Commit graph

4027 commits

Author SHA1 Message Date
Zhang Lei
f77ff1bc3d
Quantization support for split operator with its NHWC support (#6107)
* Make split working for quantization.

* NHWC transformer support for split operator

* Refactor some according to Feedback. Will add test cases soon.

* Fix build error on windows.

* Add test case for split op on uint8_t support

* Add nhwc_transformer_test for split uint8_t support

* Some change according to PR feedbacks.
2021-01-13 10:05:34 -08:00
Dmitri Smirnov
6b73bae035
Java: add Semmle to Java publishing pipelines (#6326)
Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.
2021-01-12 15:12:13 -08:00
Tim Harris
aacc8dbfa3
Remove false positive prefast warning from threadpool (#6324) 2021-01-12 14:47:52 -08:00
Ashwini Khade
0ed56d491a
fix opset imports for function body (#6287)
* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix
2021-01-12 13:44:36 -08:00
Tim Harris
b491d7c179
Avoid false sharing on thread pool data structures (#6298)
Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops.

Motivation and Context
MobileNet on a 2*12-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread).

The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 2*14-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine.

Before change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232

After change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317
2021-01-12 19:58:41 +00:00
Tianlei Wu
ec81e29c84
Add longformer to python package (#6314)
* add longformer to python package
* move test related script and data to a new folder
2021-01-12 10:38:39 -08:00
Zhang Lei
a8257666bd
Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)
* Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.
2021-01-12 09:53:13 -08:00
Luyao Ren
3b3e698674
Remove abs in LpPool (#6303) 2021-01-12 01:39:13 -08:00
Tianlei Wu
a038924bee
update transformers required package versions (#6315) 2021-01-12 00:10:56 -08:00
Changming Sun
c43ca45c4f
Force reinstall onnx python package on Windows (#6309) 2021-01-11 22:12:56 -08:00
Vincent Wang
ac5b5e5d1e
more dtype for Equal CUDA kernel (#6288)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2021-01-12 10:46:21 +08:00
Tianlei Wu
938e65d878
add --sequence_lengths option (#6285) 2021-01-11 14:26:22 -08:00
Chun-Wei Chen
84024bdfa9
Enable ONNX backend test of SequenceProto input/output (#6043)
* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name
2021-01-11 11:30:33 -08:00
Changming Sun
5084ce0969
Update nuget build (#6297)
1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.
2021-01-11 10:49:05 -08:00
Jesse Benson
fa851bff66 Add workaround to remove ROCm-specific binary-elementwise files. 2021-01-11 10:00:18 -08:00
Jesse Benson
1059bfaf75 Workaround for static_cast<double>(half) 2021-01-11 10:00:18 -08:00
Ye Wang
da952a9a20
A list of changes in transformers tool (#6224)
* longformer fp16 e2e

* add fp16/fp32 parity check helper file

* excludes nodes with subgraph in profiling

* use onnxconverter_common to do fp32->fp16

* add version check for onnxconverter_common

* remove helper file

* add pkg installation on notebooks and script
2021-01-08 11:11:14 -08:00
Tianlei Wu
ac5ca2bbe0
fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)
fix io binding crash for past_sequence_length=0
2021-01-07 23:43:50 -08:00
Hariharan Seshadri
7fc827a8a1
Fix Min/Max CPU kernels for float16 type (#6205) 2021-01-07 23:32:52 -08:00
Ye Wang
a72fcbd5fc
Add helper to compare model with different precision (#6270)
* add parity_check_helper.py

* add real example

* remove lines
2021-01-07 16:54:56 -08:00
Edward Chen
04287ec770
Increase timeout for Linux GPU CUDA11 build. (#6280) 2021-01-07 15:44:42 -08:00
Edward Chen
c10948699b
Rename MakeString and ParseString functions. (#6272)
Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, *ParseString to *ParseStringWithClassicLocale.
Add missing pass-through versions of MakeStringWithClassicLocale for string types.
2021-01-07 15:43:42 -08:00
Tianlei Wu
b80e8ce6a5
rename past to past_key_values for GPT-2 (#6269)
rename past to past_key_values for transformers 4.*
2021-01-07 11:12:04 -08:00
Xavier Dupré
481a2cdf61
Add script to preprocess python documentation before publishing (#6129)
* add script to preprocessing python documentation before publishing
2021-01-07 19:23:59 +01:00
Edward Chen
d761571afc
Deprecate Python global configuration functions [Part 2] (#6171)
Update Python API to allow more flexibility for setting providers and provider options.

The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.

Update some EP-specific option parsing to fail on unknown options.

Other clean up.
2021-01-07 10:10:55 -08:00
Hariharan Seshadri
bbc9ed908a
Fix VS 2017 build break (#6276) 2021-01-07 02:09:35 -08:00
Tang, Cheng
431604ef89
add bfloat16 to gathergrad type constrains (#6267)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-01-06 15:04:14 -08:00
Hariharan Seshadri
2347de4a9e
Fix Linux/Mac error message on input type mismatch (#6256) 2021-01-05 22:21:24 -08:00
Hariharan Seshadri
d42399e1b0
Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248) 2021-01-05 22:18:03 -08:00
pengwa
eea3806db1
model parallel refinement (#6244)
* Megatron Transformation as a seperate step

* remove useless header

* clang formating

* Re-Structure megatron transformer for subsquent changes

* fix  comments
2021-01-06 10:30:22 +08:00
liqunfu
addb4b8c2b
Liqun/speech model loop to scan (#6070)
Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-05 15:15:23 -08:00
Edward Chen
ce6161cf67
Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252)
* Add MakeStringLite which uses current locale, update macros to use that to generate messages.

* Convert calls to MakeStringLite().
2021-01-04 19:27:24 -08:00
ashbhandare
493bf931c5
Add the Concat Slice Elimination transform, fix constant_folding transform (#5457)
* Add concat slice transform + test

* Cosmetic improvements in concat slice transform

* Remove unrelated file, fix comment, fix constant folding bug

* Add test onnx graph

* fix windows build

* Review comments

* review comment

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-04 16:18:33 -08:00
Changming Sun
6fd9d34bb0
Remove a debug log in provider_test_utils.cc (#6200) 2021-01-04 13:58:11 -08:00
baijumeswani
93bf7c4d52
Documentation for distributed CI tests pipeline (#6140) 2021-01-04 10:09:39 -08:00
Olivia Jain
c8de3f355a
Refactor EP Perf Tool (#6202)
* merge master, keep postprocess status commit

* download float16.py everytime

* using variables to reference eps

* adding ACL EP to ep perf tool

* accuracy with absolute tolerance configurable

* add acl to dict + remove commented line
2021-01-04 08:50:41 -08:00
Suffian Khan
46e0e4e69f
Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239)
* bias gelu grad use exp(...) instead

* update cuda to rocm

* missing semicolon

* comment

* remove dockerfile

* missing factor of two
2021-01-02 08:54:16 -08:00
Hector Li
ffb4b62826
Fix allocator issue for TensorRT IOBinding (#6240)
* Fix issue: https://github.com/microsoft/onnxruntime/issues/6094

Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt.

Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT
2020-12-31 20:15:43 -08:00
Changming Sun
1685167e46
Update manylinux docker image to the latest (#6242) 2020-12-31 19:57:04 -08:00
Changming Sun
d5cb17c679 Update BUILD.md 2020-12-31 17:20:00 -08:00
Xavier Dupré
cd14c1af29
Support double for operator ArgMin (#6222)
* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt
2020-12-31 11:25:46 +01:00
Xavier Dupré
84addcd2cf
Support double for operator ReduceMean, ReduceLogSumExp (#6217)
* Support double for operators ReduceMean, ReduceLogSumExp
2020-12-31 11:24:54 +01:00
Xavier Dupré
5968a91ea6
Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)
* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm
2020-12-31 11:24:16 +01:00
Xavier Dupré
70e2f96ef4
Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220)
* Support double for operator TopK
* add static classes for topk/double
* fix cast issue in topk
2020-12-31 11:23:19 +01:00
Tracy Sharpe
ecb2e119e4
MLAS: handle MlasGemm(M/N/K==0) cases (#6238) 2020-12-30 23:25:10 -08:00
Hariharan Seshadri
4cc2ffef21
Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233) 2020-12-30 20:41:59 -08:00
William Tambellini
39a988ce1c Upgrade build.py to assert for python 3.6+
Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.
2020-12-30 20:17:09 -08:00
Changming Sun
c15a858745 Update the readme file 2020-12-30 20:16:45 -08:00
Changming Sun
3911105f09 Remove python 3.5 2020-12-30 20:16:45 -08:00
Changming Sun
1b23b28706
Remove MKLML/openblas/jemalloc build config (#6212) 2020-12-30 17:18:19 -08:00