Commit graph

4768 commits

Author SHA1 Message Date
Ryan Hill
bc7f8ad4f8 Revert python version 2021-05-03 17:11:50 -07:00
Ryan Hill
edf0522b35 Point to RyanWinGPU 2021-05-03 16:38:26 -07:00
Ryan Hill
9287e602b7 Revert "Test adding unload method for shared providers"
This reverts commit c427b78799.
2021-05-03 16:31:32 -07:00
Ryan Hill
58a675ec2b Revert "Disable DLL test"
This reverts commit e901cb93aa.
2021-05-03 16:31:12 -07:00
Ryan Hill
acba6779df Revert "Python test"
This reverts commit c7ec2cfe98.
2021-05-03 16:30:48 -07:00
Ryan Hill
c7ec2cfe98 Python test 2021-04-30 21:06:21 -07:00
Ryan Hill
e901cb93aa Disable DLL test 2021-04-30 15:17:03 -07:00
Ryan Hill
c427b78799 Test adding unload method for shared providers 2021-04-29 22:41:11 -07:00
Ryan Hill
9df9325fcb Update python version 2021-04-29 16:29:19 -07:00
Ryan Hill
2b650e6438 Change python version and disable dml 2021-04-29 15:39:35 -07:00
Ryan Hill
3e4199bf51 Test not using dml in pipeline 2021-04-29 12:27:05 -07:00
Ryan Hill
5a3a8fe2d0 Fix python shutdown 2021-04-28 23:53:28 -07:00
Ryan Hill
3717440937 ERROR -> WARNING 2021-04-28 19:37:38 -07:00
Ryan Hill
0605567e48 Fix more cmake files 2021-04-28 02:59:46 -07:00
Ryan Hill
42d744b51b Move unloading back to the OrtEnv as there are multiple Environments created during a session.
Remove some library dependencies for tests.
2021-04-28 01:32:34 -07:00
Ryan Hill
92be95d082 Add more diagnostics 2021-04-28 00:41:40 -07:00
Ryan Hill
99cc4418b0 Free more global allocations before library unloads 2021-04-27 20:03:52 -07:00
Ryan Hill
41144546d9 Move unloading of shared providers into Environment 2021-04-27 17:45:04 -07:00
Ryan Hill
784743e1a1 Revert profiler change 2021-04-26 16:43:14 -07:00
Ryan Hill
b89e981763 Fix merge break 2021-04-26 16:41:58 -07:00
Ryan Hill
9405a9cc72 Merge with master 2021-04-26 16:41:45 -07:00
RandySheriffH
40568d8821
Wait for dispatch done in RunParallelSection to fix random TP UT crash (#7443)
* wait for dispatch done in RunParallelSection

* pass worker_fn by value

* cancel move

* only move work_fn when it is lastly referred

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-04-26 14:12:10 -07:00
Zhang Lei
ada0fbbd2d
Implement qlinear concat and unit test. (#7341)
* Implement qlinear concat and unit test.
Add quantization tools for QLinearConcat and it quantization tests.

* Add kernel def hash for QLinearConcat.

* Change according to PR. Add qdq transformer support for QLinearConcat.

* Add QDQ Transformer unittest. Fix typo on domain.

* remove dup logic of no use.

* fix x86 build error.

* Update operator docs.
2021-04-26 13:38:40 -07:00
Changming Sun
b5592856a7
Remove thread pool's cancel method and suppress some warnings (#7411) 2021-04-26 09:33:48 -07:00
Vincent Wang
368e4a324f
SqueezeGrad Bugfix (#7412)
* squeezegrad bugfix

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2021-04-26 09:12:03 +08:00
Weixing Zhang
ca9b3f18e9
Explicitly pass cuda stream to thrust function rather than use cuda default stream implicitly (#7414)
* Pass cuda stream to thrust function to not use default stream.

In the commit 299ace0, ORT has been changed to not use cuda default stream.

* update amd_hipify.py

* remove un-necessary stream sync

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-25 01:18:56 -07:00
Ryan Hill
8c1b52cf6e Test disabling profiler 2021-04-24 02:43:06 -07:00
Ryan Hill
393286f484 Fix deleting registry at right time. 2021-04-24 01:15:14 -07:00
jeyblu
b9cbbc41ff
dnnl matmul tensor dimension check (#7383) 2021-04-23 23:17:22 -07:00
Ryan Hill
06eac846b8 Add diagnostics 2021-04-23 21:39:41 -07:00
RandySheriffH
afe912d47c
Reduce perf gap between thread pool and omp (#7333)
* add async dispatch

* minor renamings

* build py38

* restore yml

* fix sync up issue between dispatch thread and main

* fix comments

* refactor SummonWorker and rename to RunInParallelInternal
2021-04-23 18:36:36 -07:00
Thiago Crepaldi
410a81b21b
Add support for ORTModule to execute the graph when ONNX drops unused… (#7424) 2021-04-23 18:10:57 -07:00
Chen Fu
f4f2cc1a00
Add batch interface to floating point GEMM (#7323)
Currently in high dimension matmul, we call multiple GEMM sequentially. In this change we execute these GEMMs in parallel, removing barriers between two adjacent GEMM operations.

Performance tested with Bert and T5 model. Bert model shows no noticeable perf differences, as the heavy lifting is done by the attention operator, which is not changed in this PR. In T5 model, we see no regression on low parallel threads (x4), and performance improvement is more pronounced in high number of threads (8-16). T5 shows 10% speedup with 16 threads. With profiling, we can see the most expensive MatMul operators in T5 achieves around 20% speedup with 16 threads.

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-04-23 17:34:22 -07:00
Suffian Khan
7a3c1787af
Add CI pipeline to publish Python training package targeting Rocm (#7417)
* first attempt rocm training wheel

* modifications needed to python packaging pipeline for Rocm 4.1

* changges to not conflict with cuda

missed stage1 changes

remove package push

add option r to getopt

try again without python install

try again without python install

try again without python install

split pipelines and add back push to remote storage

try on cuda gpu pool

try again

try again

try running without az subscription set

try again on original pipeline

change pool

passing AMD Rocm whl on AMD-GPU pool

split rocm pipeline from cuda pipeline

remove comments

* try adding Rocm tests as well

* try with tests in place

* fix trailing ws

* add training data

* try again as root for tests

* use python3

* typo

* try to map video, render group into container

* try again

* try again

* try to avoid yum error code

* make UID 1001

* try without yum downgrade

* define rocm_version=None

* remove CUDA related comments for Rocm Dockerfile

* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)

* missed requirements-rocm.txt from last commit

* fix whitespace
2021-04-23 17:22:31 -07:00
Ryan Hill
fca6d30e46 Don't unload library 2021-04-23 15:54:58 -07:00
M. Zeeshan Siddiqui
34ebf7d3dd
Partial graph execution made simple. (#7324)
* Python changes.

* C++ changes.

* fixes/hacks.

* more hacks.

* perf.

* changes.

* changes.

* re-architect partial graph execution and  remove iobinding.

* changes.

* refactor.

* prevent copies from python to c++.

* perf.

* merge conflicts.

* misc.

* fix merge conflicts and tests.

* Ifdef partial executor.

* PR feedback.

* Delete ORT Task et al.

* Clean up.

* clean up.

* Restore SetOutputMLValue().

* PR feedback.

* Re-enable disabled ORTModule tests.

* PR feedback.

* PR feedback.
2021-04-23 15:09:18 -07:00
Changming Sun
5208231126
Fix some warnings in our CUDA code (#7436) 2021-04-23 14:56:20 -07:00
Suffian Khan
8889e717eb
add gather elements (#7435) 2021-04-23 14:05:17 -07:00
Weixing Zhang
ef72764960
Build would fail when nccl is not under standard path (--nccl_home) (#7402)
* Build would fail when nccl is not under standard path (--nccl_home)

* fix build for ROCm EP
2021-04-23 14:04:22 -07:00
Changming Sun
9f683bae78
Revert the TRT change and move the build to a new pool (#7434) 2021-04-23 14:00:26 -07:00
Ryan Hill
96fa0845a2 Diagnostics 2021-04-23 13:39:49 -07:00
satyajandhyala
979d63159b
Add level two optimizations for constant propagation transformation. (#7410)
* Made the python script generating the testcases modular.

* Modified RemoveBackToBackCasts function to remove cast even if the parent node has other consumers.

* Modified InsertCastNodes to update the graph consistently for other functions to work.

* Moved ConcatNames function to the top.

* PropagateBackward/SearchUpstream and PropagateFP16CastsFromOutputsToInputs insert FP32 casts if the level >1 in order to propagate FP16 casts backwards.

* Added new testcases for level two setting.
2021-04-23 13:25:54 -07:00
Chi Lo
f1c3f3fcc1
TRT EP memory leak fix (#7415)
* fix memory leak

* small refactor

* code refactor
2021-04-23 12:04:23 -07:00
Guoyu Wang
043883b52d
[CoreML EP] Add Gemm/MatMul support (#7403)
* [CoreML EP]Add gemm/matmul support

* remove changes in get_execution_providers

* Address CR comments

* Switch to list initialization

* Minor update
2021-04-23 11:54:59 -07:00
Yufeng Li
e7912736b9
Add qdq propagation support (#7404)
* Add qdq propagation support

* add more unit tests
2021-04-23 11:17:44 -07:00
Tang, Cheng
1fa6d8fe1c
support loading external execution provider from python frontend (#7332)
* initial dynamic load example

* support load EP in the provider options

* support dynamic load EP in orttrainer

* split the provider interface; fix comments in pr

* remove experiment code

* add test

* remove useless file

* add test model file;fix linux brewak

* fix linux build and missing file

* fix python build

* fix python build

* fix python binding

* fix python test

* fix runtime path for posix env

* exclude the shared library from minimal build

* fix comments in pr;

* seperate the provider shared lib loading

* excluded from minimal / macos / ios build

* skip copy the provider shared lib for minimal build and mac os

* fix macos build

* exclude the test for macos build

* exclude from andorid build

* exclude from web assembly build

* enable the invalid ep test

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-04-23 09:54:09 -07:00
Ashwini Khade
75e054cd33
pick onnx release candidate (#7177)
* pick onnx release candidate

* fix typo

* filter batchnorm tests

* add implementation for reshape 14

* add identity op kernel for opset 14

* fix typo

* update onnx commit

* update commit to latest master

* add hashes for new kernel registrations and update 1

* TEST commit

* update onnx back to right commit

* Update onnx to latest in rel-1.9.0

* temp fix

* remove nonzeroshapesetter transformer

* pick rel branch latest commit

* fix build failures

* fix build failures

* fix build failures

* update the commit to latest in release branch

* add test filters for not impemented op14 ops in c# tests

* plus review comments
2021-04-22 23:57:09 -07:00
Guoyu Wang
d414039189
Add ios coreml ci, and speedup ios ci run (#7420) 2021-04-22 23:41:58 -07:00
Ryan Hill
5c6910ed9c Fix memory cleanup on unload 2021-04-22 23:07:56 -07:00
sumitsays
d67c86265b
Enabled fp16-inception-v1 test (#7406)
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-04-22 23:05:03 -07:00