onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

Author	SHA1	Message	Date
Ryan Hill	bc7f8ad4f8	Revert python version	2021-05-03 17:11:50 -07:00
Ryan Hill	edf0522b35	Point to RyanWinGPU	2021-05-03 16:38:26 -07:00
Ryan Hill	9287e602b7	Revert "Test adding unload method for shared providers" This reverts commit `c427b78799`.	2021-05-03 16:31:32 -07:00
Ryan Hill	58a675ec2b	Revert "Disable DLL test" This reverts commit `e901cb93aa`.	2021-05-03 16:31:12 -07:00
Ryan Hill	acba6779df	Revert "Python test" This reverts commit `c7ec2cfe98`.	2021-05-03 16:30:48 -07:00
Ryan Hill	c7ec2cfe98	Python test	2021-04-30 21:06:21 -07:00
Ryan Hill	e901cb93aa	Disable DLL test	2021-04-30 15:17:03 -07:00
Ryan Hill	c427b78799	Test adding unload method for shared providers	2021-04-29 22:41:11 -07:00
Ryan Hill	9df9325fcb	Update python version	2021-04-29 16:29:19 -07:00
Ryan Hill	2b650e6438	Change python version and disable dml	2021-04-29 15:39:35 -07:00
Ryan Hill	3e4199bf51	Test not using dml in pipeline	2021-04-29 12:27:05 -07:00
Ryan Hill	5a3a8fe2d0	Fix python shutdown	2021-04-28 23:53:28 -07:00
Ryan Hill	3717440937	ERROR -> WARNING	2021-04-28 19:37:38 -07:00
Ryan Hill	0605567e48	Fix more cmake files	2021-04-28 02:59:46 -07:00
Ryan Hill	42d744b51b	Move unloading back to the OrtEnv as there are multiple Environments created during a session. Remove some library dependencies for tests.	2021-04-28 01:32:34 -07:00
Ryan Hill	92be95d082	Add more diagnostics	2021-04-28 00:41:40 -07:00
Ryan Hill	99cc4418b0	Free more global allocations before library unloads	2021-04-27 20:03:52 -07:00
Ryan Hill	41144546d9	Move unloading of shared providers into Environment	2021-04-27 17:45:04 -07:00
Ryan Hill	784743e1a1	Revert profiler change	2021-04-26 16:43:14 -07:00
Ryan Hill	b89e981763	Fix merge break	2021-04-26 16:41:58 -07:00
Ryan Hill	9405a9cc72	Merge with master	2021-04-26 16:41:45 -07:00
RandySheriffH	40568d8821	Wait for dispatch done in RunParallelSection to fix random TP UT crash (#7443 ) * wait for dispatch done in RunParallelSection * pass worker_fn by value * cancel move * only move work_fn when it is lastly referred Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-04-26 14:12:10 -07:00
Zhang Lei	ada0fbbd2d	Implement qlinear concat and unit test. (#7341 ) * Implement qlinear concat and unit test. Add quantization tools for QLinearConcat and it quantization tests. * Add kernel def hash for QLinearConcat. * Change according to PR. Add qdq transformer support for QLinearConcat. * Add QDQ Transformer unittest. Fix typo on domain. * remove dup logic of no use. * fix x86 build error. * Update operator docs.	2021-04-26 13:38:40 -07:00
Changming Sun	b5592856a7	Remove thread pool's cancel method and suppress some warnings (#7411 )	2021-04-26 09:33:48 -07:00
Vincent Wang	368e4a324f	SqueezeGrad Bugfix (#7412 ) * squeezegrad bugfix * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-04-26 09:12:03 +08:00
Weixing Zhang	ca9b3f18e9	Explicitly pass cuda stream to thrust function rather than use cuda default stream implicitly (#7414 ) * Pass cuda stream to thrust function to not use default stream. In the commit `299ace0`, ORT has been changed to not use cuda default stream. * update amd_hipify.py * remove un-necessary stream sync Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-25 01:18:56 -07:00
Ryan Hill	8c1b52cf6e	Test disabling profiler	2021-04-24 02:43:06 -07:00
Ryan Hill	393286f484	Fix deleting registry at right time.	2021-04-24 01:15:14 -07:00
jeyblu	b9cbbc41ff	dnnl matmul tensor dimension check (#7383 )	2021-04-23 23:17:22 -07:00
Ryan Hill	06eac846b8	Add diagnostics	2021-04-23 21:39:41 -07:00
RandySheriffH	afe912d47c	Reduce perf gap between thread pool and omp (#7333 ) * add async dispatch * minor renamings * build py38 * restore yml * fix sync up issue between dispatch thread and main * fix comments * refactor SummonWorker and rename to RunInParallelInternal	2021-04-23 18:36:36 -07:00
Thiago Crepaldi	410a81b21b	Add support for ORTModule to execute the graph when ONNX drops unused… (#7424 )	2021-04-23 18:10:57 -07:00
Chen Fu	f4f2cc1a00	Add batch interface to floating point GEMM (#7323 ) Currently in high dimension matmul, we call multiple GEMM sequentially. In this change we execute these GEMMs in parallel, removing barriers between two adjacent GEMM operations. Performance tested with Bert and T5 model. Bert model shows no noticeable perf differences, as the heavy lifting is done by the attention operator, which is not changed in this PR. In T5 model, we see no regression on low parallel threads (x4), and performance improvement is more pronounced in high number of threads (8-16). T5 shows 10% speedup with 16 threads. With profiling, we can see the most expensive MatMul operators in T5 achieves around 20% speedup with 16 threads. Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-04-23 17:34:22 -07:00
Suffian Khan	7a3c1787af	Add CI pipeline to publish Python training package targeting Rocm (#7417 ) * first attempt rocm training wheel * modifications needed to python packaging pipeline for Rocm 4.1 * changges to not conflict with cuda missed stage1 changes remove package push add option r to getopt try again without python install try again without python install try again without python install split pipelines and add back push to remote storage try on cuda gpu pool try again try again try running without az subscription set try again on original pipeline change pool passing AMD Rocm whl on AMD-GPU pool split rocm pipeline from cuda pipeline remove comments * try adding Rocm tests as well * try with tests in place * fix trailing ws * add training data * try again as root for tests * use python3 * typo * try to map video, render group into container * try again * try again * try to avoid yum error code * make UID 1001 * try without yum downgrade * define rocm_version=None * remove CUDA related comments for Rocm Dockerfile * Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1) * missed requirements-rocm.txt from last commit * fix whitespace	2021-04-23 17:22:31 -07:00
Ryan Hill	fca6d30e46	Don't unload library	2021-04-23 15:54:58 -07:00
M. Zeeshan Siddiqui	34ebf7d3dd	Partial graph execution made simple. (#7324 ) * Python changes. * C++ changes. * fixes/hacks. * more hacks. * perf. * changes. * changes. * re-architect partial graph execution and remove iobinding. * changes. * refactor. * prevent copies from python to c++. * perf. * merge conflicts. * misc. * fix merge conflicts and tests. * Ifdef partial executor. * PR feedback. * Delete ORT Task et al. * Clean up. * clean up. * Restore SetOutputMLValue(). * PR feedback. * Re-enable disabled ORTModule tests. * PR feedback. * PR feedback.	2021-04-23 15:09:18 -07:00
Changming Sun	5208231126	Fix some warnings in our CUDA code (#7436 )	2021-04-23 14:56:20 -07:00
Suffian Khan	8889e717eb	add gather elements (#7435 )	2021-04-23 14:05:17 -07:00
Weixing Zhang	ef72764960	Build would fail when nccl is not under standard path (--nccl_home) (#7402 ) * Build would fail when nccl is not under standard path (--nccl_home) * fix build for ROCm EP	2021-04-23 14:04:22 -07:00
Changming Sun	9f683bae78	Revert the TRT change and move the build to a new pool (#7434 )	2021-04-23 14:00:26 -07:00
Ryan Hill	96fa0845a2	Diagnostics	2021-04-23 13:39:49 -07:00
satyajandhyala	979d63159b	Add level two optimizations for constant propagation transformation. (#7410 ) * Made the python script generating the testcases modular. * Modified RemoveBackToBackCasts function to remove cast even if the parent node has other consumers. * Modified InsertCastNodes to update the graph consistently for other functions to work. * Moved ConcatNames function to the top. * PropagateBackward/SearchUpstream and PropagateFP16CastsFromOutputsToInputs insert FP32 casts if the level >1 in order to propagate FP16 casts backwards. * Added new testcases for level two setting.	2021-04-23 13:25:54 -07:00
Chi Lo	f1c3f3fcc1	TRT EP memory leak fix (#7415 ) * fix memory leak * small refactor * code refactor	2021-04-23 12:04:23 -07:00
Guoyu Wang	043883b52d	[CoreML EP] Add Gemm/MatMul support (#7403 ) * [CoreML EP]Add gemm/matmul support * remove changes in get_execution_providers * Address CR comments * Switch to list initialization * Minor update	2021-04-23 11:54:59 -07:00
Yufeng Li	e7912736b9	Add qdq propagation support (#7404 ) * Add qdq propagation support * add more unit tests	2021-04-23 11:17:44 -07:00
Tang, Cheng	1fa6d8fe1c	support loading external execution provider from python frontend (#7332 ) * initial dynamic load example * support load EP in the provider options * support dynamic load EP in orttrainer * split the provider interface; fix comments in pr * remove experiment code * add test * remove useless file * add test model file;fix linux brewak * fix linux build and missing file * fix python build * fix python build * fix python binding * fix python test * fix runtime path for posix env * exclude the shared library from minimal build * fix comments in pr; * seperate the provider shared lib loading * excluded from minimal / macos / ios build * skip copy the provider shared lib for minimal build and mac os * fix macos build * exclude the test for macos build * exclude from andorid build * exclude from web assembly build * enable the invalid ep test Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-04-23 09:54:09 -07:00
Ashwini Khade	75e054cd33	pick onnx release candidate (#7177 ) * pick onnx release candidate * fix typo * filter batchnorm tests * add implementation for reshape 14 * add identity op kernel for opset 14 * fix typo * update onnx commit * update commit to latest master * add hashes for new kernel registrations and update 1 * TEST commit * update onnx back to right commit * Update onnx to latest in rel-1.9.0 * temp fix * remove nonzeroshapesetter transformer * pick rel branch latest commit * fix build failures * fix build failures * fix build failures * update the commit to latest in release branch * add test filters for not impemented op14 ops in c# tests * plus review comments	2021-04-22 23:57:09 -07:00
Guoyu Wang	d414039189	Add ios coreml ci, and speedup ios ci run (#7420 )	2021-04-22 23:41:58 -07:00
Ryan Hill	5c6910ed9c	Fix memory cleanup on unload	2021-04-22 23:07:56 -07:00
sumitsays	d67c86265b	Enabled fp16-inception-v1 test (#7406 ) Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2021-04-22 23:05:03 -07:00

1 2 3 4 5 ...

4768 commits