onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
PeixuanZuo	189aef2bea	[ADD] add skip layernorm to kernel explorer for ROCm EP (#12816 ) Description: Describe your changes. Related PR: https://github.com/microsoft/onnxruntime/pull/12803 https://github.com/microsoft/onnxruntime/pull/12817 https://github.com/microsoft/onnxruntime/pull/12821 Add skip layernorm to kernel explorer for profiling. Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-09-20 17:17:01 +08:00
cloudhan	ffeba98a9d	Allow gemm profile by pass args from commandline (#12991 ) This allow us quickly launch a microbench session by, for example: ```bash python gemm_test.py T N float16 256 256 65536 ``` So that we can quickly see which one is the fastest.	2022-09-20 16:18:56 +08:00
Cheng	f26054deca	[XNNPACK] Support running in multi-thread with seperate pthreadpool (#11762 ) Description: Describe your changes. XNNPACK takes pthreadpool as its internal threadpool implemtation, it couples calculation and parallelization. Thus it's impossible to leverage ORT's threadpool (EIGEN/OPENMP based). So we enabled pthreadpool in XNNPACK EP in this PR. Case 1: Pthreadpool coexist with ORT-threadpool simply Expriments setup hardware:RedMi8A with 8 cores, ARMv7 The two threadpool has the same pool size form 1 to 8. Two models: mobilenet_v2 and mobilenet_egetppu. we can see the picture below and draw a conclusion, latency are even higher from 5 threads or more. ![image](https://user-images.githubusercontent.com/9417365/190550127-2304adfe-97ac-4aeb-91a0-4606b5305a82.png) Case 2: For the reason of performance regression with 5 more threads, ORT-threads are spinning on CPU and diddn't realease it after computation finished. It's equivalent of creating 5x2 threads for parallelization while we have only 8 cpu cores. So I mannuly disabled spinning after ort-threadpool finished and enabled it when enter ort-threadpool. The result is quite normal now. ![image](https://user-images.githubusercontent.com/9417365/190675230-0d85dd02-01f0-4255-967d-e3dbb2a1fe52.png) Case 3: Even we achieved a reasonable results with disabling spinning, Will ORT-threadpool still impact performance of pthreadpool? we have expriment setting up as: Setting ORT-threadpool size (intra_thread_num) as 1, and only pthreadpool created. Attention that, almost a third of ops are running by CPU EP. we are surprisingly find that disabling ort-threadpool is even better in performance than creating two threadpool. ![image](https://user-images.githubusercontent.com/9417365/190556480-d6507396-d777-44fc-94e1-938d2b9bb7d7.png) Case 4: Use a unified threadpool between CPU ep and XNNPACK ep. It's the fastest among all. But if we take the similar workload partition strategy as ORT-threadpool, it could be faster. ![image](https://user-images.githubusercontent.com/9417365/190674908-a68fd20f-bdf4-41f9-bf0a-76b304cda490.png) Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-09-20 16:02:15 +08:00
Pranav Sharma	a8b0f57d1a	Fix eager mode pipeline to accommodate recent allocator change. (#13000 )	2022-09-20 12:53:46 +08:00
cloudhan	0ddf4efbd9	Make PythonOp report dtype mismatch by name, instead of by using enum index (#13007 )	2022-09-20 12:29:30 +08:00
Chen Fu	77b567df66	test qdq loss presence (#12928 ) Description: Change qdq debugger test oracle instead of testing a threshold, which occasionally fails, we just test the loss value is present.	2022-09-19 15:58:27 -07:00
Prathik Rao	3cd2d4a7a1	Merge pull request #13013 from microsoft/prathikrao/setuptools-version-bug-fix downgrade setuptools	2022-09-19 15:50:48 -07:00
Prathik Rao	8ea742b507	downgrade setuptools	2022-09-19 12:39:35 -07:00
Justin Chu	14eb3cf485	Ignore settings.json in git (#12988 ) Description: Remove the `settings.json` line in gitignore. Motivation and Context Having `settings.json` tracked in git has created annoying diffs when it is modified locally. This PR removes the entry in gitignore but maintains the `settings.json` in the repo so that we have a good default.	2022-09-19 12:05:43 -07:00
cloudhan	14365b67a0	Fix hipify due to CUDA EP tensorrt_fused_multihead_attention optimization (#12990 ) Recent change in CUDA EP #12814 makes hipify extremely slow and breaks the building. This PR fixes it by c The onnxruntime/contrib_ops/rocm/bert/attention.h is checkout-ed from the version before #12814 and manually hipify-ed. Slightly extend amd_hipify.py to allow wildcard file match and exclude all `tensorrt_fused_multihead_attention/*` files from hipify	2022-09-19 15:29:23 +08:00
Changming Sun	e02bea2e3f	Fix some warnings (#12918 )	2022-09-18 10:55:33 -07:00
Baiju Meswani	4ed5a5b2a8	Disable local versions based on environment variable (#12997 )	2022-09-16 22:51:18 -07:00
Yufeng Li	b48f71fcfc	fix bug: quantization shape inference (#12983 ) model path for onnx.shape_inference.infer_shapes_path and the external data needs to be under the same directory as doc here: `f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)`	2022-09-16 10:17:22 -07:00
Wei-Sheng Chin	1a684152cc	Fix C6011: dereferencing NULL pointer with_data (and external_data) (#12982 ) As title. For pattern like ```cpp foo(ptr) ``` we change them to ```cpp if (ptr) foo(ptr) else throw ```	2022-09-16 09:49:36 -07:00
Wei-Sheng Chin	12aab3c01d	Fix TSA warnings (#12950 ) Fix two warnings: 1. Warning: Avoid calling new and delete explicitly, use std::make_unique<T> instead (r.11). Fix: new is replaced by creating unique_ptr and unique_ptr.release delete is replaced by unique_ptr.reset and unique_ptr's destructor. 2. Warning: Buffer overrun while writing to 'cpu_buffers_info->buffers': the writable size is 'buffers.public: unsigned __int64 __cdecl std::vector<void \,class std::allocator<void\> >::size(void)const () \* 8' bytes, but '16' bytes might be written. Fix: Replace void* with cudaStream_t and void** with std::vector<cudaStream_t>.	2022-09-16 09:43:48 -07:00
Adam Louly	268bfe2a5d	python training api bindings (#12610 ) Description: Python API Bindings for on device training. Motivation and Context - This PR contains api bindings so python users can perform a whole training loop. Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2022-09-16 09:38:24 -07:00
Alexey Gladyshev	2b5b11d373	[C#][TVM EP] Fix issues related to using TVM EP in C# front-end (#12958 ) Changes in this PR: * Update building of Nuget package for TVM EP * Update of documentation for using TVM EP in C#	2022-09-16 16:04:59 +02:00
Jake Mathern	85546255c4	make nhwc transformer only apply to CPU ep. (#11882 ) QLinearConv does not work with DML EP because this optimizer intended for CPU EP is wrongfully applied to it. Limit NHWC optimizer to nodes assigned to the CPU EP	2022-09-16 18:46:28 +10:00
sumitsays	ab45ac311f	Merge pull request #12980 from microsoft/WindowsAI [DML EP] Merge ORT/WindowsAI to ORT/main	2022-09-15 22:24:14 -07:00
Pranav Sharma	b935524e22	Revert reverse setup of allocators + create/register allocator in CPU EP only when needed. (#12954 ) * Revert reverse setup of allocators + create/register allocator in CPU EP only when needed.	2022-09-15 17:54:32 -07:00
Faith Xu	94d9e9ad6d	[Issue labeler] Separate out C# api as separate label (#12951 ) Separate out C# api as separate label	2022-09-15 17:36:57 -07:00
sumitsays	709254949a	DML EP temporarily fall back to CPU for LayerNorm when Bias is not present (#12987 ) * Temporarily fall back to CPU for LayerNorm * Build fix * Typo * TYPO Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-09-15 16:13:18 -07:00
Ye Wang	3c427a8946	Fix an arithmetic overflow warning (#12961 )	2022-09-15 15:53:57 -07:00
Tang, Cheng	739b5675c8	remove legacy compile api (#12932 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-09-15 13:18:40 -07:00
Changming Sun	203f63c224	Publish WinML Nuget package to ORT-Nightly ADO feed (#12904 )	2022-09-15 12:10:27 -07:00
Sumit Agarwal	9f6646f11d	Merge branch 'master' into WindowsAI	2022-09-15 10:55:08 -07:00
sumitsays	363c695dad	Update DML 1.9.0 to 1.9.1 (#12966 ) Update DML to 1.9.1 Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2022-09-15 10:54:22 -07:00
Yi Zhang	08af88e3e2	Assign generate document job to CPU pool. (#12973 )	2022-09-15 10:42:12 -07:00
PeixuanZuo	647e09cc39	[FIX] skip layer norm for ROCm EP (#12803 ) * [FIX] fix skiplayernorm	2022-09-15 09:07:37 -07:00
cloudhan	d2aa2109c0	Make TunableOp follow stream semantics (#12856 )	2022-09-15 21:11:27 +08:00
Cheng	248f72e972	fix VC++ Static Code Analysis warnings (#12940 ) * fix VC++ Static Code Analysis warnings * fix warning	2022-09-15 16:33:13 +08:00
cloudhan	10f9a69707	Use CMake EXCLUDE_FROM_ALL for composable kernels to avoid building of conv related kernels (#12855 )	2022-09-14 22:11:31 -07:00
Chun-Wei Chen	d819b56fba	Consume ONNX 1.12.1 to prevent vulnerability issue while loading external file (#12915 ) * consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors * update ONNX 1.12.1 * test updated PR * use official rel-1.12.1 commit	2022-09-14 21:10:24 -07:00
PeixuanZuo	3f456a1847	[Update] update rocm5.2.3 (#12942 ) * [Update] update rocm5.2.3 * [Update] use rocm docker image as base	2022-09-15 10:41:49 +08:00
Cassie Breviu	5099dda42f	Lint updates csharp docs (#12962 ) * fix lint issues on docfx.vendor.js file * fix ci * remove submodule * fix ci * Update var name to AcceptedList * remove test branch from ci	2022-09-14 17:56:41 -05:00
Dmitri Smirnov	bc2df1bf95	Remove previously deprecated API (#12935 ) Remove previously deprecated API Format JS code, address review comments NPM Formatting	2022-09-14 10:58:03 -07:00
Yi Zhang	1ef1029163	Skip 2 tests in windows gpu workflow (#12956 )	2022-09-14 09:43:38 -07:00
cloudhan	b8e34fbd91	Split topk implementation into per-type translation units to speed up compilation (#12861 )	2022-09-14 19:36:54 +08:00
Vincent Wang	da07c83948	SoftmaxCrossEntropyLossInternalGrad and Sum Fusion (#12746 ) * fuse scegrad and sum * add yield output shapes to value_info * resolve comments * fix merge main	2022-09-14 14:45:51 +08:00
Dwayne Robinson	568950e28c	Warn on node EP silent fallback from preferred provider (#10831 ) * Warn on node EP fallback from preferred provider * Clarify with comment * Update to ORT's weird coding style for ragged parameter wrap * Android build error: unused parameter ‘providers’ * Update logic to be more robust * Updates from Pranav's feedback about messaging to rerun with verbose and respecting explicit vs implicit EP addition. Also merge from main. * brace style patch up * Update with feedback from Pranav and Scott McKay * Restore node_placement_set after realizing it only applies when is_verbose is true * Fix build warning on Android * Renamed to node_placement_provider_set per Pranav's suggestion	2022-09-13 15:53:17 -07:00
Yulong Wang	78bc53f91d	fix prefast:Warning C26814 in non_max_suppression.cc (#12934 )	2022-09-13 15:22:55 -07:00
Changming Sun	bb98922cc8	Delete nuphar docker file (#12944 )	2022-09-13 15:22:07 -07:00
Tianlei Wu	95c4fc6877	[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814 ) * Add TensorRT fused attention fp16 kernels * drop sm 72; seq 512 for sm75; and head_size 32 kernels * Add env variable ORT_DISABLE_FUSED_ATTENTION * exclude files in hipify * update AttentionPastState_dynamic test threshold * fix --use_mask_index in benchmark	2022-09-13 15:16:12 -07:00
Scott McKay	1016c33519	Fix prefast warning in upsample.cc. (#12938 ) * Fix prefast warning. * Fix some other static analysis warnings.	2022-09-14 08:14:33 +10:00
Changming Sun	626d94aa23	Refactor python packaging pipeline and nuget packaging pipeline (#12945 ) 1. Move the Linux ARM64 part of python packaging pipeline to a real ARM64 machine pool 2. Refactor the Linux CPU build jobs of python packaging pipeline to two parts: build and test. The test part will be exempted from Cyber EO compliance requirements as it won't affect the final bits we publish. This refactoring is to reduce dependencies in the build part. For example, this PR remove pytorch from the build dependencies. 3. Combine DML nuget packaging pipeline with "Zip-Nuget-Java-Nodejs Packaging Pipeline" as they all produce ORT nuget packages. Also, publish DML nuget packages and ORT GPU nuget packages to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly feed.	2022-09-13 14:50:31 -07:00
Hariharan Seshadri	9edc9465f0	Fix some prefast warnings (#12936 )	2022-09-13 13:04:37 -07:00
RandySheriffH	64466c2d62	Remove nuphar provider folder (#12939 )	2022-09-13 09:10:52 -07:00
madurais	28e27ee7f7	Changes for AIX compilation to get CPU of running thread. hz is inter… (#12744 ) * Changes for AIX compilation to get CPU of running thread. hz is internal variable in AIX, hence changing to hz1 in window_functions.cc so that all OS shall work Co-authored-by: madurais <root@telesto10.in.ibm.com> Co-authored-by: tvkai <vamshikrishna@in.ibm.com>	2022-09-13 11:06:35 +10:00
Edward Chen	31a1403e06	Add --output_dir option to convert_onnx_models_to_ort.py. (#12844 ) Add --output_dir option to convert_onnx_models_to_ort.py. Allows one to optionally specify an output directory for the converted model files.	2022-09-12 15:36:03 -07:00
Joseph Groenenboom	a433f22f17	Softmax interface update (#12469 ) * Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP * Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP The kernel already masks off invalid items and this gives a much faster implementation in hipCUB. * Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel Hard code accumulator to fp32 for hipCUB in indicated kernel. * Reset casting to old behavior * Document steps to optimize SoftMax kernel on ROCm EP Usage of the hipCUB valid_items interface on reduction operations has a significant performance impact. Masking all thread data to avoid need to use the valid_items interface to hipCUB.	2022-09-12 13:02:31 -07:00

1 2 3 4 5 ...

7390 commits