onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

Author	SHA1	Message	Date
Adrian Lizarraga	ad4abbd75e	[EP-Perf-Dashboard] Add support for TensorRT 8.4 to EP Perf Dashboard (#11876 ) Co-authored-by: George Wu <jywu@microsoft.com>	2022-06-17 09:16:51 -07:00
Yi Zhang	8bb0062873	add manylinux_2_27 CPU wheel (#11886 ) * add manylinux_2_27 * minor refactory * change base image * minor refactor * add tests * fix condition	2022-06-17 19:38:38 +08:00
Yi Zhang	d2cbae3a04	Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888 ) Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)" This reverts commit `2ecba6fd25`.	2022-06-17 17:07:21 +08:00
stevenlix	bd65acd08d	Share execution context memory between TensorRT subgraphs (#11859 ) * share trt context memory * update parser to 8.4-EA * update parser to 8.4-GA * add context memory sharing enable option * update parser to 8.2-GA * fix format issue * reverse orders * fix format * fix format * fix issues	2022-06-16 22:42:40 -07:00
Changming Sun	10478a09ca	Revert "add manylinux_2_27 wheel (#11832 )" This reverts commit `bbace23d0c`.	2022-06-16 18:28:12 -07:00
Dmitri Smirnov	2ecba6fd25	Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804 ) Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.	2022-06-16 16:50:48 -07:00
Dwayne Robinson	3d99f16e98	Merge pull request #11827 from microsoft/user/dwayner/DmlEp1.9 Integrate WindowsAI feature branch with DML EP features and DML 1.9	2022-06-16 13:04:00 -07:00
George Wu	df5ee6aa4e	[TensorRT EP] support TensorRT 8.4 (#11866 ) * update trt 8.4ga * trt 8.4 linux ci pipeline * fix cmake * placeholder_builder * trt 8.4 windows pipeline * gpu package pipeline * trt 8.4.1.5 , packaging pipeline updates * python packaging * ctest timeout * python packaging test * bump timeout * python format * format * revert * newline * enable trt python tests * typo * python format * disable on windows	2022-06-16 07:46:40 -07:00
Dwayne Robinson	fe7b8b80ae	Revert BatchNormalization change for now, falling back to CPU on mixed types until a more advanced solution is written	2022-06-15 21:49:18 -07:00
Dwayne Robinson	babd6e3fcd	Update DirectML preview package with unmangled names	2022-06-15 18:16:58 -07:00
Maxiwell S. Garcia	3f8c9146d5	ppc64le: specialize generic 'mlas' functions to use VSX instructions (#11845 )	2022-06-15 16:49:38 -07:00
Scott McKay	d64f23fec0	EP factory creation cleanup and enhancements. (#11798 ) * Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places. Convert append EP for SNPE to be generic, and also use for XNNPACK. Add XNNPACK to C# API * Don't need stub for MIGraphX as it's using provider bridge. * Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries. * Only use EPs that require the layout transform if the opset is supported by the layout transformer. * Update wasm registration of xnnpack.	2022-06-16 07:01:41 +10:00
Rachel Guo	1a1c360a80	[NNAPI EP] Add Gather op support (#11824 ) * initial gather support nnapi * update * minor update * address pr comments * add int32 indices test case for nnapi * remove nnapi ep limitation for added UT * add link for memcpy type punning usage	2022-06-15 09:44:07 -07:00
Vincent Wang	02457ec30a	[CUDA] GatherElements[Grad]/ScatterElements Bugfix and Perf Improve (#11374 ) * gather elements bugfix and perf improve * fix win build * fix ut on some eps * ut change * resove comments * resolve comments * fix win build	2022-06-15 16:29:17 +08:00
Xavier Dupré	a805a49363	Move OrtValueVector from onnxruntime-training to onnxruntime (#11176 ) * Move OrtValueVector from onnxruntime-training to onnxruntime * disable dlpack on onnxruntime * disable dlpack * dlpack * opaque inlcuded in any cc file of the python binding * fix type issue * fix incomplete name * remove len() * remove unused parameter * black * black * black * remove unused import * add unit test to check the output type * black * lint * lint * lint * fix method name * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * check return type of C API * lint * lint * fix missing ; * fix type issue * fix merge issue Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-06-15 09:36:28 +02:00
Dwayne Robinson	ff8b173286	Typo in DirectML.Debug.dll	2022-06-15 00:18:40 -07:00
Dwayne Robinson	508c76a246	Add missing DirectML.Debug.dll	2022-06-15 00:16:10 -07:00
Dwayne Robinson	e3ec30efb6	Add missing GELU to ApiHelpers.h	2022-06-14 23:28:15 -07:00
Dwayne Robinson	4c1a410d54	Unmangle DML preview package filenames	2022-06-14 23:12:58 -07:00
Yi Zhang	bbace23d0c	add manylinux_2_27 wheel (#11832 ) * add manylinux_2_27	2022-06-15 10:26:51 +08:00
Changming Sun	51ed27cf22	Delete win-gpu-cuda-10-2-pipeline.yml (#11847 )	2022-06-14 18:34:56 -07:00
daquexian	3cbbf9dcae	Fix wasm static lib in sub-project (#11671 ) * wasm_static_lib_global Signed-off-by: daquexian <daquexian566@gmail.com> * make wasm static lib global Signed-off-by: daquexian <daquexian566@gmail.com> * fix the property Signed-off-by: daquexian <daquexian566@gmail.com> * add code missing after merge Signed-off-by: daquexian <daquexian566@gmail.com>	2022-06-14 15:18:11 -07:00
Gary Miguel	e8b0d24071	Support per-test tolerances for ONNX tests (#11775 ) Prior to this every test shared the same tolerances. This meant that if an ONNX test failed due to a small but acceptable difference in output, the only alternative was to disable the test entirely. In op set 17, the DFT operator is being added. Without this change, the tests for that operator fail because the output is off by about 5e-5. It's better to keep test coverage for this new op rather than disable the test entirely. Also prior to this change, the global tolerances were not shared between C++, JavaScript, and Python tests. Now they are. Also fix various minor issues raised by linters. Unblocks https://github.com/microsoft/onnxruntime/issues/11640.	2022-06-14 15:12:23 -07:00
Chen Fu	d936751aad	QlinearConv threading adjustments (#11228 ) * Reserve the first core for the main thread Currently in "auto affinity" mode the worker threads are affinized to cores 0..(N-1), leaving the very last core for the main thread. This patch preserves core #0 for the main thread, and affinizes the worker threads to cores 1..N. * Avoid unneeded spin_pause in thread pool's worker threads Remove unneeded PAUSE instruction (0.1-0.2 usec latency) after a worker thread finds a task to execute. * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CPUs * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CP * Addressing comments * combining x86 ARM branches in qlinearconv threaded job partition * revert first core assignment Co-authored-by: Saurabh <saurabh.tangri@intel.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-06-14 14:42:12 -07:00
Yufeng Li	80d8c4c7ff	add data type check before quantizing (#11840 )	2022-06-14 14:22:34 -07:00
Yufeng Li	607afbe1c0	fix valgrind warnings:Conditional jump or move depends on uninitialis… (#11822 ) * fix valgrind warnings:Conditional jump or move depends on uninitialised value(s)	2022-06-14 14:02:15 -07:00
Gary Miguel	52f6db19da	Python backend: use packaging.version to parse ONNX version (#11800 ) Unlike the previous code, this handles version strings like "1.12.0rc3". Unblocks https://github.com/microsoft/onnxruntime/issues/11640.	2022-06-14 10:17:35 -07:00
zhangyaobit	f6d2b629a0	Add kernel explorer (#11779 ) * Add kernel explorer, a tool to help develop, test, profile, and tune GPU kernels. * clean up with some formatting issues * rename MACRO * macro renaming * improve cmake code * fix python lint errors * fix python lint errors * fix python lint errors * delete white space suggested by lint	2022-06-13 20:11:25 -07:00
Scott McKay	6bf6bac1fd	Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829 )	2022-06-14 09:31:17 +10:00
Chun-Wei Chen	63c483a998	1.12.0 is the right TBD instead of released 1.11.0 (#11817 )	2022-06-13 14:27:59 -07:00
Adrian Lizarraga	aef53e2b0d	Support uploading EP perf data to a configurable database. (#11819 )	2022-06-13 14:06:50 -07:00
Changming Sun	a93ebd2503	Move tvm pipeline to Github Actions (#11721 )	2022-06-13 11:38:44 -07:00
Wil Brady	b0e027c661	Add aten::_softmax to eager ops. (#11820 )	2022-06-13 13:05:26 -04:00
Hector Li	7582644f57	cmake changes for SNPE EP (#11821 ) * move code used to find the SNPE libs to a separate cmake file * Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict	2022-06-13 08:15:37 -07:00
Dwayne Robinson	04dd6639de	And appease the time wasting formatting tool now -_-...	2022-06-11 19:17:20 -07:00
Dwayne Robinson	2bc487a816	Appease flaky flake tool	2022-06-11 19:15:19 -07:00
Dwayne Robinson	50e0a193c8	Merge branch 'master' into user/dwayner/DmlEp1.9	2022-06-11 19:01:51 -07:00
Dwayne Robinson	76024b8a6a	Update DirectML.dll to 1.9.0 Preview	2022-06-11 18:51:32 -07:00
Maxiwell S. Garcia	0869f4f4ea	ppc64le: optimizing the MlasRequantizeOutput() with VSX (#11659 )	2022-06-10 16:04:52 -07:00
Tianlei Wu	def78a1b81	Support T5 in BeamSearch operator (#11450 ) (1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation. (2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute (3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default. (4) Add more tests in best_beam_search.py (5) fix ORT_ENFORCE of hypothesis_buffer_offset_ (6) Improve ONNX conversion: (a) Change encoder some dynamic axes to fixed dim value (b) add --separate_encoder_and_decoder_init (c) correct name t5-3B => t5-3b, t5-11B => t5-11b (d) Add --use_int32_inputs in convert t5 to onnx (e) Allow t5 beam search conversion in one step	2022-06-10 15:06:57 -07:00
Dwayne Robinson	c1b5f34362	DML EP BatchNormalization-15 (#11814 ) * Add external helper DirectMLX.h * Add BatchNormalization-15 using DMLX to achieve casting if types are different * Shape helper and some reformatting * Additional linting issues	2022-06-10 15:04:48 -07:00
Tianlei Wu	768b9cfb60	Fix GetDirNameFromFilePath to support forward slash in windows (#11793 )	2022-06-10 14:37:30 -07:00
Jeff Daily	5562b47f06	missing #include <thrust/count.h> in non_max_suppression_impl.cu (#11730 ) Otherwise, depending on cuda or hip thrust versions, transitive header inclusions miss thrust::count_if.	2022-06-10 10:45:28 -07:00
Guenther Schmuelling	d4ea59654c	make xnnpack build for ort-web (#11745 ) * make xnnpack build for ort-web * make ci happy	2022-06-10 08:47:57 -07:00
Vincent Wang	f745eb1d3f	fix gradient ut (#11797 )	2022-06-10 12:14:19 +08:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Scott McKay	927bac0f86	Rework allocator sharing to work for multiple devices. (#11700 ) * Rework allocator sharing to work for multiple devices. * Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP). * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. * Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory. * Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup. * Add some clarifying comments around how replace allocator works. * Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class. * Fix invalid check of whether data is on CPU to use device info instead of allocator name.	2022-06-09 17:38:38 +10:00
Dwayne Robinson	5e54611427	DML EP add Trilu-14 and Resize-13 nearest mode and others (#11782 ) * Add Trilu-14 kernel * Support Resize with rounding direction for round_prefer_ceil/round_prefer_floor * Add batch normalization query and RNN query * Appease CPPLINT.cfg per https://raw.githubusercontent.com/google/styleguide/gh-pages/cpplint/cpplint.py to reduce the noise	2022-06-08 19:08:00 -07:00
Dwayne Robinson	0f0b640b4b	Reformat build.py for WindowsAI branch (#11794 )	2022-06-08 18:05:11 -07:00
Alex Fuller	8156b9370c	[Abseil] Adding URL_HASH so that an existing archive can be used from disk (#11690 )	2022-06-08 17:12:59 -07:00

1 2 3 4 5 ...

6879 commits