onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

Author	SHA1	Message	Date
daquexian	3cbbf9dcae	Fix wasm static lib in sub-project (#11671 ) * wasm_static_lib_global Signed-off-by: daquexian <daquexian566@gmail.com> * make wasm static lib global Signed-off-by: daquexian <daquexian566@gmail.com> * fix the property Signed-off-by: daquexian <daquexian566@gmail.com> * add code missing after merge Signed-off-by: daquexian <daquexian566@gmail.com>	2022-06-14 15:18:11 -07:00
Gary Miguel	e8b0d24071	Support per-test tolerances for ONNX tests (#11775 ) Prior to this every test shared the same tolerances. This meant that if an ONNX test failed due to a small but acceptable difference in output, the only alternative was to disable the test entirely. In op set 17, the DFT operator is being added. Without this change, the tests for that operator fail because the output is off by about 5e-5. It's better to keep test coverage for this new op rather than disable the test entirely. Also prior to this change, the global tolerances were not shared between C++, JavaScript, and Python tests. Now they are. Also fix various minor issues raised by linters. Unblocks https://github.com/microsoft/onnxruntime/issues/11640.	2022-06-14 15:12:23 -07:00
Scott McKay	6bf6bac1fd	Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829 )	2022-06-14 09:31:17 +10:00
Hector Li	7582644f57	cmake changes for SNPE EP (#11821 ) * move code used to find the SNPE libs to a separate cmake file * Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict	2022-06-13 08:15:37 -07:00
Guenther Schmuelling	d4ea59654c	make xnnpack build for ort-web (#11745 ) * make xnnpack build for ort-web * make ci happy	2022-06-10 08:47:57 -07:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Alex Fuller	8156b9370c	[Abseil] Adding URL_HASH so that an existing archive can be used from disk (#11690 )	2022-06-08 17:12:59 -07:00
Changming Sun	eeeb249a27	Update onnxruntime_providers.cmake to remove the reference of "onnxruntime_tvm_dependencies" (#11780 )	2022-06-08 09:06:00 -07:00
Valery Chernov	4296968f20	[TVM EP] update set input method for VirtualMachine (#11674 ) * update TVM * get alignment constant from TVM * update TVM_VM_SetInputs to upstream with TVM API * fix CI issue: update TVM EP dependencies * add sudo * revert changes needed to install missing package * add package for TVM EP CI Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2022-06-04 09:31:01 +02:00
Hector Li	95a16c1ffe	Snpe ep (#11665 ) * Initiate Ort SNPE EP * fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master * correct the source path for libonnxruntime.so while building for andorid package * add AdditionalDependencies for amr64 * On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. * fix build failure if snpe is not enabled * update doc for contrib op * separate out snpe ep settings to onnxruntime_snpe_provider.cmake * renaming according review comments * update according review comments	2022-06-03 14:10:02 -07:00
Scott McKay	4445dd6bc1	XNNPACK EP (#11445 ) * Implement XNNPACK support via an EP. * Layout transform uses the GraphPartitioner infrastructure. * Node fusion is supported. * Conv and MaxPool implementations were ported from Changming's PR. * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled	2022-06-03 20:22:34 +10:00
Scott McKay	4fabc400de	Fix CUDA 11.6 build error on Windows (#11578 ) * Avoid windows header that defines 'small'	2022-05-28 08:04:46 +10:00
Yi Zhang	a3f05da338	Revert "[TVM EP] update set input to remove excess copying inside TVM (#11247 )" (#11504 ) This reverts commit `5ae461ec0a`.	2022-05-13 02:27:36 +08:00
Tianlei Wu	ece1274ffa	revert safeint version (#11500 )	2022-05-12 11:24:43 -07:00
Tianlei Wu	f5473596fa	Change longformer default kernel (#11470 ) * change default to compact memory kernel * Remove a cuda stream synchronize that is not needed * Update longformer benchmark tool	2022-05-11 10:54:59 -07:00
symphonylyh	c2de603c10	Contrib ops for TRT plugin: Disentangled Attention Plugin (#11287 ) * Add disentangled attention TRT plugin as contrib op * update plugin name & remove null character * update onnx-tensorrt submodule with my beta version * use suggested plugin name & simpler shape propagation * update onnx-tensorrt gitsubmodule to temporary fork * update onnx-tensorrt to temporary commit * redirect submodule back to latest 8.2-GA release of onnx-tensorrt repo Co-authored-by: HHH-ComputeLab <haohangh@nvidia.com>	2022-05-08 15:25:25 -07:00
Dwayne Robinson	69b2fab810	Update DirectML from 1.8.0 to 1.8.2 (#11459 )	2022-05-06 17:52:52 -07:00
Tang, Cheng	3f3c5fcd68	Unify the Compile API for mobile build and normal build (#10632 ) * use the lightweight compile api as default; use dnnl ep for testing * apply to tensorrt ep * fix the missing files * fix build * fix the copy issue on linux * migrate migraphx and openvino ep * fix openvino build break * fix linux build * fix unused parameter * fix coreml build * use graph view's filtered initializers * fix openvino break * fix tvm compile api * fix tvm / rknpu / vitisai ep build * add IsInitializedTensor in graph_viewer; fix nuphar build * use serializer directly as tvm ep is still static lib * fix the type mismatch * fix the type mismatch * fix merge conflict * add a comment * fix minimal build * fix the DML EP's legacy approach * save type/shape in dnnl IR * fix linux break * fix tvm failure * dnnl ep: move initializer referenced out of dnnl subgraph * Revert "add IsInitializedTensor in graph_viewer; fix nuphar build" This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587. * add IsInitializedTensor to graph viewer * add the legacy code for nuphar build to temporarily make nuphar build work * ignore internal test for nuphar * remove the out of date tests * keep the legacy API in EP for a while * turn serializer into a static function * update comments * fix tvm build * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/framework/execution_provider.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * updatee comments; add warning message for legacy compil call * add a flag to control out of scope arg in serialization * fix trt build; improve the test * resolve merege errors * fix a typo Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-05-05 08:30:07 -07:00
Valery Chernov	5ae461ec0a	[TVM EP] update set input to remove excess copying inside TVM (#11247 ) * update TVM * small fixes * update TVM with new set_input and NDArray API * use set_input instead of set_one_input Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-05-05 14:25:02 +02:00
Tang, Cheng	4b875e3543	Re-implment the function support in onnxruntime (#11167 ) * initial fix * refactor the function handle * update the implementation * fix linux build break * fix training build * fix minmal build * fix gradient checker * deprecate the local function members in graph. host it in model * fix changming's comments * fix comments about inlined containers * fix a missed inlined container * fix training build * avoid const for std string_view Co-authored-by: Cheng Tang <chenta@microsoft.com>	2022-04-29 10:15:58 -07:00
Edward Chen	e194a01787	Update SafeInt version. (#11379 )	2022-04-28 10:51:59 -07:00
Dmitri Smirnov	a7d0158c24	Introduce a way to disable Abseil library (#11353 ) Introduce a way to disable Abseil library. Use cmake extra args, no new build switch.	2022-04-27 08:57:52 -07:00
Scott McKay	63d4f45186	Add stub implementation of the NNAPI interface (#11288 ) * Add stub implementation of the NNAPI interface so that model builder code can be unit tested on all platforms. Needed to fix a lot of type mismatch warnings. As these don't occur on Android builds used static_cast for simplicity.	2022-04-27 15:39:09 +10:00
Tianlei Wu	1d96cbec73	Move gpt2 script to models\gpt2 sub-directory (#11256 ) * move gpt-2 scripts to models\gpt2 * change gpt2 beam search helper to make test_gpt2 passes	2022-04-20 11:09:26 -07:00
cloudhan	013306c940	[MinBuild] 132KB minimal build binary size reduction via dummy __cxa_demangle (#11071 ) Minimal build binary size reduction via dummy __cxa_demangle	2022-04-21 00:11:10 +08:00
Lukas Berbuer	efb0928e2b	Fix find_package for benchmark	2022-04-18 15:25:43 -07:00
George Nash	d9eeb48393	One dnn v2.6 update (#11220 ) * Disable training code in DNNL LayerNorm code The capability code already does not claim the LayerNorm and SkipLayerNorm that require more than one output. However, building with training enabled was causing issues. The training specific code has been removed even when building with training enabled. Signed-off-by: George Nash <george.nash@intel.com> * Fix for DNNL FusedMatMul op. The bug was in the transpose code. Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep The dnnl ep does not currently have a way to pass memory_format information between the forward pooling primitive to the backward pooling primitive. This change explicitly sets the memory_format to use match that of Onnxruntime. For both the forward and backward pooling code. This will prevent using un-matched memory format that could result in an `unimplemented` error from dnnl ep. Signed-off-by: George Nash <george.nash@intel.com> * Update dnnl ep to use OneDNN v2.6 Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a calculation bug when doing Log or infinity valuse. The fix for this issue will be part of the next OneDNN release. Signed-off-by: George Nash <george.nash@intel.com> * Update PrintMemory function in dnnl ep This modification can be used to enable/disable memory printing for dnnl ep develpers. This is considered a developer only feature and is disabled by default. It must be enabled and code recompiled to use. Even if it is enabled it will not actually print any memory because the developer needs to take the extra step of spefifying the memory that will be printed to the screen. Signed-off-by: George Nash <george.nash@intel.com> * Update binary ops to run on intel GPU when using dnnl ep Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer call GetMemoryAndReshape in the past this would move the memory from CPU to the GPU. This extra call is no longer needed since it is taken care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape prevented copying the memory to GPU twice. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>	2022-04-15 12:51:11 -07:00
Xavier Dupré	833f5d5604	Remove dependancy on EP TVM in unit test project (#11170 )	2022-04-12 09:03:57 +02:00
Yi-Hong Lyu	749c0ddd1e	Upsample support NHWC (#10824 ) This patch implement bilinear interpolation for Upsample/Resize 4-D input with the outermost and innermost scale (usually channel of NHWC) as 1. It is parallelized with output_height * output_width instead of one dimension only. Besides, I also revert the HandleResize back to the original implementation for TransposeOptimizerTests.TestResize* tests. Finally, I add microbenchmark BM_NhwcUpsampleBilinear.	2022-04-11 11:39:17 -07:00
Tianlei Wu	00b595e389	move longformer and t5 to models subdirectory (#11161 ) * move longformer scripts to models subdirectory * Copy transformers\models\t5 to python package as well	2022-04-09 22:35:14 -07:00
Lukas	4c37f15c1b	Find boost, nsync, json, cpuinfo system libs with CMake option onnxruntime_PREFER_SYSTEM_LIB (#11146 )	2022-04-08 00:11:02 -07:00
Lukas	1b664e6d4c	Link cpuinfo only if supported (#11147 ) * Remove unnecessary target_include_directories for cpuinfo Headers already exposed as public by CMake target: `5916273f79/CMakeLists.txt (L213)` * Link to cpuinfo library only if supported	2022-04-07 21:32:12 -07:00
Justin Stoecker	7609694464	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Maajid khan	81fa28bc56	OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025 ) * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Modification to include new api 2.0 changes in the code * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Log comments updated * Changes to enable 2.0 api * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix build issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes issues Fixes compiler warnings c4458 on windows. Fixes the bug in device_type check logic Adds print info for enable_opencl_throttling option in onnxruntime_perf_test Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> commit to make openvino_2021.4 compatible * Fixed IO Buffer Optimization * Fix output names issue * Fix 2021.3 branch * Bug Fix for Multiple inputs/outputs - Assigns the right output_name and input_name for the graph when returned by CompiledModel::inputs() OV function. - Also takex care of output mismatch issue b/w openvino output and onnx output Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add comments for the changes made Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * IO Buffer Changes * Commit for Disabling GPU Throttling for 2021.4 * Updated branch * Fix windows build ->Fixed windows build in debug mode ->Disabled scatternd3_tensor_int64 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed CPP Unit tests for CPU -Fixed shrink, MVN, ReduceL2, Maxpool, upsample, scatter, slice, reshape, unsqueeze. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed first set of GPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed additional failing tests on GPU ->Added conditions to disable certain ops under certain conditions ->Disabled certain tests ->Added some op supports for no_dimension supported Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added Expand op support for CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added condition for squeeze op ->Shape can't have empty axes attribute Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add support for LessOrEqual op function Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV Interface wait for replaced by indefinite wait call * use names from ONNX model to access OV tensors This chnage is to use the input/output names retrieved from original onnx model to access OV tensors and to check if there's any input or output names mismatch b/w ONNX naming and OV naming. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes Myriad unit tests and other issues ->Fixes Myriad CPP unit tests ->Fixes output mismatch issue with models with sub graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix segfault issue ->Fixed case 3b condition in get_capability() which was causing the segfault issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed build isuse with ov 2021.4 with I/O buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disables performance counters for I/O Buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed inputs/outputs mismatch for HDDL with 2022.1 Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com> * Fix to enable GPU FP16 * Enabled mlperf_ssd_mobilenet_300 model fully on CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added ov version specific dll packaging for nuget * Fixed conditions for few ops Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Dockerfile updates * Updated License Info -Updated the copyrights License Info -modified FP16 transformations with OV 2022.1 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling mlperf_ssd_mobilenet_300 model ->Disabled this model for openvino. The test is failing in Internal_CI pipelines. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling failing python CPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 python errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: hdgx <harinix.d.g@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>	2022-04-06 13:30:33 -07:00
Ben Niu	20fbf603d3	Fix ARM64EC build breaks (#11111 ) Apply this `4c015dbb49` to fix ARM64EC build breaks.	2022-04-05 10:00:42 -07:00
Jack·Boos·Yu	01631893cd	[cmake] Re-factor pre-compile header usage (#11093 )	2022-04-04 16:28:34 -07:00
Jack·Boos·Yu	ea004e953f	[cmake] Export multi targets in static build (#11063 ) * [cmake] Export multi targets in static build * Install more components in static build, format some code * Fix code pos	2022-04-03 22:37:18 -07:00
Jack·Boos·Yu	2dfd81b9bb	[cmake] Add option onnxruntime_ENABLE_CPUINFO (#11084 )	2022-04-01 22:29:27 -07:00
Chen Fu	dc72159105	Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862 ) ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions. This change adds a Symmetric Quant indirect Conv kernel for a55 micro-architecture, where we replace ldr q4,[x1], with ldr d4,[x1], ldr x11,[x1], ins v4.d[1],x11 so that we can try to hide the memory load cycles behind computing cycles in the kernel. With this new kernel, cartoongan model shows significant perf improvement on Pixel5a little cores (2 threads running on two little cores): new kernel: 2188.59 ms old kernel: 2360.61 ms	2022-03-25 17:10:47 -07:00
leqiao-1	8ddc45f52d	Add linux and macos arm64 java aritifacts (#10981 )	2022-03-25 16:23:17 -07:00
Jack·Boos·Yu	d1be71eaa3	[cmake] Add keyword STATIC to add_library in function onnxruntime_add_static_library (#10998 )	2022-03-25 16:19:36 -07:00
pengwa	89ef987ab1	Improve NonZero on CUDA/ROCM (#10307 ) * improve NonZero * fix megatron_fp16 optimzier, fix the doc * multi_tensor_applier * resolve comment * fix building warning * fix build error when enabling training and use tensorrt	2022-03-25 07:35:45 +08:00
Shucai Xiao	7ee52fb8a0	amdmigraphx_ep-add ops to be supported by migraphx and fixed a bug in check ops to be supported (#10496 ) * backup debugging information related to debugging a jira ticket * fixed a bug in checking whether an input can be constand folded * added more operators that are supported by migraphx * revert unnecessary changes * remove unused logger parameter * rename function to make name style consistent * backup code changes * fix review comments * refactor graph utility functions to add unit tests * backup additional changes * fixed a link error in build migraphx_basic_test * add unit test for some migraphx utility functions * add more supported ops in migraphx	2022-03-23 19:17:19 -07:00
Baiju Meswani	565318ce86	Support ORT WASM compilation with the training flag (#10973 ) * Add training support for ORT web assembly compilation * Use wrapper for eigen includes in training	2022-03-22 16:13:35 -07:00
Yulong Wang	dce5d719c5	add build flag for emscripten settings (#10963 ) * allows multiple '--cmake_extra_defines' flags * fix flake8 error * Add build flag for emscripten settings * remove "emscripten_settings" in generate_build_tree() * format code	2022-03-22 11:55:45 -07:00
Sunghoon	b34d9f6867	[js/wasm] Add WebAssembly static library build into web CI pipeline (#10959 ) * add webassembly static library build into ci * add webassembly static library build into ci * skip publishing on static lib * fix type	2022-03-21 15:49:49 -07:00
Tiago Koji Castro Shibata	5ed2f4ad5f	Remove Windows Store specific code	2022-03-17 23:38:14 -07:00
Valery Chernov	625a1f7673	[TVM EP] code refactor (#10655 ) * rename info to options for TVM EP * transfer options processing from TVMExecutionProvider to TVMEPOptions * transfer TVMRunner to separated files * implement TVMCompiler class * replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider * correct logging of TVM EP options * RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented * add prepareComputeInfo method * remove update_output_shapes flag * embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner * refactor compileModel method * small cleaning * separate TVM EP options data store and processing * replace TvmTensorShape by InlinedVector with max_size 5 * correct indentation * update TVM hash Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-16 13:55:04 +01:00
Scott McKay	f385c73058	Fix a couple of issues with the python package tools (#10858 ) * Tweaks to the model utils * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle * make pytorch helpers accessible in package * make QDQ helpers accessible in package	2022-03-15 15:52:12 +10:00
Edward Chen	e53422c6d0	Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765 ) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.	2022-03-14 16:50:41 -07:00

1 2 3 4 5 ...

1070 commits