onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-06 00:03:22 +00:00

Author	SHA1	Message	Date
RajalakshmiSR	5d8c5409ab	POWER10: QGEMM optimization (#10642 ) * POWER10: QGEMM optimization This patch makes use of POWER10 MMA feature for QGEMM function. This optimization includes signed and unsigned cases.Tested and there are no new failures with gcc11 and clang-14. * Changes as per review comments Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2022-03-02 08:36:26 -08:00
Valery Chernov	62cc981599	[TVM EP] support of TVM Virtual Machine (#10341 ) * add executor option (vm or graph) and support virtual machine methods * nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime) * get output shapes for VM * remove run_with_benchmark. remove run methods from python api, get it from native side * get outputs method for VM was implemented * support multiple input for VM * update python logging and exception * small fix * update tvm with patch for VM API * update nhwc transformations for TVM EP * add data alignment check and support set_input_zero_copy for GE in TVM EP * fix logger name * return back to apache/tvm with VM fixes instead of local dev branch * hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host * flake8 fix Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-02 11:02:33 +01:00
Edward Chen	9e7d7a9e97	Convert ConvActivationFusion transformer to a selector action transformer. (#10687 )	2022-03-02 13:47:55 +10:00
Yulong Wang	f4b2d3af2b	Upgrade emsdk to 3.1.3 (#10577 )	2022-02-28 23:52:41 -08:00
Dmitri Smirnov	2679711bee	Refactor transformers and other code to reduce memory allocation calls (#10523 ) Work on minimizing memory management calls by reducing number of allocations and copies. Replace std::unordered_set to InlinedHashSet and add usage of InlinedVector. Employ std::move() to minimize copying and memory allocations. Remove copying of the const shared data into each of the PropagateCast transformer instances. Move inlined_containers.h header to include/common Adjust AsSpan imlementation for C++ < 17	2022-02-24 16:17:14 -08:00
Alexey Gladyshev	7dc7529ec8	[TVM EP] Integrate tests for TVM EP into public onnxruntime CI (#10505 ) * add support for bool type * add TVM EP support for tests * include TVM EP in python test pool * fix pylint * moved technical imports to a separate file * clean up post build actions & move _ld_preload.py extension to CMake level * add files for include TVM EP into CI * implement custom logger for TVM * replace TVM logging with ONNX RT logging * update link for TVM EP tutorial * clean up TVM EP cmake * add pybind auto enabling for TVM EP * fix blank spaces * code review fixes * replace print with comment * add list of EP without TVM EP * enable onnx tests * disable contrib ops and ml ops * reuse Dockerfile.ubuntu * Move install_tvm_test_dependencies.sh out of Docker context dir, update build definition. Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2022-02-24 16:24:23 +01:00
Scott McKay	8cfa4b1c17	Fix build errors due to changes in warnings that VS 2022 17.1 produces. (#10621 ) Disable warning about padding for abseil-cpp flat_hash_map. Disable some warnings from compiling the test proto. This also required removing a line in CMakeList.txt where we move a level 4 warning to level 3. That ends up later on the command line and overrides the `/wd4800`. Couldn't find a way to handle that nicely. As we compile with `/W4` the value of moving 4800 to level 3 in dev mode is unclear so simplest was to remove that. Open to suggestions if there's a better way.	2022-02-23 07:32:07 +10:00
Justin D. Harris	742694f679	[python] [orttraining] Add utility to export a graph to compute gradients (#8125 )	2022-02-18 14:00:49 -08:00
Scott McKay	df841ee87d	Fix incorrect type constraint registration for operator kernels. (#10489 ) * Fix incorrect type constraint registration for RoiAlign. This led to the input type not actually being checked when matching a kernel as the invalid constraint name is treated as a missing optional input. * fix missing dependency for the unit test exe. Whilst it doesn't link against the CUDA providers lib, without the dependency VS doesn't know it needs to rebuild the library if there are changes. * Add check for invalid type constraints. * Fix invalid registrations for other kernels. * Add hash replacement logic to provide backwards compatibility in ORT format models when the registration is fixed. * Add tests	2022-02-18 16:55:32 +10:00
Scott McKay	2ca9566994	Add range of helpers for making usage of ORT Mobile easier. (#10458 ) * Add range of helpers for making usage of ORT Mobile easier.	2022-02-18 07:35:25 +10:00
Chi Lo	fad590a059	Enhance TRT EP unit tests (#10493 ) * Re-write tensorrt ep cache test * refactor the code * refactor * move stdc++fs flag to CMakeLists.txt	2022-02-17 10:30:03 -08:00
Ashwini Khade	f436d3437e	Add layout transformer for NNAPI (#10371 ) * Add layout transformer for NNAPI * plus merge fixes * plus some more merge fixes * test fixes * comments + cleanup * plus updates * post merge changes * enable layout transformer in extended minimal build * plus more comments * more tests + fix CI * plus updates per review * more updates per review * fix file name * fix qdq tests * plus more updates * plus updates * typo fix * fix qdq selection in 2nd optimization pass * fix typo * fix a test * update dependency structure for layout transformer * plus updates * more updates * plus change * more updates to fix linker error in minimal build * remove unnecessary headers	2022-02-15 20:25:29 -08:00
Valery Chernov	1cdc23aba4	[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260 ) * update java API for STVM EP. Issue is from PR#10019 * use_stvm -> use_tvm * rename stvm worktree * STVMAllocator -> TVMAllocator * StvmExecutionProviderInfo -> TvmExecutionProviderInfo * stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict * STVMRunner -> TVMRunner * StvmExecutionProvider -> TvmExecutionProvider * tvm::env_vars * StvmProviderFactory -> TvmProviderFactory * rename factory funcs * StvmCPUDataTransfer -> TvmCPUDataTransfer * small clean * STVMFuncState -> TVMFuncState * USE_TVM -> NUPHAR_USE_TVM * USE_STVM -> USE_TVM * python API: providers.stvm -> providers.tvm. clean TVM_EP.md * clean build scripts #1 * clean build scripts, java frontend and others #2 * once more clean #3 * fix build of nuphar tvm test * final transfer stvm namespace to onnxruntime::tvm * rename stvm->tvm * NUPHAR_USE_TVM -> USE_NUPHAR_TVM * small fixes for correct CI tests * clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files * update CUDA support for TVM EP * roll back CudaNN home check * ERROR for not positive input shape dimension instead of WARNING * update documentation for CUDA * small corrections after review * update GPU description * update GPU description * misprints were fixed * cleaned up error msgs Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>	2022-02-15 10:21:02 +01:00
Edward Chen	3199074ac7	Update QDQ propagation transformer to insert QDQ nodes (#10487 ) Update QDQ propagation transformer to insert new QDQ nodes instead of moving the existing one. This creates a more consistent `DQ -> op -> Q` pattern for other components to recognize. Upgrade this transformer to a basic level optimization as it yields a valid ONNX graph.	2022-02-14 14:20:03 -08:00
Baiju Meswani	7691e7ed12	Introduce load balancing dataset samplers (#10163 )	2022-02-14 13:46:14 -08:00
Edward Chen	f92e47e95b	Remove onnxruntime_util dependency on onnxruntime_framework (#10512 ) There's a circular dependency between onnxruntime_util and onnxruntime_framework. Remove onnxruntime_util's dependency on onnxruntime_framework.	2022-02-10 19:17:08 -08:00
Changming Sun	7a2bf3c24c	Reorganize contrib op schemas (#10494 )	2022-02-09 09:31:58 -08:00
Maxiwell S. Garcia	6bbf016dc4	cmake: disable 'attributes' error to fix the build with GCC < 9.x This patch fixes the error "requested alignment X is larger than Y" in older GCC's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357	2022-02-03 13:38:19 -08:00
Chi Lo	a7c67860a5	Reduce test time for TensorRT EP CI (#10408 ) * expand model tests name * skip cpu/cuda for trt when running onnxruntime_test_all * only run trt ep for c++ unit test * Update CMAKE_CUDA_ARCHITECTURES for T4 * Use new t4 agent pool * Update YAML for run T4 on Windows * revert code * Update CMAKE_CUDA_ARCHITECTURES * fix wrong value * Remove cpu/cuda directly in model tests * add only CMAKE_CUDA_ARCHITECTURES=75 * remove expanding model test name to see difference * revert code * Add fallback execution provider for unit test * Add fallback execution provider for unit test (cont) * add conditional to add fackback cuda ep * Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs * use M60 * revert code * revert code * add comments * Modify code and add comment * modify comment * update comment * add comment	2022-02-01 15:56:33 -08:00
Edward Chen	c43c1691ad	Enable transpose optimizer in minimal extended build (#10349 ) Enable transpose optimizer and infrastructure it depends on in a minimal extended build.	2022-01-31 09:41:04 -08:00
Guoyu Wang	5f0ba31890	Remove coremltools submodule security vulnerability and copy the coreml model schema (#10424 ) * remove coremltools submodule * update cgmanifest * Copy proto files directly from coremltools	2022-01-28 12:48:48 -08:00
Chen Fu	c4f1dfcfaa	Cfu s8s8 (#10413 ) Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv. Perf number with single thread: Nokia G10 (baseline / new) in ms Pixel 4 (baseline/new) in ms mobilenet_edgetpu 220 / 213 18.5 / 17.6 cartoongan 8537 / 8521 967 / 928 Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-01-28 09:26:52 -08:00
Changming Sun	b14da94fc1	Exclude CETCOMPAT from Windows ARM build (#10417 )	2022-01-27 17:57:01 -08:00
Xavier Dupré	481b96d32a	STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. (#10211 ) * STVM, checks pointers are not null. * removes submodules tvm * add missing include(FetchContent) * add target tvm * fix stvm test * extend cgmanifest with dependencies of tvm	2022-01-27 20:31:13 +01:00
Changming Sun	ec4362f8f3	Enable more static analysis warnings and enable the analyzer for training cpu (#10176 )	2022-01-27 11:17:20 -08:00
Hariharan Seshadri	27a4af6074	Fix some BinSkim defects (#10400 )	2022-01-26 20:22:22 -08:00
Weixing Zhang	ea9c8a7cdc	support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Support MIGraphXEP to work with ROCMEP for inference on AMD GPU	2022-01-26 15:52:56 -08:00
Yulong Wang	847801f5be	[wasm] update emscripten v2.0.34 (#10391 )	2022-01-26 14:46:02 -08:00
Guoyu Wang	4af116649c	[QDQ] Hookup NNAPI GetCapability/Compile with shared QDQ selectors (#10347 ) * add qdqgroup as input for NodeUnit * minor update * hookup nnapi_ep * minor update * update compiler setting * Add a simple UT * Pipeline change to add build minimal extended with NNAPI for Android * move GetAllNodeUnits to node_unit.h, add UT for NodeUnits, minor updates * minor updates * address CR comments Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2022-01-25 17:13:46 -08:00
Xavier Dupré	6e95c0316d	Builds onnxruntime + eager mode with the same value for _GLIBCXX_USE_CXX11_ABI as pytorch (#10114 ) * add _GLIBCXX_USE_CXX11_ABI * restrict to eager mode	2022-01-25 11:25:31 +01:00
Chen Fu	2afce4830c	Symmetric QGEMM (#10289 ) Adding code for symmetric quantized matrix multiplication. Used in quantized convolution, achieving significant perf gain. TODO, use Symmetric Quantized GEMM in other operators! TODO address activation buffer overread in custom allocators and tensors supplied by users. DOT kernel perf test: Pixel 5a: Cartoongan 513.539 ms 471.786 ms Efficient 57.5169 ms 56.4174 ms Edgetpu 14.6673 ms 13.5959 ms NEON kernel perf test Pixel 3a Cartoongan 1423.53 ms 1069.92 ms Efficient 114.086 ms 107.968 ms Edgetpu 39.2632 ms 36.9839 ms Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-01-24 10:49:04 -08:00
Dmitri Smirnov	7e092a7e3f	Reduce number of memory allocations based on a customer profiling case (#10193 ) Add abseil and inlined containers typedefs Introduce TensorShapeVector for shape building. Use gsl::span<const T> to make interfaces accept different types of vector like args. Introduce InineShapeVectorT for shape capacity typed instantiations Refactor cuda slice along with provider shared interfaces Refactor Concat, Conv, Pad Build with Conv Einsum and ConvTranspose refactored. Remove TesnorShape::GetDimsAsVector() Refactor SliceIterator and SliceIteratorBase Refactor broadcast Refactor Pads for twice as long Remove memory planner intermediate shapes vector Refactor orttraining Fix passing TenshroShapeVector to tests Remove abseil copy and submodule, use FetchContent_Declare/Fetch Path with separate command Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.	2022-01-24 10:40:46 -08:00
Abhishek Jindal	4aa7cee0d8	Abjindal/clean eager backend (#10055 ) * clearing map for eager mode backends * clearing map for eager mode backends manager * making OrtBackendsManager an extern variable and trying to delete it * cleaning backends manager when the python interpret exits * adding ifdef for eager mode code * disabling warning for pybind state file * disabling warning for python module file * running clang auto format and reducing redundancy * remove new line * moving declaration to a new header file * adding the header file for eager mode for python module * removing source files for eager mode * add source file for python module in eager mode * Update orttraining/orttraining/python/orttraining_python_module_eager.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-01-19 14:20:09 -08:00
Sunghoon	b038f4e56f	Add a build option to create a WebAssembly static library (#10184 ) * add p50 in test * Add a build option to create a WebAssembly static library Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2022-01-18 18:05:04 -08:00
Rachel Guo	a099bd454b	[QDQ] Add shared qdq selectors (#10178 ) * wip * wip * wip * wip * wip * save * minor changes * update test graph name * address pr comments * update * address pr comments * address pr comments * fix warning * minor include fix * update to nodegroupselectors * delete unnecessary includes Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-01-11 19:41:45 -08:00
Shucai Xiao	ce103ace93	Amdmigraphx fix build error (#9272 ) * fix build error * rename a missing api for the MIGraphX EP	2022-01-10 15:18:43 -08:00
Ye Wang	5ebb857501	Update onnxruntime_unittests.cmake (#10215 )	2022-01-07 16:14:15 -08:00
Edward Chen	34c025109c	Exclude graph_runtime_optimization_test.cc from reduced ops build. (#10191 )	2022-01-05 09:22:38 -08:00
Ye Wang	2803a9465d	Add example of registering custom cuda op as shared lib (#10025 )	2022-01-05 09:22:15 -08:00
Edward Chen	792db33f01	Enable loading of ORT format model graph runtime optimizations (#9901 ) Initial implementation of load/replay of runtime optimizations in an ORT format model.	2022-01-04 12:09:07 -08:00
Tongliang Liao	1d3b34cc92	Add `.git` suffix to github URL. Although github works with both, this is more precise. Having an extension also makes it easy to match with regex, when we want to inject code to reroute traffic to our own git mirror.	2022-01-03 14:38:35 -08:00
Yufeng Li	7208fcbe1c	use wasmscalar as default kernel (#9988 ) * use wasmscalar as default kernel	2022-01-03 10:55:08 -08:00
Edward Chen	3bc91c2151	Move reduced ops files into build directory (#10030 ) In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.	2021-12-28 19:04:20 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
Guoyu Wang	f3c72de718	[QDQ] Add shared NodeUnit class (#10052 ) * initial change * move more function to node_unit * Remove commented code * Minor update * Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * address CR comments Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2021-12-16 17:37:51 -08:00
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
George Wu	16274beb6f	update TensorRT EP to use TensorRT 8.2 (#9981 ) * update base image from 11.4.0 to 11.4.2 * update Linux TRT GPU pipeline to TRT 8.2 * update onnx-tensorrt to 8.2-GA * disable failing TensorRT 8.2 tests. * update pad test. * fix * update win trt ci pipeline to trt 8.2 * test run with cuda 11.4 and cudnn 8.2 * increase timeout * revert * revert * update packaging pipelines to use trt 8.2 * fix typo * update trt gpu perf pipeline to trt 8.2 * increase timeout * delete deprecated ci-perf-pipeline.yml * bump timeout * adjust timeout packaging	2021-12-15 15:59:31 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
Chen Fu	cd0af7ad44	Symmetric quantized convolution kernel ARM64 (#9772 ) Adding a symmetric quantized convolution kernel for ARM64 Note: Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-12-13 21:14:45 -08:00
George Nash	d0b08af37a	Implementation of QAttention for the DNNL execution provider (#10004 ) * Add QAttention to DNNL EP Add QAttention to DNNL EP (limited support and disable for gpu) update ONEDNN version to 2.4.4 bug fix in getcapability add memory debug print Signed-off-by: Wang <zhaoyang.wang@intel.com> * Address Code Review + MatMulInteger Fix clean up code and add comments fix matmulinteger and add fusion rule to enable initialized vector weight zero points of 0s update DNNL_TAG to v2.5 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Linux Compile Fix + rollback ONEDNN to 2.4.4 Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com> * Fix QAttention Debug build Signed-off-by: Wang <zhaoyang.wang@intel.com> * Fix QAttention build if USE_DNNL not specified Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>	2021-12-10 21:50:13 -08:00

1 2 3 4 5 ...

1001 commits