onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-25 22:26:24 +00:00

Author	SHA1	Message	Date
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
George Wu	16274beb6f	update TensorRT EP to use TensorRT 8.2 (#9981 ) * update base image from 11.4.0 to 11.4.2 * update Linux TRT GPU pipeline to TRT 8.2 * update onnx-tensorrt to 8.2-GA * disable failing TensorRT 8.2 tests. * update pad test. * fix * update win trt ci pipeline to trt 8.2 * test run with cuda 11.4 and cudnn 8.2 * increase timeout * revert * revert * update packaging pipelines to use trt 8.2 * fix typo * update trt gpu perf pipeline to trt 8.2 * increase timeout * delete deprecated ci-perf-pipeline.yml * bump timeout * adjust timeout packaging	2021-12-15 15:59:31 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
Chen Fu	cd0af7ad44	Symmetric quantized convolution kernel ARM64 (#9772 ) Adding a symmetric quantized convolution kernel for ARM64 Note: Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-12-13 21:14:45 -08:00
George Nash	d0b08af37a	Implementation of QAttention for the DNNL execution provider (#10004 ) * Add QAttention to DNNL EP Add QAttention to DNNL EP (limited support and disable for gpu) update ONEDNN version to 2.4.4 bug fix in getcapability add memory debug print Signed-off-by: Wang <zhaoyang.wang@intel.com> * Address Code Review + MatMulInteger Fix clean up code and add comments fix matmulinteger and add fusion rule to enable initialized vector weight zero points of 0s update DNNL_TAG to v2.5 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Linux Compile Fix + rollback ONEDNN to 2.4.4 Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com> * Fix QAttention Debug build Signed-off-by: Wang <zhaoyang.wang@intel.com> * Fix QAttention build if USE_DNNL not specified Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>	2021-12-10 21:50:13 -08:00
Chi Lo	4669048b47	Handle compiler warnings for TRT EP (#9956 ) * fix error C4996 * remove wd4996 and fix error C4966 * fix typo * remove wd4996 for onnx-tensorrt * remove more /wd for onnx-tensorrt * gix bug for strncpy_s of (Buffer is too small && 0) * fix code to remove warning 4244 * fix code to remove warning 4267 * remove /wd4267 /wd4244 * fix bug * change int to size_t * using size_t instead of int * use float instead of double * Use size_t instead of int * use size_t instead of int * use size_t instead of int. Also fix typo	2021-12-09 15:33:52 -08:00
Dmitri Smirnov	a7abd541c7	Correct message type (#9973 )	2021-12-09 10:00:44 -08:00
Patrik Vavercak	fb30e9fdae	Remove /safeseh link option from non-msvc builds (#9744 ) (#9935 )	2021-12-08 11:44:00 -08:00
Yi-Hong Lyu	f60a287a64	Add __x86.get_pc_thunk.bx to avoid dependency (#9955 )	2021-12-08 04:50:41 -08:00
Dmitri Smirnov	a7f649db7c	Enable proper override using MIMalloc (#9944 ) Redirect memory allocations to MiMalloc and advance its version to v2.0.3 Refactor for a universal ifdef	2021-12-07 17:56:58 -08:00
Guoyu Wang	b34b991aea	Improve reduced ops and types build (#9908 ) * Improve reduceops and types build * minor update * fix test error * fix minimal build break * minor update and add comments * Address CR comments	2021-12-07 13:02:05 -08:00
Justin Stoecker	63c8889944	Restore arm64x onnxruntime binaries (#9950 )	2021-12-07 12:39:46 -08:00
Yufeng Li	e613019174	add s8s8 support for quantized conv and gemm (#9902 ) * add s8s8 support for quantized conv and gemm	2021-12-03 14:55:18 -08:00
Jeff Daily	8d88a6ac7f	add --amdgpu-target=gfx90a (#9820 )	2021-12-01 22:28:52 -08:00
Abhishek Jindal	740679d329	Abjindal/fix windows ci pipeline (#9883 ) * switching to /wd4800 for eager mode * fixing compile flags ignore warnings, previously it was only using the last one	2021-11-30 10:33:13 -08:00
RandySheriffH	9345894c82	Add build option to enable cuda profiling (#9875 )	2021-11-29 22:44:50 -08:00
Maajid khan	0ae0f29f14	[OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848 ) * Changes to ensure openvino build go through in Windows * Modified Hetero plugin Logic Modified Hetero Feature logic. In Hetero, if the operator to be marked true in getcapability(), it should be supported by either of the devices specified with HETERO in the device_type. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> OV updated to 2021.4.2 version * OV updated to 2021.4.2 version * Updated OV to 2021.4.2 version, mono download link and dotnet version * Copying Managed nugets in openvino c# docker file *Copying Managed nuget to nugets artifacts directory Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: saharfraza <sfatima.3001@gmail.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>	2021-11-23 13:12:08 -08:00
RajalakshmiSR	8564fc1933	POWER10: Add optimized dgemm kernel (#9652 ) * POWER10: Add optimized dgemm kernel This patch makes use of POWER10 matrix multiply assist feature and adds new DGEMM kernel. * Indentation update Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-11-22 20:28:21 -08:00
Dwayne Robinson	32419974ad	Merge remote-tracking branch 'origin/master' into user/dwayner/DML1.8forORT1.10	2021-11-19 05:20:26 -08:00
Dwayne Robinson	e0ffc30a0b	Update to 1.8.0	2021-11-19 04:44:32 -08:00
Zhang Lei	8ef6aff734	Zhalei/dwqconv3x3 5x5 arm64 (#9714 ) * Arm64 Depthwise Convolution 3x3. * Add 5x5 intrinsic dwqconv for arm64 * rebase to master, remove no-need logic after arm64 convsym enabled. * Some more adjustment on the instrunction pipeling. * Add specific test cases. * Fix test dimension too small. * Fix build warning as error on some CI. * better format, etc.	2021-11-18 13:57:16 -08:00
Changming Sun	76715ad525	Delete ioscross code (#9793 )	2021-11-18 11:31:13 -08:00
Hariharan Seshadri	e23892ddbe	Support disabling support for the optional type in ORT builds (#9745 )	2021-11-17 19:13:28 -08:00
Dwayne Robinson	99afb87a02	Update DirectML 1.5.1 to 1.8.0 for ORT1.10	2021-11-15 21:17:25 -08:00
sfatimar	1d03baa8cc	Openvino ep 2021.4 v3.3 (#9588 ) * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Please commit error message and rectification of param.context * Alignment fixed Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Changed the string to OpenVINO_GPU * hanged OpenVINO to to OpenVINO_CPU * Onnxruntime updated API for memory location * Removing Duplicate LOG Error * Tensor.h removed DeviceType function. Updated comment * API Comments updated * Removing changes to Provider Indo * Erronous commit * Removing Extra logs * Merge CMAKE * Not copy from a local location * Duplicate Entry * Remove extra line Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>	2021-11-15 13:41:12 -08:00
Chen Fu	1c84621020	Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655 ) Adding ARM64 depthwise convolution kernel for symmetric quantization Motivation and Context Two improvements against current kernel code : 1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication. 2. Unrolled loop with manual software pipelining Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-11-15 12:18:43 -08:00
Tang, Cheng	99257eb8e3	support build option to include external graph transformers (#9478 ) * temp code * support external graph transformer from build script * remove debug code * add test case * support register rewrite rule * fix source_group issue if external source is not share any common prefix * fix python code style checker * resolve merge conflict Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-11-15 08:16:20 -08:00
Edward Chen	9f69d8bbae	Disable partial runtime optimization implementation by default (#9748 ) * Only serialize runtime optimization records container if non-empty. * Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet. * Disable partial runtime optimization implementation by default.	2021-11-12 17:37:29 -08:00
Sheil Kumar	a17bdaf725	Enable JoinModels API in WinML+RT Experimental API (#9746 ) * Dynamic onnx model fusion * empty node names shoudl remain empty * comments and cleanup * logic reversed for promoting_unlined_outputs * PR feedback * type * typo * fix model outputs with promote unlinked output * remove disembodied model Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-11-12 16:56:31 -08:00
Edward Chen	997266a620	Add build.py option to disable ORT format model runtime optimization (#9723 ) ORT format model runtime optimization implementation is in progress. This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.	2021-11-11 18:05:45 -08:00
Tang, Cheng	6420530b3a	fix the mkl dependency for eager mode (#9702 ) * explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency * use find_lib to get libtorch lib name * temp fix * add missing libraries Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-11-09 08:52:55 -08:00
Changming Sun	53afaefe3b	Refactor Windows CI pipeline yaml files (#9672 )	2021-11-08 11:11:49 -08:00
Ginés Hidalgo	13e64f8ff7	Remove all warnings C4800: Implicit conversion from 'int32_t/int64_t' to bool. Possible information loss (#9535 )	2021-11-08 10:12:27 -08:00
Yulong Wang	c6fddb263f	Add Node.js binding support to packaging pipeline (#9577 )	2021-11-05 15:29:40 -07:00
Changming Sun	1cbbafdbe0	Change the default value of onnxruntime_DISABLE_RTTI (#9674 )	2021-11-05 15:27:04 -07:00
Weixing Zhang	e11fde0179	libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. (#9618 ) * libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-11-01 19:12:09 -07:00
Edward Chen	c315d1b3cd	Always enable ORT format model loading. (#9586 )	2021-11-01 10:00:08 +10:00
Ginés Hidalgo	79436a2d5b	Avoided warning C5038 (#9543 ) Updated several DML EP files to avoid warning C5038: data member 'member1' will be initialized after data member 'member2' / base class 'base_class' More information: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c5038?view=msvc-160	2021-10-30 00:36:22 -07:00
Jingqiao Fu	f7774a91d6	Add api-ms-win-core-com-l1-1-0.dll, shlwapi.dll, oleaut32.dll to delay load (#9619 )	2021-10-29 18:54:23 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
TomWildenhain-Microsoft	e8268c9a18	Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284 ) * Add Transpose Optimizer and modify nhwc optimizer to use it. * Fix casts * Fix casts2 * Fix move * Add tests * Add headers * Fixes and tests * Remove explicit template instantiation * Fix build warning * Name unit tests * Code review fixes * Add some comments * Fix some casts * Make optimization slightly less agressive * Some unit test fixes * Update Attention pattern to work with transpose optimizer * Update attention fuser * Fix attention fusion python script * Improve transpose optimizer documentation * Create OptimizerCtx struct * Disable Slice handler for testing * Implement Slice int32 * Only push transposes leading up to other transposes * Improve optimization heuristic * Add exemption for MaxPool * Document transpose optimizer api.h * Revert fusion tests to master * Remove temp files * Replace typedef with using * Trim trailing whitespace * Move class declarations from api_impl.h to api_impl.cc * Remove copy constructors and move allocator * Alphabetize headers * Add override keyword * Comments for nhwc_transformer * Rename OrtGraph to ApiGraph, etc. * Wrap line * Remove extra qualifier on ApiGraph * Refector attention fusion * Remove c-style casts from api_impl.cc * Improve documentation * Avoid printing vector in ORT_ENSURES * Revert attention fusion refactor * Remove duplicate cost heuristics and improve documentation * Fix size_t casts * Fixes from Scott's review * Unrevert attention refactor and more updates from Scott's review * Revert api_impl.cc ValueInfo change * only optimize first transpose input * Unrevert api_impl.cc changes * Make vector call reserve * transpose_optimizer.cc update from Scott's comments * Rename api::Graph to api::GraphRef etc. * Consider domains 'onnx.ai' and '' equal * Replace AddInput with SetInput * Improve tests * quantization and heuristic tests * Comments for tests * Replace const string_view with string_view and update tests * Fixes requested by Edward * Fix std::string to string_view conversion * Add <string> to includes * Fix bug for broadcasting ops with unknown rank. Slight safety improvements * Changes requested by Edward * Fix formatting * Improve description of cost metric	2021-10-27 22:10:39 -07:00
Scott McKay	b5a652c578	Add Xamarin support (#9436 ) Add Xamarin support to the ORT nuget packages. - Update C# code to support Xamarin builds for iOS and Android - refactor some things to split out common code - include iOS and Android ORT native shared library in native nuget package	2021-10-27 20:07:07 +10:00
RajalakshmiSR	c54ad0dd0b	POWER: Add Dgemm kernel for POWER processor (#9459 ) * POWER: Add Dgemm kernel for POWER processor This patch adds new dgemm kernel specific to POWER processor. * POWER: Restrict new functions to VSX in header * Remove warning check in header * POWER: Dgemm Adjust indentation Fixing indentation based on review comments. Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-10-26 20:27:24 -07:00
Yulong Wang	90555bf96d	[node.js binding] enable CI for macOS arm64 (#9532 ) * nodejs aggr * add dependency * no unzip * fix aggregation * add arm64 for mac * mac arm64 build * fix commandline * add check for multi-CMAKE_OSX_ARCHITECTURES * fix	2021-10-26 16:42:19 -07:00
Changming Sun	f39821adbc	Fix a bug in CMakeLists.txt when handling NO RTTI (#9547 )	2021-10-26 14:29:29 -07:00
Jingqiao Fu	da15f5fc2f	change cmake condition to prevent WCOS fom linking advapi32 (#9500 ) * change condition to prevent WCOS fom linking advapi32.dll * Remove linkage to advapi32.lib	2021-10-26 12:16:49 -07:00
Stella Stamenova	542f1a9737	Cleanup some whitespace and capitalization for set (#9504 )	2021-10-26 12:02:07 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
Changming Sun	f92b8e2ac8	Clean up optional-lite references (#9534 )	2021-10-25 21:05:45 -07:00
Yulong Wang	bf4c3fa3d6	[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501 )	2021-10-25 20:16:10 -07:00

1 2 3 4 5 ...

956 commits