onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-20 21:40:57 +00:00

Author	SHA1	Message	Date
RajalakshmiSR	8564fc1933	POWER10: Add optimized dgemm kernel (#9652 ) * POWER10: Add optimized dgemm kernel This patch makes use of POWER10 matrix multiply assist feature and adds new DGEMM kernel. * Indentation update Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-11-22 20:28:21 -08:00
Dwayne Robinson	32419974ad	Merge remote-tracking branch 'origin/master' into user/dwayner/DML1.8forORT1.10	2021-11-19 05:20:26 -08:00
Dwayne Robinson	e0ffc30a0b	Update to 1.8.0	2021-11-19 04:44:32 -08:00
Zhang Lei	8ef6aff734	Zhalei/dwqconv3x3 5x5 arm64 (#9714 ) * Arm64 Depthwise Convolution 3x3. * Add 5x5 intrinsic dwqconv for arm64 * rebase to master, remove no-need logic after arm64 convsym enabled. * Some more adjustment on the instrunction pipeling. * Add specific test cases. * Fix test dimension too small. * Fix build warning as error on some CI. * better format, etc.	2021-11-18 13:57:16 -08:00
Changming Sun	76715ad525	Delete ioscross code (#9793 )	2021-11-18 11:31:13 -08:00
Hariharan Seshadri	e23892ddbe	Support disabling support for the optional type in ORT builds (#9745 )	2021-11-17 19:13:28 -08:00
Dwayne Robinson	99afb87a02	Update DirectML 1.5.1 to 1.8.0 for ORT1.10	2021-11-15 21:17:25 -08:00
sfatimar	1d03baa8cc	Openvino ep 2021.4 v3.3 (#9588 ) * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Please commit error message and rectification of param.context * Alignment fixed Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Changed the string to OpenVINO_GPU * hanged OpenVINO to to OpenVINO_CPU * Onnxruntime updated API for memory location * Removing Duplicate LOG Error * Tensor.h removed DeviceType function. Updated comment * API Comments updated * Removing changes to Provider Indo * Erronous commit * Removing Extra logs * Merge CMAKE * Not copy from a local location * Duplicate Entry * Remove extra line Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>	2021-11-15 13:41:12 -08:00
Chen Fu	1c84621020	Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655 ) Adding ARM64 depthwise convolution kernel for symmetric quantization Motivation and Context Two improvements against current kernel code : 1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication. 2. Unrolled loop with manual software pipelining Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-11-15 12:18:43 -08:00
Tang, Cheng	99257eb8e3	support build option to include external graph transformers (#9478 ) * temp code * support external graph transformer from build script * remove debug code * add test case * support register rewrite rule * fix source_group issue if external source is not share any common prefix * fix python code style checker * resolve merge conflict Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-11-15 08:16:20 -08:00
Edward Chen	9f69d8bbae	Disable partial runtime optimization implementation by default (#9748 ) * Only serialize runtime optimization records container if non-empty. * Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet. * Disable partial runtime optimization implementation by default.	2021-11-12 17:37:29 -08:00
Sheil Kumar	a17bdaf725	Enable JoinModels API in WinML+RT Experimental API (#9746 ) * Dynamic onnx model fusion * empty node names shoudl remain empty * comments and cleanup * logic reversed for promoting_unlined_outputs * PR feedback * type * typo * fix model outputs with promote unlinked output * remove disembodied model Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-11-12 16:56:31 -08:00
Edward Chen	997266a620	Add build.py option to disable ORT format model runtime optimization (#9723 ) ORT format model runtime optimization implementation is in progress. This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.	2021-11-11 18:05:45 -08:00
Tang, Cheng	6420530b3a	fix the mkl dependency for eager mode (#9702 ) * explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency * use find_lib to get libtorch lib name * temp fix * add missing libraries Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-11-09 08:52:55 -08:00
Changming Sun	53afaefe3b	Refactor Windows CI pipeline yaml files (#9672 )	2021-11-08 11:11:49 -08:00
Ginés Hidalgo	13e64f8ff7	Remove all warnings C4800: Implicit conversion from 'int32_t/int64_t' to bool. Possible information loss (#9535 )	2021-11-08 10:12:27 -08:00
Yulong Wang	c6fddb263f	Add Node.js binding support to packaging pipeline (#9577 )	2021-11-05 15:29:40 -07:00
Changming Sun	1cbbafdbe0	Change the default value of onnxruntime_DISABLE_RTTI (#9674 )	2021-11-05 15:27:04 -07:00
Weixing Zhang	e11fde0179	libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. (#9618 ) * libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-11-01 19:12:09 -07:00
Edward Chen	c315d1b3cd	Always enable ORT format model loading. (#9586 )	2021-11-01 10:00:08 +10:00
Ginés Hidalgo	79436a2d5b	Avoided warning C5038 (#9543 ) Updated several DML EP files to avoid warning C5038: data member 'member1' will be initialized after data member 'member2' / base class 'base_class' More information: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c5038?view=msvc-160	2021-10-30 00:36:22 -07:00
Jingqiao Fu	f7774a91d6	Add api-ms-win-core-com-l1-1-0.dll, shlwapi.dll, oleaut32.dll to delay load (#9619 )	2021-10-29 18:54:23 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
TomWildenhain-Microsoft	e8268c9a18	Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284 ) * Add Transpose Optimizer and modify nhwc optimizer to use it. * Fix casts * Fix casts2 * Fix move * Add tests * Add headers * Fixes and tests * Remove explicit template instantiation * Fix build warning * Name unit tests * Code review fixes * Add some comments * Fix some casts * Make optimization slightly less agressive * Some unit test fixes * Update Attention pattern to work with transpose optimizer * Update attention fuser * Fix attention fusion python script * Improve transpose optimizer documentation * Create OptimizerCtx struct * Disable Slice handler for testing * Implement Slice int32 * Only push transposes leading up to other transposes * Improve optimization heuristic * Add exemption for MaxPool * Document transpose optimizer api.h * Revert fusion tests to master * Remove temp files * Replace typedef with using * Trim trailing whitespace * Move class declarations from api_impl.h to api_impl.cc * Remove copy constructors and move allocator * Alphabetize headers * Add override keyword * Comments for nhwc_transformer * Rename OrtGraph to ApiGraph, etc. * Wrap line * Remove extra qualifier on ApiGraph * Refector attention fusion * Remove c-style casts from api_impl.cc * Improve documentation * Avoid printing vector in ORT_ENSURES * Revert attention fusion refactor * Remove duplicate cost heuristics and improve documentation * Fix size_t casts * Fixes from Scott's review * Unrevert attention refactor and more updates from Scott's review * Revert api_impl.cc ValueInfo change * only optimize first transpose input * Unrevert api_impl.cc changes * Make vector call reserve * transpose_optimizer.cc update from Scott's comments * Rename api::Graph to api::GraphRef etc. * Consider domains 'onnx.ai' and '' equal * Replace AddInput with SetInput * Improve tests * quantization and heuristic tests * Comments for tests * Replace const string_view with string_view and update tests * Fixes requested by Edward * Fix std::string to string_view conversion * Add <string> to includes * Fix bug for broadcasting ops with unknown rank. Slight safety improvements * Changes requested by Edward * Fix formatting * Improve description of cost metric	2021-10-27 22:10:39 -07:00
Scott McKay	b5a652c578	Add Xamarin support (#9436 ) Add Xamarin support to the ORT nuget packages. - Update C# code to support Xamarin builds for iOS and Android - refactor some things to split out common code - include iOS and Android ORT native shared library in native nuget package	2021-10-27 20:07:07 +10:00
RajalakshmiSR	c54ad0dd0b	POWER: Add Dgemm kernel for POWER processor (#9459 ) * POWER: Add Dgemm kernel for POWER processor This patch adds new dgemm kernel specific to POWER processor. * POWER: Restrict new functions to VSX in header * Remove warning check in header * POWER: Dgemm Adjust indentation Fixing indentation based on review comments. Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-10-26 20:27:24 -07:00
Yulong Wang	90555bf96d	[node.js binding] enable CI for macOS arm64 (#9532 ) * nodejs aggr * add dependency * no unzip * fix aggregation * add arm64 for mac * mac arm64 build * fix commandline * add check for multi-CMAKE_OSX_ARCHITECTURES * fix	2021-10-26 16:42:19 -07:00
Changming Sun	f39821adbc	Fix a bug in CMakeLists.txt when handling NO RTTI (#9547 )	2021-10-26 14:29:29 -07:00
Jingqiao Fu	da15f5fc2f	change cmake condition to prevent WCOS fom linking advapi32 (#9500 ) * change condition to prevent WCOS fom linking advapi32.dll * Remove linkage to advapi32.lib	2021-10-26 12:16:49 -07:00
Stella Stamenova	542f1a9737	Cleanup some whitespace and capitalization for set (#9504 )	2021-10-26 12:02:07 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
Changming Sun	f92b8e2ac8	Clean up optional-lite references (#9534 )	2021-10-25 21:05:45 -07:00
Yulong Wang	bf4c3fa3d6	[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501 )	2021-10-25 20:16:10 -07:00
marcusfreisleben	651955d3c9	CUDA: Enable parallel compilation (#8974 ) * Pass on parallel option to nvcc * Fixed build.py * Added missing string conversion * Adressed review points	2021-10-25 16:42:58 -07:00
Stella Stamenova	d608504438	Don't use legacy mode for protobuf (#9498 )	2021-10-22 16:50:29 -07:00
Changming Sun	d83adaaf9f	Remove optional-lite (#9424 )	2021-10-22 16:45:45 -07:00
Stella Stamenova	49b66c7486	NFC: Normalize whitespace around if statements in CMakeLists.txt (#9464 ) Always add a space after if to make the file consistent	2021-10-21 15:35:58 -07:00
Stella Stamenova	9fc53df33a	Only add aliasing to targets if the corresponding package was found (#9404 )	2021-10-20 11:32:08 -07:00
Changming Sun	406f1629c1	Remove Featurizers code (#9300 )	2021-10-20 10:20:35 -07:00
Yufeng Li	da3dd398c5	Kernels for QLinearConv with symmetrically quantized filter (#9323 ) Add kernels for QLinearConv with symmetric quantized filter, e.g., filter type is int8 and zero point of filter is 0. This PR includes kernels for avx2, avxvnni, avx512 and avx 512 vnni. Will adds kernels for ARM64 in following PR. Kernels uses direct input buffer directly for pointwise, and in-direct buffer for depthwise and non-group conv. The advantages of those new kernels are: no need to compute the sum of each pixel output image, and sum/offset of filter can be combined with bias. with in-direct buffer, im2col returns an array of buffer pointers instead of memcpy'ing the original data. This saves memcpy time and reduces the size of the intermediate buffer needed to hold the im2col transform. In the future, will compute im2col ahead of time for input with fixed input size.	2021-10-18 19:40:18 -07:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Abhishek Jindal	23700a15a0	Abjindal/eager windows build (#9326 ) * removing warnings which are causing errors from torch and changing flags for Windows * adding MKL library resolution and comments * cleaning up the code * fixing onnxruntime_python file for windows build * fix the include order to aovid the python_d.lib issue on win debug build * changes for warnings, typos and other comments * merge conflict * adding fix for mkl library error * Revert "adding fix for mkl library error" This reverts commit `73b87c73c2`. * fix for dll path for windows * typo for dll path Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-10-14 12:54:49 -07:00
Sunghoon	2f1204a5d5	[js/web] Enable wasm profiling and preserve function names in profiling (#9314 ) * add p50 in test * allow WebAssembly profiling and preserve function names Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2021-10-11 22:04:50 -07:00
Maajid khan	72c4cea9e6	[OpenVINO-EP] V3.2 Release (#9232 ) * model caching changes for 2021.4 Signed-off-by: Your Name <you@example.com> * changed the ov version check * Minor changes added Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added support for external data format Starting from OpenVINO 2021.4 version, OpenVINO-EP will support onnx models with Weights saved in external file location. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Introduced Hetero/Multi options for perf_test Enabled to use HETERO/MULTI device feature from OpenVINO-EP using the onnxruntime_perf_test tool. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * cleaned up CMake code for older OV version support OV 2020.3 is now longer supported by OpenVINO-EP. This check is not required now. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add option to disable graph partitioning Added a option to diable graph partitioning during build time for OpenVINO-EP. with this option, when the model is not fully supported on OpenVINO-EP, the model fully fall backs to default CPU EP (MLAS). Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Changed the flag for diabling graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes the flake8 check error Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added changes for disable graph partition option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 indentation error Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: Your Name <you@example.com>	2021-10-07 16:02:19 -07:00
Gary Miguel	e2b1852eec	Build: respect onnxruntime_PREFER_SYSTEM_LIB for more things (#9181 ) This is based on a patch applied locally by https://github.com/conda-forge/onnxruntime-feedstock. Having this in master seems useful.	2021-10-06 13:49:28 -07:00
baijumeswani	bcdb411c8d	Implement FusedAdam for ORT adapted from DeepSpeed (#9266 )	2021-10-05 20:50:34 -07:00
Tiago Koji Castro Shibata	11a391a88f	Port ARM64x support (#9230 )	2021-10-01 13:06:43 -07:00
Yulong Wang	e2d779246a	[wasm] remove deprecated prefix 'EXTRA_' in emcc flags (#9211 )	2021-09-30 16:02:24 -07:00
Yulong Wang	8c57d51928	support WebAssembly SIMD for qgemm (#9191 ) * support WebAssembly SIMD for qgemm * remove '--experimental-wasm-bulk-memory' for test	2021-09-30 12:40:56 -07:00
Thiago Crepaldi	ceb51dda4a	Support external torch cpp extensions on ORTModule (#9223 )	2021-09-30 10:37:35 -04:00

1 2 3 4 5 ...

939 commits