onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Changming Sun	26fceca90f	Update tools/ci_build/upload_python_package_to_azure_storage.py to not use the azure blob storage python package (#11114 )	2022-04-06 14:30:51 -07:00
Maajid khan	81fa28bc56	OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025 ) * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Modification to include new api 2.0 changes in the code * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Log comments updated * Changes to enable 2.0 api * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix build issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes issues Fixes compiler warnings c4458 on windows. Fixes the bug in device_type check logic Adds print info for enable_opencl_throttling option in onnxruntime_perf_test Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> commit to make openvino_2021.4 compatible * Fixed IO Buffer Optimization * Fix output names issue * Fix 2021.3 branch * Bug Fix for Multiple inputs/outputs - Assigns the right output_name and input_name for the graph when returned by CompiledModel::inputs() OV function. - Also takex care of output mismatch issue b/w openvino output and onnx output Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add comments for the changes made Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * IO Buffer Changes * Commit for Disabling GPU Throttling for 2021.4 * Updated branch * Fix windows build ->Fixed windows build in debug mode ->Disabled scatternd3_tensor_int64 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed CPP Unit tests for CPU -Fixed shrink, MVN, ReduceL2, Maxpool, upsample, scatter, slice, reshape, unsqueeze. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed first set of GPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed additional failing tests on GPU ->Added conditions to disable certain ops under certain conditions ->Disabled certain tests ->Added some op supports for no_dimension supported Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added Expand op support for CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added condition for squeeze op ->Shape can't have empty axes attribute Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add support for LessOrEqual op function Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV Interface wait for replaced by indefinite wait call * use names from ONNX model to access OV tensors This chnage is to use the input/output names retrieved from original onnx model to access OV tensors and to check if there's any input or output names mismatch b/w ONNX naming and OV naming. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes Myriad unit tests and other issues ->Fixes Myriad CPP unit tests ->Fixes output mismatch issue with models with sub graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix segfault issue ->Fixed case 3b condition in get_capability() which was causing the segfault issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed build isuse with ov 2021.4 with I/O buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disables performance counters for I/O Buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed inputs/outputs mismatch for HDDL with 2022.1 Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com> * Fix to enable GPU FP16 * Enabled mlperf_ssd_mobilenet_300 model fully on CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added ov version specific dll packaging for nuget * Fixed conditions for few ops Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Dockerfile updates * Updated License Info -Updated the copyrights License Info -modified FP16 transformations with OV 2022.1 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling mlperf_ssd_mobilenet_300 model ->Disabled this model for openvino. The test is failing in Internal_CI pipelines. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling failing python CPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 python errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: hdgx <harinix.d.g@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>	2022-04-06 13:30:33 -07:00
Xavier Dupré	3f42665a40	Improve transfered time from ort to torch (#9610 ) * Improve transfered time from ort to torch * Use static_cast * fix call to Python API for python <= 3.8 * investigation * fix ref counts * disable import if no training * one function to convert multiple ortvalues * add proto_type * enforce dlpack->deleter to be not null * fix _ortvalues_to_torch_tensor for eager mode * rename proto_type into element_type in the Python API * conversion from ort to torch 2x times faster * fix conversion of list of OrtValue * replace has_bool_tensor by bool_tensor_indices * introduce _ortvalues_to_torch_tensor_list * use _ortvalues_to_torch_tensor_list for cache * fix ambiguity between c and python classes Co-authored-by: xavier dupré <xavier.dupre@gmail.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-04-06 09:12:58 +02:00
Scott McKay	58d97691ac	Set dims for constant with multiple values (#11116 ) * Also fix issue with data transfer not handling Tensor<std::string> correctly.	2022-04-06 07:39:07 +10:00
Abhishek Jindal	91c940b619	adding fill scalar for torch ones direct initialization on ort device (#10898 ) * adding fill scalar for torch ones direct initialization on device and adding test case for it * using ConstantOfShape to for implementing fill Scalar in atenops * adding case for handling at::Tensor attribute * handling the at::Tensor type for ConstantOfShape * handling the at::Tensor type for ConstantOfShape with attr type * handling the at::Tensor type case * converting the data to tensor in case of aten tensor mapping is needed * handling aten tensor case * handling aten tensor case and reversing the string case * changing type of scalar	2022-04-05 11:17:25 -07:00
G. Ramalingam	2c2408814f	Add function body for SoftmaxCrossEntropyLossGrad (#10779 ) * Add function definition for SoftmaxCrossEntropyLossGrad Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Cleanup Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Eliminate unused variable Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix index of weight tensor Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * A few fixes to handle typing and weight Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix for zero D dimensions Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add function body to internal op also Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * A few fixes Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix type variable name Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix type constraint var Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix ignore_index handling in testcase Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> * Add fun def for SoftmaxCrossEntropyLossInternal Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> * Specify opset Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> * Handle opset in NLL function Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Address PR feedback Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Modify onehot Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Eliminate duplicate statement Co-authored-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-05 10:52:40 -07:00
Ben Niu	20fbf603d3	Fix ARM64EC build breaks (#11111 ) Apply this `4c015dbb49` to fix ARM64EC build breaks.	2022-04-05 10:00:42 -07:00
Erick Muñoz	25fdf8b167	Add Dequantize Linear operator on OneDNN EP (#11036 )	2022-04-05 08:32:26 -07:00
Baiju Meswani	8db180c245	orttraining cuda 10.2 to not build for compute_80 (#11103 )	2022-04-04 17:22:05 -07:00
Jack·Boos·Yu	01631893cd	[cmake] Re-factor pre-compile header usage (#11093 )	2022-04-04 16:28:34 -07:00
Changming Sun	fc7fe0012f	Fix: nodejs installer file name is wrong (#11097 )	2022-04-04 16:24:08 -07:00
Olivia Jain	872ed91d8a	Perf FasterRCNN + MaskRCNN (#11102 ) * add faster mask * fix paths	2022-04-04 13:23:25 -07:00
chethanpk	112dec6565	Added code for FusedMatMul inside matmul op primitive (#11077 )	2022-04-04 10:00:02 -07:00
Jack·Boos·Yu	ea004e953f	[cmake] Export multi targets in static build (#11063 ) * [cmake] Export multi targets in static build * Install more components in static build, format some code * Fix code pos	2022-04-03 22:37:18 -07:00
Jack·Boos·Yu	2dfd81b9bb	[cmake] Add option onnxruntime_ENABLE_CPUINFO (#11084 )	2022-04-01 22:29:27 -07:00
Changming Sun	25398cc5fe	Add cleanup instruction to run_dockerbuild.sh (#11079 )	2022-04-01 22:18:56 -07:00
Baiju Meswani	f9940f17b1	Remove extra-index-url to avoid nuget security analysis vulnerability (#11082 )	2022-04-01 18:30:55 -07:00
Chun-Wei Chen	b9279f637d	update How_To_Update_ONNX_Dev_Notes with right paths (#11074 )	2022-04-01 08:05:31 -07:00
Changming Sun	588a66e221	Add cleanup steps to the build jobs which run in Linux CPU machine pool (#11078 )	2022-03-31 22:34:12 -07:00
Baiju Meswani	249c4dec7f	Update orttraining release pipelines to use torch 1.11.0 (#11018 ) * Update orttraining release pipelines to use torch 1.11.0 * Change requirements_torch...txt to requirements.txt * Update cuda cmake architectures and clean up old files	2022-03-31 21:51:06 -07:00
Changming Sun	8e6dbad287	FIX: Nuget pipeline doesn't report binary size for Linux ARM64 In #10652 #10637 #10624, we changed the RID. But I forgot to update this part.	2022-03-31 18:32:05 -07:00
wejoncy	11a4ca741d	fuse Conv+Add+activation for CPU from different op-branch (#10987 ) * Fuse op conv Add and activation from two branch * simplify code Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-04-01 09:25:17 +08:00
dependabot[bot]	79e4ed8064	Bump pytorch-lightning Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.5.10 to 1.6.0. - [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases) - [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md) - [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.5.10...1.6.0) --- updated-dependencies: - dependency-name: pytorch-lightning dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-31 16:51:24 -07:00
Boris Fomitchev	eab7c0d5bf	Fixing optimizer failure due to missing provider list (#10497 ) Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>	2022-03-31 11:05:49 -07:00
Linnea May	bfcd5bd4a2	remove hardcoded library name (#11058 ) Co-authored-by: Linnea May <linneamay@microsoft.com>	2022-03-31 10:41:31 -07:00
Yulong Wang	8dcadba670	[js] aggregation of recent dependabot security warnings fix (#11060 ) * update package-lock.json * Bump minimist from 1.2.5 to 1.2.6 in /js/react_native Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Bump minimist from 1.2.5 to 1.2.6 in /js/react_native/e2e Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Bump plist from 3.0.4 to 3.0.5 in /js/react_native Bumps [plist](https://github.com/TooTallNate/node-plist) from 3.0.4 to 3.0.5. - [Release notes](https://github.com/TooTallNate/node-plist/releases) - [Changelog](https://github.com/TooTallNate/plist.js/blob/master/History.md) - [Commits](https://github.com/TooTallNate/node-plist/commits) --- updated-dependencies: - dependency-name: plist dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Bump ansi-regex from 4.1.0 to 4.1.1 in /js/react_native Bumps [ansi-regex](https://github.com/chalk/ansi-regex) from 4.1.0 to 4.1.1. - [Release notes](https://github.com/chalk/ansi-regex/releases) - [Commits](https://github.com/chalk/ansi-regex/compare/v4.1.0...v4.1.1) --- updated-dependencies: - dependency-name: ansi-regex dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Bump plist from 3.0.4 to 3.0.5 in /js/react_native/e2e Bumps [plist](https://github.com/TooTallNate/node-plist) from 3.0.4 to 3.0.5. - [Release notes](https://github.com/TooTallNate/node-plist/releases) - [Changelog](https://github.com/TooTallNate/plist.js/blob/master/History.md) - [Commits](https://github.com/TooTallNate/node-plist/commits) --- updated-dependencies: - dependency-name: plist dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Bump ansi-regex from 4.1.0 to 4.1.1 in /js/react_native/e2e Bumps [ansi-regex](https://github.com/chalk/ansi-regex) from 4.1.0 to 4.1.1. - [Release notes](https://github.com/chalk/ansi-regex/releases) - [Commits](https://github.com/chalk/ansi-regex/compare/v4.1.0...v4.1.1) --- updated-dependencies: - dependency-name: ansi-regex dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-31 02:06:04 -07:00
dependabot[bot]	e9c68d57ca	Bump minimist from 1.2.5 to 1.2.6 in /js/web (#11033 ) Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-30 16:26:34 -07:00
Yulong Wang	6c7090a829	[js/web] fix output type mapping (#11049 )	2022-03-30 16:26:04 -07:00
RandySheriffH	9505e8c6c1	fix json format (#11046 ) Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-03-30 16:15:33 -07:00
Adam Pocock	9616ad483f	[Java] Support configuring CUDA and TensorRT execution providers (#10697 ) Java side parts for configuring CUDA and TensorRT. Adding tests for CUDA and TensorRT. Refactoring library loading logic as provider options need to have their shared library loaded before they can be constructed.	2022-03-30 14:26:51 -07:00
Yulong Wang	179406bd25	[JS] upgrade package-lock.json from v1 to v2 (#11039 ) * upgrade package-lock.json from v1 to v2 * upgrade requirement of nodejs version to 16.x	2022-03-30 13:30:28 -07:00
Nat Kershaw (MSFT)	998bf0fdb6	Remove advice to use IO Binding for this scenario (#11006 )	2022-03-30 10:23:50 -07:00
Xavier Dupré	c37d2728bf	Implement TreeEnsemble for opset(ai.onnx.ml)==3 (#10821 ) * Implement TreeEnsemble for opset(ai.onnx.ml)==3 * use of InlineVector * refactoring * improve attributes retrieval * avoid creating a temporary buffer * modifies onnx.ml.cpu.json * use unordered_map * update docs/OperatorKernels.md * address PR comments (TH -> ThresholdType, ORT_RETURN...) * add a python unit test to load a TreeEnsembleRegressor following ai.onnx.ml==3 specifications	2022-03-30 12:53:12 +02:00
Yulong Wang	1424b796ff	[js/web] disable test_tan temorarily (#11048 )	2022-03-29 21:47:52 -07:00
Yi Zhang	d1bdd2cd94	allow trailing slash in directory (#11001 ) * allow trailing slash in directory * fix lint	2022-03-30 09:42:57 +08:00
ytaous	5868413caf	fix seg fault (#11038 ) Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-03-29 14:12:45 -07:00
Edward Chen	8f456735d1	Remove unused variable. (#11043 )	2022-03-29 14:11:07 -07:00
Erick Alejandro Muñoz Alvarado	6c005bfdbc	Enabled Cast operator on OneDNN EP (#11023 )	2022-03-29 08:16:01 -07:00
Vincent Wang	6a6840d5c6	Fuse LayerNormalization for Apex O2 (#10233 )	2022-03-29 21:22:04 +08:00
Vincent Wang	3b6cee8059	[CUDA] Optimize Conv and ConvGrad for Training (#10999 ) * Optimize Conv and ConvGrad for Training * add provider option to control * fix typo	2022-03-29 07:31:36 +08:00
Chi Lo	8ba52b0a05	Bump master version to 1.12 (#10797 ) * bump master version to 1.11 * bump master version to 1.12	2022-03-28 12:30:11 -07:00
Edward Chen	9371401746	Move node EP assignment for ORT format into SessionState::FinalizeSessionState() (#10944 ) Follow up to #10904. - Move node EP assignment for ORT format into SessionState::FinalizeSessionState(). - Add unit test for #10904. - Make convert_onnx_models_to_ort.py optimization level configurable via environment variable.	2022-03-28 10:37:22 -07:00
Baiju Meswani	9c6cc018a9	Add utility to get the gradient graph from GradientGraphBuilder (#10995 ) * Add pybind method to get the gradient graph * Fix segmentation fault because of logging for gradien building	2022-03-25 17:13:56 -07:00
Chen Fu	dc72159105	Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862 ) ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions. This change adds a Symmetric Quant indirect Conv kernel for a55 micro-architecture, where we replace ldr q4,[x1], with ldr d4,[x1], ldr x11,[x1], ins v4.d[1],x11 so that we can try to hide the memory load cycles behind computing cycles in the kernel. With this new kernel, cartoongan model shows significant perf improvement on Pixel5a little cores (2 threads running on two little cores): new kernel: 2188.59 ms old kernel: 2360.61 ms	2022-03-25 17:10:47 -07:00
leqiao-1	8ddc45f52d	Add linux and macos arm64 java aritifacts (#10981 )	2022-03-25 16:23:17 -07:00
Jack·Boos·Yu	d1be71eaa3	[cmake] Add keyword STATIC to add_library in function onnxruntime_add_static_library (#10998 )	2022-03-25 16:19:36 -07:00
Chandru Ramakrishnan	cb31b7eab1	Fixed creation of ORT_Value to pass offset of 0 (#11004 )	2022-03-25 15:52:10 -04:00
Scott McKay	47c09e6701	Clarify usage of kOnnxDomainAlias. (#10962 ) * Clarify usage of kOnnxDomainAlias.	2022-03-25 09:52:59 +10:00
pengwa	89ef987ab1	Improve NonZero on CUDA/ROCM (#10307 ) * improve NonZero * fix megatron_fp16 optimzier, fix the doc * multi_tensor_applier * resolve comment * fix building warning * fix build error when enabling training and use tensorrt	2022-03-25 07:35:45 +08:00
mpapdiwala	1e917c879e	Adding support for saving and loading train step info properties in the state dict and checkpoint file. (#10569 ) * Adding optimization step and step parameter to the ORTTrainer constructor * Added ORTTrainerOptions for optimization step * Adding Train Step Info Settings to State Dictionary * Adding train step info key * Updating comments * Reverting changes * Updating test case for new state dict entry train_step_info	2022-03-24 11:50:45 -07:00

1 2 3 4 5 ...

6613 commits