Commit graph

6613 commits

Author SHA1 Message Date
Changming Sun
26fceca90f
Update tools/ci_build/upload_python_package_to_azure_storage.py to not use the azure blob storage python package (#11114) 2022-04-06 14:30:51 -07:00
Maajid khan
81fa28bc56
OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025)
* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Modification to include new api 2.0 changes in the code

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Log comments updated

* Changes to enable 2.0 api

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix build issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes issues

*Fixes compiler warnings c4458 on windows.
*Fixes the bug in device_type check logic
*Adds print info for enable_opencl_throttling
option in onnxruntime_perf_test

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* commit to make openvino_2021.4 compatible

* Fixed IO Buffer Optimization

* Fix output names issue

* Fix 2021.3 branch

* Bug Fix for Multiple inputs/outputs

- Assigns the right output_name and
input_name for the graph when
returned by CompiledModel::inputs()
OV function.

- Also takex care of output mismatch
issue b/w openvino output and onnx
output

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add comments for the changes made

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* IO Buffer Changes

* Commit for Disabling GPU Throttling for 2021.4

* Updated branch

* Fix windows build

->Fixed windows build in debug mode
->Disabled scatternd3_tensor_int64

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed CPP Unit tests for CPU

-Fixed shrink, MVN, ReduceL2, Maxpool,
upsample, scatter, slice, reshape,
unsqueeze.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed first set of GPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed additional failing tests on GPU

->Added conditions to disable certain ops
under certain conditions

->Disabled certain tests

->Added some op supports for no_dimension
supported

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added Expand op support for CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added condition for squeeze op

->Shape can't have empty axes attribute

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add support for LessOrEqual op function

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV Interface wait for replaced by indefinite wait call

* use names from ONNX model to access OV tensors

This chnage is to use the input/output names
retrieved from original onnx model to access
OV tensors and to check if there's any input
or output names mismatch b/w ONNX naming
and OV naming.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes Myriad unit tests and other issues

->Fixes Myriad CPP unit tests
->Fixes output mismatch issue with models with
sub graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix segfault issue

->Fixed case 3b condition in get_capability()
which was causing the segfault issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed build isuse with ov 2021.4 with I/O buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disables performance counters for I/O Buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed inputs/outputs mismatch for HDDL with 2022.1

Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>

* Fix to enable GPU FP16

* Enabled mlperf_ssd_mobilenet_300 model fully on CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added ov version specific dll packaging for nuget

* Fixed conditions for few ops

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Dockerfile updates

* Updated License Info

-Updated the copyrights License Info
-modified FP16 transformations with OV 2022.1

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling mlperf_ssd_mobilenet_300 model

->Disabled this model for openvino. The
test is failing in Internal_CI pipelines.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling failing python CPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 python errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: hdgx <harinix.d.g@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>
2022-04-06 13:30:33 -07:00
Xavier Dupré
3f42665a40
Improve transfered time from ort to torch (#9610)
* Improve transfered time from ort to torch
* Use static_cast
* fix call to Python API for python <= 3.8
* investigation
* fix ref counts
* disable import if no training
* one function to convert multiple ortvalues
* add proto_type
* enforce dlpack->deleter to be not null
* fix _ortvalues_to_torch_tensor for eager mode
* rename proto_type into element_type in the Python API
* conversion from ort to torch 2x times faster
* fix conversion of list of OrtValue
* replace has_bool_tensor by bool_tensor_indices
* introduce _ortvalues_to_torch_tensor_list
* use _ortvalues_to_torch_tensor_list for cache
* fix ambiguity between c and python classes

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-04-06 09:12:58 +02:00
Scott McKay
58d97691ac
Set dims for constant with multiple values (#11116)
* Also fix issue with data transfer not handling Tensor<std::string> correctly.
2022-04-06 07:39:07 +10:00
Abhishek Jindal
91c940b619
adding fill scalar for torch ones direct initialization on ort device (#10898)
* adding fill scalar for torch ones direct initialization on device and adding test case for it

* using ConstantOfShape to for implementing fill Scalar in atenops

* adding case for handling at::Tensor attribute

* handling the at::Tensor type for ConstantOfShape

* handling the at::Tensor type for ConstantOfShape with attr type

* handling the at::Tensor type case

* converting the data to tensor in case of aten tensor mapping is needed

* handling aten tensor case

* handling aten tensor case and reversing the string case

* changing type of scalar
2022-04-05 11:17:25 -07:00
G. Ramalingam
2c2408814f
Add function body for SoftmaxCrossEntropyLossGrad (#10779)
* Add function definition for SoftmaxCrossEntropyLossGrad

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Cleanup

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate unused variable

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix index of weight tensor

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes to handle typing and weight

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix for zero D dimensions

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add function body to internal op also

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type variable name

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type constraint var

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix ignore_index handling in testcase

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Add fun def for SoftmaxCrossEntropyLossInternal

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Specify opset

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Handle opset in NLL function

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Address PR feedback

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Modify onehot

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate duplicate statement

Co-authored-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-05 10:52:40 -07:00
Ben Niu
20fbf603d3
Fix ARM64EC build breaks (#11111)
Apply this 4c015dbb49 to fix ARM64EC build breaks.
2022-04-05 10:00:42 -07:00
Erick Muñoz
25fdf8b167
Add Dequantize Linear operator on OneDNN EP (#11036) 2022-04-05 08:32:26 -07:00
Baiju Meswani
8db180c245
orttraining cuda 10.2 to not build for compute_80 (#11103) 2022-04-04 17:22:05 -07:00
Jack·Boos·Yu
01631893cd
[cmake] Re-factor pre-compile header usage (#11093) 2022-04-04 16:28:34 -07:00
Changming Sun
fc7fe0012f
Fix: nodejs installer file name is wrong (#11097) 2022-04-04 16:24:08 -07:00
Olivia Jain
872ed91d8a
Perf FasterRCNN + MaskRCNN (#11102)
* add faster mask

* fix paths
2022-04-04 13:23:25 -07:00
chethanpk
112dec6565
Added code for FusedMatMul inside matmul op primitive (#11077) 2022-04-04 10:00:02 -07:00
Jack·Boos·Yu
ea004e953f
[cmake] Export multi targets in static build (#11063)
* [cmake] Export multi targets in static build

* Install more components in static build, format some code

* Fix code pos
2022-04-03 22:37:18 -07:00
Jack·Boos·Yu
2dfd81b9bb
[cmake] Add option onnxruntime_ENABLE_CPUINFO (#11084) 2022-04-01 22:29:27 -07:00
Changming Sun
25398cc5fe
Add cleanup instruction to run_dockerbuild.sh (#11079) 2022-04-01 22:18:56 -07:00
Baiju Meswani
f9940f17b1
Remove extra-index-url to avoid nuget security analysis vulnerability (#11082) 2022-04-01 18:30:55 -07:00
Chun-Wei Chen
b9279f637d
update How_To_Update_ONNX_Dev_Notes with right paths (#11074) 2022-04-01 08:05:31 -07:00
Changming Sun
588a66e221
Add cleanup steps to the build jobs which run in Linux CPU machine pool (#11078) 2022-03-31 22:34:12 -07:00
Baiju Meswani
249c4dec7f
Update orttraining release pipelines to use torch 1.11.0 (#11018)
* Update orttraining release pipelines to use torch 1.11.0

* Change requirements_torch...txt to requirements.txt

* Update cuda cmake architectures and clean up old files
2022-03-31 21:51:06 -07:00
Changming Sun
8e6dbad287 FIX: Nuget pipeline doesn't report binary size for Linux ARM64
In #10652 #10637 #10624, we changed the RID. But I forgot to update this part.
2022-03-31 18:32:05 -07:00
wejoncy
11a4ca741d
fuse Conv+Add+activation for CPU from different op-branch (#10987)
* Fuse op conv Add and activation from two branch
* simplify code

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-04-01 09:25:17 +08:00
dependabot[bot]
79e4ed8064 Bump pytorch-lightning
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.5.10 to 1.6.0.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases)
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.5.10...1.6.0)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-31 16:51:24 -07:00
Boris Fomitchev
eab7c0d5bf
Fixing optimizer failure due to missing provider list (#10497)
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
2022-03-31 11:05:49 -07:00
Linnea May
bfcd5bd4a2
remove hardcoded library name (#11058)
Co-authored-by: Linnea May <linneamay@microsoft.com>
2022-03-31 10:41:31 -07:00
Yulong Wang
8dcadba670
[js] aggregation of recent dependabot security warnings fix (#11060)
* update package-lock.json

* Bump minimist from 1.2.5 to 1.2.6 in /js/react_native

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump minimist from 1.2.5 to 1.2.6 in /js/react_native/e2e

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump plist from 3.0.4 to 3.0.5 in /js/react_native

Bumps [plist](https://github.com/TooTallNate/node-plist) from 3.0.4 to 3.0.5.
- [Release notes](https://github.com/TooTallNate/node-plist/releases)
- [Changelog](https://github.com/TooTallNate/plist.js/blob/master/History.md)
- [Commits](https://github.com/TooTallNate/node-plist/commits)

---
updated-dependencies:
- dependency-name: plist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump ansi-regex from 4.1.0 to 4.1.1 in /js/react_native

Bumps [ansi-regex](https://github.com/chalk/ansi-regex) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/chalk/ansi-regex/releases)
- [Commits](https://github.com/chalk/ansi-regex/compare/v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: ansi-regex
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump plist from 3.0.4 to 3.0.5 in /js/react_native/e2e

Bumps [plist](https://github.com/TooTallNate/node-plist) from 3.0.4 to 3.0.5.
- [Release notes](https://github.com/TooTallNate/node-plist/releases)
- [Changelog](https://github.com/TooTallNate/plist.js/blob/master/History.md)
- [Commits](https://github.com/TooTallNate/node-plist/commits)

---
updated-dependencies:
- dependency-name: plist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump ansi-regex from 4.1.0 to 4.1.1 in /js/react_native/e2e

Bumps [ansi-regex](https://github.com/chalk/ansi-regex) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/chalk/ansi-regex/releases)
- [Commits](https://github.com/chalk/ansi-regex/compare/v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: ansi-regex
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-31 02:06:04 -07:00
dependabot[bot]
e9c68d57ca
Bump minimist from 1.2.5 to 1.2.6 in /js/web (#11033)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-30 16:26:34 -07:00
Yulong Wang
6c7090a829
[js/web] fix output type mapping (#11049) 2022-03-30 16:26:04 -07:00
RandySheriffH
9505e8c6c1
fix json format (#11046)
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-03-30 16:15:33 -07:00
Adam Pocock
9616ad483f
[Java] Support configuring CUDA and TensorRT execution providers (#10697)
Java side parts for configuring CUDA and TensorRT.
Adding tests for CUDA and TensorRT. Refactoring library loading logic as provider options need to have their shared library loaded before they can be constructed.
2022-03-30 14:26:51 -07:00
Yulong Wang
179406bd25
[JS] upgrade package-lock.json from v1 to v2 (#11039)
* upgrade package-lock.json from v1 to v2

* upgrade requirement of nodejs version to 16.x
2022-03-30 13:30:28 -07:00
Nat Kershaw (MSFT)
998bf0fdb6
Remove advice to use IO Binding for this scenario (#11006) 2022-03-30 10:23:50 -07:00
Xavier Dupré
c37d2728bf
Implement TreeEnsemble for opset(ai.onnx.ml)==3 (#10821)
* Implement TreeEnsemble for opset(ai.onnx.ml)==3
* use of InlineVector
* refactoring
* improve attributes retrieval
* avoid creating a temporary buffer
* modifies onnx.ml.cpu.json
* use unordered_map
* update docs/OperatorKernels.md
* address PR comments (TH -> ThresholdType, ORT_RETURN...)
* add a python unit test to load a TreeEnsembleRegressor following ai.onnx.ml==3 specifications
2022-03-30 12:53:12 +02:00
Yulong Wang
1424b796ff
[js/web] disable test_tan temorarily (#11048) 2022-03-29 21:47:52 -07:00
Yi Zhang
d1bdd2cd94
allow trailing slash in directory (#11001)
* allow trailing slash in directory

* fix lint
2022-03-30 09:42:57 +08:00
ytaous
5868413caf
fix seg fault (#11038)
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-03-29 14:12:45 -07:00
Edward Chen
8f456735d1
Remove unused variable. (#11043) 2022-03-29 14:11:07 -07:00
Erick Alejandro Muñoz Alvarado
6c005bfdbc
Enabled Cast operator on OneDNN EP (#11023) 2022-03-29 08:16:01 -07:00
Vincent Wang
6a6840d5c6
Fuse LayerNormalization for Apex O2 (#10233) 2022-03-29 21:22:04 +08:00
Vincent Wang
3b6cee8059
[CUDA] Optimize Conv and ConvGrad for Training (#10999)
* Optimize Conv and ConvGrad for Training

* add provider option to control

* fix typo
2022-03-29 07:31:36 +08:00
Chi Lo
8ba52b0a05
Bump master version to 1.12 (#10797)
* bump master version to 1.11

* bump master version to 1.12
2022-03-28 12:30:11 -07:00
Edward Chen
9371401746
Move node EP assignment for ORT format into SessionState::FinalizeSessionState() (#10944)
Follow up to #10904.
- Move node EP assignment for ORT format into SessionState::FinalizeSessionState().
- Add unit test for #10904.
- Make convert_onnx_models_to_ort.py optimization level configurable via environment variable.
2022-03-28 10:37:22 -07:00
Baiju Meswani
9c6cc018a9
Add utility to get the gradient graph from GradientGraphBuilder (#10995)
* Add pybind method to get the gradient graph

* Fix segmentation fault because of logging for gradien building
2022-03-25 17:13:56 -07:00
Chen Fu
dc72159105
Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric Quant indirect Conv kernel for a55 micro-architecture, where we replace

ldr q4,[x1],

with

ldr d4,[x1],
ldr x11,[x1],
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

With this new kernel, cartoongan model shows significant perf improvement on Pixel5a little cores (2 threads running on two little cores):

new kernel: 2188.59 ms
old kernel: 2360.61 ms
2022-03-25 17:10:47 -07:00
leqiao-1
8ddc45f52d
Add linux and macos arm64 java aritifacts (#10981) 2022-03-25 16:23:17 -07:00
Jack·Boos·Yu
d1be71eaa3
[cmake] Add keyword STATIC to add_library in function onnxruntime_add_static_library (#10998) 2022-03-25 16:19:36 -07:00
Chandru Ramakrishnan
cb31b7eab1
Fixed creation of ORT_Value to pass offset of 0 (#11004) 2022-03-25 15:52:10 -04:00
Scott McKay
47c09e6701
Clarify usage of kOnnxDomainAlias. (#10962)
* Clarify usage of kOnnxDomainAlias.
2022-03-25 09:52:59 +10:00
pengwa
89ef987ab1
Improve NonZero on CUDA/ROCM (#10307)
* improve NonZero

* fix megatron_fp16 optimzier, fix the doc

* multi_tensor_applier

* resolve comment

* fix building warning

* fix build error when enabling training and use tensorrt
2022-03-25 07:35:45 +08:00
mpapdiwala
1e917c879e
Adding support for saving and loading train step info properties in the state dict and checkpoint file. (#10569)
* Adding optimization step and step parameter to the ORTTrainer constructor

* Added ORTTrainerOptions for optimization step

* Adding Train Step Info Settings to State Dictionary

* Adding train step info key

* Updating comments

* Reverting changes

* Updating test case for new state dict entry train_step_info
2022-03-24 11:50:45 -07:00