Commit graph

6940 commits

Author SHA1 Message Date
Vincent Wang
04f7c2deda
FP16_Optimizer Support for more Deepspeed Versions (#12046)
* fp16_optimizer for more ds versions

* change ds version

* bugfix

* fix bug
2022-06-30 18:36:17 +08:00
Tianlei Wu
ecca6f4d16
Move beamsearch shared initializers from subgraphs to main graph (#12025)
* move shared initializers to parent graph
* add --disable_shared_initializers
2022-06-29 22:43:41 -07:00
zhijxu
9f260fb60f resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
100aebbd26 resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
2295b24cd5 support optimizer opt for deepspeed 0.5.9 2022-06-30 11:26:13 +08:00
George Wu
102d01b206
update roialign cuda impl to onnx opset16 (#12036)
* roialign opset16

* fix

* fix
2022-06-29 17:32:59 -07:00
Yi-Hong Lyu
c8cd36da01
Resize optimization for all architectures (#11956)
With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor
and the mode is linear by:

* Transforming NCHW Resize to NHWC variant
* Using the NHWC Resize kernel without floating-point computation

It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves
Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.
2022-06-29 09:19:19 -07:00
Chun-Wei Chen
4eb54ff9a5
Add warning about future computation change for ConvTranspose with auto_pad (#11984)
* Add warning about future computation change for Convtranspose with auto_pad

* improve msg

* update TODO to make lint happy

* update more contents for warning and add if

* valid was not infected

* move it into kernel registration

* parse auto_pad myself

* try to use conv_transpose_attrs_.auto_pad directly
2022-06-29 06:53:31 -07:00
Valery Chernov
8ba8146650
[TVM] handshake mechanism for support of TVMso EP (#11437)
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm

* check hash during compile in TVMso EP

* add IPP-CRYPTO to external dependencies for TVM EP

* made checkHash method constant

* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict

* implemented SHA-256 calculation using ipp-crypto library

* fix dependency for ipp-crypto

* add provider options for hash check

* update documentation for added provider options

* add hash check condition

* fix docs

* fix lint

* fix ORT_THROW

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-29 14:57:18 +02:00
dependabot[bot]
c0dd9be7ba
Bump electron from 13.6.6 to 15.5.5 in /js/web (#11884)
Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5.
- [Release notes](https://github.com/electron/electron/releases)
- [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md)
- [Commits](https://github.com/electron/electron/compare/v13.6.6...v15.5.5)

---
updated-dependencies:
- dependency-name: electron
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-28 15:50:44 -07:00
Yosshi999
0702364d7a
[js/web][bugfix] fix negative axes for unsqueeze (#11944)
[js/web] fix negative axes for unsqueeze
2022-06-28 11:28:35 -07:00
Tianlei Wu
9be2b6046b
convert_beam_search supports large gpt2 model (#11989)
(1) add --run_shape_inference to make shape inference optional
(2) add --vocab_mask to make the input optional
(3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch
(4) save gpt2 model tensors to one external data file by default
(5) group convert_beam_search arguments to multiple groups
(6) make --decoder_onnx optional for gpt2 model
(7) replace print by logger
(8) update shape inference function to support external data.
(9) when saving external data, show warning if onnx version < 1.12
2022-06-28 10:02:35 -07:00
sumitsays
4552dd38c6
[DML EP] Pad operator: Handle negative pad counts (#11974)
* Pad fallback to CPU

* Added queryPad in operatorRegistration.cpp

* Acknowledged PR comments

* Used any_of

* used none_of instead of any_of

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-06-28 00:41:57 -07:00
RandySheriffH
d5fcb432fa
Generalize native op creation (#11539)
* create op from ep

* read input count from context

* create holder to host nodes

* fix typo

* cast type before comparison

* throw error on API fail

* silence warning from minimal build

* switch to unique_ptr with deleter to host nodes

* fix typo

* fix build err for minimal

* fix build err for minimal

* add UT for conv

* enable test on CUDA

* add comment

* fix typo

* use gsl::span and string view for Node constructor

* Added two APIs - CopyKernelInfo and ReleaseKernelInfo

* pass gsl::span by value

* switch to span<NodeArg* const> to allow for reference to const containers

* fix typo

* fix reduced build err

* fix reduced build err

* refactoring node construction logic

* rename exceptions

* add input and output count as arguments for op creation

* refactor static member

* use ORT_CATCH instead of catch

* cancel try catch

* add static value name map

* format input definition and set err code

* fix comments

* fix typo
2022-06-27 21:12:15 -07:00
Dwayne Robinson
fc0143fe68
DML EP ResNet50 opset 15 fails in ONNX checker for FusedBatchNormalization lacking training_mode attribute (#12010)
FusedBatchNormalization include training_mode attribute
2022-06-27 19:41:34 -07:00
Edward Chen
f045994389
[NNAPI EP] Update NNAPI headers (#11954)
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
2022-06-27 18:54:06 -07:00
Edward Chen
466b2d9f3d
[C# Tests] Add support for double tensor output in TestPreTrainedModels. (#12008)
Add support for double tensor output in TestPreTrainedModels.
2022-06-27 18:49:19 -07:00
Sheil Kumar
7d712c8f8b
Fix WinML Tests are still targetting deprecated (deleted) experimental signal op definitions (#12006)
* fix winml tests

* remove legacy test

* switch idft -> dft+inverse attr

* upgrade opset 13->17 for signal ops tests
2022-06-27 16:35:50 -07:00
Yulong Wang
bd973bcf1e
[js/rn] upgrade dependencies for e2e test (#11863)
* [js/rn] upgrade dependencies for e2e test

* use JDK11 only for gradle

* expand variable
2022-06-27 14:56:49 -07:00
Dwayne Robinson
8cd02508c8
Include opset 15 in Conv+BatchNormalization fusion (#11960) 2022-06-27 10:59:14 -07:00
dependabot[bot]
68afa2d362
Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280)
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-27 10:30:01 -07:00
George Nash
9583841ef7
Improve performance of BiasGelu on oneDNN execution provider (#11935)
Improve performance of BiasGelu on OneDNN execution provider

This modifies how BiasGelu is handled by the OneDNN execution provider
by executing the gelu_erf primitive as a postop of the binary_add primitive.

Also fixes extra data copies made when running on GPU.

Signed-off-by: George Nash <george.nash@intel.com>
2022-06-27 08:34:11 -07:00
Scott McKay
f72288b453
Fix a couple of typos (#11943)
Fix couple of typos
2022-06-27 10:32:14 +10:00
Gary Miguel
dc5d6b9515
register signal ops for opset 17 (#11778)
* Register signal ops for op set 17

Note code is mostly being moved, not added. These ops were previously
only registered as Microsoft contrib ops and only built if
`BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx
standard op set in version 17.

Main components of this change:

* Move the kernels from the conrib_ops directory to the
  core directory.
* Add function bodies for ms experimental ops. This will allow
  old models that use the contrib ops to continue to function.
  All the function bodies consist of a single op (the
  new standard op), so performance overhead should be minimal.

Minor clean-up also in this change:

* De-duplicate get_scalar_value_from_tensor: put it in a new utils.h.
* Fix some bugs that caused compilation errors with the experimental
  ops. Tested with `build.sh --ms_experimental`
* Fix some spelling errors and lint violations.
* Replace a couple of switch statements with `MLTypeCallDispatcher`.
* Use `InlineVector` instead of `std::vector`.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640
2022-06-27 10:26:55 +10:00
Hubert Lu
f4ba199bad
Optimize FastGelu with float2 and float4 vectorized kernels on ROCm (#11491)
* Using vectorized loads (float2) for fp16 to improve performance

* Fix a few warnings from cpplint

* Fix a few warnings from cpplint

* Use __float2half2_rn and fix some cpplint warnings

* Move some computaions to LaunchFastGeluKernel

* Fix some Lint C++ warning

* Using vectorized loads (float4) for fp16 to improve performance

* Switch   whether to optimize FastGelu with float4 vectorization

* Switch to float4 memory access based on input_length in FastGelu

* Comment how to set the threshold of float2 and float4 vectorized kernels

* Add FastGelu fp16 unit tests for bias_length = 2 and 8

* Make vectorized kernels generic with aligned_vector

* Unify the vectorized kernels with/without bias

* Refactor the code to suppress cpplint warnings

* Solve formatting issues

* Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel

* Move fast_gelu_impl.h to rocm/bert

* Fix some Lint C++ warnings and code alignment
2022-06-24 12:46:17 -07:00
Dmitri Smirnov
088bc7494b
Deprecate APIs returning raw ptrs and provide replacements (#11922)
Provider better documentation
2022-06-24 09:50:04 -07:00
G. Ramalingam
b1411c8357
Restructure function inliner (#11731)
* Add nested function call tests

* Add overload for Specialize

* Pass symboltable to onnx shape inference

* Avoid renaming empty names

* Enable sequence_map tests which failed before this change
2022-06-24 09:21:31 -07:00
pengwa
0d6cbc6e57
fix memory profile for partial graph run (#11911)
* fix mpi build for gcc8 or higher

* fix memory profile for partial graph run

* Revert "fix mpi build for gcc8 or higher"

This reverts commit fb60beb05402cd380597a12fc25880c0c8652ed4.

* remove debug code

* fix build

* fix build

* fix cpplint and python black format
2022-06-24 13:08:14 +08:00
Wil Brady
fa7f80c847
Eager mode: Argmax and fixup max and min. (#11861)
* Eager mode ArgMax support.

* Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope.

* Add addmm test.
2022-06-23 15:55:34 -04:00
Tianlei Wu
2c4e4b6afc
MT5 onnx conversion for beam search (#11958)
* support mt5
* save external data to one file
* update default value of --model_name_or_path and --decoder_onnx
2022-06-23 10:23:28 -07:00
Dmitri Smirnov
607b7df060
Allow saving on CPU usage for infrequent inference requests by reducing thread spinning (#11841)
Introduce Start/Stop threadpool spinning switch
Add a session config option to force spinning stop at the end of the Run()
2022-06-23 10:04:37 -07:00
pengwa
c398ad513f
Fix orttraining-linux-ci-pipeline - Symbolic shape infer (#11965)
fix symbolic shape error due to upgraded numpy + legacy sympy
2022-06-23 08:23:36 -07:00
Ye Wang
e24349b8f2
Optimize t5 encoder in beam search (#11926)
* ooptimize t5 encoder

* update

* update

* update

* refactor expand impl

* cuda tests passed

* update

* alignment

* more alignments

* review comments
2022-06-22 12:45:02 -07:00
Dwayne Robinson
f6d2fe8311
MeanVarianceNormalization CPU EP axes attribute validation (#11925)
Validate axes attribute parameter properly rather than silently returning incorrect results
2022-06-22 12:03:13 -07:00
Preetha Veeramalai
f54476a42f
Dll version fix ovep4.1 (#11953)
* Setting default version values for ovep dlls as well

* Update backend_manager.cc

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: mohsin <mohsinx.mohammad@intel.com>
2022-06-22 11:09:36 -07:00
pengwa
2229c48547
fix mpi in training build (#11855)
fix mpi build for gcc8 or higher
2022-06-22 10:04:44 +08:00
Vincent Wang
03beed0ceb
Remove Cast before and after Gelu (#11885)
* fuse cast gelu

* use PropagateCastOps

* fix ut
2022-06-22 09:07:48 +08:00
Gary Miguel
4bf22e2a40
Update ONNX to 1.12 (#11924)
Follow-ups that need to happen after this and before the next ORT release:
* Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731
* Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778

Follow-ups that need to happen after this but don't necessarily need to happen before the release:
* Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916

Fixes #11640
2022-06-21 17:19:52 -07:00
Dwayne Robinson
64f95d400a
Update DML 1.9 Nuget package to fix WindowsAI nuget pipeline build issue (#11934) 2022-06-21 15:55:51 -07:00
Scott McKay
3b1224dc08
Add .net6 support to the C# nuget package. (#11908)
* Add .net6 support to the C# nuget package.

Currently requires jumping through a lot of hoops due to .net 6 only being supported in the preview release of VS 2022.

Build existing targets using msbuild.
Add .net6 targets and build using dotnet.
Create nuget package with combined targets.

A few misc automated changes from VS to spacing and adding a couple of properties.
2022-06-22 08:08:24 +10:00
Arseny
8c8a781cdb
fix: handle setBindingDimensions return value in TensorRT EP (#11929) 2022-06-21 14:30:27 -07:00
Edward Chen
5646410f65
Enable Pad test cases with initializer inputs only when building NNAPI EP on Android. (#11932) 2022-06-21 14:16:55 -07:00
sfatimar
61a74f2f4d
Mohsin/enable dynamic shapes (#11867)
* Add pypi build changes to latest Master

* Add ORT training part of OV build

* Disabling SqueezeOpTest.BadAxes

* Add ONNXruntime branch ARG to Docker build

* Changes to include file details versions

* Commit File Version Updates

* Change naming for linux build

* Add fix for pylint format errors

* Fix pylint warnings.

* Enable Dynamic Shapes for OV_API_20

* Update requirements.txt whl version- internal_ci fix

* Update backend_manager.cc MYRIAD Fix

* Update wheel version in requirements.txt

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update setup.py

* Fix pylint warnings

* Fix pylint warnings 2

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

* Update backend_manager.cc

Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
2022-06-21 08:03:58 -07:00
Adrian Lizarraga
b20daeda81
Update Linux Multi GPU TensorRT pipeline to TensorRT 8.4 (#11923)
* Try manually installing trt8.4 in multi-gpu pipeline

* Remove stmts that clean up cmake, ctest. Update tensorrt repository name passed to get_docker_image.py

* Update trt and cudnn home

* Don't install trtexec cli tool.

* Increase job timeout

* Revert timeout change and use trt placeholder builder build option
2022-06-21 07:59:11 -07:00
Ye Wang
859ef277a0
apply zcode changes to the beam search op (#11880)
* apply zcode  changes to the beam search op

* fix pipeline failure

* add doc

* workaround for C#

* update

* update

* use name zcode

* review comment

* review comments

* fix cpplint

* review coments
2022-06-20 18:39:07 -07:00
RandySheriffH
cefceff5c9
Mark the end of APIs for release 1.12 (#11914)
* mark the end of APIs for 1.12

* add static assert for C API 1.12
2022-06-20 15:22:55 -07:00
Adrian Lizarraga
ca35ea417a
[EP-Perf] Install new wheel>=0.35.1 dependency (#11917) 2022-06-20 15:09:27 -07:00
Yi Zhang
7f1e9e8c67
Bash: there should be a whitespace after not operator. (#11910)
add whitespace after not
2022-06-21 05:14:32 +08:00
Chi Lo
457ce6cb89
Make symbolic shape inference script support external weight (#11909)
* add support for external data

* fix format

* fix format

* fix typo

* fix typo
2022-06-20 13:07:45 -07:00
Dwayne Robinson
c1577d08ca
DML EP QuantizeLinear defer axis validation for test_quantizelinear_cpu (#11906)
DML EP QuantizeLinear defer axis validation
2022-06-20 11:03:32 -07:00