Commit graph

6967 commits

Author SHA1 Message Date
Hubert Lu
dbcf54aa41
Add hipified SkipLayerNorm code for ROCmEP (#12107)
* First attempt for half2 vectorized memory access in SkipLayerNorm

* Add some functions for debugging

* Clean up the code

* Clean up the code

* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp

* Add a unit test for a larger input size

* Fix some Lint C++ warnings

* Use ILP = 4 for the vectorized kernels

* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm

* Use conditional operator for input_v

* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel

* Clean some comments and rename the layernorm function

* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel

* Resolve a Lint C++ warning

* Fix SkipLayerNormBatch1_Float16_vec output data

* Add hipified code of bert SkipLayerNorm for ROCmEP

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve Python formatting issue
2022-07-06 22:13:11 -07:00
Yufeng Li
97b03fedff
check consumers of dq node before swap dq and transpose (#12099)
* check consumers of dq node before swap dq and transpose

* add unit test
2022-07-06 11:11:38 -07:00
ytaous
446f899fed
[ROCm] Temp disable AMD UT (#12105)
temp disable UT

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-06 11:08:26 -07:00
Edward Chen
bd76e21fb3
Add pipeline for building perf test binaries. (#12067)
Add initial pipeline for building perf test binaries. It only builds Android binaries now but can be expanded later.
2022-07-06 09:42:49 -07:00
Wil Brady
1948b7c726
Add eager support for eq and ne ops. (#12031)
* Add eager support for aten::eq and aten:ne.

* Add generator support for resizing output param.
2022-07-06 12:39:04 -04:00
Edward Chen
07b0469a23
Fix unused function warning for decodeMIDR(). (#12069)
Changed from static function defined in header to function declared in header and defined in separate .cc file.
2022-07-06 09:18:01 -07:00
Hubert Lu
835ecb264d
Leverage vectorized load/write for SkipLayerNorm (#11803)
* First attempt for half2 vectorized memory access in SkipLayerNorm

* Add some functions for debugging

* Clean up the code

* Clean up the code

* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp

* Add a unit test for a larger input size

* Fix some Lint C++ warnings

* Use ILP = 4 for the vectorized kernels

* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm

* Use conditional operator for input_v

* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel

* Clean some comments and rename the layernorm function

* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel

* Resolve a Lint C++ warning

* Fix SkipLayerNormBatch1_Float16_vec output data
2022-07-05 22:28:15 -07:00
ytaous
7b8f45dd60
[ROCm] Enable build option for autograd (#11945)
* add autograd build option

* disable UTs

* disable UTs

* UT-step1

* UT-step1

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* Fix UTs

* increase shm

* code clean up

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-05 18:11:29 -07:00
Dwayne Robinson
32a8751dc4
DML EP Update to DML 1.9 (#12090)
* Update to DML 1.9

* Appease obnoxious Python formatting tool
2022-07-05 16:30:54 -07:00
Yufeng Li
3446a3750c
generate quantization parameter for outputs (#12089) 2022-07-05 14:57:43 -07:00
Wenbing Li
479e71a7a8
enable the extensions custom build for java and android (#11823) 2022-07-05 10:34:14 -07:00
Scott McKay
c20cbf0c97
Add undocumented attribute to disable generation of Java bindings from the Android AAR. (#12075)
The generated bindings causes C# build errors that require workaround code. Disabling generation should avoid the need for any workarounds.

As the user has the C# ORT package with the C# to C bindings there's no need for binding generation that calls the ORT Java API (which is C# -> Java ->C).
2022-07-05 10:29:32 -07:00
zhangyaobit
ddb6202df7
Add op tuning functionality and an example for vector add. (#12060)
* Add op tuning functionality and example for vector add.

* Add namespace.

* Various improvements.

* use unique pointer

* fix lint errors

* Check return error.
2022-07-03 21:12:04 -07:00
Hariharan Seshadri
df712d80ca
Add data type check in ConvAddRelu fusion (#12058) 2022-07-01 15:31:15 -07:00
Justin Stoecker
57ac3d0a61
Disable DML command list reuse for Xbox (#12063)
disable cl reuse for xbox
2022-07-01 13:22:35 -07:00
Jameson Miller
ae88f43550
Eager mode: structure for supporting out= operators (#12066)
* Add utility methods for resize_output

* Eager mode: implement abs.out

This is an initial hand written implementation of an out= operator to
demonstrate how to structure out= methods using resize_out helper
methods.

This is meant to be used as a reference when we update the code
generator to generate implementations for out= operations.
2022-07-01 13:35:12 -04:00
Ye Wang
8dc8d44087
remove --disable_iobinding for trt ep benchmark (#12053)
Update run_benchmark.sh
2022-07-01 10:33:35 -07:00
Gary Miguel
043816f895
Make C# runtest.sh automatically set latest opset (#12039)
* Update C# runtest.sh for opset 17

Should have been part of https://github.com/microsoft/onnxruntime/pull/11924

* get appropriate opset version from onnx doc

* use absolute rather than relative path

* fix typo in var name
2022-07-01 10:12:33 -07:00
Jeff Bloomfield
02b9b12127
Fix DML custom operators which set descriptor heap to command list (#12059) 2022-07-01 09:49:23 -07:00
Scott McKay
bfe1eca10c
Add targets files for new .net6 frameworks (#12016)
* Add net6 targets.
Remove maccatalyst as we don't have a native build targetting that.

* Set platform in macos targets

* Add targetFramework entries

* Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file.

Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions.

Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6.

* Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines

* Fix patch version mismatch
Add some extra debug info in case it helps

* Debug nuget location in CI

* Add workspace entry back in

* Add steps

* One more attempt with hardcoded nuget.exe path and original android31.0 version

* Better fix - found explicit nuget download and updated version there.

* flake8 fixes

* Fix black complaints.

* Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS.

* Removed outdated comment
2022-07-01 09:13:55 -07:00
Jameson Miller
3e6b8d159a
Eager mode: implement resize_ operation (#12004)
Add support for PyTorch `resize_` operation. The PyTorch API method is documented
here:

https://pytorch.org/docs/stable/generated/torch.Tensor.resize_.html

Implementation notes:

There are some implementation details that might deviate from
expectations:

  - As the Onnxruntime::tensor does not support resize operation, this
    functionality is supported on the TensorImpl by swapping out the
    backing tensor if the size changes.

  - In the ORT model the shape of the TensorImpl is defined by the
    backing onnxruntime::tensor, so it is not supported to have a
    TensorImpl with a different shape / size than the backing
    onnxruntime::tensor. This means when resizing to a smaller TensorImpl,
    other implementations might keep the same backing storage, ORT will
    re-allocate a new onnxruntime::tensor and copy over as many of the
    existing elements that fit. Functionally, you will end up with same
    output, but the underlying buffer will be re-allocated.

    A future change could be to allow ORTTensorImpl to have a different
    size / shape than the onnxrutime::tensor backing it, and then we
    could improve this behavior.

 The canonical CPU / CUDA implementations in PyTorch repository:
     CPU: aten/src/ATen/native/Resize.cpp
     CUDA: aten/src/ATen/native/cuda/Resize.cpp
2022-06-30 22:14:37 -04:00
RandySheriffH
b858c2f725
Extend lifetime of KernelDef when creating a standalone op (#12057)
place tmp kernel def as local variable to cover the lifetime of kernel creation
2022-06-30 17:38:59 -07:00
Hariharan Seshadri
2e27a7e330
Skip Constant Folding for ops producing an optional type output (#11839) 2022-06-30 13:38:35 -07:00
Wil Brady
0fa2041f68
Add eager support for aten:: equal. (#12020) 2022-06-30 15:46:14 -04:00
Wei-Sheng Chin
0ee0b8cf18
Disable sequence-type tests since C# infra doesn't support well (#12037) 2022-06-30 09:49:03 -07:00
zhangyaobit
da133ad3d8
Add FastGelu to kernel explorer for profiling. (#11995)
* Add FastGelu to kernel explorer for profiling.

* fix python lint errors

* Fix one more python lint error

* Delete white space (python lint)

* Various improvements.

* Update README.md

* refactor header files
2022-06-30 07:35:43 -07:00
Wil Brady
fdf12a5c35
Fix windows eager build break by pinning to torch version 1.11.0 (#12033)
Fix windows and linux eager build to torch 1.11.0.
2022-06-30 07:01:13 -04:00
Vincent Wang
04f7c2deda
FP16_Optimizer Support for more Deepspeed Versions (#12046)
* fp16_optimizer for more ds versions

* change ds version

* bugfix

* fix bug
2022-06-30 18:36:17 +08:00
Tianlei Wu
ecca6f4d16
Move beamsearch shared initializers from subgraphs to main graph (#12025)
* move shared initializers to parent graph
* add --disable_shared_initializers
2022-06-29 22:43:41 -07:00
zhijxu
9f260fb60f resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
100aebbd26 resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
2295b24cd5 support optimizer opt for deepspeed 0.5.9 2022-06-30 11:26:13 +08:00
George Wu
102d01b206
update roialign cuda impl to onnx opset16 (#12036)
* roialign opset16

* fix

* fix
2022-06-29 17:32:59 -07:00
Yi-Hong Lyu
c8cd36da01
Resize optimization for all architectures (#11956)
With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor
and the mode is linear by:

* Transforming NCHW Resize to NHWC variant
* Using the NHWC Resize kernel without floating-point computation

It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves
Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.
2022-06-29 09:19:19 -07:00
Chun-Wei Chen
4eb54ff9a5
Add warning about future computation change for ConvTranspose with auto_pad (#11984)
* Add warning about future computation change for Convtranspose with auto_pad

* improve msg

* update TODO to make lint happy

* update more contents for warning and add if

* valid was not infected

* move it into kernel registration

* parse auto_pad myself

* try to use conv_transpose_attrs_.auto_pad directly
2022-06-29 06:53:31 -07:00
Valery Chernov
8ba8146650
[TVM] handshake mechanism for support of TVMso EP (#11437)
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm

* check hash during compile in TVMso EP

* add IPP-CRYPTO to external dependencies for TVM EP

* made checkHash method constant

* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict

* implemented SHA-256 calculation using ipp-crypto library

* fix dependency for ipp-crypto

* add provider options for hash check

* update documentation for added provider options

* add hash check condition

* fix docs

* fix lint

* fix ORT_THROW

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-29 14:57:18 +02:00
dependabot[bot]
c0dd9be7ba
Bump electron from 13.6.6 to 15.5.5 in /js/web (#11884)
Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5.
- [Release notes](https://github.com/electron/electron/releases)
- [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md)
- [Commits](https://github.com/electron/electron/compare/v13.6.6...v15.5.5)

---
updated-dependencies:
- dependency-name: electron
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-28 15:50:44 -07:00
Yosshi999
0702364d7a
[js/web][bugfix] fix negative axes for unsqueeze (#11944)
[js/web] fix negative axes for unsqueeze
2022-06-28 11:28:35 -07:00
Tianlei Wu
9be2b6046b
convert_beam_search supports large gpt2 model (#11989)
(1) add --run_shape_inference to make shape inference optional
(2) add --vocab_mask to make the input optional
(3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch
(4) save gpt2 model tensors to one external data file by default
(5) group convert_beam_search arguments to multiple groups
(6) make --decoder_onnx optional for gpt2 model
(7) replace print by logger
(8) update shape inference function to support external data.
(9) when saving external data, show warning if onnx version < 1.12
2022-06-28 10:02:35 -07:00
sumitsays
4552dd38c6
[DML EP] Pad operator: Handle negative pad counts (#11974)
* Pad fallback to CPU

* Added queryPad in operatorRegistration.cpp

* Acknowledged PR comments

* Used any_of

* used none_of instead of any_of

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-06-28 00:41:57 -07:00
RandySheriffH
d5fcb432fa
Generalize native op creation (#11539)
* create op from ep

* read input count from context

* create holder to host nodes

* fix typo

* cast type before comparison

* throw error on API fail

* silence warning from minimal build

* switch to unique_ptr with deleter to host nodes

* fix typo

* fix build err for minimal

* fix build err for minimal

* add UT for conv

* enable test on CUDA

* add comment

* fix typo

* use gsl::span and string view for Node constructor

* Added two APIs - CopyKernelInfo and ReleaseKernelInfo

* pass gsl::span by value

* switch to span<NodeArg* const> to allow for reference to const containers

* fix typo

* fix reduced build err

* fix reduced build err

* refactoring node construction logic

* rename exceptions

* add input and output count as arguments for op creation

* refactor static member

* use ORT_CATCH instead of catch

* cancel try catch

* add static value name map

* format input definition and set err code

* fix comments

* fix typo
2022-06-27 21:12:15 -07:00
Dwayne Robinson
fc0143fe68
DML EP ResNet50 opset 15 fails in ONNX checker for FusedBatchNormalization lacking training_mode attribute (#12010)
FusedBatchNormalization include training_mode attribute
2022-06-27 19:41:34 -07:00
Edward Chen
f045994389
[NNAPI EP] Update NNAPI headers (#11954)
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
2022-06-27 18:54:06 -07:00
Edward Chen
466b2d9f3d
[C# Tests] Add support for double tensor output in TestPreTrainedModels. (#12008)
Add support for double tensor output in TestPreTrainedModels.
2022-06-27 18:49:19 -07:00
Sheil Kumar
7d712c8f8b
Fix WinML Tests are still targetting deprecated (deleted) experimental signal op definitions (#12006)
* fix winml tests

* remove legacy test

* switch idft -> dft+inverse attr

* upgrade opset 13->17 for signal ops tests
2022-06-27 16:35:50 -07:00
Yulong Wang
bd973bcf1e
[js/rn] upgrade dependencies for e2e test (#11863)
* [js/rn] upgrade dependencies for e2e test

* use JDK11 only for gradle

* expand variable
2022-06-27 14:56:49 -07:00
Dwayne Robinson
8cd02508c8
Include opset 15 in Conv+BatchNormalization fusion (#11960) 2022-06-27 10:59:14 -07:00
dependabot[bot]
68afa2d362
Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280)
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-27 10:30:01 -07:00
George Nash
9583841ef7
Improve performance of BiasGelu on oneDNN execution provider (#11935)
Improve performance of BiasGelu on OneDNN execution provider

This modifies how BiasGelu is handled by the OneDNN execution provider
by executing the gelu_erf primitive as a postop of the binary_add primitive.

Also fixes extra data copies made when running on GPU.

Signed-off-by: George Nash <george.nash@intel.com>
2022-06-27 08:34:11 -07:00
Scott McKay
f72288b453
Fix a couple of typos (#11943)
Fix couple of typos
2022-06-27 10:32:14 +10:00