Commit graph

6990 commits

Author SHA1 Message Date
Scott McKay
75cf5dc2c9
Fix GH issue 12151 by using inverse perms for updating DQ axis attribute (#12158)
* Fix GH issue 12151.

Need to use inverse perms for updating that axis to what is used for transposing the input. This only applies if the DQ node is doing per-axis dequantization.
2022-07-13 18:02:58 +10:00
cloudhan
785f74979b
Rework cmake for kernel_explorer (#12079)
Improve CMake for deep integration with ORT, so that we can easily hook ort function of microbenchmarking purpose.
2022-07-13 15:43:32 +08:00
PeixuanZuo
5579d81fc8
[add] Add operator gemmfastgelu for ROCM (#12101)
* [ADD] add gemm fast gelu

* [UPDATE] refunction matmul_impl

* [Update] delete tuning_ in this pr

* [FIX] code format

* [FIX] compiler warning

* [Update] update doc
2022-07-13 15:40:16 +08:00
jingyanwangms
a9d0d3323e
Use updated symbolic_helper.check_training_mode (#11900)
Co-authored-by: Jingyan Wang, Baiju Meswani
2022-07-12 17:26:06 -07:00
RandySheriffH
178a413ca1
List 3.10 as supported python version and remove 3.6 (#12141)
list 3.10 as supported python version and remove 3.6

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-07-12 15:28:30 -07:00
Adam Pocock
e0ed9f0f2f
[java] First part of the JNI error handling rewrite (#12013)
**Description**: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is.

The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review.

**Motivation and Context**
- Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side.
- If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.
2022-07-12 15:16:54 -07:00
msftlincoln
a6fd1a3b85
Eager mode generator improvements for multiple onnx operators and extra test cases (#12111)
* test case for masked_select

* isolate variables per onnx_op, include line numbers for ORT errors

* format errors

* correct masked_select impl, broadcast test

* node attrs naming fixed
2022-07-12 16:05:09 -04:00
Edward Chen
6e051016c1
Add Python package to perf test pipeline. (#12135) 2022-07-12 10:50:24 -07:00
LironKesem
9647a3be40
Add tests for all unary aten ops supported in eager mode (#12087)
* Add tests for all uniary aten ops supported in eager mode

* fixing the PR draft

* fixing the merge

* changing eval to be at compile time

* adding requirements for eager

* 1.adding function to {ops}_out
2.cleaning the code
  and adding comments

* editing the code according to code review

Co-authored-by: root <root@AHA-LIRONKESE-1>
2022-07-12 08:53:19 -04:00
Hariharan Seshadri
73310b2a0f
Fix Reduced Ops build pipeline (#12144)
Fix ReducedOps build pipeline
2022-07-11 19:02:38 -07:00
Carson Swope
c675c4750a
include coreml_provider_factory.h in macos build instead of coreml_ex… (#12138)
include coreml_provider_factory.h in macos build instead of coreml_execution_provider.h
2022-07-11 18:27:01 -07:00
Dwayne Robinson
742f843efc
RoiAlign CPU EP add warning for max mode with samples != 1 (#12136)
* RoiAlign add warning about incorrect max summation when sample size not 1
2022-07-11 17:44:41 -07:00
Wil Brady
f1047e0456
Fix minor python and cpp warnings from previous PR. (#12140)
Description: In the PR 12018 a few fixable python and cpp warning were introduced that this PR cleans up. Also adding a comment on the intent of test_mul_bool and out testing on test_ones.

Motivation and Context

When iterating in Python, use a list instead of a set and don't use reserved words
Fix long line in cpp
Clarify test_mul_bool intent for future developers.
fill_ implements torch.ones under the covers but in previous pr verification on the out param was not added so adding it here.
2022-07-11 16:18:40 -04:00
Preetha Veeramalai
99a370dd02
Update readme for OVEP (#12122)
* Add changes for training module in Readme

* Update ReadMeOV.rst
2022-07-11 10:54:12 -07:00
Wil Brady
418cfdc766
Update create_ort_attribute to set the tensor dimension and value correctly. Implement eager fill_ (#12018)
* Update create_ort_attribute to set the tensor dimension and value correctly.

* Eager mode support for fill_ and mm.out (mm uses mm.out).
2022-07-11 11:18:04 -04:00
PeixuanZuo
1c39d22f4e
[ADD] Rocm5.2 for Rocm python packaging pipeline (#12129)
[ADD] rocm5.2
2022-07-11 11:10:45 +08:00
Ashwini Khade
c6732c079b
pin protobuf version to be compatible with onnx (#12132)
Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-08 15:01:27 -07:00
Yulong Wang
d45c1a144e
[js/rn] Support UINT8 type for onnxruntime-react-native on Android (#12112)
* support uint8 for react native

* add test
2022-07-08 14:07:46 -07:00
Wil Brady
c04afae9a9
Add eager ops for unary ops with out. (#12106) 2022-07-08 12:09:26 -04:00
Jeff Bloomfield
2dd69cc3d9
Prevent unbounded growth of command allocator memory (#12114)
Prevent unbounded growth of command allocator memory
2022-07-07 19:55:06 -07:00
Yulong Wang
3ce25db7eb
[js/rn] optimize exception message on Android (#12113) 2022-07-07 13:26:50 -07:00
PeixuanZuo
b50239251d
[FIX] Add required variable for Rocm packaging ci pileine (#12118)
[fix] packaging ci compiler error

[FIX] pipeline variable

[Frevert] fix compiler
2022-07-07 11:36:26 -07:00
zhangyaobit
a9b9c7f69f
Add autotuning support to FastGelu (#12093)
* Add autotuning for FastGelu (Draft).

* Clean up.

* delete unused header file

* Fix lint errors.

* Add missing template parameter.

* Improvements.

* Fix type.

* Fix namespace issue.
2022-07-06 23:17:48 -07:00
Hubert Lu
dbcf54aa41
Add hipified SkipLayerNorm code for ROCmEP (#12107)
* First attempt for half2 vectorized memory access in SkipLayerNorm

* Add some functions for debugging

* Clean up the code

* Clean up the code

* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp

* Add a unit test for a larger input size

* Fix some Lint C++ warnings

* Use ILP = 4 for the vectorized kernels

* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm

* Use conditional operator for input_v

* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel

* Clean some comments and rename the layernorm function

* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel

* Resolve a Lint C++ warning

* Fix SkipLayerNormBatch1_Float16_vec output data

* Add hipified code of bert SkipLayerNorm for ROCmEP

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve Python formatting issue
2022-07-06 22:13:11 -07:00
Yufeng Li
97b03fedff
check consumers of dq node before swap dq and transpose (#12099)
* check consumers of dq node before swap dq and transpose

* add unit test
2022-07-06 11:11:38 -07:00
ytaous
446f899fed
[ROCm] Temp disable AMD UT (#12105)
temp disable UT

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-06 11:08:26 -07:00
Edward Chen
bd76e21fb3
Add pipeline for building perf test binaries. (#12067)
Add initial pipeline for building perf test binaries. It only builds Android binaries now but can be expanded later.
2022-07-06 09:42:49 -07:00
Wil Brady
1948b7c726
Add eager support for eq and ne ops. (#12031)
* Add eager support for aten::eq and aten:ne.

* Add generator support for resizing output param.
2022-07-06 12:39:04 -04:00
Edward Chen
07b0469a23
Fix unused function warning for decodeMIDR(). (#12069)
Changed from static function defined in header to function declared in header and defined in separate .cc file.
2022-07-06 09:18:01 -07:00
Hubert Lu
835ecb264d
Leverage vectorized load/write for SkipLayerNorm (#11803)
* First attempt for half2 vectorized memory access in SkipLayerNorm

* Add some functions for debugging

* Clean up the code

* Clean up the code

* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp

* Add a unit test for a larger input size

* Fix some Lint C++ warnings

* Use ILP = 4 for the vectorized kernels

* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm

* Use conditional operator for input_v

* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel

* Clean some comments and rename the layernorm function

* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel

* Resolve a Lint C++ warning

* Fix SkipLayerNormBatch1_Float16_vec output data
2022-07-05 22:28:15 -07:00
ytaous
7b8f45dd60
[ROCm] Enable build option for autograd (#11945)
* add autograd build option

* disable UTs

* disable UTs

* UT-step1

* UT-step1

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* UT-step2

* Fix UTs

* increase shm

* code clean up

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-05 18:11:29 -07:00
Dwayne Robinson
32a8751dc4
DML EP Update to DML 1.9 (#12090)
* Update to DML 1.9

* Appease obnoxious Python formatting tool
2022-07-05 16:30:54 -07:00
Yufeng Li
3446a3750c
generate quantization parameter for outputs (#12089) 2022-07-05 14:57:43 -07:00
Wenbing Li
479e71a7a8
enable the extensions custom build for java and android (#11823) 2022-07-05 10:34:14 -07:00
Scott McKay
c20cbf0c97
Add undocumented attribute to disable generation of Java bindings from the Android AAR. (#12075)
The generated bindings causes C# build errors that require workaround code. Disabling generation should avoid the need for any workarounds.

As the user has the C# ORT package with the C# to C bindings there's no need for binding generation that calls the ORT Java API (which is C# -> Java ->C).
2022-07-05 10:29:32 -07:00
zhangyaobit
ddb6202df7
Add op tuning functionality and an example for vector add. (#12060)
* Add op tuning functionality and example for vector add.

* Add namespace.

* Various improvements.

* use unique pointer

* fix lint errors

* Check return error.
2022-07-03 21:12:04 -07:00
Hariharan Seshadri
df712d80ca
Add data type check in ConvAddRelu fusion (#12058) 2022-07-01 15:31:15 -07:00
Justin Stoecker
57ac3d0a61
Disable DML command list reuse for Xbox (#12063)
disable cl reuse for xbox
2022-07-01 13:22:35 -07:00
Jameson Miller
ae88f43550
Eager mode: structure for supporting out= operators (#12066)
* Add utility methods for resize_output

* Eager mode: implement abs.out

This is an initial hand written implementation of an out= operator to
demonstrate how to structure out= methods using resize_out helper
methods.

This is meant to be used as a reference when we update the code
generator to generate implementations for out= operations.
2022-07-01 13:35:12 -04:00
Ye Wang
8dc8d44087
remove --disable_iobinding for trt ep benchmark (#12053)
Update run_benchmark.sh
2022-07-01 10:33:35 -07:00
Gary Miguel
043816f895
Make C# runtest.sh automatically set latest opset (#12039)
* Update C# runtest.sh for opset 17

Should have been part of https://github.com/microsoft/onnxruntime/pull/11924

* get appropriate opset version from onnx doc

* use absolute rather than relative path

* fix typo in var name
2022-07-01 10:12:33 -07:00
Jeff Bloomfield
02b9b12127
Fix DML custom operators which set descriptor heap to command list (#12059) 2022-07-01 09:49:23 -07:00
Scott McKay
bfe1eca10c
Add targets files for new .net6 frameworks (#12016)
* Add net6 targets.
Remove maccatalyst as we don't have a native build targetting that.

* Set platform in macos targets

* Add targetFramework entries

* Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file.

Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions.

Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6.

* Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines

* Fix patch version mismatch
Add some extra debug info in case it helps

* Debug nuget location in CI

* Add workspace entry back in

* Add steps

* One more attempt with hardcoded nuget.exe path and original android31.0 version

* Better fix - found explicit nuget download and updated version there.

* flake8 fixes

* Fix black complaints.

* Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS.

* Removed outdated comment
2022-07-01 09:13:55 -07:00
Jameson Miller
3e6b8d159a
Eager mode: implement resize_ operation (#12004)
Add support for PyTorch `resize_` operation. The PyTorch API method is documented
here:

https://pytorch.org/docs/stable/generated/torch.Tensor.resize_.html

Implementation notes:

There are some implementation details that might deviate from
expectations:

  - As the Onnxruntime::tensor does not support resize operation, this
    functionality is supported on the TensorImpl by swapping out the
    backing tensor if the size changes.

  - In the ORT model the shape of the TensorImpl is defined by the
    backing onnxruntime::tensor, so it is not supported to have a
    TensorImpl with a different shape / size than the backing
    onnxruntime::tensor. This means when resizing to a smaller TensorImpl,
    other implementations might keep the same backing storage, ORT will
    re-allocate a new onnxruntime::tensor and copy over as many of the
    existing elements that fit. Functionally, you will end up with same
    output, but the underlying buffer will be re-allocated.

    A future change could be to allow ORTTensorImpl to have a different
    size / shape than the onnxrutime::tensor backing it, and then we
    could improve this behavior.

 The canonical CPU / CUDA implementations in PyTorch repository:
     CPU: aten/src/ATen/native/Resize.cpp
     CUDA: aten/src/ATen/native/cuda/Resize.cpp
2022-06-30 22:14:37 -04:00
RandySheriffH
b858c2f725
Extend lifetime of KernelDef when creating a standalone op (#12057)
place tmp kernel def as local variable to cover the lifetime of kernel creation
2022-06-30 17:38:59 -07:00
Hariharan Seshadri
2e27a7e330
Skip Constant Folding for ops producing an optional type output (#11839) 2022-06-30 13:38:35 -07:00
Wil Brady
0fa2041f68
Add eager support for aten:: equal. (#12020) 2022-06-30 15:46:14 -04:00
Wei-Sheng Chin
0ee0b8cf18
Disable sequence-type tests since C# infra doesn't support well (#12037) 2022-06-30 09:49:03 -07:00
zhangyaobit
da133ad3d8
Add FastGelu to kernel explorer for profiling. (#11995)
* Add FastGelu to kernel explorer for profiling.

* fix python lint errors

* Fix one more python lint error

* Delete white space (python lint)

* Various improvements.

* Update README.md

* refactor header files
2022-06-30 07:35:43 -07:00
Wil Brady
fdf12a5c35
Fix windows eager build break by pinning to torch version 1.11.0 (#12033)
Fix windows and linux eager build to torch 1.11.0.
2022-06-30 07:01:13 -04:00