Commit graph

8559 commits

Author SHA1 Message Date
pengwa
bf32dbbd9b
Share more constant initializers (#15461)
### Share more constant initializers.

`ConstantSharing` transformer originally only handle single value
initializer (scalar or 1D).

This PR tried to share more cases to make common subexpression
elimination transformer to remove more duplicated nodes.

Originally, we used a single
vector<std::variant<float,half,int32,int64>> to store different scalar
values. In this PR, we create a unordered map with its key being
data_type + rank + element count, and its value is a vector of
`InitializerValue`.

For one specific initializer, if it fulfils the condition, then finally
will find the corresponding vector of `InitializerValue` by its
<data_type + rank + element count>, then search from the vector whether
the constant tensor already exist or not. After that, a value id is
returned, which will be combined together with <data_type + rank +
element count> to form the pattern key to decide which tensor to reuse
(legacy code).

### Motivation and Context

One example we see here is:

```mermaid
stateDiagram
    [*] --> LayerNorm(b,s,64)
    LayerNorm(b,s,64) --> Reshape1
    Shape1_Const[b*s,64] --> Reshape1

    LayerNorm(b,s,64) --> Reshape2
    Shape2_Const[b*s,64] --> Reshape2


    Reshape1 --> AttentionSubGraph
    Reshape2 -->  Add
    AttentionSubGraph--> Add
   Add --> [*]
```

Ideally CommonSubexpressionElimination can remove one of `Reshape1` and
`Reshape2`, while since `Shape1_Const` and `Shape2_Const` are different
NodeArg*, so it did not remove the duplication.

This is an example: removing the duplication will bring more
opportunities to apply graph transformations.
2023-04-14 07:41:07 -07:00
Changming Sun
f297bbb89b
Fix an indent error in build.py (#15497)
### Description
Fix an indent error in build.py

### Motivation and Context
The problem was introduced in #15395 when I was deleting unused code.
2023-04-14 06:32:46 -07:00
mindest
0fdd356abf
[ROCm] Add hipBLASLt GEMM support to Tunable op. (#15351)
### Description
Add hipBLASLt to GEMM Tunable op, which supports GEMM and
StridedBatchedGEMM.

To enable hipBLASLt implementation, add an extra flag to the building
command: `--cmake_extra_defines onnxruntime_USE_HIPBLASLT=ON`.
2023-04-14 17:56:01 +08:00
Sunghoon
fda0aa14c8
SkipLayerNorm fusion with different input and output type (#15500)
SkipLayerNorm fusion fuses LayerNorm and one or more Add kernels now.
While LayerNormalization kernel allows different input and output type
by definition, SkipLayerNormalization must have the same input and
output type.

This graph is valid as the output of Add node is float16 and two inputs
from initializers are float.


![image](https://user-images.githubusercontent.com/35605090/231874079-3f3b03cc-f751-4ad9-a002-31116a35117f.png)

But, when Add and LayerNormalization are fused, it fails because two
inputs of Add node are float16 type and SkipLayerNormalization must have
the same input types. To avoid this failure, this PR adds Cast node
before inputs of SkipLayerNormalization when input and output type are
different and output type is float. The above graph is fused as follows,


![image](https://user-images.githubusercontent.com/35605090/231874097-6405713a-7c95-4b5b-a293-1305976edc94.png)

For performance, it'd better for SkipLayerNormalization to support
different input and output type, but this PR is to unblock Turing NLR v5
base mode in Babel. When we have more cases, we can support it.
2023-04-13 23:07:47 -07:00
Wei-Sheng Chin
d76cf374c4
Capture both ValueError and RuntimeError (#15503) 2023-04-13 19:29:34 -07:00
Akshay Sonawane
56ad68120e
Add support to use sequence as input ids in decoder inputs to Beam Search CUDA Op (#15232)
Add support to use sequence as input ids in decoder inputs to Beam
Search CUDA Op

### Description
Currently Beam search Op is only supported for CPU EP, added support for
CUDA EP.

### Motivation and Context
- For Turing models inference was throwing segmentation fault due to
copy failing in cuda memory, also beam search support was not present in
cuda.
2023-04-13 13:35:33 -07:00
Changming Sun
5bed8d0285
Disable XNNPack EP's tests in Windows CI pipeline (#15406)
### Description

1. Disable XNNPack EP's tests in Windows CI pipeline
The EP code has a known problem(memory alignment), but the problem does
not impact the usages that we ship the code to. Now we only use XNNPack
EP in mobile apps and web usages. We have already pipelines to cover
these usages. We need to prioritize fixing the bugs found in these
pipelines, and there no resource to put on this Windows one. We can
re-enable the tests once we reached an agreement on how to fix the
memory alignment bug.

2.  Delete anybuild.yml which was for an already deleted pipeline.
3. Move Windows CPU pipelines to AMD CPU machine pools which are
cheaper.
4. Disable some qdq/int8 model tests that will fail if the CPU doesn't
have Intel AVX512 8-bit instructions.
2023-04-13 12:19:32 -07:00
zhijiang
05ec22330f
softmax perf improvement pr2 - import softmax bw (#15199)
when dimension to do softmax is 2048, original ort code will fallback to
cudnn, while with some optimization on ort's softmax_warp_backward, we
can be faster than cudnn implementation.

the ideas to optimize softmax_warp_backward is:
1. instead of saving intermediate result in register, we just recompute
to save resource
2. save the input data in fp16 instead of fp32 to further save resource

the perf numbers:

![image](https://user-images.githubusercontent.com/43435212/227476335-ae0b61c4-cd15-40b7-b743-a956fadaedda.png)

please be noted that when dim to do softmax is less than 2048, nothing
will be changed, so only gives perf number of 2048 case.


add more perf number for smaller batch size

![image](https://user-images.githubusercontent.com/43435212/231676120-c8944b09-a664-43f3-a1e8-dfe729c6e816.png)
2023-04-13 14:57:01 +08:00
mindest
67ac36101c
disable BatchNormalizationGrad test (#15485)
### Description
Temporarily disable BatchNormalizationGrad test due to random failure.

Example:

```
2023-04-12T06:33:24.1593811Z 1: [ RUN ] GradientCheckerTest.BatchNormalizationGrad
2023-04-12T06:33:27.5603881Z 1: D:\a\_work\1\s\orttraining\orttraining\test\gradient\gradient_ops_test.cc(1468): error: Value of: IsErrorWithinTolerance(max_error, error_tolerance)
2023-04-12T06:33:27.5604509Z 1: Actual: false
2023-04-12T06:33:27.5604719Z 1: Expected: true
2023-04-12T06:33:27.5604997Z 1: max_error: 1.776702880859375; tolerance: 0.019999999552965164; ORT test random seed: 2552121240;
2023-04-12T06:33:27.5605266Z 1: Google Test trace:
2023-04-12T06:33:27.5605531Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 8910
2023-04-12T06:33:27.5605843Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 5678
2023-04-12T06:33:27.5606478Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 1234
2023-04-12T06:33:27.8285560Z 1: D:\a\_work\1\s\orttraining\orttraining\test\gradient\gradient_ops_test.cc(1493): error: Value of: IsErrorWithinTolerance(max_error, error_tolerance)
2023-04-12T06:33:27.8286181Z 1: Actual: false
2023-04-12T06:33:27.8286404Z 1: Expected: true
2023-04-12T06:33:27.8286669Z 1: max_error: 1.776702880859375; tolerance: 0.019999999552965164; ORT test random seed: 2552121240;
2023-04-12T06:33:27.8286942Z 1: Google Test trace:
2023-04-12T06:33:27.8287208Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 8910
2023-04-12T06:33:27.8287532Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 5678
2023-04-12T06:33:27.8287849Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 1234
2023-04-12T06:33:51.6368960Z 1: [ FAILED ] GradientCheckerTest.BatchNormalizationGrad (27475 ms)
```
2023-04-13 14:53:47 +08:00
Changming Sun
a22dc65a81
Add a missing header to cuda_common.h (#15489)
### Description

The following three lines are needed before including some cutlass
header files, because cutlass uses "and"/"or" keywords. Generally it
should not be a problem without this header, but nvcc is not strictly
compliant to C++ standard.

```c++
#ifdef __cplusplus
#include <ciso646>
#endif
```

We didn't hit this problem because the above code exists in absl. We
always include absl headers first. However, ABSL recently deleted them!
https://github.com/abseil/abseil-cpp/pull/1246

The cutlass dependency was introduced in #14343 , after we had abseil.
2023-04-12 22:16:59 -07:00
pengwa
516c8e95fa
Optimize SCE loss compute (#15401)
### Optimize SCE loss compute

Compute optimization based on label data sparsity:
- Insert ShrunkenGather before SCELoss node, to filter out invalid
labels for compute.
- Support ShrunkenGather upstream.
- Added test for the above.
- Added flag to enable label sparsity optimization with env var, by
default disabled now. Will enable after comprehensive benchmarking
later.
- Extract common logic into test_optimizer_utils.h/cc from
core/optimizer/compute_optimzier_test.cc, then the common functions can
be shared by both core/optimizer/compute_optimzier_test.cc and
orttraining/core/optimizer/compute_optimzier_test.cc
- Extract common logic into shared_utils.h/cc: `GetONNXOpSetVersion` and
`Create1DInitializerFromVector`


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-13 13:02:12 +08:00
Justin Chu
07b64d5275
Remove codecov from requirements-dev.txt (#15487)
### Motivation and Context
It is no longer supported, and we don't really use it.
2023-04-12 18:48:02 -07:00
Patrice Vignola
fd7f0c3cfc
[DML EP] Use ORT node names in DML execution plans (#15411) 2023-04-12 16:44:53 -07:00
G. Ramalingam
e361e3f138
Fix bug in handling of variadics in function schema creation (#15409)
### Description

The code handling variadic parameters when creating a schema for a
function has a minor bug.
The checking logic was nested inside a conditional, instead of being
outside.
Fix the logic, and add a test-case. This bugs manifests itself when the
first parameter in the
variadic list is not an input/output of the enclosing function.

### Motivation and Context

Fixes https://github.com/microsoft/onnxruntime/issues/15404

---------

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2023-04-12 14:32:24 -07:00
Yulong Wang
e1e8852213
[build/npm] dump ORT_COMMON_FROM from validation (#15475)
### Description
dump ORT_COMMON_FROM from validation

This writes environment variable ORT_COMMON_FROM for later steps in the
release pipeline to use.
2023-04-12 13:48:19 -07:00
Yulong Wang
3875c824d5
[build] fix default value of flag cmake_generator (#15471)
### Description
fix default value of flag cmake_generator
2023-04-12 13:47:58 -07:00
Yulong Wang
041a0e2747
[build] fix nuget linux in build_protoc_for_host() (#15472)
### Description
fix nuget linux in build_protoc_for_host()
2023-04-12 13:46:32 -07:00
yf711
8cd5f3ad9c
[TensorRT EP] support TensorRT 8.6-EA (#15299)
### Description

<!-- Describe your changes. -->

* Integrate TRT 8.6EA on relevant Linux/Windows/pkg pipelines
  * Update onnx-tensorrt to 8.6
  * Add new dockerfiles for TRT 8.6 and clean old ones
* Update
[CGManifest](https://github.com/microsoft/onnxruntime/tree/main/cgmanifests)
files and ort build deps version
  * yml/script update
* Enable built-in TRT parser option on TRT related pipelines by default
* Exclude test TopKOperator.Top3ExplicitAxisInfinity out of TRT EP tests
(8.6-EA has issue with topk operator)
2023-04-12 11:34:59 -07:00
Numfor Tiapo
e3086b2ed8
Move DML CI Pipeline to A10 (#15468)
This change moves the DML CI pipeline to the A10 machines and fixes or
disables tests that were failing from this change.

- Max error rate threshold was increased for Image Tests
- Some failing batch tests were disabled

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2023-04-12 10:19:40 -07:00
PeixuanZuo
0016554090
[ROCm] disable composable_kernel and kernel explorer for MIGraphX CI (#15479)
Disable composable_kernel and kernel explorer for MIGraphx CI to save
build time.
Composable_kernel and kernel explorer are tested on ROCm CI.
2023-04-12 22:26:40 +08:00
PeixuanZuo
d49a8de9b1
[ROCm] add FP16 support for FusedConv Op (#15443)
Add FP16 support for FusedConv Op and update UT
2023-04-12 12:19:14 +08:00
PeixuanZuo
ce1eb6d629
[ROCm] Add Tunable GroupNorm (#15298)
refactor GroupNorm and Add Tunable GroupNorm
2023-04-12 10:55:42 +08:00
Changming Sun
db4fc12318
Add support for building the code on Windows ARM64 natively (#15371)
### Description
Recently Visual Studio and python started to provide native Windows
ARM64 packages. This PR is to provide better support for building on
Windows ARM64. You can do it as what you did for x64. Like:

```
python tools\ci_build\build.py --config Debug --update --skip_submodule_sync --build_dir b --cmake_generator "Visual Studio 17 2022"
```

You do not need to append the "--arm64" build arg, and do not need to
cross-compile protoc for a different arch as you are not cross-compiling.

**caveat:** it does not work with the latest cmake release(3.26.x). It
only works fine with cmake 3.25.x and below. Filed a bug to them:
https://gitlab.kitware.com/cmake/cmake/-/issues/24797

### Motivation and Context
Provide better support for building on Windows ARM64.
2023-04-11 17:14:54 -07:00
Rachel Guo
9c42d5e31f
[CoreML EP]Add broadcasting support for binary ops (#15187)
### Description
<!-- Describe your changes. -->

As title

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/15110

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-11 13:50:45 -07:00
Yulong Wang
0fbf715824
[build] add script to validate generated NPM packages (#15453)
### Description
add script to validate generated NPM packages and publish it to
artifacts, so that release pipeline can use it.

once this PR is merged, I will update the NPM package release pipeline.
2023-04-11 11:04:55 -07:00
Dmitri Smirnov
ce3b4eabd3
Implement Optional Metadata support and C# test support (#15314)
### Description
Implement Optional Type metadata support in the library.
Implement optional support in C# API along with metadata.
Implement Sequence, Map, Optional test data support
and test execution.

Prune tests and provide more details for failing tests in C# code.

Note, this PR does not enable running onnx test models in C++.

### Motivation and Context
Opset18 optional type support.
2023-04-11 09:41:59 -07:00
Edward Chen
0497ac0432
Support additional op domains in op reduction script. (#15424)
Add support for kMSInternalNHWCDomain and kPytorchAtenDomain op domains to op reduction script.
Make it an error if the op reduction script encounters unknown op domains.
2023-04-11 08:57:51 -07:00
Patrice Vignola
3be5bfe363
[DML EP] Add MatMul + SoftMax fusion (#15240) 2023-04-11 08:31:04 -07:00
Patrice Vignola
7c927bb95c
[DML EP] Add BiasSplitGelu (#15197) 2023-04-11 08:30:37 -07:00
Yi Zhang
311f84d00c
Fix one nuget packaging pipline error (#15458)
### Description
Fix one typo in #14965 


### Motivation and Context
Fix the error `"onnxruntime_providers_shared.dll not found for win-x64"`
2023-04-11 18:00:10 +08:00
zhijiang
29c74d3c43
softmax perf improvement pr1 - add more softmax related test (#15176)
1. add fp16 test
2. add test for shape is not power of two.
2023-04-11 17:02:40 +08:00
Ye Wang
ef42fd09fb
google/mt5 optimization and fix (#15454)
### Description
<!-- Describe your changes. -->
1. enabled self-attention fusion in mt-5 decoder graph
2. fix a parity issue
https://github.com/microsoft/onnxruntime/issues/15042


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-04-11 00:09:11 -07:00
Patrice Vignola
c5b6ee1a99
[DML EP] Add NhwcConv (#15194) 2023-04-10 23:16:09 -07:00
cloudhan
9acbfc6a29
ROCm MHA (#15279)
Add MultiHeadAttention for ROCm EP.

**Before:**
```
'engine': 'onnxruntime'
'version': '1.15.0'
'height': 512
'width': 512
'steps': 50
'batch_size': 1
'batch_count': 5
'num_prompts': 1
'average_latency': 3.878769588470459
'median_latency': 3.8792178630828857
'first_run_memory_MB': -1
'second_run_memory_MB': -1
'model_name': 'runwayml/stable-diffusion-v1-5'
'directory': './sd-v1-5-onnx-fp16-nomha'
'provider': 'ROCMExecutionProvider'
'disable_safety_checker': True
```

**After:**
```
'engine': 'onnxruntime'
'version': '1.15.0'
'height': 512
'width': 512
'steps': 50
'batch_size': 1
'batch_count': 5
'num_prompts': 1
'average_latency': 2.364924430847168
'median_latency': 2.3650705814361572
'first_run_memory_MB': -1
'second_run_memory_MB': -1
'model_name': 'runwayml/stable-diffusion-v1-5'
'directory': './sd-v1-5-onnx-fp16'
'provider': 'ROCMExecutionProvider'
'disable_safety_checker': True
```
2023-04-11 13:20:44 +08:00
Yi Zhang
feafbc4263
Refactor all Mac build steps (#15440)
### Description


### Motivation and Context
Make the compilation cache steps easy to use and maintain
Reduce cache storage.
2023-04-11 12:12:46 +08:00
Changming Sun
d175e87a1f
Delete eager mode code and increase minimal required python version to 3.8 (#15450)
### Description
1. Delete eager mode code.
2. Increase the minimal required python version to 3.8.
2023-04-10 16:00:04 -07:00
Patrice Vignola
4a676b011a
[DML EP] Add BiasAdd (#15211) 2023-04-10 14:46:33 -07:00
Sheil Kumar
ce9ad8c8bc
For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fal… (#15448)
CP: [For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should
fallback to CPU when there is no hardware support #15414
](https://github.com/microsoft/onnxruntime/pull/15414)

For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should
fallback to CPU when there is no hardware support.
2023-04-10 13:21:40 -07:00
Shukant Pal
6657df9212
[CoreML EP] Add support for LeakyReLU activation layers (#15327)
## Description

Implements support for LeakyReLU in ActivationOpBuilder for CoreML's EP.

### Motivation and Context

This speeds up inference on macOS significantly for models using
LeakyReLU.
2023-04-10 13:01:55 -07:00
Yulong Wang
0205b63756
[wasm] optimize default session options parsing (#15428)
### Description
optimize default session options parsing.
- do minimal property assignment to the passed in `options` object.
- modify default value of `enableCpuMemArena` and `enableMemPattern` to
`false`. We don't get benefits from enabling these 2 flags in web
assembly
2023-04-10 11:09:09 -07:00
Changming Sun
c8524d2dab
Refactor web-ci pipeline and delete eager mode CI pipeline (#15416)
### Description
1. Move it to a separated pool that use the same image as [the public
hosted
pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml).
Also, create a beta pool which contains the next version image of the
hosted pool, and add jobs in our post merge pipeline to test if the next
version image will break our CI. So, usually we will have at least one
week to prepare.

2. Change the cmake generator in use in our pipelines from "Ninja" to
"MingW Makefile", because the latest version of cmake doesn't work with
the latest version of Ninja. People who prefer Ninja could still use
ninja in their local build by passing "--cmake_generator ninja" to
[build.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py).

3. Delete eager mode CI pipeline. 


### Motivation and Context
I need to update the software we have in our CI build machines, and I
need to resolve this incompatibility issue. In more detail, the build
error I hit was:

em++: error:
CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o:
No such file or directory
("CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o"
was expected to be an input file, based on the commandline arguments
provided)

After this PR we will deprecate python 3.7 support. The eager mode CI
pipeline is the last one that still use python 3.7. Then we can rework
the PR #10953 made by [fs-eire](https://github.com/fs-eire) last year.

Fixed
[AB#14435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14435)
2023-04-10 10:41:04 -07:00
Hector Li
9ef11f1c6a
[QNN EP] Qnn batchnorm Op support (#15222)
### Description
Support BatchNorm Op in Qnn EP
Node Unit group support for BatchNorm, Exp ops

### Motivation and Context
Enable more models.
2023-04-10 10:36:57 -07:00
Yi Zhang
0ea965c541
clear cache stat. after building (#15439)
### Description
Add  `ccache -z` after every building.


### Motivation and Context
Uploaded Cache stat shouldn't include cache stat.
2023-04-10 13:56:55 +08:00
stevenlix
6d126f8996
Add FP16 support for Whisper model (#15427)
Current ORT can only run inference for Whisper FP32 model. This PR adds
FP16 support.
2023-04-08 21:36:10 -07:00
Ye Wang
34f22daf25
Support T5 Beam Search with DecoderMaskedMHA (#15386)
### Description
<!-- Describe your changes. -->
tldr:
Latency improvement
t5-small: 37.8% 
t5-base: 24.5%


Benchmark on V100

Before:
T5-small
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '104.74', 'latency_95_percentile': '104.74',
'latency_99_percentile': '104.74', 'average_latency_ms': '104.74',
'QPS': '19.10', 'parity': True}
T5-base
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '200.93', 'latency_95_percentile': '200.93',
'latency_99_percentile': '200.93', 'average_latency_ms': '200.93',
'QPS': '9.95', 'parity': True}



After:
T5-small
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '76.01', 'latency_95_percentile': '76.01',
'latency_99_percentile': '76.01', 'average_latency_ms': '76.01', 'QPS':
'26.31', 'parity': True}
T5-base
ORT {'test_times': 1, 'latency_variance': '0.00',
'latency_90_percentile': '161.40', 'latency_95_percentile': '161.40',
'latency_99_percentile': '161.40', 'average_latency_ms': '161.40',
'QPS': '12.39', 'parity': True}


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-04-08 12:50:18 -07:00
Hariharan Seshadri
f77c8f4863
Fix Npm packaging pipeline (#15425)
### Description
It seems like https://github.com/microsoft/onnxruntime/pull/15329
re-worked some jobs in `react-native-ci.yml` into stages. When this
template is used from within `npm-packaging-pipeline.yml`, there is
problem in that there is a stage that contains multiple stages as jobs.
Per my understanding, this is not acceptable to Azure DevOps. So,
re-working some portion of `npm-packaging-pipeline.yml` to accomadate
changes in https://github.com/microsoft/onnxruntime/pull/15329

### Motivation and Context
Fix NPM packaging pipeline
Validating test run with fix:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=297391&view=results
2023-04-07 22:13:39 -07:00
Ryan Hill
56beac4b5b
VIT model handling in the Benchmark.sh file (#15045)
### Description
Adds VIT model type to the benchmark
Also adds Swin (v1) model type

### Motivation and Context
Image models are important and we should verify these work as expected
at the performance we expect.
2023-04-07 20:17:29 -07:00
Pranav Prakash
3c5d02a9ce
Implement BatchNormGradient kernel for CPU EP (#7622)
**Description**: Register an implementation for BatchNormInternal and
add a CPU kernel for BatchNormGradient. This is the third in a series of
PRs to implement BN training on CPU (first was #6946, second was #7539).

**Motivation and Context**
Support training networks with BatchNorm (e.g. convnets). Also note that
there exists a CUDA kernel for BN (forward training & backwards) but
it's currently disabled due to flaky failures; someone more familiar
with those parts can register the implementation for BNInternal on CUDA
(gradient kernel doesn't have to change).

---------

Co-authored-by: Simon Zirui Guo <simonguozirui@berkeley.edu>
Co-authored-by: mindest <linminuser@gmail.com>
Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>
2023-04-08 09:20:26 +08:00
Rui Ren
5e2f46df2b
update deepspeed version 0.8.3 (#15415)
### Description
<!-- Describe your changes. -->
Update the support deepspeed to 0.8.3 as it's the latest version


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will fix the error of `Skip modifying optimizer because of
unsupported DeepSpeed version`

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-07 17:59:50 -07:00
Edward Chen
666aff56a4
Add workflow to update Objective-C docs. (#15413)
Add workflow to update Objective-C API docs. Remove the Objective-C API doc generation step from the packaging pipeline.

There are similar workflows for automatically updating other language API docs. This change enables this for Objective-C too.
2023-04-07 15:00:15 -07:00