Commit graph

8630 commits

Author SHA1 Message Date
Ye Wang
d00197aaa7
initialize cache_indir explicitly in beamsearch with encoder decoder model (#15667) 2023-04-25 11:05:21 -07:00
Chi Lo
e1755541cc
Fix TRT timing cache test (#15588)
TRT EP test for timing cache has wrong logic where it enables timing
cache for both sessions to compare the trt engine build time, that's why
CI got some intermittent failures.

This PR disabled the timing cache test for comparing the engine build
time between enabling/disabling timing cache until we find a model that
can benefit from timing cache.
2023-04-25 10:20:26 -07:00
Wei-Sheng Chin
d0c3f92ec6
[DORT] Fix fake tensor problem cuased by PyTorch change (#15664)
This should make `Orttraining Linux Lazy Tensor CI Pipeline` green
again.
2023-04-25 19:56:42 +08:00
Yulong Wang
3440d3a08e
remove 'lib/' from .gitignore (#15613)
This will ignore source folder /js/web/lib/
2023-04-24 18:43:32 -07:00
Ashwini Khade
124ea0a801
remove compute optimizer from lte (learning on the edge) builds (#15637)
### Description
Removing compute optimizer from on device training builds.

### Motivation and Context
1. mitigate android build failures
2. reduce binary size

Since only CPU EP is enabled for LTE builds, we can optimize the models
offline.
2023-04-24 15:57:15 -07:00
Yulong Wang
14cc02c65c
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
  - Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
  - initial implementation of kernels:
    - elementwise operators (22)
    - binary operators (5)
    - tensor: Shape, Reshape, Transpose, Gemm
    - nn: Conv, {Global}Maxpool, {Global}AveragePool


Code need to be polished. still working on it.

## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.

Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.

What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.

What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
> 
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
>   // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.

What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.

## Design Overview

**Inter-op**

JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
    Module.jsepBackend = backend;
    Module.jsepAlloc = alloc;
    Module.jsepFree = free;
    Module.jsepCopy = copy;
    Module.jsepCopyAsync = copyAsync;
    Module.jsepCreateKernel = createKernel;
    Module.jsepReleaseKernel = releaseKernel;
    Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this

The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.

**Resource Management**

Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.

For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.

**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.

**run kernel in JS**

Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.

`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.

**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.

**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.

**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 15:21:18 -07:00
George Wu
8dd32fed47
[TensorRT EP] avoid excessive library load/unload overhead when running unit tests. (#15639)
TensorRT will load/unload libraries as builder objects are created and
torn down. This will happen for
every single unit test, which leads to excessive test execution time due
to that overhead.
This overhead has steadily increased over the past few TensorRT versions
as the library objects get bigger leading to
8 hours to run all the unit tests. Nvidia suggests to keep a placeholder
builder object around to avoid this.
2023-04-24 14:43:13 -07:00
George Wu
c2acf69d13
support new include,lib dir structure in upcoming QNN 2.11 (#15605)
upcoming QNN 2.11 will have a different include/lib directory structure.
update cmake files to support the new structure.
2023-04-24 13:10:17 -07:00
Ashwini Khade
ccb2243ee7
Update build option for training in java to enable_training_api (#15638)
### Description
Updating the build option for enabling training in java builds from
ENABLE_TRAINING -> ENABLE_TRAINING_APIS.
In the native codebase ENABLE_TRAINING is used for enabling full
training and ENABLE_TRAINING_APIS is used for creating the lte builds
with training apis. Making the change to sync the naming convention
across all the language bindings.

It was a bit confusing to see ENABLE_TRAINING when debugging the android
build failures for training. Making this change just to improve
readability of logs during debugging.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-24 11:53:08 -07:00
Tianlei Wu
686fd3c22a
Fix cuda 12.1 windows Build (#15614)
### Description
Fix CUDA 12.1 Windows build error of cuda namespace ambiguous. Use a new namespace for attention softmax.

Tested with VS 2019 and VS 2022 with the following settings:
- OS: Microsoft Windows 11 Enterprise (Version 10.0.22621 Build 22621)
- CUDA: cuda_12.1.0_531.14_windows
- TensorRT: TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0
- CUDNN: 8.8.1.3 for cuda 12
- Visual Studio Enterprise 2019, version 16.11.26 (MSVC v142) or
  Visual Studio Enterprise 2022 (64-bit), version 17.5.4
- Python: 3.10
- CMake: 3.25.2

VS 2019:
```
build.bat --cmake_generator "Visual Studio 16 2019" --config Release --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;80;86" --skip_submodule_sync --parallel --build_shared_lib --update --build --build_dir .\build\trt --use_cuda --cuda_version "12.1" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" --cudnn_home "C:\CuDNN\8.8.1.3_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0\TensorRT-8.6.0.12"
```

VS 2022:
```
build.bat --cmake_generator "Visual Studio 17 2022" --config Release --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;80;86" --skip_submodule_sync --parallel --build_shared_lib --update --build --build_dir .\build\trt_2022 --use_cuda --cuda_version "12.1" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" --cudnn_home "C:\CuDNN\8.8.1.3_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0\TensorRT-8.6.0.12"
```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/15242
2023-04-24 10:02:35 -07:00
cao lei
dc53ddef7a
Create a new C API KernelContext_GetAllocator() for Custom Op scenario (#15591)
### Description
Create a new C API KernelContext_GetAllocator() for Custom Op scenario



### Motivation and Context
Create a new C API KernelContext_GetAllocator() for Custom Op scenario
2023-04-23 21:54:35 -07:00
Hector Li
a8e2833050
[QNN EP]Unblock Qnn EP for Csharp support (#15640)
### Description
Unblock Qnn EP for Csharp support

### Motivation and Context
Enable Csharp support for Qnn EP
2023-04-23 21:28:34 -07:00
Changming Sun
c82bebde6a
Fix the TestCUDAProviderOptions test error (#15649)
The test limits GPU's running memory requirements to 20MB. It might be
enough in the past, but it seems not enough now when we upgrade CUDA to
a newer version or add more kernels/graph transformers to our code.
Therefore we need to increase it. Our test log shows sometimes the model
needs 128MB memory. So I set the limit to 256MB.
2023-04-24 11:21:59 +08:00
PeixuanZuo
9df1a5e605
[ROCm] enable LayerNorm opset Ver17 for ROCm EP (#15601)
enable LayerNorm opset Ver17 for ROCm EP.
2023-04-24 10:30:06 +08:00
Erick Muñoz
45c82eefb4
[OneDNN] Fix poolgrad bug (#15557)
* Fixed default dilatation value for poolgrad ops

### Description
Changed default dilatation value to 0 in poolgrad ops



### Motivation and Context
Fixes error on unit tests when --enable_training --use_dnnl flags are
active and
2023-04-23 08:20:26 -07:00
cloudhan
d1354dcc83
[ROCm] Add stable diffusion benchmark results for MI100 (#15646) 2023-04-23 18:29:35 +08:00
cloudhan
8297148bde
[ROCm] Update benchmark for stable diffusion (#15602)
1. update scripts for ROCm memory measurement.
2. update README to contain ROCm result.
3. address some minor issue in the README
2023-04-23 11:49:40 +08:00
cloudhan
9e44248bf9
Workaround ROCm global pool (#15481)
Implement global avg/max pool with reduction
2023-04-23 11:48:43 +08:00
Baiju Meswani
fd6ecc3909
Add env to the TrainingSession constructor (#15635) 2023-04-21 21:05:46 -07:00
Hector Li
fab3e33105
[Qnn EP]Enable Gelu op support (#15631)
### Description
Enable Gelu contrib op support

### Motivation and Context
unblock models with contrib op Gelu
2023-04-21 16:54:34 -07:00
Patrice Vignola
0080bb0331
Add NCHW transpose for GroupNorm (#15634)
It gives about a 2x perf improvement on Stable Diffusion on some
hardware.
2023-04-21 15:18:11 -07:00
Patrice Vignola
b49d428299
[DML EP] Add missing newline to image test logging (#15596) 2023-04-21 13:39:07 -07:00
Tianlei Wu
5a675d9113
Disable random failing DML image batch test (#15624)
### Description
Disable a test with random failure in Windows GPU CI Pipeline like the
following:

```
11: [       OK ] BatchTest/BatchTest.BatchSupport/163 (0 ms)
11: [ RUN      ] BatchTest/BatchTest.BatchSupport/164
11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(186): error: Expected: m_model_binding.Bind(output_data_binding_name, output_video_frames) doesn't throw an exception.
11:   Actual: it throws.
11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(211): error: Expected: m_result = m_session.Evaluate(m_model_binding, L"") doesn't throw an exception.
11:   Actual: it throws.
11: total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0[  FAILED  ] BatchTest/BatchTest.BatchSupport/164, where GetParam() = ((L"fns-candy_Bgr8_Batch3.onnx", 0, { L"1080.jpg", L"fish_720_Gray.png", L"fish_720.png" }, 3, false), 0, 1, 1, 1, 4-byte object <02-00 00-00>) (3203 ms)
```

Since https://github.com/microsoft/onnxruntime/pull/15468 merged to
main, about 10~15% build job failed in the test.
2023-04-21 13:29:56 -07:00
Ye Wang
633dec0b17
refactor some code (#15566)
### Description
<!-- Describe your changes. -->

1. moved onnxruntime/contrib_ops/cuda/decoder to
onnxruntime/contrib_ops/cuda/bert
2. create utils.cuh under /bert for shared implementations in
decoder_masked_multihead_attention_impl_utils.h and
rotary_embedding_util.h
3. refactored relative_attn_bias_impl.cu by reusing the template
specializations in utils.cuh

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-04-21 12:57:08 -07:00
Baiju Meswani
b5a1941835
C, C++, Python, C# API update for on device training (#15518) 2023-04-21 11:36:01 -07:00
Zhang Lei
a6d6e45be2
Tune block size for layer_norm considering #rows and GPU resource (#15410)
fine tune cuda layernorm block size considering number of rows to
process together with column number, and hardware resources (number of
SMs, etc)

Co-authored-by: Lei Zhang <phill.zhang@gmail.com>
2023-04-21 09:49:21 -07:00
Rachel Guo
2cb3fb18b5
Integrate React Native E2E test with detox framework (#15133)
### Description
<!-- Describe your changes. -->

Integrate react native e2e test framework with detox.
https://wix.github.io/Detox/

Good build in CI:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=946695&view=results

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Write cross-platform end-to-end tests in JavaScript. 
Resolve flaky e2e tests in react native ci pipelines.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-04-21 09:46:26 -07:00
Adrian Lizarraga
f3d04cd1be
[QNN EP] Update Windows ARM64 pipeline to use Visual Studio 2022 (#15607)
### Description
- Updates the QNN Windows ARM64 pipeline to use a new image with Visual
Studio 2022 (updated from VS 2019)
- Creates a new gtest fixture class that skips tests for the QNN CPU
backend if we detect that the QNN CPU backend is not
available/functional. The current windows arm64 vm does not support any
QNN backend.

### Motivation and Context
Visual Studio 2022 adds support for native arm64 compilation. This
pipeline will help catch any build regressions on Windows ARM64 w/ VS
2022.
2023-04-21 09:31:10 -07:00
Yi Zhang
84746a8efe
Revert "Retry the step of Start Android simulator (#15584)" (#15620)
This reverts commit 64b63921a2.


### Motivation and Context
From
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=970086&view=logs&s=28fb2bf2-39c5-5feb-1887-4904233f6193&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3
It's useless to rerun the step.
2023-04-21 08:33:18 -07:00
kunal-vaishnavi
3de33e00c7
Fix issues for Whisper export with beam search (#15619)
### Description
This PR fixes an issue with calling the ORT transformer optimizer script
on the custom export of Whisper with beam search. It also includes the
[fix](https://github.com/microsoft/onnxruntime/pull/15616) for the GPU
out-of-memory issue.



### Motivation and Context
With this PR fix, the optimizer runs as described in the [Whisper model
optimization PR](https://github.com/microsoft/onnxruntime/pull/15473).
2023-04-21 00:08:58 -07:00
Ted Themistokleous
9011613b65
Add Trilu and GatherND to the list of supported OPs for MIGraphX EP (#15463)
Add support entry for Trilu op to be recognized in the MIGraphX EP

Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2023-04-21 14:46:28 +08:00
Yi Zhang
a2f80a006b
update target framework to dotnet6.0 (#15615)
### Description
Upgrade dotnet E2E test target framework to dotnet6.0


### Motivation and Context
Fix dotnet3.1 deprecation issue which broke nuget building pipeline.
The error message in NuGet_Test_Linux_CPU was
```
To install missing framework, download:
https://aka.ms/dotnet-core-applaunch?framework=Microsoft.NETCore.App&framework_version=3.1.0&arch=x64&rid=ubuntu.20.04-x64
. Please check the diagnostic logs for more information.
```

Test Run:

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=300655&view=results.
2023-04-21 12:11:43 +08:00
Chi Lo
6cf080ccbf
Temporarily disable two tests for TRT EP (#15578)
We are investigating an issue introduced by TRT 8.6 which causes [TRT EP
CI](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=967950&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=66420422-c7d6-5f71-625c-4b7851c9b9ba)
fail. Disable two tests for now until the issue is root caused and
fixed.
2023-04-20 16:32:56 -07:00
Justin Chu
dfa06bf81b
Add link to doc for lintrunner in CI (#15604)
Add a link to point to the doc where users can find instructions to set
up lintrunner should there be any lint issues in CI.
2023-04-20 15:54:14 -07:00
Dmitri Smirnov
a5dec8eedf
[C# ] Improve string marshalling and reduce GC pressure (#15545)
### Description

  Reduce a number of auxillary objects created to reduce GC pressure.
Eliminate GCHandle type of memory pinning in most of the places.
Improve string marshalling by allocating unmanaged memory that does not
require pinning. Change native methods from `IntPtr` to `byte[]`
(marshalling pinning is more efficient).

Allocate input/output UTF-8 names in unmanaged heap for the lifetime of
InferenceSession. So we do not keep converting them and pinning on every
Run.

Introduce a new native API that allows to allocate and convert/copy
strings directly into a native tensor.

The PR delivers around 50% latency improvements and less GC pauses.

Inspired by: https://github.com/microsoft/onnxruntime/pull/15520

### Motivation and Context
Client experience GC pressure and performance degradation when dealing
with string tensors.


Co-Authored-By: @tannergooding
2023-04-20 15:12:51 -07:00
Yufeng Li
373f912e51
add quantization support for whisper (#15589)
### Description
<!-- Describe your changes. -->
Add dynamic quantization support for whisper model.
There are 3 options to try out:
- quantize_embedding_layer: enable to quantize embedding layer of
decoder model or not
- quantize_per_channel: enable to quantize per channel for Gemm or
MatMul
- quantize_reduce_range: use 7bit to quantize MatMul or Gemm. Use when
hitting accuracy issue on x64 cpus without VNNI.
2023-04-20 14:22:11 -07:00
Edward Chen
4b74cb1741
Make docker command fail if bash command fails. (#15564)
Add `set -e` so that failing bash commands will cause the containing docker command to fail.
2023-04-20 13:38:58 -07:00
Baiju Meswani
46210556f0
BatchnormInternal avoid setting num_channels if input shape is not known (#15544) 2023-04-20 12:57:16 -07:00
Baiju Meswani
11b0a18de6
Add support for cuda 11.8 and python 3.11 for training (#15548) 2023-04-20 12:56:45 -07:00
Justin Chu
1f7c2f724f
Fix lintrunner configurations (#15586)
### Description

- Fix lintrunner configurations to always use `python` instead of
`python3`.
- Set up dependabot
- Moved dependencies to requirements-lintrunner to allow dependabot to
update it similar to https://github.com/onnx/onnx/pull/5124
2023-04-20 08:54:26 -07:00
Adrian Lizarraga
9df96c7d5b
[QNN EP] Fix shape inference of NHWC Resize (#15477)
### Description
Adds schema for NHWC Resize that uses the default ONNX type/shape
inferencing.


### Motivation and Context
The QNN EP requires the Resize operator to be NHWC. Currently, the
Resize operator fails type and shape inference because the current
schema changes the input to NCHW, but the `scales` and `sizes` inputs
remain in NHWC.

This PR adds a schema for NHWC Resize that allows it to use the default
ONNX type/shape inference while still remaining in the internal NHWC
domain.
2023-04-20 07:25:25 -07:00
Scott McKay
446c478fbd
Add iOS Swift Package Manager support (#15297)
### Description
<!-- Describe your changes. -->
Add Swift Package Manager (SPM) support for ORT based on  #14621
- uses the existing objective-c bindings
- some re-organization of the directory structure was required but the
contents of the files are unchanged, apart from adjustments due to file
movements

Add tool for updating ORT native pod used in the SPM package
Update CIs to use ORT native pod from build, and build/test using SPM



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
iOS developers are using SPM as much as cocoapods, so adding SPM means
both are catered for.
2023-04-20 16:18:35 +10:00
Yi Zhang
64b63921a2
Retry the step of Start Android simulator (#15584)
### Description
Add Retry once There's a failure in `Start Android Simulator`. 

### Motivation and Context
`Start Android Simulator` isn't stable enough and the pipeline would
hang.

We could find many instances in
https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=188&contextType=build
2023-04-20 12:06:35 +08:00
Yi Zhang
5b6f79e79b
Improve windows build cache steps (#15537)
### Description
1. Split deps' compilation cache and ort's
2. reduce the caches generation in merge branch.

### Motivation and Context
Reduce pipeline cache stage.
2023-04-20 09:42:22 +08:00
Chen Fu
29d00fb776
Set proper default values for pool attributes (#15559)
### Description
Setting proper default value for attributes of pool operators


### Motivation and Context
Fixed AB#14719

Global pooling and pooling operators usually share the same underlying
implementation. When we detect the operator is global, code for setting
up the attributes is skipped. This may cause un-deterministic behavior.
2023-04-19 17:24:35 -07:00
George Nash
f2889b41c1
[AMX] Update assembler check (#15501)
A recent commit added an assembler check if the ASM dialect was ATT

This unfortunately broke the AMX build for systems that don't have the
ASM-ATT dialect.

This change assumes if the CMAKE_ASM-ATT_COMPILER_ID is not found and
the CMAKE_ASM_COMPILER_ID is "GNU" based on all the other already passed
checks AMX is supported by the compiler and assembler.

### Description




### Motivation and Context
On my build system the recent change to add the ASM-ATT version check
disabled AMX code from the build.

---------

Signed-off-by: George Nash <george.nash@intel.com>
2023-04-19 14:16:26 -07:00
Chen Fu
142220ad87
Fix cmake 3.25 debug info config (#15565)
### Description

https://github.com/microsoft/onnxruntime/pull/15538
Above pull request breaks Windows build on cmake 3.25 or earlier. This
should fix it.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-19 09:14:19 -07:00
Yi Zhang
573e4cf95f
[Fix] Python Packaging Pipeline exception. (#15568)
### Description
supplement of #15299

### Motivation and Context
It broke Python Packaging Pipeline since April 12.
2023-04-19 21:57:14 +08:00
PeixuanZuo
59ea35d592
[ROCm] add CK GroupNorm to GroupNormTunable (#15510)
- Add CK GroupNorm to GroupNormTunable.
- Reduce configuration of GroupNormNHWCOp because CK implementation is
better.

The performance gain on stable diffusion v1.5.
Before:
```
'height': 512
'width': 512
'steps': 50
'batch_size': 1
'batch_count': 5
'num_prompts': 1
'average_latency': 2.4782688856124877
'median_latency': 2.4783748388290405
'provider': 'ROCMExecutionProvider'
'disable_safety_checker': True 
```

After:
```
'height': 512, 
'width': 512, 
'steps': 50, 
'batch_size': 1,
'batch_count': 5,
'num_prompts': 1, 
'average_latency': 2.107170510292053,
 'median_latency': 2.1067750453948975,
 'first_run_memory_MB': -1, 
'second_run_memory_MB': -1,
'provider': 'ROCMExecutionProvider', 
'disable_safety_checker': True
```
2023-04-19 13:54:59 +08:00
Dmitri Smirnov
a66af390fa
[C#] Allow passing various options when creating singleton Environment object. (#14723)
### Description
Re-work OrtEnv class so we can pass various options when creating the
environment such as:
- logId
- initial logging level
- thread options
- user supplied logging function

Create the default instance when SessionOptions are instantiated as
users often forget to do so.

### Motivation and Context
We lack this capability.
Inspired by
https://github.com/microsoft/onnxruntime/pull/13822
https://github.com/microsoft/onnxruntime/pull/13951
https://github.com/microsoft/onnxruntime/pull/11593


Cc: @thoron

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-04-18 21:49:55 -07:00