Commit graph

8819 commits

Author SHA1 Message Date
Sheil Kumar
a7ad859e3a
DML EP Register Split18 (#15931)
Register Split18 for DirectML

Split13 was previously implemented. Split18 adds a new attribute called
"num_outputs" that must be used mutually exclusively with the "split"
input.

The "num_outputs" attribute wil split the tensor evenly (and handles odd
uneven splits). To implement, the DML split tensor just needs to be
overridden in the presence of the num_output attribute.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2023-05-16 11:58:19 -07:00
Yulong Wang
04ea561fc8
[js/webgpu] throw error when WebGPU=ON and SIMD=OFF (#15924)
### Description
throw error when WebGPU=ON and SIMD=OFF
2023-05-16 11:05:56 -07:00
Jian Chen
780442b9f6
Change windows machine pools to use VS2022
 (#15806)
### Description
<!-- Describe your changes. -->



Old pool | New pool | Notes
-- | -- | --
onnxruntime-Win-CPU-2019 | onnxruntime-Win-CPU-2022 |  
onnxruntime-Win2019-CPU-training | onnxruntime-Win2022-CPU-training-AMD
|  
onnxruntime-Win2019-CPU-training-AMD |
onnxruntime-Win2022-CPU-training-AMD | Same as the above
onnxruntime-Win2019-GPU-dml-A10 | Need be created | You need to create a
new image for it first
onnxruntime-Win2019-GPU-T4 | onnxruntime-Win2022-GPU-T4 |  
onnxruntime-Win2019-GPU-training-T4 | onnxruntime-Win2022-GPU-T4 | Same
as the above because we do not have many T4 GPUs
onnxruntime-tensorrt8-winbuild-T4| TBD|TBD
Win-CPU-2021|onnxruntime-Win-CPU-2022| will do it in next PR
Win-CPU-2019|onnxruntime-Win2022-Intel-CPU'| Intel CPU needed for
win-ci-pipeline.yml -> `stage: x64_release_dnnl`

<br class="Apple-interchange-newline">

### Motivation and Context
With vs2022 we can take the advantage of 64bit compiler. It also with
better c++20 support
2023-05-16 10:34:34 -07:00
RandySheriffH
7faad53632
Set default option for package name and build arg options (#15958)
Set default value for parameters in nuget-zip pipeline, and only apply
the configurations when they are not "NONE".

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-16 09:07:38 -07:00
Akash
1079df6aaa
Update StableDiffusion path after cloning repo (#15948)
### Description
Correct path to SD files in README



### Motivation and Context
Small typo in path
2023-05-16 08:39:27 -07:00
Baiju Meswani
6b7181d31d
Add C# API documentation for training (and some other changes) (#15935) 2023-05-16 03:15:24 -07:00
Prathik Rao
a0ccb95f3c
add option to load pretrained weights for T5 model (#15951)
### Description
<!-- Describe your changes. -->

Adds option to pass in pretrained weights file during T5 inference onnx
export. Mimics the changes made to whisper:
https://github.com/microsoft/onnxruntime/pull/15759

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required for ONNX Runtime demo being presented at BUILD.
2023-05-15 22:52:35 -07:00
PeixuanZuo
e96f10d27b
[ROCm] reduce batch size to fix CI error (#15714)
ROCm CI batch size test occasionally fail. Try reduce batch size to fix
it.

error log:
Non-zero status code returned while running FusedMatMul node.
Name:'MatMul_2914_Grad/FusedMatMul_0' Status Message: HIP error
hipErrorNotFound:named symbol not found
Non-zero status code returned while running Gemm node.
Name:'MatMul_2891_Grad/Gemm_5' Status Message: HIP error
hipErrorNotFound:named symbol not found
2023-05-16 13:10:02 +08:00
Aung T Naing
bc5018a4e1
[QNN EP] test coverage for MaxPool (#15904)
### Description
Added MaxPool tests to show the issues with MaxPool and also provide
test coverage

The following tests are currently Failing:
 ./onnxruntime_test_all --gtest_filter=*.TestMaxPool*

[  FAILED  ] 5 tests, listed below:
[  FAILED  ] QnnCPUBackendTests.TestMaxPool_Ceil
[  FAILED  ] QnnCPUBackendTests.TestMaxPool_Large_Input2_Ceil
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input_HTP_u8
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input2_HTP_u8
[  FAILED  ] QnnHTPBackendTests.TestMaxPool_Large_Input2_Ceil_HTP_u8


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Provide test coverage for MaxPool and debug model related issues.
2023-05-15 21:35:50 -07:00
Yulong Wang
22a9a1a630
[js/webgpu] only register webgpu backend when it's available (#15922)
### Description
only register webgpu backend when it's available
2023-05-15 18:09:31 -07:00
cloudhan
dc383ed4ce
Basic CSharp packaging support for ROCm EP (#15535)
This PR mainly fixes building errors when trying to build nupkg for ROCm EP.
It also slighly improve the packaging logic so that devlopers can
produce the nupkg on linux natively.
2023-05-16 07:27:38 +08:00
Yulong Wang
204111a79e
[js/webgpu] support proxy for webgpu (#15851)
### Description
[js/webgpu] support proxy for webgpu. fixes #15832
2023-05-15 16:23:13 -07:00
Yulong Wang
f3b8130d1a
[js/web] support npm run pull:wasm [buildID] (#15877)
### Description
support `npm run pull:wasm [buildID]`

remove `npm run pull:wasm:debug` as it can be simply replaced with `npm
run pull:wasm debug`.
2023-05-15 16:19:34 -07:00
Jian Chen
00c1da5e0a
Fixing NhwcFusedConv fp16 (#15950)
### Description
<!-- Describe your changes. -->

This should produced fused Resnet50.fp16.onnx

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-15 15:34:41 -07:00
kunal-vaishnavi
5b663d6797
Whisper Multitask and Multilingual (#15936)
### Description
This PR enables Whisper's multitask format and allows a user to use
Whisper for multiple tasks (e.g. transcription, translation) and for
multilingual purposes (e.g. English, Spanish). This PR also removes
`attention_mask` as a required input for Whisper with beam search.

### Usage
Here is an example of how you can use Whisper for English transcription.
```
import numpy as np
import onnxruntime as ort

from datasets import load_dataset
from transformers import AutoConfig, AutoProcessor

model = "openai/whisper-tiny"
config = AutoConfig.from_pretrained(model)
processor = AutoProcessor.from_pretrained(model)

forced_decoder_ids = processor.get_decoder_prompt_ids(language="english", task="transcribe")
# forced_decoder_ids is of the format [(1, 50259), (2, 50359), (3, 50363)] and needs to be 
# of the format [50258, 50259, 50359, 50363] where 50258 is the start token id
forced_decoder_ids = [config.decoder_start_token_id] + list(map(lambda token: token[1], forced_decoder_ids))

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
input_features = processor(ds[0]["audio"]["array"], return_tensors="np").input_features

inputs = {
  "input_features": np.float32(input_features),
  "max_length": np.array([26], dtype=np.int32),
  "min_length": np.array([1], dtype=np.int32),
  "num_beams": np.array([2], dtype=np.int32),
  "num_return_sequences": np.array([1], dtype=np.int32),
  "length_penalty": np.array([1.0], dtype=np.float32),
  "repetition_penalty": np.array([1.0], dtype=np.float32),
  "decoder_input_ids": np.array([forced_decoder_ids], dtype=np.int32),
}
sess = ort.InferenceSession("whisper-tiny_beamsearch.onnx", providers=["CPUExecutionProvider"])
outputs = sess.run(None, inputs)

# Print tokens and decoded output
print(outputs[0][0][0])
print(processor.decode(outputs[0][0][0]))
```

If you don't want to provide specific decoder input ids or you want
Whisper to predict the output language and task, you can set
`forced_decoder_ids = [config.decoder_start_token_id]` instead.

### Motivation and Context

As seen in the figure below from the [OpenAI Whisper
paper](https://cdn.openai.com/papers/whisper.pdf), Whisper can be used
for multiple tasks and languages.

![Screenshot 2023-05-12
165215](https://github.com/microsoft/onnxruntime/assets/115581922/49335e39-a79c-4f78-92e9-89b034405f65)
2023-05-15 14:36:33 -07:00
Ye Wang
3418ca28a8
pack qkv in t5 decoder (#15801)
### Description
<!-- Describe your changes. -->

V100, b_4_s_128, max_output_len=64, beam=4

before:
t5_small: 101.28ms
t5_base:  200.07ms

after:
t5_small: 87.65ms
t5_base: 174.44ms



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-05-15 13:45:39 -07:00
liqun Fu
a8d9b29cd2
support AveragePool19 and Pad19 (#15597) 2023-05-15 10:46:24 -07:00
Ryan Hill
22bc020dcd
Ryanunderhill/ortapibase enforce (#15940)
### Description
To prevent people from accidentally changing OrtApiBase, these
static_asserts will fire if anyone adds a method, rearranges the method
ordering, or changes the signature of any of the methods.

### Motivation and Context
People have submitted changes that have done these things and there was
no mechanism to stop them besides someone noticing in a code review. An
automated way to let people know is needed.
2023-05-15 10:35:38 -07:00
yf711
825d691617
Unify cuda & trt version on few CIs (#15943)
### Description
The cuda & trt version of some CIs didn't sync with the majority. 
Unifying cuda version as 11.8 and trt version as 8.6 on these CIs
2023-05-15 09:54:30 -07:00
Sheil Kumar
fa16e2e0f3
Register CPU OptionalGetElement, OptionalHasElement on DirectML (#15926)
Register CPU OptionalGetElement, OptionalHasElement on DirectML

Graphs with OptionalGetElement and OptionalHasElement should work in a
DML graph without extra memcpy operation on and off the GPU.

CopyCpuTensor is swapped with DataTransferManager.CopyTensor() to make
the CPU operator usable by other providers.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2023-05-15 09:53:35 -07:00
Rachel Guo
18133ddadb
[doc] add LeakyRelu to coreml supported ops (#15944)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-15 09:46:30 -07:00
RandySheriffH
4b1d9d796a
Scope down UT (#15939)
Scope down a unit test case where the condition should only apply when:

1. Two streams, one GPU one CPU;
2. If node is on CPU;
3. There is a wait step before the If node.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-15 09:30:57 -07:00
Adrian Lizarraga
5542e70dd1
[QNN EP] Update default QNN SDK version to 2.10 for QNN NuGet pipeline (#15899)
### Description
Updates the default QNN SDK version to 2.10 for the QNN NuGet pipeline.

### Motivation and Context
Ensures that the daily QNN NuGet pipeline builds ORT using the latest
QNN SDK by default.
2023-05-15 09:17:42 -07:00
cao lei
3b8f3a086f
change the EP device to default OrtDevice() for memoryType equals CPUInput (#15903)
### Description
<!-- Describe your changes. -->
change the EP device to default OrtDevice() for memoryType equals
CPUInput for cuda, rocm, migraph
x and tensorRT EP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My previous PR (https://github.com/microsoft/onnxruntime/pull/15618)
caused random failures on cuda training test
GradientCheckerTest.TileGrad (see build
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb)
and rocm test:

root@a59558217e53:/workspace# pytest
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
... 
E RuntimeError: Error in backward pass execution: Non-zero status code
returned while running ATen node.
Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size
calculation overflowed with sizes=[72340172838076673, 72340172838076673,
128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx
EP is CPUInput, previously the corresponding device in the IAllocator's
memoryInfo is default OrtDevice(), while after my change, it becomes
OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training
build.
2023-05-15 07:42:17 -07:00
PeixuanZuo
af6cb2af87
[ROCm] update ROCm/MIGraphX CI to ROCm5.5 (#15905)
update ROCm/MIGraphX CI to ROC5.5.

TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(https://github.com/microsoft/onnxruntime/pull/15903)
- test_ortmodule_attribute_name_collision_warning
(https://github.com/microsoft/onnxruntime/pull/15884)
2023-05-15 10:28:15 +08:00
Yi Zhang
b20d5e85d5
Update Cuda to 11.8 in 2 Linux GPU workflows. (#15925)
### Description
use template variable for cuda version


### Motivation and Context
2023-05-14 12:51:25 +08:00
Ryan Hill
310273cbe4
BeamScorer to use contiguous arrays for BeamHypotheses (#15923)
### Description
Change BeamHypotheses to not use a stl::priority_queue and instead all
BeamHypotheses use a single buffer that they each get a small slice of.

As the beam count is really small (typically 4,8, max of 32) and the
array size fixed, the BeamHypotheses just does a sorted insert into an
array.

This also allows for the BeamHypotheses inside of the BeamSearchScorer
to be a single fixed allocation vs an onnxruntime::FastAllocVector.

### Motivation and Context
The goal is to simplify the memory usage and make the code more easily
ported to CUDA.
2023-05-13 14:17:45 -07:00
Dmitri Smirnov
896a963492
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921)
### Description

This PR partially reverts changes introduced in
https://github.com/microsoft/onnxruntime/pull/15643

We make two API return std::string always in UTF-8.

We also move the entry points from OrtApiBase to OrtApi to make them
versioned.

### Motivation and Context

`GetVersionString` always returns x.y.z numbers that are not subject to
internationalization.
`GetBuildInfoString` can hold international chars, but UTF-8 should be
fine to contain those.
We prefix them with u8"" in case the compiler default charset is not
UTF-8.
Furthermore, creating platform dependent APIs is discouraged.
`ORTCHAR_T` is platform dependent and was created for paths only.
On non-unix platforms would still produce `std::string` that can only
contain UTF-8

The API was introduced after the latest release, and can still be
adjusted.
2023-05-13 13:45:07 -07:00
RandySheriffH
9fe6d58857
Separate execution plan serialization for a new PR. (#15916)
Remove serialization for execution plans, will follow up with another PR
along with proper unit tests.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-13 10:39:54 -07:00
Chester Liu
984dd02df3
Update optimize_pipeline.py to use __name__ detection (#15866)
### Description
<!-- Describe your changes. -->

Use `__name__` detection in `optimize_pipeline.py`.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It prevents unwanted execution of `main` when importing the file.
2023-05-12 20:43:29 -07:00
Yulong Wang
9328a0f955
[js/webgpu] run test on chrome instead of chrome canary for webgpu (#15902)
### Description
webgpu is released in chrome v113. No longer to use chrome canary in
test cli
2023-05-12 15:47:59 -07:00
Maximilian Müller
143551092f
fix: setting builder optimization level to TRT 8.6 default (#15897)
The actual released default level is 3 and not the previously used 2.

Just a small sample of the effects:
![Screenshot 2023-05-10 at 15 49
55](https://github.com/microsoft/onnxruntime/assets/44298237/5a694446-22c0-4943-9ddf-80670781878f)
2023-05-12 13:29:30 -07:00
Numfor Tiapo
b473d3eee5
Reenable ConstantOfShape TypeTests (#15910)
ConstantOfShape TypeTests were previously broken due to a bug where the
case for the uint64 test was being passed an int64_data_size. Changing
the data type to uint64_data_size fixes the bug.

TensorProto Int8 and Int16 tests are reenabled since they are now
passing.
2023-05-12 11:28:57 -07:00
petermcaughan
e5189330d5
Address OOM Issue when exporting Whisper (#15880)
### Description
Remove attention_mask from unnecessary code paths in the whisper export
process.

### Motivation and Context
Current export script frequently hits OOM error when export
whisper-large. Memory profiling shows that this is a result of
generating dummy inputs for the `encoder_attention_mask` input for a
model pass during exporting - in whisper-large, this dummy tensor can be
around 20GB in size.

`encoder_attention_mask` is ultimately a dummy input - it's just there
to satisfy certain BeamSearch requirements. Thus, we're currently
creating a 20GB tensor and passing it to the model, which then discards
the input anyways. By removing the code path to generate a dummy
encoder_mask tensor, we can reduce the memory requirements to export
whisper substantially, while keeping the BeamSearch checks satisfied.

---------

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-05-12 11:23:07 -07:00
Numfor Tiapo
000a600080
DML EP Mark ImageScalar Test As 'won't fix' (#15894)
ImageScalar is an experimental operator added in ONNX 1.2.1 and removed
in ONNX 1.5 so it's no longer in use.

Changing the comment to won't fix.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2023-05-12 10:11:51 -07:00
Hector Li
1bebc88069
[SNPE EP] Add option to enable SNPE init caching feature (#15917)
### Description
[SNPE EP] Add option to enable SNPE init caching feature

### Motivation and Context
To save model initialization time
2023-05-12 07:57:11 -07:00
RandySheriffH
7c4e8267e7
Implement openAI endpoint invoker for nuget (#15797)
Implement openAI audio endpoint, and enable nuget packaging.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-11 22:04:02 -07:00
Yi Zhang
0e7ae13e74
Run Linux GPU tests in docker container (#15872)
### Description
Run  Linux GPU tests in docker container

### Motivation and Context
2023-05-12 06:29:22 +08:00
Yufeng Li
902c5f53ae
add cutlass fmha support in PackedAttention (#15838)
### Description
<!-- Describe your changes. -->
Support cutlass fMHA in PackedAttention. Though we have fMHA trt kernel,
it doesn't support relative bias position. Cutlass fmha has support for
RBP and also support lower end GPUs(5.3, 6.x).


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-11 13:47:15 -07:00
Nat Kershaw (MSFT)
27b2815d42
Update publish-csharp-apidocs.yml .NET 5->6 (#15854) 2023-05-11 13:35:53 -07:00
Jian Chen
1a73d61829
Update eigen to 3.4 and remove the eigen from git submodule (#15875)
### Description
Update eigen to 3.4 and remove the eigen from git submodule

### Motivation and Context
We need to have eigen 3.4 for c++20
2023-05-11 11:56:59 -07:00
Changming Sun
7c58d013aa
Remove Ubuntu 18.04 usages (#15781)
### Description
Remove Ubuntu 18.04 usages because it will be EOL this month.

### Motivation and Context
2023-05-11 11:44:00 -07:00
sdegrande
cf062dbdb1
FlatBuffers fails to compile with gcc13. (#15787)
When building the FlatBuffers dependencies, gcc13 emits a
stringop-overflow warning. All warnings being turned into errors, that
fails the compilation of FlatBuffers, and as a consequence also fails
the build of onnxruntime.

This commit adds the application of a patch to FlatBuffers's
CMakeList.txt, to add -Wno-error=stringop-overflow to the
CMAKE_CXX_FLAGS.
2023-05-11 11:20:19 -07:00
Yulong Wang
756cf3a76f
increase web CI timeout (#15876)
### Description
The CI is extremely slow on downloading source code (~1MB/sec) so the
web CI went timeout. This is blocking the PR/checks.

Increase the timeout temporarily.
2023-05-11 11:17:46 -07:00
Ryan Hill
e15ab78052
Ryanunderhill/beamsearch simplify (#15883)
### Description
Simplify some sections of code by removing some extra gsl::span
conversions and passing parameter packs by an existing structure vs
directly.

### Motivation and Context
While stepping through the code, I noticed parts that could be
simplified. Simplifying then helped me understand it further.
2023-05-11 09:50:14 -07:00
RandySheriffH
657ab2f43c
Sync between parent node and subgraph (#15757)
By https://github.com/microsoft/onnxruntime/issues/14691, we found that
there is a mis-reuse of GPU memory between NonZero(GPU) and
Identity(GPU) which is a subgraph node in If(CPU).
The NonZero gives a GPU output consumed by Transpose(GPU), after which
that GPU output marks as free in BFCArena, and soon be reused by
Identity(GPU) in a subgraph of If(CPU).
However, NonZero(GPU) and Identity(GPU) run on separate cuda streams,
there is no synchronization because the Identity node is in a subgraph
of If(CPU). Meaning - Identity(GPU) can write to the memory when
Transpose(GPU) is reading from it.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-11 09:28:04 -07:00
pengwa
fed52053a7
Refine a bit (on device training) (#15803)
### Few minor refinements:
- Simplify ParameterOptimizerState a bit
- Use inlined containers
- Remove GetStateDict APIs]
- Re-enable cuda test for lr scheduler
2023-05-10 20:36:13 -07:00
pengwa
346ec12377
Fix no contrib tests in TVM CI (#15895)
### Fix no contrib tests in TVM CI


Linux_CI / Onnxruntime-TVM 
Windows_CI / Onnxruntime-TVM


```
[----------] Global test environment tear-down
[==========] 2850 tests from 186 test suites ran. (51340 ms total)
[  PASSED  ] 2820 tests.
[  FAILED  ] 30 tests, listed below:
[  FAILED  ] GraphTransformationTests.LayerNormFusionTest
[  FAILED  ] GraphTransformationTests.LayerNormWithCastFusionTest_2
[  FAILED  ] GraphTransformationTests.LayerNormWithCastFusionTest_3
[  FAILED  ] GraphTransformationTests.LayerNormWithCastFusionTest_4
[  FAILED  ] GraphTransformationTests.LayerNormWithCastFusionTest_5
[  FAILED  ] GraphTransformationTests.SimplifiedLayerNormFusionTest
[  FAILED  ] GraphTransformationTests.SimplifiedLayerNormWithCastsFusionTestCudaEp
[  FAILED  ] GraphTransformationTests.SkipLayerNormFusionTest
[  FAILED  ] GraphTransformationTests.SkipLayerNormFusionWithCastTest
[  FAILED  ] GraphTransformationTests.SkipLayerNormFusion_Input_Output_Check
[  FAILED  ] GraphTransformationTests.SkipLayerNormFusion_NoBeta
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat1
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat2
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat3
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat3_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat3NoCast
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat3NoCast_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat4
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat5
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat5_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat6
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat6_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat7
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat7_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat8
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat8_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat9
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionFormat9_OpSet13
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionMultiple
[  FAILED  ] GraphTransformationTests.EmbedLayerNormFusionMultiple_OpSet13
```

Looks related to https://github.com/microsoft/onnxruntime/pull/15844. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-11 09:14:13 +08:00
George Nash
19f2cc6fb6
Check the AMX tile configuration is unchanged (#15387)
Don't assume the AMX tile configuration will always remain unchanged It
is possible that other code will change the AMX tile configuration.

This change will read the current tile configuration
 - if the tile is un-configured it will be configured
- if the tile is configured but does not match the expected
configuration it will be configured for the expected configuration

This resolves issues seen in unit tests when building OneDNN ep.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Signed-off-by: George Nash <george.nash@intel.com>
2023-05-10 15:50:59 -07:00
liqun Fu
ac9ae9f7c5
update onnx release 1.14 for docker files (#15680)
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.


### Motivation and Context

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-05-10 13:15:56 -07:00