Commit graph

8703 commits

Author SHA1 Message Date
Baiju Meswani
bb33285ec2
C# training api updates for on device training (#15720) 2023-05-01 10:01:38 -07:00
shalvamist
c10a6a9d17
Tensor <--> image - Adding per channel compute for Norm mean & Bias (#14705)
### Description
Enabled the use of per channel Bias and Mean normalization when converting an image <--> tensor.
Added a few bug fixes and updates to the relevant E2E tests.

---------

Co-authored-by: shalvamist <shalva.mist@microsoft.com>
2023-05-01 09:37:50 -07:00
Scott McKay
0770cf3699
Remove C# SessionOptions.RegisterCustomOpsUsingFunction. (#15754)
Symbol visibility from DllImport is inconsistent across platforms resulting in the symbol not necessarily being visible to ORT native code that tries to look it up by name.

Best solution is to use DllImport to load the library and to call the registration function directly. That requires the native SessionOptions handle and OrtApiBase struct. We could either make those public, or provide a helper where the user passes in a delegate from their DllImport. Can add when needed.
2023-05-01 09:14:21 -07:00
RandySheriffH
cdf4fc49fc
Implement lite custom op API (#15590)
Implement a set of new APIs for lightweight custom ops registration, to
save efforts on schema-composing.
A few highlights:

1. Support build-time type inference;
2. Support function-as-op for "stateless" ops;
3. Support structure-as-op for "stateful" ops;
4. Support varied input/output forms such as span, scalar, and tensors,
either optional or non-optional.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-01 08:45:26 -07:00
Chen Fu
0e9472d391
NHWC graph optimizer (#15724)
### Description

Augment nhwc graph optimizer to accommodate fp16 operators.


### Motivation and Context

With new fp16 conv operator added. This operator prefers NHWC data
layout. We need to augment existing graph optimizers to better utilize
the new operator.
2023-05-01 08:44:07 -07:00
Chunye Wang@AMD
d35850c142
[VitisAI]Update VitisAI EP to be compatible with VitisAI 3.5 (#15673)
### Description

Originally VitisAI EP only works with old version of VitisAI release. 


### Motivation and Context

Update VitisAI EP so that it works together with the current VitisiAI
3.5 and further version of VitisAI. We try our best to make it forward
compatible.

---------

Co-authored-by: Wang Chunye <chunywan@xilinx.com>
Co-authored-by: mingyue <mingyue@amd.com>
Co-authored-by: mingyueliuh <131847423+mingyueliuh@users.noreply.github.com>
Co-authored-by: liumingyue <mingyue@xilinx.com>
Co-authored-by: moore-ch <129165652+moore-ch@users.noreply.github.com>
Co-authored-by: shoucair <shoucai.ren@amd.com>
Co-authored-by: zz002 <zhenze.wang@amd.com>
Co-authored-by: BoarQing <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-01 08:28:26 -07:00
Jeff Bloomfield
3df3a85114
Default kOrtSessionOptionsDisableQuantQDQ to 1 when the DML EP is registered (#15725)
This addresses a performance regression in some INT8 models with the
DirectML EP by defaulting OrtSessionOptionsDisableQuantQDQ to 1 when the
EP is registered.

This regression occured due to the introduction of the QDQ propagation
transformer, which is based on this session option. That transformer
maximizes the number of nodes which are executed as quantized by
logically propagating quantize operators upstream and dequantize
operators downstream. However, it does this simply by inserting QDQ
pairs, with an expectation that something will recognize sequences of
DQ->Op->Q. This logic and related L2 transformers are not currently
enabled for the DirectML EP.

This change also removes a noisy warning when the session option for
memory pattern is overriden as the DirectML EP is registered.
2023-05-01 08:26:03 -07:00
Tianlei Wu
10dff4f665
only add type info from symbolic shape inference for fp16 conversion (#15617)
### Description

Walkaround of https://github.com/microsoft/onnxruntime/issues/15521.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-30 23:22:11 -07:00
Chi Lo
6e652d0554
Support explicit TRT profiles from provider options (#15546)
Previous behavior of TRT EP to set TRT optimization profiles for dynamic
shape input is based on input tensor values. Users can't explicitly
specify the profiles.

This PR makes users capable of specifying min/max/opt profiles through
newly added three provider options:

`trt_profile_min_shapes`, `trt_profile_max_shapes` and
`trt_profile_opt_shapes`
with the format of "input1:dim1xdim2...,input2:dim3xdim4...".
(Note: It's similar to --minShapes, --maxShapes and --optShapes of
trtexec command-line
[flags](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags))

For example, if you are using onnxruntime_perf_test, you can try this:

`./onnxruntime_perf_test -e tensorrt -r 1 -i
"trt_profile_min_shapes|imgs:1x3x384x288
trt_profile_max_shapes|imgs:32x3x384x288
trt_profile_opt_shapes|imgs:16x3x384x288" your_model_path`

If the engine cache is enabled, you still need to provide these three
explicit provider options in order to use this feature. ORT TRT will
compare the min/max/opt profile shape with the ones saved in .profile
file to decide whether to rebuild the engine.

Constraints to use these provider options: (1) Need to specify
min/max/opt profile shapes for all the dynamic shape input

 

This feature is also requested by other users:
https://github.com/microsoft/onnxruntime/issues/13851
2023-04-30 22:30:26 -07:00
Scott McKay
31e7d3d7d4
Disable TestRegisterCustomOpsWithFunction on Linux (#15747)
### Description
<!-- Describe your changes. -->
Disable new test that is failing on linux. Not required for this
release. Will fix in the next week.

Marshal.Prelink can be used on Windows to make the symbol available but
Linux appears to work differently.
Also need to update the pre-checkin tests so this is tested early as
it's only failing in the E2E tests run in the packaging pipeline.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline error.
2023-04-30 14:39:02 +10:00
Changming Sun
176161348e
Revert "make nuget workflow easy to debug. (#15693)" (#15744)
This reverts commit 53ff50d19a because it
make the nuget pipeline fail.
2023-04-29 19:05:01 -07:00
kunal-vaishnavi
7ae01cec15
Update wheel path to Whisper custom export script (#15739)
### Description
This PR updates the documentation for using the Whisper custom export
scripts via the wheel.



### Motivation and Context
The path should say
`onnxruntime.transformers.models.whisper.convert_to_onnx` instead of
`onnxruntime.transformers.models.convert_to_onnx`.
2023-04-29 17:32:34 -07:00
sfatimar
4fbc08e3c2
VPUX config fix and dynamic_shape bug fixed. (#15737)
Dynamic shapes was not working with serialized model so we are switching
to compile model method

### Motivation and Context
Dynamic shapes was not working with serialized model 
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
2023-04-29 15:48:34 -07:00
Adrian Lizarraga
d32c540b2d
[QNN EP] Support LRN operator (#15741)
### Description
Adds support for the LRN operator to QNN EP.

### Motivation and Context
Enables basic models like googlenet and alexnet to run entirely on QNN
EP.
2023-04-29 13:23:42 -07:00
Changming Sun
65020d433e
Prefast fixes for CUDA EP (#15726)
### Description
1. Adjust cmake flags. Do not modify CMAKE_CXX_FLAGS globally. Only
apply the flags to ORT code.
2. Fix some SDL warnings.
2023-04-29 12:43:12 -07:00
Jian Chen
ec2f038c6d
Update Nuget pipeline's Linux CUDA job to cuda 11.8 (#15516)
### Description
Fixed AB#14497
2023-04-29 07:38:18 -07:00
Rachel Guo
c8bd34f975
[js/rn] Package dependency change to manage ort-extensions for react_native app (#15641)
### Description
<!-- Describe your changes. -->

js/react_native package dependency change to manage ort-extensions for
react-native app.

Enable optional inclusion of ort-ext aar/ ort-ext pods for react-native
extensions apps when specifiy `ortExtensionsEnabled` in user's
package.json file


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-04-29 00:07:12 -07:00
Yuhong Guo
41dcf0d32e
Expose build information in dynamic lib (#15643)
### Description
<!-- Describe your changes. -->
1. Add Build Info API to onnx.
2. Fix compile error while building onnxruntime_benchmark in MacOs.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
1. When Onnxruntime lib is serving online, we need a way to detect how
this lib is built. This PR helps the developer to get the build
information using `strings` such as git branch, git commit id, build
type and cmake cxx flags, which is showed as follows.


![image](https://user-images.githubusercontent.com/19584326/233794371-b2f95a2c-27fb-4709-a6dd-bf4bb12b0b5b.png)


![image](https://user-images.githubusercontent.com/19584326/233794360-f96f5d2e-332c-405c-83f1-370ccc2b86f8.png)

If the build env has no git, there will be no git related infor:


![image](https://user-images.githubusercontent.com/19584326/234558596-298c1b01-9a90-41bf-9372-7259a8f8e5be.png)


3. Fix the following compile error while building benchmark in MacOs.

![image](https://user-images.githubusercontent.com/19584326/233793571-c261ac1f-47b2-434d-a293-7e9edc6c8a66.png)

---------

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-04-28 21:57:31 -07:00
Adrian Lizarraga
191deb4235
[QNN EP] Nuget package (#15711)
Adds pipeline for QNN NuGet package (x64 and arm64).
2023-04-28 19:33:14 -07:00
Rachel Guo
6a6091a519
[rn] Add support for loading model from buffer on iOS (#13802)
### Description
<!-- Describe your changes. -->

-Add support for loading model from buffer on iOS
-Update OnnxruntimeModuleTest to use updated loadModelFromBuffer
Based on #12676

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Issue: #12500

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-04-28 17:34:26 -07:00
kunal-vaishnavi
fe1ddd7b61
Fix bug when adding Whisper to wheel (#15708)
### Description
This PR adds `onnxruntime.transformers.models.whisper` to the wheel.

### Usage
There is a README.md document that shows sample commands. The following
command will show how to use the custom Whisper export script in more
detail.
```
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx --help
```

### Motivation and Context
This fixes an issue with adding the Whisper custom export scripts to the
wheel. The Whisper folder now appears in the wheel.
![Screenshot 2023-04-26
143705](https://user-images.githubusercontent.com/115581922/234708587-6d1b7d34-71a9-4f9f-a491-657ceb25afcb.jpg)
2023-04-28 16:03:55 -07:00
liqun Fu
2802c547a1
update OnnxMl.cs (#15702) 2023-04-28 11:20:29 -07:00
pengwa
29d13cea42
Cumulative update on optimizers and tests (on-device training) (#15499) 2023-04-28 09:55:39 -07:00
Adam Pocock
8a1a40ac63
[Java] CheckpointState AddProperty & GetProperty support (#15730) 2023-04-28 09:52:52 -07:00
Chen Fu
be08b47e7b
Refine cast optimizer for safety (#15658)
### Description

Cast optimizer may convert a fp16 node to fp32. This used to be safe as
all fp16 kernels has fp32 implementation. As this assumption is no
longer true, we need to check the validity of the operation



### Motivation and Context

Main work here is to introduce an API to check whether a kernel is
registered. Currently we don't have a way to do that without an operator
node. This needs to be augmented. We need to query whether a kernel is
registered by its property only, so that we can judge whether it is safe
to construct a node long before we actually do so.
2023-04-28 09:32:54 -07:00
Edward Chen
c415bc725f
Add 'name' key to xcodebuild 'destination' option. (#15690) 2023-04-28 08:52:18 -07:00
Jian Chen
c401cf4b51
Fix issue there 9573-quantizing-distilbert-models-after-optimizing-wi… (#15659)
…th-ort-leads-to-invalid-node-input-names

### Description
Fix issue where Quantizing DistilBERT models after optimizing with ORT
leads to invalid node input names



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-28 08:45:20 -07:00
Scott McKay
7e6331d5c7
Add ability to register custom ops from ORT extensions nuget package (#15696)
### Description
<!-- Describe your changes. -->
Add infrastructure so it's easy for a user to add the ORT extensions
nuget package and register the custom ops for C# apps.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Need to be able to use extensions on mobile platforms with Xamarin/MAUI
2023-04-28 18:53:02 +10:00
Yulong Wang
94c9a31f83
[js/webgpu] fix download failure due to buffer change (#15723)
### Description
fix download failure due to buffer change.

WebAssembly buffer may change (growth triggered by memory allocation)
during an async function call.
2023-04-28 00:16:31 -07:00
Linnea May
2c3697be00
User/linneamay/reduce 18 (#15701)
### Description
<!-- Describe your changes. -->
Add registration for DML reduce functions in opset 18. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Linnea May <linneamay@microsoft.com>
2023-04-27 20:32:11 -07:00
Changming Sun
5b826b1bc3
Update cmake version in Linux build (#15707)
### Description
All our Windows build pipelines already uses cmake 3.26 except one
pipeline: QNN ARM64.
This PR does the same for Linux build pipelines.

### Motivation and Context
This change is related to #15704 .
2023-04-27 20:02:33 -07:00
Edward Chen
9db24f8fec
Update kernel registration validation to allow kernel registrations to appear in arbitrary order. (#15705)
The validation script will now sort them by increasing opset order before processing them.
2023-04-27 18:49:31 -07:00
kunal-vaishnavi
39d6d7050d
Change EmbedLayerNormalization mask index output to optional (#15526)
### Description
This PR changes an EmbedLayerNormalization node's mask index output to
be an optional output if a mask input is not provided.



### Motivation and Context
The documentation for EmbedLayerNormalization states 
```
The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated.
```
However, if the mask input is not provided, the mask index output is
still calculated and required.
2023-04-27 16:32:42 -07:00
Yulong Wang
d471432e10
[js/webgpu] fix attribute cache key for 2 operators (#15710)
### Description
fix attribute cache key for LeakyRelu and ThresholdedRelu
2023-04-27 15:04:33 -07:00
Yulong Wang
c0116af619
[js/webgpu] operator Exp (#15713)
### Description
operator Exp
2023-04-27 15:04:09 -07:00
Tang, Cheng
627f5c9767
support allgather on different axis (#15610)
### Description
Extend the AllGather op to support perform allgather on different axis.
provide the implementation in nccl kernels.

### Motivation and Context
We hit some scenario in distributed inference that we need to support
gather on non-first axis.

---------

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
2023-04-27 14:47:28 -07:00
Sheil Kumar
5bde1e8e37
Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation (#15686)
Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation

This will enable STFT and DFT on signals which have non-powers of 2.
2023-04-27 14:03:40 -07:00
Adrian Lizarraga
be5c582e65
[QNN EP] Update to QNN SDK 2.9.0 (#15709)
### Description
- Update to QNN SDK 2.9.0 for QNN pipelines
- Temporarily disable warnings as errors for QNN Windows x64 pipeline
- Note that this pipeline did not previously run to completion. It also
currently does not run for pull requests.

### Motivation and Context
Need to update and test the latest available version of the QNN SDK.
2023-04-27 13:44:09 -07:00
RandySheriffH
9773e76c44
Single-schema-multi-kernel (#15184)
The PR is to allow custom op of different input types to have same op
name in a graph.
The idea to go over all ops of same name and merge their input/output
types into a type-inference function.
With the enhancement, custom op node inside a graph can have same
op-type given that the input/output types are different.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-04-27 13:39:59 -07:00
Changming Sun
d3d232b047
Rename onnxruntime-Linux-CPU-2019 machine pool (#15691)
Rename onnxruntime-Linux-CPU-2019 machine pool to
"onnxruntime-Ubuntu2004-AMD-CPU". The old one has an internal error and
stuck there. I cannot make any change to it. It has been like this for
more than 1 week. So I created a new pool with the same setting except
the name is different.
Also, move some android pipelines to
"onnxruntime-Linux-CPU-For-Android-CI" which uses a standard image from
https://github.com/actions/runner-images
2023-04-27 12:46:18 -07:00
Chi Lo
a957a872d3
Patch fix for the newly added TRT EP provider options (#15687)
We missed some code change with recently added TRT EP provider options
2023-04-27 10:36:01 -07:00
Changming Sun
d3e8d7a70d
Better support for cmake 3.26 and Windows ARM64 (#15704)
### Description

In #8953 I introduced a change in our onnxruntime_mlas.cmake that it
enables "ASM_MASM" cmake language for all Windows build.
```cmake
enable_language(ASM_MASM)
```
Before the change, it is only enabled when onnxruntime_target_platform
equals to x64.

However, cmake 3.26 added a new language:  ASM_MARMASM.

According to cmake's manual,
ASM_MASM is for Microsoft Assembler
ASM_MARMASM is for Microsoft ARM Assembler. This one is new in cmake
3.26.

We should choose the right one according to
${onnxruntime_target_platform}.
2023-04-27 10:25:45 -07:00
yf711
2e1f92a986
Fix EP Perf pipeline (#15507)
### Description
* Update TensorRT 8.6 lib dependencies in dockerfile of TRT EP Perf
pipeline
* Avoid using `--allow_running_as_root` and build ORT with non-root user


### Motivation and Context
To fix the build issue on EP perf pipeline

Fixed
[AB#14615]
2023-04-27 10:09:14 -07:00
Yi Zhang
8cda1ffa28
Fix error in post-merge pipeline (#15717)
### Description
Get the right drive letter on Windows

### Motivation and Context
Build Directory might be in drive C
2023-04-27 10:05:15 -07:00
cloudhan
a952419674
[ROCm] Fix FusedConv to stop caching fusion args (#15671)
The follow code shows ROCm EP FusedConv produce incorrect results:
```py
import numpy as np
import onnx
import onnxruntime as ort

X = onnx.helper.make_tensor_value_info("input", onnx.TensorProto.FLOAT, [1, 64, 55, 55])
a = onnx.helper.make_tensor_value_info("tmp", onnx.TensorProto.FLOAT, [1, 64, 55, 55])
Y = onnx.helper.make_tensor_value_info("output", onnx.TensorProto.FLOAT, [1, 64, 55, 55])

weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32)
weight1 = onnx.helper.make_tensor("weight1", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data)
bias_data = np.random.random(64).astype(np.float32)
bias1 = onnx.helper.make_tensor("bias1", onnx.TensorProto.FLOAT, [64], bias_data)

weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32)  # <------ comment out
weight2 = onnx.helper.make_tensor("weight2", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data)
bias_data = np.random.random(64).astype(np.float32)  # <------ comment out
bias2 = onnx.helper.make_tensor("bias2", onnx.TensorProto.FLOAT, [64], bias_data)

node1 = onnx.helper.make_node("FusedConv", inputs=[X.name, weight1.name, bias1.name], outputs=[a.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu")
node2 = onnx.helper.make_node("FusedConv", inputs=[a.name, weight2.name, bias2.name], outputs=[Y.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu")

graph = onnx.helper.make_graph([node1, node2], "Graph", [X], [Y], initializer=[weight1, bias1, weight2, bias2])

model = onnx.helper.make_model(graph, producer_name="tmp", opset_imports=[
    onnx.helper.make_opsetid('com.microsoft', 1), 
    onnx.helper.make_opsetid('ai.onnx.ml', 1), 
    onnx.helper.make_opsetid('', 14),
])

sess0 = ort.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"])
sess1 = ort.InferenceSession(model.SerializeToString(), providers=["ROCMExecutionProvider"])

ref = sess0.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0]
our = sess1.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0]

print(ref - our)
```

The root cause is that fusion args is cached together with fusion plan.
It seems that internal to MIOpen, the `miopenOperatorArgs_t` handle is
copied directly to execution engine, instread of the content of a
`miopenOperatorArgs_t`. If two ORT `OpKernel`s have the same conv kernel
spatial dimension and strides, etc, we then get the same hash for the
fusion plan, thus we also get the same fusion args handle. Then the
second node of `FusedConv` may modify the fusion args on the fly when it
is still pending execution for first node of `FusedConv` internal to
MIOpen. This PR moves the fusion args out of fusion plan cache to avoid
the problem.
2023-04-27 23:20:25 +08:00
pengwa
2efb75bfe9
Fold shape related operation (#14936)
### Fold shape related operation at best efforts. 

This is a follow up for PR
https://github.com/microsoft/onnxruntime/pull/12561.
Create a specialized shape_optimzer to constant fold shape related
operation.
ShapeOptimizer at the best efforts to constant fold the dim values that
exists from shape inferencing. This is helpful to simplify the graph,
which on the other hand, help other graph transformers to do more.

Transformer that traverses the graph top-down and performs shape
optimizations.
Try the best effort to constant fold the shape related to Shape node
outputs:
1. Shape generates 1D tensor [12, 128, 512] (all dimensions have
concrete dim value), which can be constant folded
to an initializer including 1D tensor values [12, 128, 512]. (Some logic
of ConstantFolding also does the same thing.)
2. Shape generate 1D tensor [batch_size, 128, 512] ->
Slice(start=1,end=3), we can constant fold the Shape->Slice to
  an initializer including 1D tensor values [128, 512].
3. Shape generate 1D tensor [batch_size, 128, 512] -> Gather(axes=[0],
index=[2]), we can constant fold the
  Shape->Gather to an initializer including 1D tensor values [512].
4. Shape 15 takes input of shape [batch_size, 128, 512], slicing from 1
to 2(exclusive), we can constant fold the
Shape15(start=1,end=2) to an initializer including 1D tensor values
[128].
This would help clean up the graph, combined with ConstantFolding, the
graph would be much more simplified.


### Motivation and Context



One direct motivation to have this is, we have a model subgraph like
this:

![image](https://user-images.githubusercontent.com/10530022/223390243-47b13922-4340-4999-9637-f52a33f69a2d.png)

The subgraph in the green rectangle is trying to get the value `30522`,
with the changes in this PR, the subgraph will be constant folded. Plus
ConstantFolding optimizer will further to optimize out the subsquent
`Squeeze`/`Unsqueeze`/`ConcatTraining`, then we will have a clean very
clean Reshape node, with its shape input be an constant `[-1, 20522]`.

Having this simplified graph, our other compute optimizer can help
further optimize the graph by re-ordering gather/reshape nodes.
2023-04-27 18:59:28 +08:00
Yi Zhang
53ff50d19a
make nuget workflow easy to debug. (#15693)
### Description
Add parameters to make some stages could use other run's intermediate
output.

### Motivation and Context
nuget workflow has 38 stages of 4 layers.
We had to run the whole workflow from begining to test one stage.
It could make life easier to run only one stage for testing.
like

![image](https://user-images.githubusercontent.com/16190118/234453721-e6e9a4bd-5e0b-4101-a18e-d5cf60615c9f.png)

### N.B.
In this PR, Nuget_Test_Linux_CPU, Nuget_Test_LinuxGPU and
Jar_Packaging_GPU are enabled as the first step.
So I can start to move tests from Linux host to container
2023-04-27 14:54:14 +08:00
Ted Themistokleous
926ae7d786
Add updated skipped test for multiheadattention Packed KV & QKV (#15587)
Adds skip for MIGraphX EP builds for Packed KV and QKV tests in
Multi Head attention. As it is not supported and causes CI failures
when building and testing EPs

---------
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2023-04-27 10:31:53 +08:00
Changming Sun
e63bb5acef
Fix a memory leak in QGemm (#15703)
### Description
The BufferUniquePtrs in the old code doesn't have knowledge of the
allocator where the allocated memory was from, so it cannot free the
memory.
2023-04-26 18:48:00 -07:00
Rachel Guo
740d553c42
[rn] Reland support loading model from buffer for Android (#14514)
### Description
<!-- Describe your changes. -->

Reland previous reverted changes for loading model from buffer - Android


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#13903

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-04-26 16:53:17 -07:00