### Description
<!-- Describe your changes. -->
* Partially revert [previous
change](https://github.com/microsoft/onnxruntime/pull/19804), and
* Redo concurrency_test_result parser outside of post.py
* Add support of syncing memtest result to db
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To fix the error when CI is running on two model groups.
- When running on two model groups, the [previous
change](https://github.com/microsoft/onnxruntime/pull/19804) wrongly
navigates two levels up in the directory after running one model group,
while one level is needed. After that, the script can't find another
model group.
- Running on one model group can't repro the issue
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Move iOS package build to separate job so it can run in parallel with Android AAR build and be decoupled from the test stage. The test stage fails sometimes (not infrequently) and may need to be re-run.
- Update stop iOS simulator step so it doesn't fail if the start step doesn't run.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- This PR combine all CUDA 12 stage into the Zip-nuget-... pipeline.
- It also enables the cuda12 support
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
orttrainingtestdatascus has only save mnist whose size is only 64M in
Azure File
To meet security requirements and reduce maintenance cost, move the test
data to lotusscus and saved in Azure blob.
This pull request primarily involves changes to the build scripts in the
`tools/ci_build/github/azure-pipelines` directory. The changes add build
date and time information to the build process. This is achieved by
introducing two new parameters, `BuildDate` and `BuildTime`, and
incorporating them into the `msbuildArguments` in multiple locations.
Addition of new parameters:
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R309-R310):
Added `BuildDate` and `BuildTime` parameters using the pipeline's start
time.
Incorporation of new parameters in `msbuildArguments`:
*
[`tools/ci_build/github/azure-pipelines/c-api-noopenmp-packaging-pipelines.yml`](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948):
Added `CurrentDate` and `CurrentTime` parameters to `msbuildArguments`
in multiple locations.
[[1]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL947-R948)
[[2]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1092-R1093)
[[3]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1114-R1115)
[[4]](diffhunk://#diff-efb530efd945fdd9d3e1b92e53d25cc8db7df2e28071c364b07a7193092de01bL1137-R1138)
*
[`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L446-R448):
Incorporated the `CurrentDate` and `CurrentTime` parameters into
`msbuildArguments`.### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Update method for uploading to Azure storage to use managed identity.
- Allow helper script tasks to be split across different calls.
- Rewrite helper script in Python.
Motivation:
Recently the Azure storage account configuration was changed and now the old way of uploading to it no longer works.
### Description
<!-- Describe your changes. -->
Currently figuring out if the protobuf dependency is building protoc it
is a little obtuse and inconsistent
* in some places we directly set protobuf_BUILD_PROTOC_BINARIES to OFF
to indicate the protobuf dependency is not building protoc
* e.g. macOS/iOS/visionOS builds
* for a user provided protoc path we don't set
protobuf_BUILD_PROTOC_BINARIES, and inside protobuf_function.cmake that
determines if `protobuf::protoc` is added as a dependency or not
*
0dda8b0c44/cmake/external/protobuf_function.cmake (L40-L45)
To be more consistent/explicit, set protobuf_BUILD_PROTOC_BINARIES to
OFF when ONNX_CUSTOM_PROTOC_EXECUTABLE set and valid.
Remove outdated script that built and external protoc binary which was
used in later builds. The build setup will fetch a pre-built protoc so
there's no need for this additional build.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make it easier to figure out if protoc is coming from the protobuf
dependency.
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.
### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
### Description
- Updates QNN pipelines to use QNN SDK 2.21
- Downloads QNN SDK from Azure storage to avoid having to rebuild images
when a new version is released.
### Motivation and Context
Test with the latest QNN SDK.
### Description
<!-- Describe your changes. -->
This branch is based on rel-1.18.0 and supports TensorRT 10-GA.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
https://github.com/microsoft/onnxruntime/pull/20418
Add back Catalyst changes only for now.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
Update order of steps
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CI
### Description
<!-- Describe your changes. -->
Error:
**Artifact name input: e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss)
##[error]Artifact name is not valid:
e2e_test_logs_1364625_$(Date:yyyyMMddHHmmss). It cannot contain '\', /',
"', ':', '<', '>', '|', '*', and '?'**
Date not correctly showing up in the artifact name. Use predefined
pipeline variable BuildNumber instead which also serves similarly as a
timestamp.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
RN CI failure
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Update to more generic url
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Update the image name to avoid docker image wouldn't be overwrite.
there was an mistake that variables.CUDA_VERSION_MAJOR is always empty
14fcf0a52d/tools/ci_build/github/azure-pipelines/stages/nuget-linux-cuda-packaging-stage.yml (L120)
3. set one artifact name as variable to make the job rerunnable
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
This PR supports a build of onnxruntime.xcframework for xros/xrsimulator
for visionos via the build command of
`python3 tools/ci_build/github/apple/build_apple_framework.py --config
Release/Debug
tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`.
For officially include visionos in ios cocoapods package and testing in
CI, would require separate work for upgrading the Xcode version &
upgrade macOS CI agent to macos-13-arm64 or higher.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
visionos support:
https://github.com/microsoft/onnxruntime/discussions/19313
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
This reverts commit f396748ed6.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- Updates Windows QNN Nuget and Python packaging pipelines to download
QNN SDK from blob storage.
- Makes the QNN SDK version configurable when launching the python
packaging pipeline.
### Motivation and Context
Removes the need to rebuild images to update QNN SDK. Only applies to
Windows pipelines. Linux pipelines still get the SDK from disk.
### Description
<!-- Describe your changes. -->
Add Nuget package changes for adding new 'net6.0-maccatalyst' platform.
The output ORT Nuget package was manually tested and verified in a .NET
MAUI app setup.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
### Description
These changes include
Support to OpenVINO 2024.1
Import PreCompiled Blobs with EPContext Blob
Separate Device/Precision as input
Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU
AUTO GPU, CPU will only create GPU Blob and not CPU Blob.
### Motivation and Context
- OpenVINO 2024.1 will be out soon
- Import Precompiled Blob can greatly reduce FEIL/FIL Time.
- Separating Device/Precision will make the input cleaner
-
---------
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
I am prefiring this change to pre-run the non-dml checks, and also to
give folks the time to review it before DML gets released. When DML 1.14
officially releases, we'll only need to run the DML pipeline to
automatically pick up the nuget package. This should save us some
valuable time.
Note that DML 1.14 is the release needed for ORT 1.17.4, and DML 1.15
will come soon after.
### Description
Currently we try to include all prebuilt binaries into the NPM packages.
This was working until we added libonnxruntime_providers_cuda.so
(>400MB) into the NPM package. The NPM registry refuses to accept new
package publishment because the file is too large.
To make the new NPM package working, we have to remove the large file
from the package, and add a new script on package installation. This
script will try to dynamically install onnxruntime CUDA dynamic library
for Linux/x64.
This adds a new "Graph Capture" option to the DML ep, similar to the
cuda graph functionality. Here's how graph capture works:
- A user can enable graph capture in the session options by setting
`ep.dml.enable_graph_capture` to `true`
- When they want to capture a run, they set `gpu_graph_id` in their
`RunOptions` to a number bigger than 0 (0 is reserved for internal use
according to the cuda graph documentation).
- Then, when they start the inference, the graph will be captured and
stored in the DML EP for future use
- When they execute the run for a second time with the same id, the
`ReplayGraph` function in the DML EP will be called instead of executing
the kernels, resulting in very low overhead and avoiding kernel
recompilation.
This feature can give up-to-par or even better performance than
specifying the static dimensions at session creation time, but is also
much more flexible.
Bumps [transformers](https://github.com/huggingface/transformers) from
4.36.0 to 4.38.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer,
AQLM</h2>
<h2>New model additions</h2>
<h3>💎 Gemma 💎</h3>
<p>Gemma is a new opensource Language Model series from Google AI that
comes with a 2B and 7B variant. The release comes with the pre-trained
and instruction fine-tuned versions and you can use them via
<code>AutoModelForCausalLM</code>, <code>GemmaForCausalLM</code> or
<code>pipeline</code> interface!</p>
<p>Read more about it in the Gemma release blogpost: <a
href="https://hf.co/blog/gemma">https://hf.co/blog/gemma</a></p>
<pre lang="python"><code>from transformers import AutoTokenizer,
AutoModelForCausalLM
<p>tokenizer =
AutoTokenizer.from_pretrained("google/gemma-2b")
model =
AutoModelForCausalLM.from_pretrained("google/gemma-2b",
device_map="auto", torch_dtype=torch.float16)</p>
<p>input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text,
return_tensors="pt").to("cuda")</p>
<p>outputs = model.generate(**input_ids)
</code></pre></p>
<p>You can use the model with Flash Attention, SDPA, Static cache and
quantization API for further optimizations !</p>
<ul>
<li>Flash Attention 2</li>
</ul>
<pre lang="python"><code>from transformers import AutoTokenizer,
AutoModelForCausalLM
<p>tokenizer =
AutoTokenizer.from_pretrained("google/gemma-2b")</p>
<p>model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2b", device_map="auto",
torch_dtype=torch.float16,
attn_implementation="flash_attention_2"
)</p>
<p>input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text,
return_tensors="pt").to("cuda")</p>
<p>outputs = model.generate(**input_ids)
</code></pre></p>
<ul>
<li>bitsandbytes-4bit</li>
</ul>
<pre lang="python"><code>from transformers import AutoTokenizer,
AutoModelForCausalLM
<p>tokenizer =
AutoTokenizer.from_pretrained("google/gemma-2b")</p>
<p>model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2b", device_map="auto",
load_in_4bit=True
)
</tr></table>
</code></pre></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08ab54ada5"><code>08ab54a</code></a>
[ <code>gemma</code>] Adds support for Gemma 💎 (<a
href="https://redirect.github.com/huggingface/transformers/issues/29167">#29167</a>)</li>
<li><a
href="2de9314197"><code>2de9314</code></a>
[<code>Maskformer</code>] safely get backbone config (<a
href="https://redirect.github.com/huggingface/transformers/issues/29166">#29166</a>)</li>
<li><a
href="476957b5b4"><code>476957b</code></a>
🚨 Llama: update rope scaling to match static cache changes (<a
href="https://redirect.github.com/huggingface/transformers/issues/29143">#29143</a>)</li>
<li><a
href="7a4bec6e8f"><code>7a4bec6</code></a>
Release: 4.38.0</li>
<li><a
href="ee3af60be0"><code>ee3af60</code></a>
Add support for fine-tuning CLIP-like models using
contrastive-image-text exa...</li>
<li><a
href="0996a10077"><code>0996a10</code></a>
Revert low cpu mem tie weights (<a
href="https://redirect.github.com/huggingface/transformers/issues/29135">#29135</a>)</li>
<li><a
href="15cfe38942"><code>15cfe38</code></a>
[<code>Core tokenization</code>] <code>add_dummy_prefix_space</code>
option to help with latest is...</li>
<li><a
href="efdd436663"><code>efdd436</code></a>
FIX [<code>PEFT</code> / <code>Trainer</code> ] Handle better peft +
quantized compiled models (<a
href="https://redirect.github.com/huggingface/transformers/issues/29">#29</a>...</li>
<li><a
href="5e95dcabe1"><code>5e95dca</code></a>
[<code>cuda kernels</code>] only compile them when initializing (<a
href="https://redirect.github.com/huggingface/transformers/issues/29133">#29133</a>)</li>
<li><a
href="a7755d2409"><code>a7755d2</code></a>
Generate: unset GenerationConfig parameters do not raise warning (<a
href="https://redirect.github.com/huggingface/transformers/issues/29119">#29119</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Add `--user` option to pip install command.
Error:
```
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py'
Consider using the `--user` option or check the permissions.
```
See #19877.
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>