### Description
ONNX model zoo changed their dir structure. So some our pipelines are
failing. In prevent such things happening again, we'd better to read the
test data for a cache from local disk instead of downloading it remotely
every time.
It's branched off from
https://github.com/microsoft/onnxruntime/pull/17751 but removes
KernelContext_SetOutput() API. It copies output allocation buffer to
kernel context.
---------
Co-authored-by: George Wu <jywu@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
Now, the nightly Microsoft.ML.Onnxruntime.Managed Nuget Packag couldn't
be added in dotnet console program in VS2022 with target framework .NET
6.0.
I just restore it to previous setting to make it work.
- Add support for OpenVINO 2023.2
- num_of_threads provider option is mapped to the CPU device property
inference_num_threads of the CPU plugin, so users can control the
#threads used for inference by the CPU
- Logging in Debug mode now includes the runtime properties set for
devices
- Fix issue in using external weights through OpenVINO
---------
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
### Description
<!-- Describe your changes. -->
Add macos build for objc pod.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Follow up pr for #18550
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
TrainingSession has been deprecated for a while now, but the gradient
ops tests are still using training session. This PR updates these tests
to use inference session instead of training session.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will enable us to remove all the training session related
deprecated code from the repo.
### Description
The warning is:
```
C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.1812949Z with
2023-12-08T20:58:48.2144272Z [
2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.2801935Z ]
2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2894476Z with
2023-12-08T20:58:48.2911521Z [
2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,
2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc)
2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3197946Z with
2023-12-08T20:58:48.3198565Z [
2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3905678Z ]
2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3913414Z with
2023-12-08T20:58:48.3913660Z [
2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.3914499Z ]
2023-12-08T20:58:48.3914743Z qlinear_concat.cc
2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5534583Z with
2023-12-08T20:58:48.5541266Z [
2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5544914Z ]
2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5553712Z with
2023-12-08T20:58:48.5555569Z [
2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5558707Z ]
2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5566354Z with
2023-12-08T20:58:48.5568185Z [
2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5571339Z ]
2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5578562Z with
2023-12-08T20:58:48.5580399Z [
2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5583465Z ]
2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5591396Z with
2023-12-08T20:58:48.5593220Z [
2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5595955Z ]
```
And the warning in #18195
### Motivation and Context
AB#22894
---------
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
### Description
Move NuGet nightly package publishing job to a separated pipeline.
Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs
Packaging Pipeline'. This PR moves it to a separate pipeline so that we
can manually trigger this step for any branch(e.g. release branches).
### Description
This PR provided a vectorized matmul algorithm. In most situations, we
still go to the workgroup memory optimized matmul. But for some
situations, like N and K are very small, using workgroup optimized
matmul can't fully utilize the underlying hardware due to the 32x32 tile
size. So for very small N/K, we switch to the naive vectorized matmul
algorithm to improve the hardware execution unit usage.
With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3],
input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs.
Supported added in MIGraphX. should be in operator list
### Description
Simple change to add support to EP for DynamicQuantizeLinear
### Motivation and Context
Changes added in MIGraphX. Should also be available in the EP to run
models that are int8 quantized. Currently we fail and fallback ops to
ROCm->CPU EPs
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
### Disable test_bert_result_with_layerwise_recompute
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Previously, we use `ck::tensor_layout::gemm::RowMajor` or `ColumnMajor`
to tag the template for correct dispatch. This is cumbersome in the case
of CK is disabled.
Switch to use the ORT BlasOp to tag the template and use
`CKBlasOpAdaptor` to adapt between ORT BlasOp enum and ck's Col/Row.
Just like what we have done for ORT datatype and ck datatype with
`CKDataTypeAdaptor`.
### Description
<!-- Describe your changes. -->
Added uniforms to Reduce op
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve perforamnce.
### Description
- Adds graph fusions to preprocessing step that can be called before
creating a QDQ model for QNN EP.
- Fuse Erf sequence to Gelu (adapted from
[optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_gelu.py)).
Required by QNN EP.
- Fuse ReduceMean sequence to LayerNormaliation (adapted from
[optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_layernorm.py)).
Not required by QNN EP.
- Fuse ReduceL2 sequence to LpNormalization (new, specific to QNN EP).
Required by QNN EP.
Example use:
```python3
from quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model
# Added by this PR:
model_updated = qnn_preprocess_model("model.fp32.onnx", "model.fp32.preprocessed.onnx", fuse_layernorm=True)
model_to_quantize = "model.fp32.preprocessed.onnx" if model_updated else "model.fp32.onnx"
# Quantize model ...
qnn_config = get_qnn_qdq_config(model_to_quantize, data_reader, activation_type=QuantType.QUInt16)
quantize(model_to_quantize, "model.qdq.onnx", qnn_config)
```
### Motivation and Context
Allow more models to be quantized for use with QNN EP
---------
Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>
This fix resolves the build error “error: invalid parameter combination
for AltiVec intrinsic ‘__builtin_vec_vsx_st’” which is coming up with
the commit dea425e7c1.
### Description
<!-- Describe your changes. -->
Added uniforms to Tile and Where Ops
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve performance.
### Description
Logs out ORT session options as INFO if LogSeverityLevel is set high
enough. Also log out ORT session options on Windows if the provider is
enabled. The events are not Telemetry are will be emitted for local
analysis (if enabled).
[Microsoft.ML.ONNXRuntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/windows/telemetry.cc#L47)
- 3a26b1ff-7484-7484-7484-15261f42614d
### Motivation and Context
ORT session options are key to understanding ORT behavior. This allows
better diagnosability to see what the options are set to.
### Description
* implemented lazyResetGrad function
### Motivation and Context
* we are in the process of adding language bindings to enable training
on web
* lazyresetgrad ensures that the gradients are calculated correctly
after the first runTrainStep call
---------
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
### Allow layer-wise recompute
Early, we need users/developers to specify the subgraphs to recompute,
now we introduced a more user-friendly way to enable recompute for all
detected stashed activation recomputation subgraphs. This scarifies
getting the best configs while makes it easier to support user
requirements when they switches from PyTorch per-layer gradient
checkpoint to ORTModule.
`ORTMODULE_MEMORY_OPT_LEVEL` is introduced to control the usage, by
default, it is 0, e.g. `USER_SPECIFIED`, all subgraphs definedin
`ORTMODULE_MEMORY_OPT_CONFIG` will be recomputed. So this is compatible
to existing recompute usage in ORTModule integrated models.
Using `ORTMODULE_MEMORY_OPT_LEVEL=1`, we will enable all recompute plans
detected, so those configs in `ORTMODULE_MEMORY_OPT_CONFIG` will not be
respected any more.
Add Unit Tests using 3 layer blooms.
https://github.com/microsoft/onnxruntime/blob/pengwa/add_aggresive_recompute/docs/Memory_Optimizer.md
### Description
Bugfix: Dequantize4BitsKernel buffer overrun when the input matrix has
less than the number of blocks that a single thread block can handle.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Updating transformers package in test pipeline to fix a security
vulnerability.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add cuda visible devices for Mistral benchmark as it is not working for
Torch compile and throwing an error.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Error:
File
"/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_inductor/triton_heuristics.py",
line 556, in run
return launcher(
File "<string>", line 8, in launcher
RuntimeError: Triton Error [CUDA]: invalid device context
### Fix gemm_float8 build failure on CUDA 11.3 ~ 11.7
User env: CUDA 11.3, build option include "--disable_types float8"
```
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(256): error: identifier "CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET" is undefined
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(264): error: enum "cublasLtMatmulDescAttributes_t" has no member "CUBLASLT_MATMUL_DESC_FAST_ACCUM"
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(268): error: identifier "CUBLASLT_MATMUL_DESC_A_SCALE_POINTER" is undefined
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(271): error: identifier "CUBLASLT_MATMUL_DESC_B_SCALE_POINTER" is undefined
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(274): error: identifier "CUBLASLT_MATMUL_DESC_D_SCALE_POINTER" is undefined
5 errors detected in the compilation of "/tmp/onnxruntime/onnxruntime/contrib_ops/cu
```
Here is a versions (major version) diff on the requested attributes:
```
cuda 11.5.1
no CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET
cuda 11.6
https://docs.nvidia.com/cuda/archive/11.6.0/pdf/CUBLAS_Library.pdf
has CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET
cuda 11.7
no CUBLASLT_MATMUL_DESC_FAST_ACCUM
no CUBLASLT_MATMUL_DESC_A_SCALE_POINTER
no CUBLASLT_MATMUL_DESC_B_SCALE_POINTER
no CUBLASLT_MATMUL_DESC_D_SCALE_POINTER
cuda 11.8
https://docs.nvidia.com/cuda/archive/11.8.0/pdf/CUBLAS_Library.pdf
has CUBLASLT_MATMUL_DESC_FAST_ACCUM
has CUBLASLT_MATMUL_DESC_A_SCALE_POINTER
has CUBLASLT_MATMUL_DESC_A_SCALE_POINTER
has CUBLASLT_MATMUL_DESC_B_SCALE_POINTER
has CUBLASLT_MATMUL_DESC_D_SCALE_POINTER
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Currently all the inputs of Resize node will be converted to NHWC if the
preferred layout is NHWC, and the ORT will call `IsOpSupportedImpl`
twice, first time the inputs are NCHW, and the second time the inputs
have been converted to NHWC. This would make the validation for scales
input complicated and difficult to identify the height and width values.
### Description
Update absl and gtest to fix an ARM64EC build error
### Motivation and Context
We need to get an important fix into ORT.
The fix is:
8028a87c96
### Description
**This PR is a replacement of #17820.**
allow to specify callback for profiling data
*Previous*:
```js
ort.env.webgpu.profilingMode = 'default'; // enable profiling
// profiling data will output to console.
```
*Now*:
```js
ort.env.webgpu.profiling = {
mode: 'default'; // enable profiling
ondata: (data) => {
// .. process the profiling data
}
};
//for each kernel, "ondata" will be called once. only output to console if ondata is not specified.
```
### Description
reuse EO pool in NPM pipeline.
### Motivation and Context
build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO
pool.
Reuse EO pool to make the pipeline work now.
When I'm free, I'll try upgrading the chrome in the custom image.
Build onnxruntime.dll as arm64x
Added a .cmake file to generate a link repro of the onnxruntime.dll
during arm64 build. This provides us a directory containing all the
arm64 objs, def file and libs to link to when it is time to building
arm64x onnxruntime.dll during the arm64ec build by passing the
/machine:arm64x flag to the linker along with the arm64 artifacts.
If other dlls wanted to be built as x, setting the ARM64X_TARGETS
variable in the toplevel cmakelists.txt to include these other targets
is all that will be needed.
Added build_arm64x.bat as a wrapper for the multiple (rm64, then
arm64ec) cmake calls needed to build as arm64x.
AB#22533
### Description
<!-- Describe your changes. -->
As title.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
yolo-v8 model missing operator support.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Fix a bug that can't create context binary if the model has inputs/outputs with different data type
### Description
Update EPContext op schema to unblock nodes with different data type among inputs & outputs
### Description
- Update QNN CI Pipelines to use QNN SDK version 2.17.0
- **Print warning if unit test requires adjusted tolerance to pass**
- **Temporarily disable unloading QnnCpu.dll for windows x64 due to
crash when calling FreeLibrary**
- Enable fixed HTP tests
- QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale
- QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8
- QnnHTPBackendTests.ReduceSumS8Opset13_Rank5
- QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis
- QnnHTPBackendTests.WhereLargeDataBroadcastU8
- QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8
- Enabled fixed CPU tests
- QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales
- Increased tolerance for HTP tests that are less accurate on QNN SDK
2.17.0
- QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8
- QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8
- QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8
- QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input
- QnnHTPBackendTests.ConvU8U8S32_bias_initializer
- QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer
- QnnHTPBackendTests.LRNSize3
- QnnHTPBackendTests.LRNSize5
- QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8
- QnnHTPBackendTests.MaxPool_LargeInput_1Pads
- QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel
- QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners
- QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric
- Disabled ONNX model tests
- averagepool_2d_ceil: Accuracy issues **only on Windows x64
QnnCpu.dll**
- Disabled QDQ model tests (onnx_test_runner)
- facedetection_op8_qdq: Accuracy issues
- Disabled CPU EP tests (these use QnnCpu.dll)
- ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX
- GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is
initializer and bias is not
- MathOpTest.MatMulFloatType "test padding and broadcast B > A":
Inaccuracy (**only linux**)
- Fix Gemm translation bugs in QNN EP:
- Do not skip processing of inputs that need to be transposed.
### Motivation and Context
- Allow testing with newest QNN SDK version
- Take advantage of improvements to enable new models.
### Description
<!-- Describe your changes. -->
Registered Sharded MoE op under contrib_op/cuda/collective with expert
slicing. The broadcast process happens just before adding second bias(if
has) and permutation undoing. Tensor slicing is planned but not included
in this PR.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->