Commit graph

10805 commits

Author SHA1 Message Date
Yi Zhang
ab2eaedfaa
Install ONNX by buildling source code in Windows DML stage (#20079)
### Description
In #20073, I use pin onnx version to unblock the whole PR CI.
In fact, we could use the onnx that installed by building source code,
that the onnx version is controlled by deps.txt.
For some history reason, DML stage installed onnx from pypi. Now, the
onnx can be installed as other stages.

add an option to skip installing onnx in win-ci-prebuild-step
2024-03-27 12:29:34 -07:00
Yi Zhang
4df9d16f98
[Fix] TSAUpload task must be in building stage (#20098)
### Description
In #20085, TSAUpload was in testing stage so main branch failed.
2024-03-27 12:20:57 -07:00
Xiaoyu
c8676ffbff
Add ModelProto support for quantize api (#20018)
### Description
Add ModelProto support for `quantize` api



### Motivation and Context
Currently, the `quantize` API only accepts a model path as the input
model. However, for large models, saving and loading from disk can be
time-consuming. By adding `ModelProto` as an input option to the
`quantize` API, significant time can be saved.
2024-03-27 10:40:08 -07:00
Yulong Wang
47903e701a
fix condition in web CI YAML (#20095)
### Description
fix condition in web CI YAML
2024-03-27 10:35:43 -07:00
Nanashi
ca465dc087
[js] Make error friendly when isOrtFormat is undefined (#19958)
### Description
Make error friendly when isOrtFormat is undefined
(`onnxruntime.InferenceSession.create` is called with ArrayBuffer or
Uint8Array).

### Motivation and Context
I was trying to run my onnx model in WebGL EP, but it gave me the error
"Cannot read properties of null (reading 'irVersion')".
I used debugger to find that actual error is `int64 is not supported`,
but the error was invisible for me.
So I made it to show both error when isOrtFormat is undefined.
<s>I haven't written unit test yet, so I'm making it draft. (I have no
idea about how do I test this though...)</s>
[d62d942](d62d9425ba)
2024-03-27 02:07:00 -07:00
guyang3532
4aa84003ca
support Pow/Div/Sqrt in PaddingElimination (#20083) 2024-03-27 16:10:07 +08:00
Yulong Wang
28907d8c59
[js/web] workaround NPM test fetch failure (#20020)
### Description

Sometimes the `npm test` failed with an error of "TypeError: Failed to
fetch".

I checked the callback entry of the localhost server started by karma.
When the "Failed to fetch" happens, no request is reflected on the
server side. The root cause is still not identified. However, as this
issue only happens sometimes when the browser is just launched by karma
runner, doing retry can workaround this issue for most of the time.
2024-03-26 21:35:49 -07:00
Chi Lo
3dcda13e62
[TensorRT EP] Fix concurrency issue for TRT custom op list (#20093)
The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its
static variables, `created_custom_op_list` and `custom_op_domain`.
This PR makes sure synchronization using mutex.

see issue: https://github.com/microsoft/onnxruntime/issues/20089
2024-03-26 21:20:14 -07:00
Yi Zhang
0561b9576e
Fix and Refactor Python Packaging Pipeline (#20085)
### Description
Make Windows GPU Packaging stage in Python Packaging pipeline run on CPU
machine as well



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Test Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430961&view=results
2024-03-27 12:17:22 +08:00
zhijiang
b14d3f1d52
Zhijxu/fix softmax fp16 (#20059)
in fp16 input, the softmax will return nan in some case, 

the reason is because in float16 dtype,
std::numeric_limits<float16>::infinity() will return 0 instead of inf
2024-03-27 11:37:10 +08:00
Xiaoyu
512c803550
fix import in transformer optimizer python script (#20091)
### Description
Fix import.

### Motivation and Context
Fix error.
2024-03-26 20:16:09 -07:00
Yulong Wang
473434c73f
[js/webgpu] perform uniform consistency check (#20019)
### Description

This PR makes a change in WebGPU backend to validate program uniforms.
It compares the uniform data that comes from the result of
`getRunData()` callback from the program info, with the `ShaderHelper`'s
maintained list of uniform variables.

Fixes a few bugs that found by this check as well.
2024-03-26 17:14:43 -07:00
cao lei
793a8882ed
Regarding copy inputs before inference, flush the stream which copies the input only if the input is consumed by the ops from different streams (#19970)
### Description
<!-- Describe your changes. -->
Regarding copy inputs before inference, flush the stream which copies
the input only if the input is consumed by the ops from different
streams


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is the improvement for the fix
https://github.com/microsoft/onnxruntime/pull/17303
2024-03-26 13:57:25 -07:00
Yulong Wang
050085a7fb
[js/web] remove "browser" field in package.json (#20021)
### Description

Field "browser" is deprecated in favor of "exports". Removes the unused
field.

Some bundler may read from "browser" and generate errors. Removing this
field should let bundler to look up "exports". Fixes #19915
2024-03-26 13:57:11 -07:00
Yulong Wang
0313dd1f65
Update Web CI to use data dir under Agent.TempDirectory (#20074)
### Description
Update Web CI to use data dir under Agent.TempDirectory

This change fixes the random failure caused by unstable access to karma
temp directory (which is under AppData\Local\Temp) on CI pipeline
2024-03-26 13:16:59 -07:00
Baiju Meswani
40efbd6c37
Fix training and macos ci pipelines (#20034) 2024-03-26 12:20:11 -07:00
zesongw
ea3082edc6
[WebNN EP] Support Split before opset13 (#19988)
### Description
Support Split before opset13, where the `split` is an attribute.



### Motivation and Context
Support more models which use the earlier opset.
2024-03-26 11:59:41 -07:00
pengwa
dfa891a2d8
Fix memory stats printing (#20061)
### Fix memory stats printing

The mmeory stats printing is failed when module is in eval mode, doing
ORTModule wrap. At that time, runtime inspector for training manager
should have training model being true, but got a false (because existing
logic get the boolean from module.training). Runtime inspector as part
of training manager or inference manager should know it is serving
training or not explicitly, so we cannot depend on the stat of
module.training during ORTModule initialization.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-26 21:25:59 +08:00
Yi Zhang
0906c57c9e
Pin Onnx Version (#20073)
### Description
1. change in build.py is to fix DML exception
(https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=10&_a=summary)
2. change in requirements.txt is to fix exception in python packaging
pipeline.
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430433&view=results



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-26 17:59:46 +08:00
pengwa
1a0ba3f69f
Fix softmax export (#20057)
### Description


Why we need to define softmax export logic here?
For the usage `nn.functional.softmax(attn_weights, dim=-1,
dtype=torch.float32)` in the model,

76a33a1092/src/transformers/models/mistral/modeling_mistral.py (L302)
If dtype is specified, the input tensor is casted to dtype before the
operation is performed.
This is useful for preventing data type overflows. While existing ONNX
exporter do the cast after the operation, which is not correct.
(cf06189a2d/torch/onnx/symbolic_opset13.py (L27)).
This override can be a workaround before PyTorch fix the issues in
coming releases.
 (TODO: pengwa - add PyTorch versions when the issue is fixed).


@thiagocrepaldi We may need a fix in PyTorch repo as well. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-26 13:09:20 +08:00
Adrian Lizarraga
7d976cf720
[QNN QDQ Quant] Utils to generate mixed-precision quant overrides (#20028)
### Description
- Adds a utility to the QNN quantization scripts that "fixes" an initial
set of tensor quantization overrides for mixed-precision QDQ models.
Follow-up to https://github.com/microsoft/onnxruntime/pull/19925
- Moves existing overrides for QNN compatibility (matmul, layernorm,
sigmoid, tanh) to separate functions. PR adds missing unit tests for
these.
- Adds `weight_symmetric=None` parameter to the `get_qnn_qdq_config()`
function to enable user specification (instead of always using default
behavior).
- If weight_symmetric is set to `None`, it will be set to
`weight_symmetric = weight_type in (QUInt8, QUInt16)`.
  - Otherwise, the user's value is used.

#### Example
Float model:

```
    input_0 --> Op1 --> Op3 --> Op5 --> Op6 --> output_0
                                 ^
                                 |
    input_1 --> Op2 -+-> Op4 ----+
                     |
                     +-> Op7 --> output_1
                     |
                     +-> Op8 --> output_2
```

If we'd like to quantize this model to uint8 precision, but would like
to make sure tensor "Op4_out" is quantized to 16-bit, then we would
specify the following initial tensor quantization overrides:
```python
# Op4_out could be an inaccurate tensor that should be upgraded to 16bit
initial_overrides = {"Op4_out": [{"quant_type": QuantType.QUInt16}]}
```

These initial overrides may not create a valid model because Op4 and Op5
may require both the input and output to be the same type (e.g.,
uint16). This helper fixes the overrides so that input/output data types
are valid:

```python
qnn_config = get_qnn_qdq_config(
    float_model_path,
    data_reader,
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QUInt8,
    init_overrides=initial_overrides,  # These initial overrides will be "fixed"
)
```

The above snippet generates the following "fixed" overrides (get via
`qnn_config.extra_options["TensorQuantOverrides"]`):
```python
    {
      "Op2_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op4"}}}],
      "Op3_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op5"}}}],
      "Op4_out": [{"quant_type": QUInt16}],
      "Op5_out": [{"quant_type": QUInt16, "convert": {"quant_type": QUInt8, "recv_nodes": {"Op6"}}}]
    }
```

How to interpret the fixed overrides:
- Op2's output is consumed by Op4, Op7, and Op8. Op4 consumes the
converted u16 type, but Op7 and Op8 consume the original u8 type.
- Op3's output is converted from u8 to u16. Op5 consumes the converted
u16 type.
- Op4's output is just u16 (not converted). All consumers of Op4_out get
the u16 type.
- Op5's output is converted from u16 to u8. Op6 consumes the u8 type.

### Motivation and Context
Generating mixed-precision quantization overrides is currently a manual
process. This PR adds an utility that helps generate valid overrides.
2024-03-25 14:41:14 -07:00
Vincent Wang
d30c81d270
Add Symbolic Shape Hint to Triton Codegen Config (#20056)
Add symbolic shape hint to Triton codegen config so that we can avoid
unnecessary recompile when input shapes are keeping changing. Below
screenshot shows that with proper configuration, we can speed up the
training a lot by reducing unnecessary recompile.


![image](https://github.com/microsoft/onnxruntime/assets/11661208/699944d2-81cd-4c22-84e7-73a4fa0d2a28)
2024-03-25 15:05:02 +08:00
aciddelgado
4a196d1594
Packed QKV and Rotary Embedding Support for sm<80 GQA (#20012)
### Description
Add support for packed qkv input and rotary embedding with sm<80 using
memory efficient attention kernel.



### Motivation and Context
Allows lower-end gpus to run gqa with packed qkv input and rotary
embedding.
2024-03-23 14:30:35 -07:00
Hector Li
f977be0663
Fix issue that failed to load Conv node with external initializer (#20042)
### Description
Fix issue that failed to load Conv node with external initializer.
Root cause the model path is not provided while loading the weight and
bias tensor for Conv.
2024-03-23 13:43:20 -07:00
Satya Kumar Jandhyala
5b64d7c32b
[JS/WebGPU] Use non-matmul implementation for ConvTranspose in channel-first case. (#20022)
### Description
Avoid using vec4 Matmul implementation for ConvTranspose with channel-last



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-23 11:19:14 -07:00
Adrian Lizarraga
cdc5d72ba9
[QDQ Quant] Support mixed-precision integer quantization via overrides (#19925)
### Description
Adds support for specifying mixed precision QDQ models via tensor
quantization overrides.



### Motivation and Context
This PR implements an approach for supported "mixed precision" models.
The following figure demonstrates an example mixed precision model as
defined in this PR.


![image](https://github.com/microsoft/onnxruntime/assets/19691973/40ae3bf9-b21a-4ba5-a1cd-41c1e08c21e7)

A mixed precision QDQ model consists of regions with different
activation/weight quantization data types. The boundary between regions
converts between activation quantization data types (e.g., uint8 to
uint16) using a DQ to Q sequence.

The ability to specify regions with different quantization data types
enables exploring the tradeoffs between accuracy and latency. A higher
integer precision may improve accuracy at the expense of latency, so
selectively promoting certain regions to a higher precision can aid in
achieving a desirable balance in key metrics.

#### Current support
By default, the ORT quantizer supports specifying default activation and
weight quantization data types for the entire model. A recent PR added
support for specifying basic quantization overrides at the tensor level
via the `extra_options["TensorQuantOverrides"]` configuration:

```
TensorQuantOverrides = dictionary :
    Default is {}. Set tensor quantization overrides. The key is a tensor name and the value is a
    list of dictionaries. For per-tensor quantization, the list contains a single dictionary. For
    per-channel quantization, the list contains a dictionary for each channel in the tensor.
    Each dictionary contains optional overrides with the following keys and values.
           'quant_type' = QuantType : The tensor's quantization data type.
           'scale' =  Float         : The scale value to use. Must also specify `zero_point` if set.
           'zero_point' = Int       : The zero-point value to use. Must also specify `scale` is set.
           'symmetric' = Bool       : If the tensor should use symmetric quantization. Invalid if also
                                      set `scale` or `zero_point`.
           'reduce_range' = Bool    : If the quantization range should be reduced. Invalid if also
                                      set `scale` or `zero_point`.
           'rmax' = Float           : Override the maximum real tensor value in calibration data.
                                      Invalid if also set `scale` or `zero_point`.
           'rmin' = Float           : Override the minimum real tensor value in calibration data.
                                      Invalid if also set `scale` or `zero_point`.
```
The tensor-level overrides are currently used to override the
quantization type for weights/initializers or to set specific
scale/zero-point values for a tensor (e.g., QNN requires Sigmoid to use
a specific scale/zero-point at its output).

However, these overrides are not typically used to override activation
quantization types due in large part to operator data type constraints.
Consider, for example, that all inputs and outputs to an Add operator
must be of the same data type. Consequently, using tensor-level
overrides to promote the Add’s output to 16-bits would force the inputs
to also be overridden to 16-bit. In turn, this would have a cascading
effect on potentially the entire graph. The solution implemented by this
PR is to allow the specification of tensor boundaries where the
activation quantization data type changes.

#### The approach
The following figure shows a model with a region that has been promoted
to 16-bit from the default 8-bit activation type.


![image](https://github.com/microsoft/onnxruntime/assets/19691973/5998c301-ae20-4ac9-8a43-37f335cfcf8b)

Note the following observations:
- Op2’s output is consumed by Op4, Op7, and Op8. Op4 consumes the
converted u16 type, while Op7 and Op8 consume the original u8 type.
- Op3’s output is converted from u8 to u16. Op5 consumes the converted
u16 type.
 - Op4’s output is just u16 (not converted).
 - Op5’s output is converted from u16 to u8. Op6 consumes the u8 type.

The approach implemented by this PR uses the tensor-level quantization
overrides to specify a tensor’s quantization type at both the producer
and consumer ends. **The following shows the overrides necessary to
create this mixed precision QDQ model.**

```python3
overrides = {
  “Op2_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op4”}}}],
  “Op3_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op5”}}}],
  “Op4_out”: [{“quant_type”: QUInt16}],
  “Op5_out”: [{“quant_type”: QUInt16, “convert”: {“quant_type”: QUInt8, “recv_nodes”: {“Op6”}}}]
}
```
2024-03-23 11:05:08 -07:00
Changming Sun
3b4b99b90b
Fix a bug in WASM's GEMM (#20023)
### Description
Fix a bug in WASM's GEMM. The bug was found when running
"ConvAddActivationFusionTests.ConvGemmDirect" unit test in a wasm build
with address sanitizer enabled. When CountK=25, CountN=1, lda=25, ldc=1,
the function I am modifying triggered a read out of bound error.

The bug fix was provided by @fs-eire.
2024-03-23 08:53:50 -07:00
Xiaoyu
71551dacd5
Add ModelProto support for transformers optimize_model (#19990)
### Description
Add `ModelProto` support as an input to transformers `optimize_model`
API.



### Motivation and Context
Currently, the `optimize_model` API only accepts a model path as the
input model. However, for large models, saving and loading from disk can
be time-consuming. By adding `ModelProto` as an input option to the
`optimize_model` API, significant time can be saved.
2024-03-22 18:40:58 -07:00
Dmitri Smirnov
3076b56947
Make MS Debug engine SymInitialize() called as needed. (#20036)
### Description
<!-- Describe your changes. -->
Initialize Symbol engine as needed with no duplicate calls.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
  Currently absel library may call SymInitialize more than once
  when shared libraries are involved. However, this can only be
  called only once per process. Our debug_alloc also may call it
  when enabled. This change enables intialization to proceed
  only when needed with no duplicate effort.
2024-03-22 16:17:47 -07:00
kunal-vaishnavi
f9cddd2cf5
Remove early stopping from LLaMA end-to-end benchmarking (#20033)
### Description
This PR removes early stopping from the end-to-end LLaMA-2 benchmark
script.

### Motivation and Context
This allows models to always generate the requested number of new
tokens.
2024-03-22 14:44:34 -07:00
Abhishek Jindal
7e84ba0ea3
remove const cast for DLManagedTensor (#20015)
### Description
<!-- Describe your changes. -->
Removing const_cast as it might lead to unknown behavior. Specifying
DLMangedTensor as a const doesn't seem to be necessary and I have tested
this by running torch_ort.configure. Not sure what other tests which
needs to be done. Background can be found in this
[PR](https://github.com/microsoft/onnxruntime/pull/19982)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-22 10:39:19 -07:00
Baiju Meswani
2bc29244b4
Support model with multiple SCE loss nodes (#20016) 2024-03-22 10:28:44 -07:00
kunal-vaishnavi
6238e9c0af
Add LLaMA end-to-end benchmarking (#19985)
### Description

This PR adds a benchmarking script to measure end-to-end performance and
saves the results in a CSV file.

### Motivation and Context

With this PR, end-to-end performance can be easily measured for many
large-language models such as LLaMA-2. The performance numbers for
LLaMA-2 are located
[here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).
2024-03-21 18:59:05 -07:00
sfatimar
eab35c20fc
Ort openvino npu 1.17 master (#19966)
### Description
Add NPU to list of device supported. 
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL 
Bug Fixes with Python API 
Reverted Dockerfiles not being maintained. 



### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
2024-03-21 18:44:00 -07:00
Yi Zhang
cd6d3aea45
Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989)
### Description
1. Move building on CPU machine.
2. Optimize the pipeline
3. Since there isn't official ONNX package for python 12, the python 12
test stage uses the packages built with ONNX source in build stage.


### Motivation and Context
1. Resolve the random hang in compilation
4. Save a lot of GPU resources.

---------
2024-03-22 09:16:00 +08:00
Changming Sun
dafbef3a21
CMake: support reading dependency zip files from a local mirror (#20005)
### Description
To test this feature, run 
```bat
python cmake\deps_update_and_upload.py --root-path mirror
```
Then run build.py as usual. 

The zip files will be cached local. To avoid being downloaded again and
again.
2024-03-21 17:58:59 -07:00
TP Boudreau
983fd8393a
Recognize NaN operands in Min and Max ops (#19984)
### Description
Update the Min and Max CUDA math operations on float/double types to
propagate NaNs: if either operand is NaN, the result should be NaN.

TODO: float16/bfloat16 need similar change.

### Motivation
Currently, results differ between the CPU and CUDA implementations of
the floating point Min and Max operators: the CPU operators correctly
return NaN results if either operand is NaN. This PR updates the CUDA
implementations to conform with this correct behavior.

See the the issue and comments raised
[here](https://github.com/onnx/onnx/issues/6003).

### Context
Same behavior in numpy, torch and Java:
```
>>> numpy.min([numpy.NAN, 1])
nan
>>> numpy.max([numpy.NAN, 1])
nan

>>> torch.min(torch.tensor([1, float('nan')]))
tensor(nan)
>>> torch.max(torch.tensor([1, float('nan')]))
tensor(nan)
```

C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin)
and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has
different behavior:
```
fmax(NaN,1) = 1
fmin(NaN,1) = 1
```

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331)

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf
2024-03-21 16:08:18 -07:00
Yi Zhang
30a0d80925
Fix exception in Publish unit test results step (#20007)
### Description
Test results files are all in RelWithDebInfo\RelWithDebInfo directory.
It's not necessary to stat the directory of _deps 

### Motivation and Context
Recently this exception in zip-nuget pipleine occurs many times.
`##[error]Error: Failed find: EPERM: operation not permitted, stat
'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'`

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b
2024-03-22 06:53:59 +08:00
Tianlei Wu
06fe4f3113
Increase MNIST test tolerance (#20000)
### Description

Found multiple occurrence of failures:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1321061&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=56a04c0b-9e7f-5c69-cb7b-c2a7b1a7392a&l=17537

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1329701&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=4f6ef737-111d-50d1-a46b-5f86d9a970bc&s=3618b4c0-1011-591a-85b8-671e72e2cff1

1: [ RUN      ] ModelTests/ModelTest.Run/
cuda__models_zoo_opset7_MNIST_model
1: D:\a\_work\1\s\onnxruntime\test\providers\cpu\model_tests.cc(358):
error: Expected equality of these values:
1:   COMPARE_RESULT::SUCCESS
1:     Which is: 4-byte object <00-00 00-00>
1:   ret.first
1:     Which is: 4-byte object <01-00 00-00>
1: expected -2.33638 (c0158735), got -2.30239 (c0135a47), diff:
0.0339923, tol=0.0243638 idx=9
2024-03-20 23:40:27 -07:00
Prathik Rao
0b958bb421
add random seed to layernorm tests (#19998)
Adds random seed to layernorm tests to prevent random failure.

### Motivation and Context
Fixes https://github.com/microsoft/onnxruntime/issues/19983
2024-03-20 21:00:25 -07:00
Yi Zhang
175f149b30
Remove downloading deps in CUDA package test stage (#19993)
### Description
<!-- Describe your changes. -->



### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
2024-03-21 10:01:03 +08:00
Justin Chu
0335ea9f1e
Use Java 11 to build project in the codeql pipeline (#19999)
Codeql uses Java 8 by default, which is too old for the project.

Related:

https://learn.microsoft.com/en-us/java/openjdk/reasons-to-move-to-java-11
https://github.com/actions/setup-java
2024-03-20 17:53:48 -07:00
Yufeng Li
15219e2e71
turn on neural_speed by default (#19627)
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-20 12:49:58 -07:00
Rachel Guo
6b305f95e0
Support xcframework for mac catalyst builds. (#19534)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

MAUI on macOS uses mac-catalyst which requires a different native
binary.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-03-20 10:55:19 -07:00
Adam Pocock
19ff4a6d6c
String Tensor SplitToSequence fix (#19942) 2024-03-20 10:52:00 -07:00
Markus Tavenrath
0af5eacc8b
Fix broken Pooling CUDA NHWC Ops and ensure NCHW / NHWC parity. (#19889)
### Description
Fixed all CUDA NHWC Pooling operations which were broken and enabled the
NHWC CUDA pooling tests. Disabled all pooling tests which are not
supported by the CUDA EP.



### Motivation and Context
Ensure parity between CUDA NHWC / NCHW and work towards 100% tests
enabled for the CUDA EP / CUDA NHWC EP.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-03-20 09:57:29 -07:00
Yi Zhang
8adbc09314
[Fix] Error Python Packaging Pipeline (Training CPU) (#19992)
### Description
fix the error caused by
https://github.com/microsoft/onnxruntime/pull/19973
2024-03-20 09:02:50 -07:00
zesongw
7e18cb4c35
[WebNN EP] Support MatMul 1D (#19862)
### Description
Support MatMul 1D inputs by combining Reshape and ReduceMean.



### Motivation and Context
ONNX MatMul can support 1D inputs, which is disabled in
`IsOpSupportedImpl`.
2024-03-20 08:32:57 -07:00
Ye Wang
6ff31e06d5
[MoE] Add TP and Mixtral MoE (#19945)
### Description
<!-- Describe your changes. -->

1.Support Tensor Parallelism in ShardedMoE.
2.Make necessary code changes to support Mixtral MoE.
3.Fix a bug related to using IOBinding in test script.
4.Fix the input size limitation

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-19 21:28:15 -07:00
mindest
3dfe4a5e6d
[ROCm] Remove MPI dependency and collectives to use NCCL (#19830)
### Description
* Remove MPI dependency to use NCCL AllReduce, etc.
* Exclude unsupported collectives in hipify
2024-03-19 17:35:18 -07:00