Commit graph

8678 commits

Author SHA1 Message Date
Edward Chen
c415bc725f
Add 'name' key to xcodebuild 'destination' option. (#15690) 2023-04-28 08:52:18 -07:00
Jian Chen
c401cf4b51
Fix issue there 9573-quantizing-distilbert-models-after-optimizing-wi… (#15659)
…th-ort-leads-to-invalid-node-input-names

### Description
Fix issue where Quantizing DistilBERT models after optimizing with ORT
leads to invalid node input names



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-28 08:45:20 -07:00
Scott McKay
7e6331d5c7
Add ability to register custom ops from ORT extensions nuget package (#15696)
### Description
<!-- Describe your changes. -->
Add infrastructure so it's easy for a user to add the ORT extensions
nuget package and register the custom ops for C# apps.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Need to be able to use extensions on mobile platforms with Xamarin/MAUI
2023-04-28 18:53:02 +10:00
Yulong Wang
94c9a31f83
[js/webgpu] fix download failure due to buffer change (#15723)
### Description
fix download failure due to buffer change.

WebAssembly buffer may change (growth triggered by memory allocation)
during an async function call.
2023-04-28 00:16:31 -07:00
Linnea May
2c3697be00
User/linneamay/reduce 18 (#15701)
### Description
<!-- Describe your changes. -->
Add registration for DML reduce functions in opset 18. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Linnea May <linneamay@microsoft.com>
2023-04-27 20:32:11 -07:00
Changming Sun
5b826b1bc3
Update cmake version in Linux build (#15707)
### Description
All our Windows build pipelines already uses cmake 3.26 except one
pipeline: QNN ARM64.
This PR does the same for Linux build pipelines.

### Motivation and Context
This change is related to #15704 .
2023-04-27 20:02:33 -07:00
Edward Chen
9db24f8fec
Update kernel registration validation to allow kernel registrations to appear in arbitrary order. (#15705)
The validation script will now sort them by increasing opset order before processing them.
2023-04-27 18:49:31 -07:00
kunal-vaishnavi
39d6d7050d
Change EmbedLayerNormalization mask index output to optional (#15526)
### Description
This PR changes an EmbedLayerNormalization node's mask index output to
be an optional output if a mask input is not provided.



### Motivation and Context
The documentation for EmbedLayerNormalization states 
```
The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated.
```
However, if the mask input is not provided, the mask index output is
still calculated and required.
2023-04-27 16:32:42 -07:00
Yulong Wang
d471432e10
[js/webgpu] fix attribute cache key for 2 operators (#15710)
### Description
fix attribute cache key for LeakyRelu and ThresholdedRelu
2023-04-27 15:04:33 -07:00
Yulong Wang
c0116af619
[js/webgpu] operator Exp (#15713)
### Description
operator Exp
2023-04-27 15:04:09 -07:00
Tang, Cheng
627f5c9767
support allgather on different axis (#15610)
### Description
Extend the AllGather op to support perform allgather on different axis.
provide the implementation in nccl kernels.

### Motivation and Context
We hit some scenario in distributed inference that we need to support
gather on non-first axis.

---------

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
2023-04-27 14:47:28 -07:00
Sheil Kumar
5bde1e8e37
Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation (#15686)
Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation

This will enable STFT and DFT on signals which have non-powers of 2.
2023-04-27 14:03:40 -07:00
Adrian Lizarraga
be5c582e65
[QNN EP] Update to QNN SDK 2.9.0 (#15709)
### Description
- Update to QNN SDK 2.9.0 for QNN pipelines
- Temporarily disable warnings as errors for QNN Windows x64 pipeline
- Note that this pipeline did not previously run to completion. It also
currently does not run for pull requests.

### Motivation and Context
Need to update and test the latest available version of the QNN SDK.
2023-04-27 13:44:09 -07:00
RandySheriffH
9773e76c44
Single-schema-multi-kernel (#15184)
The PR is to allow custom op of different input types to have same op
name in a graph.
The idea to go over all ops of same name and merge their input/output
types into a type-inference function.
With the enhancement, custom op node inside a graph can have same
op-type given that the input/output types are different.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-04-27 13:39:59 -07:00
Changming Sun
d3d232b047
Rename onnxruntime-Linux-CPU-2019 machine pool (#15691)
Rename onnxruntime-Linux-CPU-2019 machine pool to
"onnxruntime-Ubuntu2004-AMD-CPU". The old one has an internal error and
stuck there. I cannot make any change to it. It has been like this for
more than 1 week. So I created a new pool with the same setting except
the name is different.
Also, move some android pipelines to
"onnxruntime-Linux-CPU-For-Android-CI" which uses a standard image from
https://github.com/actions/runner-images
2023-04-27 12:46:18 -07:00
Chi Lo
a957a872d3
Patch fix for the newly added TRT EP provider options (#15687)
We missed some code change with recently added TRT EP provider options
2023-04-27 10:36:01 -07:00
Changming Sun
d3e8d7a70d
Better support for cmake 3.26 and Windows ARM64 (#15704)
### Description

In #8953 I introduced a change in our onnxruntime_mlas.cmake that it
enables "ASM_MASM" cmake language for all Windows build.
```cmake
enable_language(ASM_MASM)
```
Before the change, it is only enabled when onnxruntime_target_platform
equals to x64.

However, cmake 3.26 added a new language:  ASM_MARMASM.

According to cmake's manual,
ASM_MASM is for Microsoft Assembler
ASM_MARMASM is for Microsoft ARM Assembler. This one is new in cmake
3.26.

We should choose the right one according to
${onnxruntime_target_platform}.
2023-04-27 10:25:45 -07:00
yf711
2e1f92a986
Fix EP Perf pipeline (#15507)
### Description
* Update TensorRT 8.6 lib dependencies in dockerfile of TRT EP Perf
pipeline
* Avoid using `--allow_running_as_root` and build ORT with non-root user


### Motivation and Context
To fix the build issue on EP perf pipeline

Fixed
[AB#14615]
2023-04-27 10:09:14 -07:00
Yi Zhang
8cda1ffa28
Fix error in post-merge pipeline (#15717)
### Description
Get the right drive letter on Windows

### Motivation and Context
Build Directory might be in drive C
2023-04-27 10:05:15 -07:00
cloudhan
a952419674
[ROCm] Fix FusedConv to stop caching fusion args (#15671)
The follow code shows ROCm EP FusedConv produce incorrect results:
```py
import numpy as np
import onnx
import onnxruntime as ort

X = onnx.helper.make_tensor_value_info("input", onnx.TensorProto.FLOAT, [1, 64, 55, 55])
a = onnx.helper.make_tensor_value_info("tmp", onnx.TensorProto.FLOAT, [1, 64, 55, 55])
Y = onnx.helper.make_tensor_value_info("output", onnx.TensorProto.FLOAT, [1, 64, 55, 55])

weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32)
weight1 = onnx.helper.make_tensor("weight1", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data)
bias_data = np.random.random(64).astype(np.float32)
bias1 = onnx.helper.make_tensor("bias1", onnx.TensorProto.FLOAT, [64], bias_data)

weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32)  # <------ comment out
weight2 = onnx.helper.make_tensor("weight2", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data)
bias_data = np.random.random(64).astype(np.float32)  # <------ comment out
bias2 = onnx.helper.make_tensor("bias2", onnx.TensorProto.FLOAT, [64], bias_data)

node1 = onnx.helper.make_node("FusedConv", inputs=[X.name, weight1.name, bias1.name], outputs=[a.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu")
node2 = onnx.helper.make_node("FusedConv", inputs=[a.name, weight2.name, bias2.name], outputs=[Y.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu")

graph = onnx.helper.make_graph([node1, node2], "Graph", [X], [Y], initializer=[weight1, bias1, weight2, bias2])

model = onnx.helper.make_model(graph, producer_name="tmp", opset_imports=[
    onnx.helper.make_opsetid('com.microsoft', 1), 
    onnx.helper.make_opsetid('ai.onnx.ml', 1), 
    onnx.helper.make_opsetid('', 14),
])

sess0 = ort.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"])
sess1 = ort.InferenceSession(model.SerializeToString(), providers=["ROCMExecutionProvider"])

ref = sess0.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0]
our = sess1.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0]

print(ref - our)
```

The root cause is that fusion args is cached together with fusion plan.
It seems that internal to MIOpen, the `miopenOperatorArgs_t` handle is
copied directly to execution engine, instread of the content of a
`miopenOperatorArgs_t`. If two ORT `OpKernel`s have the same conv kernel
spatial dimension and strides, etc, we then get the same hash for the
fusion plan, thus we also get the same fusion args handle. Then the
second node of `FusedConv` may modify the fusion args on the fly when it
is still pending execution for first node of `FusedConv` internal to
MIOpen. This PR moves the fusion args out of fusion plan cache to avoid
the problem.
2023-04-27 23:20:25 +08:00
pengwa
2efb75bfe9
Fold shape related operation (#14936)
### Fold shape related operation at best efforts. 

This is a follow up for PR
https://github.com/microsoft/onnxruntime/pull/12561.
Create a specialized shape_optimzer to constant fold shape related
operation.
ShapeOptimizer at the best efforts to constant fold the dim values that
exists from shape inferencing. This is helpful to simplify the graph,
which on the other hand, help other graph transformers to do more.

Transformer that traverses the graph top-down and performs shape
optimizations.
Try the best effort to constant fold the shape related to Shape node
outputs:
1. Shape generates 1D tensor [12, 128, 512] (all dimensions have
concrete dim value), which can be constant folded
to an initializer including 1D tensor values [12, 128, 512]. (Some logic
of ConstantFolding also does the same thing.)
2. Shape generate 1D tensor [batch_size, 128, 512] ->
Slice(start=1,end=3), we can constant fold the Shape->Slice to
  an initializer including 1D tensor values [128, 512].
3. Shape generate 1D tensor [batch_size, 128, 512] -> Gather(axes=[0],
index=[2]), we can constant fold the
  Shape->Gather to an initializer including 1D tensor values [512].
4. Shape 15 takes input of shape [batch_size, 128, 512], slicing from 1
to 2(exclusive), we can constant fold the
Shape15(start=1,end=2) to an initializer including 1D tensor values
[128].
This would help clean up the graph, combined with ConstantFolding, the
graph would be much more simplified.


### Motivation and Context



One direct motivation to have this is, we have a model subgraph like
this:

![image](https://user-images.githubusercontent.com/10530022/223390243-47b13922-4340-4999-9637-f52a33f69a2d.png)

The subgraph in the green rectangle is trying to get the value `30522`,
with the changes in this PR, the subgraph will be constant folded. Plus
ConstantFolding optimizer will further to optimize out the subsquent
`Squeeze`/`Unsqueeze`/`ConcatTraining`, then we will have a clean very
clean Reshape node, with its shape input be an constant `[-1, 20522]`.

Having this simplified graph, our other compute optimizer can help
further optimize the graph by re-ordering gather/reshape nodes.
2023-04-27 18:59:28 +08:00
Yi Zhang
53ff50d19a
make nuget workflow easy to debug. (#15693)
### Description
Add parameters to make some stages could use other run's intermediate
output.

### Motivation and Context
nuget workflow has 38 stages of 4 layers.
We had to run the whole workflow from begining to test one stage.
It could make life easier to run only one stage for testing.
like

![image](https://user-images.githubusercontent.com/16190118/234453721-e6e9a4bd-5e0b-4101-a18e-d5cf60615c9f.png)

### N.B.
In this PR, Nuget_Test_Linux_CPU, Nuget_Test_LinuxGPU and
Jar_Packaging_GPU are enabled as the first step.
So I can start to move tests from Linux host to container
2023-04-27 14:54:14 +08:00
Ted Themistokleous
926ae7d786
Add updated skipped test for multiheadattention Packed KV & QKV (#15587)
Adds skip for MIGraphX EP builds for Packed KV and QKV tests in
Multi Head attention. As it is not supported and causes CI failures
when building and testing EPs

---------
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2023-04-27 10:31:53 +08:00
Changming Sun
e63bb5acef
Fix a memory leak in QGemm (#15703)
### Description
The BufferUniquePtrs in the old code doesn't have knowledge of the
allocator where the allocated memory was from, so it cannot free the
memory.
2023-04-26 18:48:00 -07:00
Rachel Guo
740d553c42
[rn] Reland support loading model from buffer for Android (#14514)
### Description
<!-- Describe your changes. -->

Reland previous reverted changes for loading model from buffer - Android


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#13903

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-04-26 16:53:17 -07:00
Yulong Wang
a02c885f86
[js/webgpu] add implementation of Relu, LeakyRelu and ThresholdedRelu (#15668)
### Description
add implementation of Relu, LeakyRelu and ThresholdedRelu
2023-04-26 15:11:01 -07:00
Justin Chu
76ddc92fbd
Enable RUFF as a formatter (#15699)
### Description

RUFF can now format since lintrunner-adapters v0.8. Removed the RUFF-FIX
linter.



### Motivation and Context

Better engineering
2023-04-26 14:04:07 -07:00
Yufeng Li
d7ba9814cf
[prefast:Warning]: C26409 ('PackedAttention<onnxruntime::MLFloat16>::TryGettingFusedRunner') (#15663)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-26 14:03:36 -07:00
Patrice Vignola
97c4cab6b7
[DML EP] Massage SkipLayerNorm axes to better target metacommands (#15676)
DML's MVN metacommand needs all axes except for batch and channel to be
reduced. By adding trailing dimensions of 1's and their corresponding
axes, the operation stays the same but we are now able to call
metacommands.
2023-04-26 14:00:36 -07:00
Hector Li
4c7b5032da
[QNN EP]Support unpack initializer from external data source (#15694)
### Description
Support unpack initializer from external data source

### Motivation and Context
Support unpack initializer from external data source
2023-04-26 13:39:40 -07:00
yf711
28985c47b7
[TensorRT EP] Unleash opset16-17 onnx model tests (#15657)
### Description
In 2021 we restricted onnx node test CI execution in range of opset
14-15 for ORT-TRT, which was the latest opset that TRT EP could support

Update this range to opset 14-17 to improve the ORT-TRT unit test
coverage, as [Nvidia announced that TRT 8.6 supported
opset17](https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md)
2023-04-26 11:44:19 -07:00
kunal-vaishnavi
cfb8c0e2ca
Add Whisper custom export to wheel (#15685)
### Description
This PR adds the Whisper custom export scripts to the wheel.



### Motivation and Context
This enables access to the custom export scripts in the wheel.
2023-04-26 10:45:52 -07:00
yf711
d701dcd027
Fix Linux MultiGPU TensorRT CI (#15697)
### Description
* Reverting default TensorRT version to 8.5 as temporary fix
  
* Apart from that, this PR temporarily leaves this CI as a place to
validate user behavior that uses TRT 8.5 with latest ORT

### Context
* This CI pool equips 2xTesla M60 GPUs, which are no longer supported by
TensorRT 8.6.
* Currently, other CIs are using single-T4 VM but there's no VM with
2xT4 or other suitable dualGPU in the range.
* Once we decide which VM instance for this CI to migrate to, TRT8.6 can
be enabled on this CI

* According to
[Nvidia](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html):
* TensorRT 8.5.3 was the last release supporting NVIDIA Kepler (SM 3.x)
and NVIDIA Maxwell (SM 5.x) devices. *These devices are no longer
supported in TensorRT 8.6*. NVIDIA Pascal (SM 6.x) devices are
deprecated in TensorRT 8.6.
2023-04-26 10:01:33 -07:00
PeixuanZuo
0ecfe83932
[ROCm] add beam search support (#15625)
add beam search support for ROCm EP.
2023-04-26 17:53:33 +08:00
Xavier Dupré
699c9a520b
Fix TVM pipelines (#15653)
### Description
Fix TVM pipelines by adding missing dependancy of TVM (attrs).
2023-04-26 09:55:05 +02:00
Yulong Wang
b98317b907
[js/webgpu] following up for JSEP/WebGPU code cleanup (#15666)
### Description
This PR resolves a part of non-critical comments from code review
comments in #14579.

- use `USE_JSEP` instead of `USE_JS` in build definition to make it less
ambiguous
- remove unused util functions from util.ts
- fix transpose.h
- other misc fixes
2023-04-25 21:20:03 -07:00
sfatimar
ebaafac3f5
Openvino ep ort 5.0 (#15626)
### Description
The PR adds VPU support to OpenVINO Execution Provider
Bug fixes for GPU, CPU. 
Changes to OpenVINO Backend in Serialized Model API for faster First
Inference Latency.
Deprecation to HDDL-VADM and MYRIAD, removed code
Support OpenVINO 2023.0 
Dynamic Shapes Support for iGPU

### Motivation and Context
- VPU is an upcoming hardware that can provide AI Acceleration for
Client Systems through OpenVINO
- If it fixes an open issue, please link to the issue here. -->

---------

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
2023-04-25 20:59:42 -07:00
Changming Sun
b1b6e5522e
Update cuda 11.6 to 11.8 for Windows pipelines (#15684)
### Description
Update cuda 11.6 to 11.8 for Windows pipelines
This PR is just for Windows CUDA pipelines. It does include any change
for Linux pipelines or TensorRT pipelines

### Motivation and Context
It is a planned feature for the upcoming ONNX Runtime release.
2023-04-25 20:23:57 -07:00
Rui Ren
db6a9bc033
support latest deepspeed version for optim (#15682)
### Description
<!-- Describe your changes. -->

support the latest deepspeed 0.9.1 for the next release


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will avoid the warn message `Skip modifying optimizer because of
unsupported DeepSpeed version`

---------

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-25 20:12:23 -07:00
Hector Li
3dc9720cfc
[QNN EP] Enable Qnn EP op support Elu, HardSwish, Atan (#15681)
### Description
Enable some Ops for QNN EP: Elu, HardSwish, Atan

### Motivation and Context
unblock more models
2023-04-25 20:11:06 -07:00
Wei-Sheng Chin
1524f73a09
Implement two easier random tensor generator (RTG) for flaky tests (#15517)
Some math ops have very bad numerical stability and essential randomness
(e.g., exp/log with reduction on large elements). To maintain the same
test coverage with lower CI failing rate, we can gradually replace flaky
tests' RTG with the ones implemented in this PR --- try Discrete first.
If still unstable, use Circular.

Overall recommended strategy to handle flaky test
- Find if it uses `Uniform` in
`onnxruntime/test/common/tensor_op_test_utils.h`. If yes, replace
`Uniform` with `Discrete` implemented in this PR. For
`candidate_values`, we can try `[-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5,
2]`, `[-2, -1, 0, 1, 2]`, `[-1, 0, 1]`, and `[0, 1]` and choose the most
difficult one among those passing 100 runs.
- If `Discrete` fails to meet the stability requirement, switch to
`Circular` and repeat the `candidate_values` selection process.

Let's keep an eye on the two bugs mentioned in
https://github.com/microsoft/onnxruntime/pull/15515. If the related unit
tests fail again, we can replace the underlying
`RandomValueGenerator::Uniform` with
`FixedPatternValueGenerator::Descrete` or
`FixedPatternValueGenerator::Circular` implemented in this PR.
2023-04-25 17:52:44 -07:00
Numfor Tiapo
f44f6c5b2e
Fix Prefast Errors (#15651)
This PR adds fixes for prefast errors with the following codes:

- C26814
- C26451
- C26400
2023-04-25 16:41:39 -07:00
Rui Ren
4c3e350a6a
fix ORTModuleONNXModelException fallback OOM (#15523)
### Description
<!-- Describe your changes. -->
### Error 
```
RuntimeError: There was an error while exporting the PyTorch model to ONNX:-

Traceback (most recent call last):
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_utils.py", line 254, in get_exception_as_string
    raise exception
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 385, in _get_exported_model
    torch.onnx.export(self._flattened_module,
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/__init__.py", line 305, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/utils.py", line 743, in _export
    proto, export_map, val_use_external_data_format = graph._export_onnx(
RuntimeError: ONNX export failed: Couldn't export Python operator XDropout
```
The error leads to Out of Memory issue, because the log.txt file is **26
GB**.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The root cause is that in each `_forward`
```
      if log_level <= _logger.LogLevel.WARNING and not self._raised_ORTModuleONNXModelException:
          warnings.warn(
              (
                  f"Fallback to PyTorch due to exception {type(self._exception)} was triggered. "
                  "Report this issue with a minimal repro at https://www.github.com/microsoft/onnxruntime. "
                  f"See details below:\n\n{_utils.get_exception_as_string(self._exception)}"
              ),
              UserWarning,
          )
```


above code will be called and log the `exception` through
`get_exception_as_string`,

In my training case, this will lead to 40 k times of `Traceback` stdout
and 110 millions lines of `onnx graph` output and run into OOM.

### Validation

After above fixes, the log.txt file will only be **2.4 MB**.

---------

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-25 15:10:31 -07:00
Yulong Wang
d30831d829
[js/webgpu] make RunFunction return void (#15669)
### Description
make `RunFunction` return `void`.

the return value is meaningless in the OpResolveRule context. Allows any
JavaScript error to be caught and returns non-zero return value from
`computeKernel()`
2023-04-25 14:14:26 -07:00
Chen Fu
2fa10fb803
Fp16 onnx pool operators, relu, leakyrelu (#15498)
### Description
Adding the fp16 onnx operator implementations:
 maxpool, averagepool, global average pool, relu, leaky relu


### Motivation and Context

Continue with support for fp16. Standard onnx operator implementations are needed as a basis for the graph optimizers to work.
2023-04-25 14:01:47 -07:00
Changming Sun
9bf08bdb52
Fix iconv link issue (#15592)
### Description
Fix iconv link issue. The library is used in string_normalizer.cc. 

### Motivation and Context
Though iconv is part of POSIX standard, some systems may have additional iconv providers, for example GNU iconv, that is not in the standard c runtime library. In these cases we may need to link to additional libraries. 
However, this change has two caveats:
1. It may silently pull in GNU libraries into libonnxruntime.so,  and make the shared library not distributable. 
2. The detection of iconv library runs before we add additional include folders to ORT. So the detection may be inaccurate.
2023-04-25 13:28:36 -07:00
Ye Wang
d05777ddb6
stabilize fusion script with a seperate create_attention_node() (#15670)
### Description
<!-- Describe your changes. -->

previously it used create_attention_node() from base class in
fusion_attention.py. sometimes the changes in that file may silently
lead to generating a bad model.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-04-25 13:07:58 -07:00
Baiju Meswani
5885abfb35
Training Documentation (#15612) 2023-04-25 11:44:12 -07:00
Ye Wang
d00197aaa7
initialize cache_indir explicitly in beamsearch with encoder decoder model (#15667) 2023-04-25 11:05:21 -07:00
Chi Lo
e1755541cc
Fix TRT timing cache test (#15588)
TRT EP test for timing cache has wrong logic where it enables timing
cache for both sessions to compare the trt engine build time, that's why
CI got some intermittent failures.

This PR disabled the timing cache test for comparing the engine build
time between enabling/disabling timing cache until we find a model that
can benefit from timing cache.
2023-04-25 10:20:26 -07:00