Commit graph

8914 commits

Author SHA1 Message Date
Dale Phurrough
6e1c3003ff
DML EP and MLAS buffer allocator - increase alignment to 64 bytes for AVX-512 processing (#15141)
Fixes #13119 top concerns by

* using `onnxruntime::AllocatorDefaultAlloc` instead of `malloc`
* set `MLAS_DEFAULT_PREFERRED_BUFFER_ALIGNMENT=64` which cascades that
value
  to several members and functions not directly related to MLAS.

### Motivation and Context

* Fixes #13119 top concerns. Otherwise, alignment is to 16 bytes circa
1990s 👴
* Does not yet enable flexible alignment. Instead fixed at 64 (64 x 8
bits=512 bits) for modern NN hardware like AVX-512
2023-06-01 16:32:55 -07:00
Adrian Lizarraga
5a4c3b7937
[QNN EP] Support Equal, Less, LessOrGreater, Greater, GreaterOrEqual operators on HTP backend (#16171)
### Description
- Updates QDQ transformer to handle QDQ logical operators (Equal, Less,
LessOrEqual, Greater, GreaterOrEqual).
  - Expects 2 DQ inputs and no Qs in the output, which is boolean.

### Motivation and Context
This is needed to enable QDQ models with logical comparison operators to
run on QNN EP.
2023-06-01 15:07:15 -07:00
Hector Li
f72dc198c6
[QNN EP]Add UT for cached Qnn context binary (#16184)
### Description
1. Add UT for cached Qnn context binary
2. Minor change: set model path to "" if model_path is not available
since the model could be loaded from buffer instead of Onnx file

### Motivation and Context
support more scenario

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-06-01 14:28:46 -07:00
Changming Sun
5bfa1183d1
Add a Memory Profiling build job in post merge pipeline (#16172)
### Description
1. Add a Memory Profiling build job
2. Remove no absl build job since the feature will be removed
3. Simplify post-merge-jobs.yml by unifying the pool names

### Motivation and Context
To catch build errors in #16124
2023-06-01 13:00:44 -07:00
Alexander Visheratin
e6c6184fee
[JS/WebGPU] Unsqueeze operator implementation (#16138)
### Description

This PR adds an implementation of the Squeeze operator to WebGPU JSEP.
The implementation follows the [operator
schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Unsqueeze).

To implement the `Unsqueeze` operator in the same fashion as the
`Squeeze`, I added the `ComputeOutputShape()` method to the
`UnsqueezeBase` class and made some slight modifications. Please let me
know if it is a bad idea and if I should move this method to the JS
implementation.

I also uncommented test case lines in the `suite-test-list.jsonc` file
for both Squeeze and Unsqueeze operators following @hariharans29's
[comment](https://github.com/microsoft/onnxruntime/pull/16024#issuecomment-1565113633).

### How was it tested

1. I created a model with only one operator:

```Python
import onnx.helper

node = onnx.helper.make_node(
    "Unsqueeze",
    inputs=["T", "axes"],
    outputs=["y"],
)
graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [2])], [onnx.helper.make_tensor_value_info("y", 1, [3, 1, 4, 5, 1])])
onnx.save(onnx.helper.make_model(graph), "unsqueeze.onnx")
```

2. I compiled the runtime using @fs-eire's
[instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce).
3. I ran the test models in the browser using this minimal setup:
```HTML
<html>
    <script src=".\dist\ort.webgpu.min.js"></script>
    <script>
        async function run() {
            const session = await ort.InferenceSession.create('unsqueeze.onnx', {executionProviders: ['webgpu']});
            console.log(session);
            const input = new ort.Tensor('float32', new Float32Array(60), [3, 4, 5]);
            const dim = new ort.Tensor('int64', [1n, 4n], [2]);
            const output = await session.run({ "T": input, "axes": dim });
            console.log(output);
        }
        run();
    </script>
</html>
```

### Motivation and Context

Improve operator coverage for WebGPU JSEP.
2023-06-01 12:23:02 -07:00
Changming Sun
5b08176314
Exclude shufflenet from DNNL's model tests (#16126) 2023-06-01 10:56:24 -07:00
FFFrog
d185bf444d
[CANN] Add IOBinding Support For CANN EP (#15802)
### Description
Add IOBinding Support For CANN EP

### Motivation and Context
Now, Users can use IOBinding feature to speed up the inference on CANN.
2023-06-01 03:13:38 -07:00
FFFrog
8c85d990c2
add third-party pipeline status to README.md (#16155)
Refer to this
[issue](https://github.com/microsoft/onnxruntime/issues/16154), please.
2023-05-31 22:14:39 -07:00
PeixuanZuo
1b518c6836
[ROCm] add early stop to tunable profile progress (#15716)
For TunableOp, some instance may has very bad performance and it will
take a long time during profile process.
Add `tunable_op_max_tuning_duration_ms` parameter to limit max tuning
time.
2023-06-01 10:18:25 +08:00
pengwa
65b316a138
Consolidate ORTModule logging (#16078)
### Consolidate ORTModule logging

There are few improvements for ORTModule loggings:
- All ORTModule logging are used logger that is initialized in
`ortmodule.py`.
- Manage all export logs same way, e.g. use `
_logger.suppress_os_stream_output(log_level=self._debug_options.logging.log_level)`
to control exporting related logs suppressing or not. If any warning or
errors suppressed, `self._warning_log_detected_during_export` will be
set to True, then when we log ORTModule feature matrix, we will also
told users there are logs suppressed.
- Downgrade some warnings. We had some warnings for years, and looks
many models have them by default, no action we actually can take, so
downgrade them to make user logging cleaner.
- PyTorch export requires update of custom export function signature
changes, otherwise, _symbolic_context_handler complains with warnings,
so update custom export function adaption for version >=1.13 PyTorch.
- Add ORTModule feature matrix summary, **this is supposed to be only
places users see our logs by default** (unless they use INFO or
VERBOSE). Features ON/OFF states are shown clearly to them in case they
want to try some features in OFF states. This logs only shows up in rank
0 (if there are multiple rank), the intention is we want user to see a
useful and clean output from ORTModule by default. The outputs shown as
below:



![image](https://github.com/microsoft/onnxruntime/assets/10530022/9c6653ac-50fa-4b2d-ba7f-4d5ce44b25b2)


![image](https://github.com/microsoft/onnxruntime/assets/10530022/10dff5a9-2d46-4646-a4b4-2c515566376e)


- `reinitialize_ortmodule` in util.py is only used by ortmodule.py,
moving it into ortmodule.py, then utils takes no dependency on
`orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py`,
then `_custom_op_symbolic_registry.py` can call functions defined in
utils.py (without recursively include).
2023-06-01 10:09:12 +08:00
Changming Sun
d19e5c0abb
Fix a misaligned error in CUDA GEMM (#16130)
### Description

Fix an issue that FusedMatMulOpTest.FloatTypeTransposeBatch fails to run on GPUs with TF32 support. 


Authored-by: Tianlei Wu <tlwu@microsoft.com>
2023-05-31 18:10:17 -07:00
Yulong Wang
f67f7c0f0b
[js/web] disable node fallback in webpack (#16166)
### Description
disable webpack's polyfill for node's `global`, `__filename` and
`__dirname` in web build. This will confuse emscripten generated
environment detection.

see https://webpack.js.org/configuration/node/
2023-05-31 16:47:00 -07:00
cao lei
13d6ac74de
fix memory profile build (#16177)
### Description
<!-- Describe your changes. -->
This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE
is on


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE
is on.
It fixes this issue
https://github.com/microsoft/onnxruntime/issues/16124

Co-authored-by: Lei Cao <leca@microsoft.com>
2023-05-31 16:08:14 -07:00
dependabot[bot]
a55637a103
Bump socket.io-parser from 4.2.2 to 4.2.3 in /onnxruntime/test/wasm (#16067) 2023-05-31 21:55:00 +00:00
Aung T Naing
3cca32beec
[QNN EP] exapand convolution test coverage. (#15975)
### Description
<!-- Describe your changes. -->
Convolution with Padding and Convolution with large inputs,outputs.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is mainly to check the CPU vs QNN EP output mismatch for models.

./onnxruntime_test_all --gtest_filter=*.TestQDQConvU8U8S32*
Failed tests with mismatch.
[  FAILED  ] 2 tests, listed below:
[ FAILED ]
QnnHTPBackendTests.TestQDQConvU8U8S32_large_input1_padding_bias_initializer
[ FAILED ]
QnnHTPBackendTests.TestQDQConvU8U8S32_large_input2_bias_initializer


./onnxruntime_test_all --gtest_filter=*.TestCPUConvf32_*
[ FAILED ]
QnnCPUBackendTests.TestCPUConvf32_large_input1_pad_bias_initializer
2023-05-31 10:12:35 -07:00
Yi Zhang
e0199cfbd9
extend mac packaging timeout limit (#16173)
### Description

### Motivation and Context
MacOS_py_wheels are often failed due to timeout
2023-05-31 18:31:28 +08:00
Yulong Wang
ba5f5e3198
[js] allow manually release inference session (#16169)
### Description
This change adds a new instance function (method) to type
`InferenceSession` to allow users to manually release an inference
session instance.

#16131 depends on this change to work correctly.
2023-05-31 00:31:38 -07:00
PeixuanZuo
3dc5179a36
[ROCm] Change ortmodule test (#15884)
Change ortmodule test because rocm ep behaves differently than cuda.
The warning from torch `The first argument to symbolic functions is
deprecated in 1.13 and will be removed in the future. Please annotate
treat the first argument (g) as GraphContext and use context information
from the object instead.` appears twice on ROCm EP.

On ROCm EP, the log is shown as below:
```
The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead.
The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead.
User Module's attribute name _torch_module collides with ORTModule's attribute name. User Module's attribute may not be returned when trying to retrieve the attribute through ORTModule.
User Module's attribute name load_state_dict collides with ORTModule's attribute name. User Module's method may not be called upon invocation through ORTModule.
```
2023-05-31 15:14:10 +08:00
dependabot[bot]
03216e2313
Bump socket.io-parser from 4.2.2 to 4.2.3 in /js/web (#16068) 2023-05-31 02:15:23 +00:00
Baiju Meswani
7edc4b105d
Copy missing training header files to the package archive (#16119) 2023-05-30 16:45:40 -07:00
RandySheriffH
2802614846
Condition the usage of variadic callback by version (#16112)
For older versions of custom ops, optional and variadic callbacks are
null pointers, hence adding conditions to scope the usage.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-30 16:43:22 -07:00
Yulong Wang
ebe715a817
[js/webgpu] fix RangeError in buffer download (#16165)
### Description
this is a following up fix for #15990, which should resolve the
RangeError issue.
2023-05-30 15:04:50 -07:00
Sunghoon
bf05d4ec26
Fix nightly ort CI pipeline (#16162)
This PR changes [night ort CI
pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198)
to pick up the latest night ACPT image, which was changed from torch
2.0.0.dev to torch 2.1.0.dev.
2023-05-30 14:00:34 -07:00
Xavier Dupré
e726151b5c
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.

* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel

### Motivation and Context
Supports latest onnx version.

Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 13:25:58 -07:00
神楽坂帕琪
abd94b65b7
eigen.cmake use url info from deps.txt (#16129)
### Description

`eigen.cmake` use url info provided by deps.txt instead of using raw
url.
2023-05-30 11:07:20 -07:00
mindest
90e8c8daaf
profile_explorer: add op-kernel correlation info (#15946)
### Description
<!-- Describe your changes. -->
* Add aggregated op-kernel correlation information in profiler explorer
when running inference session.
* Add filtering feature so that we can focus on model runs of interest
(excluding warmup steps, etc.)
2023-05-30 23:25:43 +08:00
Yi Zhang
31fc25d2c2
[Fix] Check if CUDA is downloaded in AGENT_TEMPDIRECTORY (#16142)
### Description
supplement of #15915

### Motivation and Context
fix nuget pipeline exception in the stage of
Final_Jar_Testing_Windows_GPU

```
  JUnit Jupiter:ProviderOptionsTest:testCUDAOptions()
    MethodSource [className = 'ai.onnxruntime.providers.ProviderOptionsTest', methodName = 'testCUDAOptions', methodParameterTypes = '']
    => ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1131 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\cloudtest\AppData\Local\Temp\onnxruntime-java17193857285260738736\onnxruntime_providers_cuda.dll"
```


### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313476&view=results
2023-05-30 13:14:08 +08:00
Jian Chen
6abdc3a87b
Fix static analysis bug (#16114)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-28 10:58:07 -07:00
Yi Zhang
73584f9360
More fixes on nuget pipeline (#16091)
### Description
1. parameters couldn't using string to comprare, change it to boolean.
2. Windows_CI_GPU_DML_DEV_arm64 on the pool onnxruntime-Win-CPU-2022
failed to pass prefast step, change the pool to aiinfra-dml-winbuild.
3. skipped test_zfnet512, it's failed in Nuget_Test_Win_Training_CPU

Todo
Only Final_Jar_Testing_Windows_GPU failed now.

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313042&view=logs&s=d66543d5-16de-5a48-6ecb-a36e21ff8d4d&j=d9489789-5e39-5a05-13ab-9aaf7b4d386f
2023-05-27 08:59:12 +08:00
Alexander Visheratin
415c26e46e
[JS/WebGPU] Squeeze operator implementation (#16024)
### Description

This PR adds an implementation of the `Squeeze` operator to WebGPU JSEP.
The implementation follows the [operator
schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Squeeze)
and allows one or two inputs.

### How was it tested

1. I created two models. Without `axes`:

```Python
import onnx.helper

node = onnx.helper.make_node(
    "Squeeze",
    inputs=["T"],
    outputs=["y"],
)
graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5])], 
    [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])])
onnx.save(onnx.helper.make_model(graph), "squeeze.onnx")
```

And with `axes`:

```Python
import onnx.helper

node = onnx.helper.make_node(
    "Squeeze",
    inputs=["T", "axes"],
    outputs=["y"],
)
graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [1])], [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])])
onnx.save(onnx.helper.make_model(graph), "squeeze-dim.onnx")
```

2. I compiled the runtime using @fs-eire's
[instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce).
3. I ran the test models in the browser using this minimal setup:
```HTML
<html>
    <script src=".\dist\ort.webgpu.min.js"></script>
    <script>
        async function run() {
            const session = await ort.InferenceSession.create('squeeze-dim.onnx', {executionProviders: ['webgpu']});
            console.log(session);
            const input = new ort.Tensor('float32', new Float32Array(60), [3, 1, 4, 5]);
            const dim = new ort.Tensor('int64', [-3n], [1]);
            const output = await session.run({ "T": input, "axes": dim });
            console.log(output);
        }
        run();
    </script>
</html>
```

### Motivation and Context

Improve operator coverage for WebGPU JSEP.
2023-05-26 15:53:05 -07:00
Scott McKay
5e41d1600a
Add new QNN CIs to azp run tool (#16109)
### Description
<!-- Describe your changes. -->
Add 2 new QNN CIs to tools/python/run_CIs_for_external_pr.py


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update tool so it runs all current CIs
2023-05-27 08:46:16 +10:00
Dmitri Smirnov
9939092e71
[CPP API]Fix constness in C++API (#16103)
### Description
`CreateMap` and `CreateSequence` should be able to take in const data.
2023-05-26 14:09:00 -07:00
Jeff Bloomfield
54fdb640fe
Address performance regression with duplicate initializers across DML partitions (#16087)
This addresses a DML performance regression introduced by the constant
sharing pass.

The constant sharing pass identifies small initializer tensors which
contain identical values and merges them. This could have the effect of
causing DML to treat those tensors as non-constant and skip certain
optimization.

To prevent this, there is now an element count threshold below which the
DML EP will enable this optimization, even though it results in
duplicate work uploading and pre-processing the common tensor at
multiple operators.
2023-05-26 13:37:34 -07:00
Changming Sun
a5410515ad
Fix: Some fields in OrtCUDAProviderOptionsV2 struct are not initialized (#16113)
### Description
The file include/onnxruntime/core/providers/cuda/cuda_provider_options.h
is a C++ file. It is not for C.

Before this commit, this header file is already not compatible with C compilers. Because it has:
```
onnxruntime::ArenaExtendStrategy arena_extend_strategy;
```

And this file is intended to be internal only. It is an internal header file. It should not be included in onnxruntime_c_api.h and should not be used with the public C APIs. User can only get the instance of OrtCUDAProviderOptionsV2 via CreateCUDAProviderOptions. In such a way we can add new members to this struct without breaking binary compatibility.
Since it is an internal header, we can safely use C++ grammar there.
2023-05-26 11:34:22 -07:00
cao lei
4ab7d410ae
ExecutionProvider API refactor - Deattach allocator from EP by creating local cpu allocator instead (#16084)
### Description
ExecutionProvider API refactor - Detach allocator from EP by creating
local cpu allocator instead



### Motivation and Context
This is PR is a refactor to create local CPU allocator instead of
getting allocator from ExecutionProvider, which the final goal is to
totally detach allocators from ExecutionProvider, and put them in
session level indexed by OrtDevice
2023-05-26 04:54:42 -07:00
Edward Chen
4bfb8d3303
Update calls to OrtArenaCfg constructor to pass additional parameter. (#16104)
Update calls to OrtArenaCfg constructor to pass additional parameter.

Updating some call sites after change in #15983. Fix CI build.
2023-05-26 12:41:42 +08:00
cloudhan
2cf0ae7d01
[ROCm] Add AttentionMode to make attention logic streamline (#15978)
Refactor for future kv cache change.
2023-05-26 12:06:36 +08:00
Skand Hurkat
b28e927ca4
Read AA64ISAR0_EL1 to check dot product support (#16082)
### Description

Use an assembly instruction to read the `AA64ISAR0_EL1` register for dot
product support.

### Motivation and Context

The only reliable way to check for supported instruction extensions in
ARM is to
query the instruction set attribute registers. [Dot product instructions
can
be checked using bits 47:44 in the AA64ISAR0_EL1

register](https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers/ID-AA64ISAR0-EL1--AArch64-Instruction-Set-Attribute-Register-0?lang=en#fieldset_0-47_44).

On `qemu-aarch64` with the `a64fx` cpu which does not support the dot
product
instructions, running a quantized BERT-Large (from MLPerf) results in
`SIGILL`.
With the change, the program continues without using the dot product
instructions. Also verified that `S8S8_SDOT` kernels are invoked when
running
on hardware that supports dot product instructions.

---------

Co-authored-by: Skand Hurkat <skhurkat@microsoft.com>
2023-05-25 17:05:30 -07:00
Wanming Lin
0d1a8cc651
[WebNN EP] Use NCHW as preferred layout for DML backend (#16037)
To improve performance on DML backend.
2023-05-25 09:47:41 -07:00
Yuhong Guo
04a8f50674
New configuration to limit the arena extension (#15983)
Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
In our real scenario, we observe that if the model is big enough the
BfcArena will extend uncontrollable.
As showed by the following figures, if a model uses more than 16GB
memory, the BfcArena will totally apply for 32GB memory according to the
`kNextPowerOfTwo` strategy. With the new strategy, the extension is
limited. The default maximum extension size is 1GB.

#### Without the new configuration
After loading the model, ORT uses 32G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/42b93c66-b957-4f20-a13b-d34cb390afff)

#### With the new configuration
After loading the model, ORT uses 23G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/5abffeff-9ca3-4187-a262-37fd2764fe1b)

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-05-25 02:19:07 -07:00
Changming Sun
60bb07307b
Fix the TRT GPU build job in python packaging pipeline (#16073)
1. Cherry-pick #16054 back to the main branch
2. Replace onnxruntime-gpu-winbuild-t4 with onnxruntime-Win2022-GPU-T4.
The later one has VS2022.

---------

Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-05-25 00:09:08 -07:00
Changming Sun
cc0c5e5612
Fix an error in test/shared_lib/test_inference.cc (#16090)
### Description
Fix an error in test/shared_lib/test_inference.cc. It should use
ASSERT_NEAR to test float values.

### Motivation and Context
Our OpenVino pipeline is failing because of this.
2023-05-24 22:59:28 -07:00
Yi Zhang
76fd9aa745
[Fix] Some pipelines have to be using VS2019 (#16034)
### Description


### Motivation and Context
Fix nuget and python package pipeline.

1. ARM 32 build isn't supported by VS2022 officially.

https://developercommunity.visualstudio.com/t/Compilation-Error-with-VS2022-ARM/10285309

2. onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-winbuild-tensorrt8-T4
haven't VS 2022
2023-05-25 09:55:35 +08:00
pengwa
34fe8fb069
Type hint for ORTModule (#15938)
### Type hint for ORTModule

Add Type hint for ORTModule
Refine comments. 

The reason of removing theinterface execution_session_run_forward from
`orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py`:

PR
cc275e7529 (diff-497e18dc8878818205b81fd80f85942548d8aa15d0f1204ce3e3d9795e3dd195)
and some commit before it breaks the function interface contracts
between parent calss _graph_execution_manager.py and its children
_training_manager.py and _inference_manager.py. So there is no need to
have this interface.


### Other EE work opportunities

1. Use logger correctly. 
2. Remove few duplication logic parsing input/output recursively.
3. Clean up environment variable usage.
2023-05-25 09:28:20 +08:00
Sumit Agarwal
70d2dc8209
[DML EP] Fix issue with --dml_path build option (#15972)
### Description
DML_PACKAGE_DIR cmake variable is not getting set properly when dml_path
build options is used.


### Motivation and Context
- Why is this change required? What problem does it solve?
It is required for DML Perf dashboard.
<!--- If it fixes an open issue, please link to the issue here. -->
2023-05-24 19:20:40 -05:00
Zhang Lei
63c9973b7a
Fix cuda provider crash on it (#16056) 2023-05-24 16:13:11 -07:00
yf711
105f5f0f20
Avoid trt deprecated api warnings shown as errors during ORT-TRT build (#16035)
### Description
Avoid trt deprecated api warnings shown as errors when building
onnxruntime_test_all
This issue is only visible when installing trt via binaries, rather than
deb/rpm pkg (CI pipelines)


The change is similar to existing set_property for
onnxruntime_providers_tensorrt

89ea503024/cmake/onnxruntime_providers.cmake (L421)

### Motivation and Context

onnxruntime/test/unittest_main/[test_main.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/unittest_main/test_main.cc#L32)
includes nvinfer.h, which includes deprecated trt apis and and generates
warnings.
When building onnxruntime_test_all, it will show warnings as errors and
block the build.

### Doubts
Although this issue is visible on trt tar binaries but not on trt
deb/rpm pkgs,
Their file size&hash are the same (creation time vary), regarding
headers/libs installing in different ways.
| tarBin | pkg |
| ------------------------------------------------------------ |
------------------------------------------------------------ |
| 997284784 Apr 26 15:15 libnvinfer_builder_resource.so.8.6.1 |
997284784 Apr 26 22:21 libnvinfer_builder_resource.so.8.6.1 |
| 235369632 Apr 26 15:14 libnvinfer.so.8.6.1 | 235369632 Apr 26 22:21
libnvinfer.so.8.6.1 |
2023-05-24 13:19:27 -07:00
yf711
84f1af7ff5
ort build flag fix (#16072)
### Description
* Sync and clean build flag `--use_tensorrt_builtin_parser` from
existing CI config as this becomes default flag
* cuda version update
2023-05-24 12:32:10 -07:00
Guenther Schmuelling
20857c4ff2
workaround test failure in ci (#16070)
don't run wasm proxy test on debug build to unblock ci.
Needs some longer debugging.
2023-05-24 21:01:06 +08:00
Shukant Pal
f316bc57c4
[CoreML EP] Implement Unary & Reduce operators (#15532)
### Description

This change is a follow-up to #15327. It adds Unary operators (Sqrt,
Reciprocal) and Reduce operators (ReduceSum, ReduceMean). I've tried to
follow existing patterns in the code :-)


### Motivation and Context

This reduces fragmentation across EPs when using CoreML on macOS,
thereby speeding up execution.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-05-24 18:16:59 +10:00