Commit graph

1874 commits

Author SHA1 Message Date
Edward Chen
76461c8f4d
Increase timeout for iOS packaging pipeline jobs. (#20434) 2024-04-23 11:55:55 -07:00
Yi Zhang
7ebc653f04
Revert "Nuget .NET changes for Mac Catalyst (#19923)" (#20418)
This reverts commit f396748ed6.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 15:08:12 +08:00
Adrian Lizarraga
e6a677f6b7
[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359)
### Description
- Updates Windows QNN Nuget and Python packaging pipelines to download
QNN SDK from blob storage.
- Makes the QNN SDK version configurable when launching the python
packaging pipeline.



### Motivation and Context
Removes the need to rebuild images to update QNN SDK. Only applies to
Windows pipelines. Linux pipelines still get the SDK from disk.
2024-04-22 22:32:55 -07:00
Yi Zhang
197b3f1d90
Enable Whisper Test with OMP_FFMPEG (#20402)
### Description
 Installing OMP_FFMPEG in the docker  and Readd Whisper Test
Download OMP_FFMPEG in restricted accessed Azure blob.
2024-04-22 10:55:56 -07:00
Yulong Wang
a457c1df80
upgrade emsdk to 3.1.57 (#20295)
### Description
upgrade emsdk to 3.1.57
2024-04-19 23:05:18 -07:00
Rachel Guo
f396748ed6
Nuget .NET changes for Mac Catalyst (#19923)
### Description
<!-- Describe your changes. -->

Add Nuget package changes for adding new 'net6.0-maccatalyst' platform.

The output ORT Nuget package was manually tested and verified in a .NET
MAUI app setup.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2024-04-19 14:20:03 -07:00
sfatimar
4d1963c2a2
OpenVINO EP Rel 1.18 Changes (#20337)
### Description
These changes include
Support to OpenVINO 2024.1 
Import PreCompiled Blobs with EPContext Blob 
Separate Device/Precision as input
Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU 
AUTO GPU, CPU will only create GPU Blob and not CPU Blob. 



### Motivation and Context
- OpenVINO 2024.1 will be out soon
- Import Precompiled Blob can greatly reduce FEIL/FIL Time. 
- Separating Device/Precision will make the input cleaner
-

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
2024-04-19 00:31:38 -07:00
Yulong Wang
3577a4bd02
[Node.js binding] Allow installation to download CUDA binaries via script (#20364)
### Description
Currently we try to include all prebuilt binaries into the NPM packages.
This was working until we added libonnxruntime_providers_cuda.so
(>400MB) into the NPM package. The NPM registry refuses to accept new
package publishment because the file is too large.

To make the new NPM package working, we have to remove the large file
from the package, and add a new script on package installation. This
script will try to dynamically install onnxruntime CUDA dynamic library
for Linux/x64.
2024-04-18 13:44:42 -07:00
Yi Zhang
4d2b98155f
More fixes on random connection excepiton in Mac Build. (#20328)
### Description
supplement of #20322 



### Motivation and Context
Fixes random connection exceptions in 
Mac build in Python Packaging Pipeline

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443617&view=logs&j=5849a411-e258-5ce5-39bd-7b65d44961a0&t=ccb871c8-76d9-5e80-55b0-4279efd5567f
and IOS full xcframework

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443458&view=logs&j=370fd1a2-3dec-5916-4d2c-8aae58c72d28&t=686352ba-ee61-5ad4-8739-e8abd07372a4&s=e9aa87c8-a9ad-51f7-3b12-045ecc319776
2024-04-17 08:37:56 +08:00
Yi Zhang
caf692e626
[Fix] Random connection exceptions in MacOS_C_API_Packaging_CPU stage (#20322)
### Description
Add download_deps to reduce downloading from 3rd party websites.


### Motivation and Context
Fix frequent random exception like
```
CMake Error at abseil_cpp-subbuild/abseil_cpp-populate-prefix/src/abseil_cpp-populate-stamp/download-abseil_cpp-populate.cmake:162 (message):
  Each download failed!

    error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20240116.0.zip' failed
          status_code: 35
          status_string: "SSL connect error"
          log:
          --- LOG BEGIN ---
            Trying 20.29.134.23:443...

  Connected to github.com (20.29.134.23) port 443

  ALPN: curl offers h2,http/1.1

  (304) (OUT), TLS handshake, Client hello (1):

  [315 bytes data]

   CAfile: /etc/ssl/cert.pem
   CApath: none

  Recv failure: Operation timed out

  LibreSSL/3.3.6: error:02FFF03C:system library:func(4095):Operation timed
  out

  Closing connection
```

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443278&view=logs&j=006a7a04-d43b-5fe1-df02-ecafb79c4d6e&t=110edd38-9f3b-50cf-b328-8ed0f915e5c1

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-04-16 13:28:18 +08:00
Edward Chen
287ecea2f1
Fix binary size check build publish step. (#20298)
Add `--user` option to pip install command.

Error:
```
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py'
Consider using the `--user` option or check the permissions.
```

See #19877.
2024-04-15 10:15:42 -07:00
liqun Fu
cd7112f800
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md

ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0

#### Updated ops for CPU EP:
- DequantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block dequantization support
- QuantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block quantization support
- Cast(21)
  - Missing int4 and uint4 support
- CastLike(21)
  - Missing int4 and uint4 support
- ConstantOfShape(21)
  - Missing int4 and uint4 support
- Identity(21)
  - Missing int4 and uint4 support
- If(21)
  - Missing int4 and uint4 support
- Loop(21)
  - Missing int4 and uint4 support
- Reshape(21)
  - Missing int4 and uint4 support
- Scan(21)
  - Missing int4 and uint4 support
- Shape(21)
  - Missing int4 and uint4 support
- Size(21)
  - Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support

#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Disabled tests
#### ORT Training

orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops

#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8

#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)

---------

Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 09:46:49 -07:00
Yifan Li
9577fe454d
[EP Perf] Customize onnx-tensorrt commit id when init CI tasks (#20175)
### Description
<!-- Describe your changes. -->
Customize commit id of onnx-tensorrt in EP Perf CI variables when
testing OSS parsers in different versions

### To Verify

![image](https://github.com/microsoft/onnxruntime/assets/109183385/9dc650d8-377d-4223-8951-f0849b1fe984)

After assigning `onnxTensorrtCommitId` in EP Perf CI Variables, 
CI would prompt during the step of **[Build latest ORT Image with
TensorRT OSS
parser](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396450)**:
```
Updated deps.txt with new commit id a43ce67187bab219520fd80f21af8bbd4354bc8c and hash 572535aefef477050f86744dfab1fef840198035
```
And CI would [overwrite the line of onnx_tensorrt in
deps.txt](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396451)
which was assigned as:
```
onnx_tensorrt;a43ce67187.zip;572535aefef477050f86744dfab1fef840198035

```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To save time of modifying deps.txt and manually calculating zip hash
2024-04-10 09:46:05 -07:00
Yi Zhang
0acde1157a
Set parallel count to avoid OOM in training GPU packaging pipeline (#20255)
### Description
make the compilation work on Azure CPU Agent by reduce the parallel
count



### Motivation and Context
The OOM issue mentioned in #20244 was caused the by low
memory/parallel_count.
2024-04-10 14:05:53 +08:00
Yi Zhang
14d7872ce9
Reuse T4 for Cuda12.2 training packaging pipeline. (#20244)
### Description
It always has been out of memory in training CUDA 12.2 packaging
pipeline
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
since the PR #19910
I tried other CPU agents for example, D64as_v5(256G memory) and
D32as_v4(128G memory and 256 G SSD temp storage), which are still out of
memory like the below image

![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083)


But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp
storage, and it takes much more time.

### Motivation and Context
Restore CUDA 12.2 training packaging pipeline first.
More time is needed to investigate the root cause


### Other Clues.
These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4
And it runs out of memory on CPU machine. @ajindal1 
cuda12.2 on T4
```
2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o
2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o

```

But they could be finished in about one minute with Cuda 11.8 on CPU
```
cuda11.8 on CPU
2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

cuda11.8 on GPU
024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
```
2024-04-10 09:21:40 +08:00
Adrian Lizarraga
05d97e8d18
Update QNN python packages to use QNN SDK version 2.19.2 (#20213)
### Description
Update QNN python packages to use QNN SDK version 2.19.2.



### Motivation and Context
Our CI builds already use QNN SDK version 2.19.2. We should make sure
the ort-nightly-qnn python packages are also built with the same QNN SDK
version.
2024-04-05 17:15:25 -07:00
Yi Zhang
23a5d0a305
Extend time out in Windows GPU packaging jobs (#20207)
### Description
Extend Windows GPU Packaging job building time out to 6 hours, and test
stage to 3 hours.



### Motivation and Context
There're still a few timeout issues after refactoring. The probability
is about 20% in
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84.
I found the building could be finished in 4 hours if it becomes slow,
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f,
although in most cases, it only take about 30 minutes.
Not like before, the building couldn't be completed.
So, In this PR, I extend the timeout to 6 hours.

And one interesting thing, if one windows GPU job becomes slow, all
other windows GPU jobs in the same run become slow too.
So I doubt it has something with the ADO or virtualization. That is,
it's not completely random.
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841
2024-04-06 08:03:42 +08:00
Yi Zhang
4ea54b82f9
[Fix] Upload training CUDA daily wheel (#20183)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-03 13:18:26 +08:00
Yi Zhang
523ef04240
enable lto in Python-CUDA-Packaging Pipline (#20164)
### Description
Except [Python-CUDA-Packaging
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary),
all windows cuda packaging jobs have been running well now.
After comparison, enable_lto isn't added in the pipeline, which might be
one root cause of the random hang.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-01 15:42:28 +08:00
Jeff Bloomfield
2f31560430
Enable generic feature level devices in DML EP (#20114)
### Description
Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and
D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX
headers through the DirectX-Headers repo.

Note that this includes an update to cgamanifest.json for onnx-tensorrt
which is triggered during re-generation due to a prior changes to
deps.txt.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-29 14:37:30 -07:00
Adam Pocock
2f82400b13
[java] Java 21 build support (#19876)
### Description
Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to
allow compiling ORT on Java 21. The build still targets Java 8.

I'm not sure if there will be CI changes necessary to use this PR,
specifically for the Gradle version as I don't know if that is cached
somewhere earlier in the CI build process.

The new Gradle version adds a warning that using `--source` and
`--target` to select the Java language version is obsolete which is
annoying, we can fix it if we decide to only allow building on newer
versions of Java, while still supporting running on Java 8.

### Motivation and Context
Java 21 is the latest LTS release of Java and ORT should be able to
build on it.
2024-03-28 15:51:22 -07:00
Yi Zhang
f7b52d2e3e
[Fix] Only copy java files when build_java is True (#20121)
### Description


### Motivation and Context
Fix error in Nuget-CUDA-Packaging-Pipeline
2024-03-28 14:06:28 -07:00
Yi Zhang
c5d7310f1b
Remove TSA upload in testing stage (#20115)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-28 13:15:03 +08:00
Yi Zhang
8f069f81c4
Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080)
### Description
reactor win-ci.yml to solve the random hang issue in more GPU workflows,
move nugget-zip packages and python cuda12 packages building to CPU
machine.

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-03-28 12:55:44 +08:00
Dmitri Smirnov
b95fd4e644
Enable CUDA EP unit testing on Windows (#20039)
### Description
Address build issues and source code discrepancies.
Fix cuda_test_provider gtest argument stack corruption.

### Motivation and Context
`OpTester` class that is widely used for kernel testing is not
suitable for testing internal classes for EPs that are built as shared
objects.
Currently, CUDA EP tests run only on Linux.
We want to enable testing and developments on Windows,
and create a usable pattern for testing of other EPs internals.

Alternatives considered: 
Abstracting EP unit tests into separate test executable such as
`onnxruntime_test_all`.
This alternative was rejected as it would create a lot more changes in
the established patterns,
and potentially interfere with CUDA functionality with more complex
source code maintanence.
2024-03-27 13:32:36 -07:00
Yi Zhang
ab2eaedfaa
Install ONNX by buildling source code in Windows DML stage (#20079)
### Description
In #20073, I use pin onnx version to unblock the whole PR CI.
In fact, we could use the onnx that installed by building source code,
that the onnx version is controlled by deps.txt.
For some history reason, DML stage installed onnx from pypi. Now, the
onnx can be installed as other stages.

add an option to skip installing onnx in win-ci-prebuild-step
2024-03-27 12:29:34 -07:00
Yi Zhang
4df9d16f98
[Fix] TSAUpload task must be in building stage (#20098)
### Description
In #20085, TSAUpload was in testing stage so main branch failed.
2024-03-27 12:20:57 -07:00
Yulong Wang
47903e701a
fix condition in web CI YAML (#20095)
### Description
fix condition in web CI YAML
2024-03-27 10:35:43 -07:00
Yi Zhang
0561b9576e
Fix and Refactor Python Packaging Pipeline (#20085)
### Description
Make Windows GPU Packaging stage in Python Packaging pipeline run on CPU
machine as well



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Test Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430961&view=results
2024-03-27 12:17:22 +08:00
Yulong Wang
0313dd1f65
Update Web CI to use data dir under Agent.TempDirectory (#20074)
### Description
Update Web CI to use data dir under Agent.TempDirectory

This change fixes the random failure caused by unstable access to karma
temp directory (which is under AppData\Local\Temp) on CI pipeline
2024-03-26 13:16:59 -07:00
Baiju Meswani
40efbd6c37
Fix training and macos ci pipelines (#20034) 2024-03-26 12:20:11 -07:00
sfatimar
eab35c20fc
Ort openvino npu 1.17 master (#19966)
### Description
Add NPU to list of device supported. 
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL 
Bug Fixes with Python API 
Reverted Dockerfiles not being maintained. 



### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
2024-03-21 18:44:00 -07:00
Yi Zhang
cd6d3aea45
Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989)
### Description
1. Move building on CPU machine.
2. Optimize the pipeline
3. Since there isn't official ONNX package for python 12, the python 12
test stage uses the packages built with ONNX source in build stage.


### Motivation and Context
1. Resolve the random hang in compilation
4. Save a lot of GPU resources.

---------
2024-03-22 09:16:00 +08:00
Yi Zhang
30a0d80925
Fix exception in Publish unit test results step (#20007)
### Description
Test results files are all in RelWithDebInfo\RelWithDebInfo directory.
It's not necessary to stat the directory of _deps 

### Motivation and Context
Recently this exception in zip-nuget pipleine occurs many times.
`##[error]Error: Failed find: EPERM: operation not permitted, stat
'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'`

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b
2024-03-22 06:53:59 +08:00
Yi Zhang
175f149b30
Remove downloading deps in CUDA package test stage (#19993)
### Description
<!-- Describe your changes. -->



### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
2024-03-21 10:01:03 +08:00
Yufeng Li
15219e2e71
turn on neural_speed by default (#19627)
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-20 12:49:58 -07:00
Rachel Guo
6b305f95e0
Support xcframework for mac catalyst builds. (#19534)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

MAUI on macOS uses mac-catalyst which requires a different native
binary.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-03-20 10:55:19 -07:00
Yi Zhang
8adbc09314
[Fix] Error Python Packaging Pipeline (Training CPU) (#19992)
### Description
fix the error caused by
https://github.com/microsoft/onnxruntime/pull/19973
2024-03-20 09:02:50 -07:00
mindest
3dfe4a5e6d
[ROCm] Remove MPI dependency and collectives to use NCCL (#19830)
### Description
* Remove MPI dependency to use NCCL AllReduce, etc.
* Exclude unsupported collectives in hipify
2024-03-19 17:35:18 -07:00
Hariharan Seshadri
cd6ec50b50
Switch a portion of CI/packaging jobs to MacOS12 (#19908) 2024-03-19 14:54:58 -07:00
Yi Zhang
d4c8bc359e
Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973)
### Description
The docker image name was fixed, but the docker argument was different
in different job.
It would trigger rebuilding the docker image almost every time!!!
2024-03-19 09:33:24 -07:00
Yulong Wang
b29849a287
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```

Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-15 19:01:50 -07:00
Yifan Li
0b2a75b274
[EP Perf] Add concurrency test (#19804)
### Description
<!-- Describe your changes. -->
* Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner)
  * Model: FasterRCNN-10 model within CI image
  * `-c` param configurable via CI panel when kicking off CI tasks
  * Auto-replicate test input/outputs according to `-c` param
* By default, the model test will be executed in 100 iterations (~2min
added to T4 CI task load overall)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To monitor potential concurrency issues of ORT-TRT
2024-03-15 07:41:21 -07:00
Justin Chu
bcf47d3546
Update install_deps_lort.sh to fix onnxscript installation (#19922)
Install onnxscript correctly with `pip install`. Dev dependencies are
not required.

### Motivation and Context

Fix build breaks.
2024-03-14 17:05:50 -07:00
Adam Louly
32558134a9
[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770)
### Description
Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers


### Motivation and Context
This PR includes changes that will make ort handle 2GB+ checkpoints.
To do that we need to upgrade flatbuffers to 23.5.9 -
https://github.com/google/flatbuffers/pull/7945

- Modified the commitHash and the hash for the new version
- Removed the patch for rust generator's unused variable warning as it
is no longer producing this - [Check it out
here](d121e09d89/src/idl_gen_rust.cpp)
- Updated the VerifyField calls with alignment values that were
introduced in the new version.

---------

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2024-03-14 16:36:24 -07:00
Yi Zhang
87a9f77c56
Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910)
### Description
1. Use stage to organize the pipeline and split building and testing
2. Move compilation on CPU machine
3. test stage can leverage existing artifacts
4. check wheel size, it gives warning if the size above 300M
5. docker image name wasn't change even the argument changed, which
caused the docker image was always rebuilt. So update the docker image
name according to the argument can save the docker build time.

Pipeline duration reduced by 60% (2 hours ->  50 minutes)
Compilation time reduced by 75% (1.5hours -> 20 minutes)
GPU time reduced by 87% ( 8 hours to 1 hours)
for debugging, the GPU time could be reduced by above 95%, because we
can choose run only one test stage and skip building.

### Motivation and Context
Make the pipeline efficient.
Optimized

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results
Curent

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results

---------
2024-03-15 06:47:41 +08:00
Changming Sun
8b766bd24e
Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919)
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.

Similar to #19909

### Motivation and Context

As a follow up of #19118
2024-03-14 15:07:56 -07:00
Changming Sun
ea4a5eea18
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build (#19909)
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.


### Motivation and Context

As a follow up of #19118
2024-03-14 07:55:00 -07:00
Yulong Wang
e771a763c3
[js/test] align web test runner flags with ort.env (#19790)
### Description
the `npm test` flags are difficult to memorize, because they are
different to the `ort.env` flags. This change makes those flags align
with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`.

Old flags are marked as deprecated except `-x` (as a shortcut of
`--wasm.numThreads`)
2024-03-13 12:00:36 -07:00
Yi Zhang
d5d9dbd51d
reuse T4 on Linux GPU (#19879)
### Description

### Motivation and Context
Linux GPU test on A10 isn't very stable
2024-03-13 10:41:36 -07:00