Commit graph

1307 commits

Author SHA1 Message Date
Yulong Wang
0578eeff91
upgrade EsrpCodeSigning from v1 to v2 (#14531)
### Description
This change upgrade EsrpCodeSigning from v1 to v2 in our build pipeline.
2023-02-02 13:08:26 +08:00
Yi Zhang
80f807c03d
upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279)
### Description
upgrade protobuf to 3.20.2, same as onnx 1.13.0

### Motivation and Context
Per component governance requirement and Fixes #14060

unused-parameter error occurs in 2 conditions.
1. compile protolbuf

`onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66:
error: unused parameter ‘prototype’ [-Werror=unused-parameter]`
2. include onnx_pb.h
```
2023-01-28T10:20:15.0410853Z FAILED: CMakeFiles/onnxruntime_pybind11_state.dir/onnxruntime_src/onnxruntime/python/onnxruntime_pybind_iobinding.cc.o 
......
2023-01-28T10:20:15.0466024Z                  from /build/Debug/_deps/onnx-src/onnx/onnx_pb.h:51,
2023-01-28T10:20:15.0466958Z                  from /onnxruntime_src/include/onnxruntime/core/framework/to_tensor_proto_element_type.h:10,
....
2023-01-28T10:20:15.0609678Z /build/Debug/_deps/onnx-build/onnx/onnx-operators-ml.pb.h:1178:25:   required from here
2023-01-28T10:20:15.0610895Z /onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66: error: unused parameter ‘prototype’ [-Werror=unused-parameter]
2023-01-28T10:20:15.0611707Z cc1plus: all warnings being treated as errors

```

https://dev.azure.com/onnxruntime/2a773b67-e88b-4c7f-9fc0-87d31fea8ef2/_apis/build/builds/874605/logs/22
2023-01-31 12:55:09 -08:00
cloudhan
3b6d551c35
Enable ccache for HIP objects (#14465)
This enables HIP compiler to be launched with `ccache` when build with `--use_cache`
2023-01-28 22:34:24 +08:00
Vincent Wang
7aecb2150f
Fix onnxruntime-CI-nightly-ort-pipeline Failure (#14464)
PyTorch skipped version 1.14 and jumped to 2.0, while the image for the
onnxruntime-CI-nightly-ort-pipeline is still using
nightly-ubuntu2004-cu116-py38-torch1140dev. Switch to the new torch
version image to fix the failure of the pipeline.
2023-01-28 16:05:56 +08:00
Tianlei Wu
94b1791974
Upgrade CUTLASS to v2.11 and add sequence length threshold for cutlass FMHA (#14401)
### Description
Add sequence length threshold for triggering cutlass FMHA in FP32. See
performance test results in
https://github.com/microsoft/onnxruntime/pull/14343 to see how this
threshold is selected.

Upgrade cutlass to v2.11 and update deps.txt and cgmanifest for nuget
pipeline build (test build:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=268574&view=results)
2023-01-25 09:43:48 -08:00
Edward Chen
7cc9aed314
Android package custom build script update (#14403)
Update Android package custom build script.
- Use later version of various dependencies (CMake, JDK, Android command line tools, Android NDK, Ubuntu). The CMake version was too old for the current ORT code.
- Do in-container build in a directory that is not shared with the host. Resolves some file permission issues and speeds up file access.

Add a nightly build to make sure the script works with the latest ORT.
2023-01-25 09:19:05 -08:00
Yi Zhang
cf3661ff6d
Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… (#14375)
…ckaging_CPU_x86_default (#14332)"

This reverts commit a491f33f54.

### Description


### Motivation and Context
It looks an ADO issue.
Now, it's recovered.
It could be reenabled.
2023-01-21 09:32:39 +08:00
Edward Chen
3b382ea7e1
Free OrtStatus in ASSERT_ORT_STATUS_OK, make run_android_emulator.py work with newer JDK version (#14369)
- Free OrtStatus in ASSERT_ORT_STATUS_OK in model_tests.cc
- Make run_android_emulator.py work with newer JDK version
2023-01-20 09:27:47 -08:00
Yi Zhang
3d6cea14f4
Remove intermedia obj files once build finished (#14361)
### Description
Remove intermedia obj files and reenable cache

### Motivation and Context
Recently, training_debug_x64 pipeline often failed due to not enough
space.
It could free nearly 8G space by deleting obj files.
So, the compilation cache can be reenabled
2023-01-20 13:37:15 +08:00
Edward Chen
ae0e090c7b
Fix post merge jobs pipeline build issues (#14346)
- Fix debug node inputs outputs nullptr dereference with ONNX optional types.
- Fix model test memory leak.
- Convert jobs to stages in post-merge-jobs.yml to allow a subset of builds to be enabled when running manually.
- Fix buffer overrun in CumSum op exposed by Mimalloc build.
2023-01-19 11:16:42 -08:00
Yi Zhang
b51415b0ea
disable cache for training_x64_debug (#14358)
### Description
disable cache to save disk space for training_x64_debug


### Motivation and Context
To mitigate not enough disk space in training_x64_debug first.
2023-01-19 15:08:34 +08:00
Adrian Lizarraga
a491f33f54
Allow PostAnalysis@2 task to continue on error for Windows_Packaging_CPU_x86_default (#14332)
### Description
Allows the PostAnalysis@2 task for windows CI jobs to continue even if
an error is encountered.


### Motivation and Context
This is a temporary workaround that enables the
`Windows_Packaging_CPU_x86_default` job within the Zip-Nuget-Java-NodeJS
packaging pipeline to finish. A recent update to dotnet 6 has broken the
PostAnalysis task for this job.

This task was originally added by
https://github.com/microsoft/onnxruntime/pull/13694
2023-01-18 19:54:48 -08:00
Rui Ren
904e63633a
increase the time limit as more unit tests added (#14327)
### Description
Pipeline failed because we added more unit tests, reference:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863643&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=305229be-e8ba-5189-ca61-fcb77d866478

Now we have: [2430 tests](
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863619&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43853)
Previously we had: [2422
tests](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=859543&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43640)

- Timeout error as we have 2 hour threshold
```
jobs:
- job: Linux_Build
  timeoutInMinutes: 120
  variables:
    skipComponentGovernanceDetection: true
```

### Motivation and Context

- Increase the timeoutInMinutes to `150`
2023-01-18 15:51:21 -08:00
Guenther Schmuelling
60290393f3
enable ort-extensions in wasm release builds (#14239)
enable ort-extensions in wasm release builds. sentence piece, gpt2, bert
and word piece tokenizers for now.

wasm size will grow from 8.4MB to 8.9MB.
2023-01-17 12:39:13 -08:00
Yi Zhang
fb801d58b1
Add Cache in Linux CPU Aten Pipeline (#14313)
### Description
Add compilation cache in Linux CPU Aten Pipeline.
The pipeline could be completed in 6 minutes at best.

### Motivation and Context
1. Accelerate the pipeline.
2. It's the shortest pipeline with docker image. I'll use it to try
moving the storage of linux docker image from ACR to ADO pipeline cache.
2023-01-17 10:49:29 +08:00
Yi Zhang
6d60dc24fe
install shared deps script (#14234)
### Description
Add a new install_shared_deps.sh

### Motivation and Context
Azcopy, Ninja, Node.js and CCache are all needed, but they are copied
everywhere.
2023-01-16 18:27:29 +08:00
Yi Zhang
2a82f95040
Increase package python test pipeline timeout limit (#14288)
### Description
Increase python test pipeline timeout limit. 
So far, It's a known issue for tensortRT8.5.
2023-01-14 13:46:09 +08:00
PeixuanZuo
d3a09cf77f
[ROCm] use pytest-xdist for fast pytest (#14261)
### Description

Use pytest-xdist to distribute tests across multiple CPUs to speed up
test execution.
Use pytest-rerunfailures to rerun failed test in case of pytest-xdist
crash.
`pytest -n 16` can reduce pytest time from 80 minutes to 20 minutes.


### Motivation and Context
Now kernel explorer pytest of ROCm CI takes nearly 1 hour 20 minutes. It
will take longer time when we add more tunableOp in the future.
2023-01-13 16:57:50 +08:00
Scott McKay
b9ecd428c1
Add ability to register custom ops by specifying a function name (#14177)
### Description
<!-- Describe your changes. -->
Use dlsym/GetProcAddress to lookup a custom ops registration function by
name and call it.

This will be better on mobile platforms where the custom ops library is
linked against, and there isn't necessarily a filesystem that a library
path can be loaded from.

Alternative is to wire up passing in the address of the function, but
that has multiple complications which differ by platform.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable using ort and ort-ext packages on mobile platforms.

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-01-12 15:11:34 +10:00
sfatimar
7654cd50e8
Openvino ep 2022.3 v4.3 (#14210)
### Description
Changes to incorporate OpenVINO EP 2022.3


### Motivation and Context
This change is required to incorportate OpenVINO EP 2022.3
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: flexci <mohsinmx>
2023-01-11 16:31:26 -08:00
RandySheriffH
83ad562826
Rename CloudEP to AzureEP (#14175)
Rename CloudEP to AzureEP.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-11 12:25:04 -08:00
Ashwini Khade
d92c663f28
Create dedicated build for training api (#14136)
### Description
Enable creating dedicated build for on device training. With this PR we
can build a lean binary for on device training using flag
--enable_training_apis. This binary includes only the essentials like
training ops, optimizers etc and NOT features like Aten fallback,
strided tensors, gradient builders etc . This binary also removes all
the deprecated components like training::TrainingSession and OrtTrainer
etc

### Motivation and Context
This enables our partners to create a lean binary for on device
training.
2023-01-10 20:58:04 -08:00
PeixuanZuo
33367fa2dc
[MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0 (#14184)
### Description
Update the MIGraphX version used in ORT to rocm-5.4.0

### Motivation and Context
The previous branch migraphx_for_ort has stopped updating, it is too far
away from the MIgraphX latest release branch. More discussion here:
https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2023-01-10 13:40:25 +08:00
Yi Zhang
6463f4383b
make WITHCACHE as an option in MacOS workflow (#14188)
### Description
1. Set the WithCache default value as false in Mac OS CI workflow too.
2. Add date of today in cache key to avoid cache size keep increasing
too.

WithCache, the pipeline duration reduced from 70 more minutes to 10 more
minutes
2023-01-10 10:54:19 +08:00
liqun Fu
1be36913cc
to work with onnx 1.13 rc, implement ver 18 reduce and optioanl ops, … (#13765) 2023-01-09 10:26:16 -08:00
Baiju Meswani
c6ff5bac9d
Update torch in eager mode CI pipeline (#14094) 2023-01-06 11:46:44 -08:00
zhijiang
0ed7277bbe
fix training compilation option (#14151)
fix the pipeline failure for compilation option error
2023-01-06 14:25:03 +08:00
Yi Zhang
2ce7b1c1dc
Enable cache for msbuild (#14085)
### Description
Enable ccache in windows CPU compilation.
The windows compilation in CI could be reduced to 1 more minute at most.

![image](https://user-images.githubusercontent.com/16190118/210294061-86742cf4-65c7-4cc2-9725-e102c3c64abd.png)
2023-01-06 11:19:57 +08:00
Ashwini Khade
e5e3570ac5
fix cg issue (#14112)
### Description
Update torch version to 1.13.1 to fix CG issue:
https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/10666/
2023-01-04 09:07:13 -08:00
Yi Zhang
f864b54393
Use today's cache only (#14120)
### Description
Add date value of today into the cache key.

### Motivation and Context
Microsoft-host agent has only 10GB for build.
To limit cache size, pipeline only use cache generated today.
2023-01-04 17:48:52 +08:00
Baiju Meswani
0ff61f7b97
Update torch to 1.13.1 in CI and packaging pipelines for ort training (#14055) 2023-01-03 20:03:33 -08:00
Ashwini Khade
68b5b2d7d3
Refactor training build options (#13964)
### Description
1. Renames all references of on device training to training apis. This
is to keep the naming general. Nothing really prevents us from using the
same apis on servers\non-edge devices.
2. Update ENABLE_TRAINING option: With this PR when this option is
enabled, training apis and torch interop is also enabled.
3. Refactoring for onnxruntime_ENABLE_TRAINING_TORCH_INTEROP option: 
   -  Removed user facing option
- Setting onnxruntime_ENABLE_TRAINING_TORCH_INTEROP to ON when
onnxruntime_ENABLE_TRAINING is ON as we always build with torch interop.

Once this PR is merged when --enable_training is selected we will do a
"FULL Build" for training (with all the training entry points and
features).
Training entry points include:
1. ORTModule
2. Training APIs

Features include:
1. ATen Fallback
2. All Training OPs includes communication and collectives
3. Strided Tensor Support
4. Python Op (torch interop)
5. ONNXBlock (Front end tools for training artifacts prep when using
trianing apis)

### Motivation and Context
Intention is to simply the options for building training enabled builds.
This is part of the larger work item to create dedicated build for
learning on the edge scenarios with just training apis enabled.
2023-01-03 13:28:16 -08:00
RandySheriffH
587e891cae
CloudEP (#13855)
Implement CloudEP for hybrid inferencing.
The PR introduces zero new API, customers could configure session and
run options to do inferencing with Azure [triton
endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint)
Sample configuration in python be like:

```
sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton');
sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com');
sess_opt.add_session_config_entry('cloud.model_name', 'detection2');
sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1
sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose
...
run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint.
run_opt.add_run_config_entry('cloud.auth_key', '...')
...
sess.run(None, {'input':input_}, run_opt)
```

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-03 10:03:15 -08:00
Baiju Meswani
b85878953f
Fix nightly ort training ci pipeline (#14007) 2022-12-30 12:28:57 -08:00
PeixuanZuo
b5fd2a6a80
[ROCm] Add ROCm5.4 to python package pipeline (#14012)
Add ROCm5.4 to python package pipeline.
The download link of ROCm5.4 nightly build whl is
https://download.onnxruntime.ai/onnxruntime_nightly_rocm54.html
The download linkd of ROCm5.4 nightly build whl with profiling is
https://download.onnxruntime.ai/onnxruntime_nightly_rocm54.profiling.html

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-22 10:01:40 +08:00
PeixuanZuo
ab2dd8dfaf
[ROCm] Update ROCm and MigraphX CI to ROCm5.4 (#14011)
Update ROCm and MigraphX CI to ROCm5.4
Run ortmodule_test with ROCm5.4 and all
passed(https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=824742&view=logs&j=8292f886-7946-5da9-7977-04484c342eda&t=5de68eaa-cbdc-5be5-13d0-bb946f4ddb2d).

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-22 10:01:05 +08:00
Changming Sun
fc2a6db573
Update absl to the latest release (#13990)
### Description
Update absl to a new version

### Motivation and Context
The new version contains fixes that are needed for Nvidia GPU build.
Once we update it to that version, we don't need to maintain our private
patches for Nvidia GPU build.
2022-12-19 14:25:13 -08:00
Yulong Wang
cc0a6213e4
[js] update versions of a few build dependencies (#13977)
### Description
update versions of a few build dependencies for onnxruntime NPM
packages.

update nodejs version to v16.x in linux CI. v12 is too out-of-dated. see
[nodejs release
schedule](https://github.com/nodejs/release#release-schedule)

### Motivation and Context
- upgrade to latest webpack allows using of latest Node.js LTS version.
previous version of webpack does not work on Node.js v18 and it is fixed
in latest version
- upgrade to latest typescript, ts-loader and other dev deps to
accelerate the build and bundling.
- upgrade also helps to resolve security warnings that may be vulnerable
in out-of-dated version
2022-12-16 17:26:54 -08:00
Chi Lo
ba89cae3bd
Update package pipelines to support TRT 8.5 (#13998)
Update following package pipelines to support TRT 8.5 after
https://github.com/microsoft/onnxruntime/pull/13867:

- [Linux Multi GPU TensorRT CI
Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1016&_a=summary)
- [Python packaging
pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)
-
[build-perf-test-binaries](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1130&_a=summary)
-
[Linux-GPU-EP-Perf](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)
2022-12-16 15:01:50 -08:00
Yi Zhang
aa9fbed3d4
Add compilation cache for Linux GPU (#13995)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-16 16:38:12 +08:00
Yi Zhang
7d20d889d1
Use cache for compilation in container (#13960)
### Description
For compilation in container,  ADO Cache task doesn't work directly.
The workaround is to mount the cache directory to the container, and let
CCache in container to read/write cache data.
In short, we just leverage ADO API to download/upload cache data.

The Post-jobs works in stack-mode, So the PostBuildCleanUp Tasks should
be defined first.
Thus, The PostBuildCleanUp would be executed lastly.
Else, Cache Task would fail to upload cache because the Agent Directory
is cleaned.
2022-12-16 07:19:07 +08:00
Changming Sun
a9b1fb032b
FIX: macOS CI pipeline doesn't run tests (#13970)
### Description
Fix a problem: macOS CI pipeline doesn't run tests. It is due a code
refactoring I recently made.

### Motivation and Context
Add the tests back.
2022-12-14 18:39:31 -08:00
Chi Lo
5b492cbae3
[TensorRT EP] support TensorRT 8.5 (#13867)
Integrate TensorRT 8.5

- Update TensorRT EP to support TensorRT 8.5
- Update relevant CI pipelines
- Disable known non-supported ops for TensorRT
- Make timeout configurable.
We observe more than [20
hours](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=256729&view=logs&j=71ce39d8-054f-502a-dcd0-e89fa9931f40)
of running unit tests with TensorRT 8.5 in package pipelines. Because we
can't use placeholder to significantly reduce testing time (c-api
application test will deadlock) in package pipelines, we only run
subsets of model tests and unit tests that are related to TRT (add new
build flag--test_all_timeout and set it to 72000 seconds by package
pipelines). Just to remember, we still run all the tests in TensorRT CI
pipelines to have full test coverage.

- include https://github.com/microsoft/onnxruntime/pull/13918 to fix
onnx-tensorrt compile error.

Co-authored-by: George Wu <jywu@microsoft.com>
2022-12-14 13:06:03 -08:00
Yi Zhang
7894d44d2d
Improve MacOS Cache Code (#13958)
### Description
Update cache key to make cache could be updated.
2022-12-14 20:47:09 +08:00
Edward Chen
b4dd5dda12
Revert "Update protobuf version to 3.18.3 in tools/ci_build/github/linux/docker/scripts/requirements.txt." (#13963)
Reverts microsoft/onnxruntime#13922
2022-12-13 18:15:06 -08:00
Edward Chen
b23395f977
Update protobuf version to 3.18.3 in tools/ci_build/github/linux/docker/scripts/requirements.txt. (#13922)
### Description
<!-- Describe your changes. -->

Update protobuf version to 3.18.3 in
tools/ci_build/github/linux/docker/scripts/requirements.txt.

### Motivation and Context

Address component governance alert CVE-2022-1941
2022-12-12 12:38:27 -08:00
Yi Zhang
2cb12caf93
Output cache stats (#13937)
### Description
Output cache stats
2022-12-12 15:22:13 +08:00
Changming Sun
89812a623e
Add two daily build jobs to validate some extra build configs (#13921)
### Description
Add two daily build jobs to validate some extra build configs


### Motivation and Context

To catch issues like: #13893
2022-12-10 09:15:14 -08:00
Adrian Lizarraga
db9c677b63
[EP Perf Dashboard] Add TensorRT 8.5.1.1 dockerfile (#13843)
### Description
- Adds a dockerfile for Ubuntu with TensorRT 8.5.1.1.
- Adds option to run EP Perf pipeline with TensorRT 8.5

### Motivation and Context
Necessary to benchmark models with TensorRT 8.5
2022-12-09 14:33:52 -08:00
shalvamist
d22be84add
Pin packaging to version 21.3 to address training pipeline failures 2022-12-09 09:05:55 -08:00