Commit graph

1537 commits

Author SHA1 Message Date
Yi Zhang
bd95a8ea77
update onnxruntime-gpu-winbuild-T4 to onnxruntime-Win2022-GPU-T4 (#16838)
### Description


### Motivation and Context
It's also used to upgrade visual studio to VS2022.
onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-tensorrt8-winbuild-t4
are using the image based on one dev branch and VS2019

To avoid breaking the current CIs, we move jobs running on
onnxruntime-gpu-winbuild-T4/onnxruntime-gpu-tensorrt8-winbuild-t4 to
onnxruntime-Win2022-GPU-T4.
2023-07-27 08:38:20 -07:00
Wang, Mengni
fe463d4957
Support SmoothQuant for ORT static quantization (#16288)
### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
2023-07-26 18:56:45 -07:00
Justin Chu
0c1a5098dc
Disable PERF* rules in ruff to allow better readability (#16834)
### Description

Disable two PERF* rules in ruff to allow better readability. Rational
commented inline. This change also removes the unused noqa directives
because of the rule change.

### Motivation and Context

Readability
2023-07-25 15:38:22 -07:00
Edward Chen
e01365f80b
Update upload_pod_archive_and_update_podspec.sh to take path pattern (#16810)
Update upload_pod_archive_and_update_podspec.sh to take a pod archive path glob pattern. The actual pod archive path has a version suffix which changes.
2023-07-25 08:55:31 -07:00
Yi Zhang
38db5eca65
replace onnxruntime-Win-CPU-2019 with onnxruntime-Win-CPU-2022 (#16844)
### Description
<!-- Describe your changes. -->



### Motivation and Context
upgrade to VS2022
2023-07-25 23:05:34 +08:00
Yi Zhang
f88f0d8e36
Upgrade 4 stages in nuget pipeline to VS2022 (#16825)
### Description


### Motivation and Context
Continue upgrading to VS2022

### Verfication

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=331377&view=results

N.B.
In practice, SDLNativeRules@3 doesn't support VS2019.
2023-07-25 14:22:39 +08:00
PeixuanZuo
8ede2f139e
[ROCm] Optimize ROCm CI pipeline 2 (#16691)
- Set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to replace numpy with cupy on
kernel explorer test.

KERNEL_EXPLORER_TEST_USE_CUPY=0 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/91724b78-0b4e-4cbd-ad88-83cad9976472)

KERNEL_EXPLORER_TEST_USE_CUPY=1 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/58239911-667c-4d5f-bb78-deca60d0266f)


- Use `Bash@3`.
- Update shell script.
2023-07-24 13:57:48 +08:00
Yi Zhang
3252ff2cb7
Change DML GPU pool in Windows GPU workflow use Visual Studio 2022 (#16784)
### Description
1. use the pool with VS2022
2. upgrade System.Memory to 4.5.5


### Motivation and Context
Solve the build error while using VS2022:
`[Failure] Msbuild failed when processing the file
'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj'
with message: Method not found: 'System.ReadOnlySpan`1<Char>
Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'`

Ref:
https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros
2023-07-23 10:07:21 +08:00
Justin Chu
d79515041c
[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789

Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-07-21 12:53:41 -07:00
Baiju Meswani
538d2412ef
Objective-C Add Support to Create and Query String ORTValues (#16764)
This pull request contains a few changes:

1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
2023-07-20 17:39:29 -07:00
Adrian Lizarraga
a8c263f92c
[QNN EP] Update QNN SDK to 2.12 (#16750)
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.

### Motivation and Context
Test with the latest QNN SDK.
2023-07-20 16:22:14 -07:00
Yi Zhang
c314d7724f
Update dml gpu pool to onnxruntime-Win2022-GPU-dml-A10 (#16765)
### Description
onnxruntime-Win2022-GPU-dml-A10 is using VS2022.



### Motivation and Context
1. Upgrade VS2019 to VS2022 to fix prefast issue.
2023-07-20 16:52:13 +08:00
Edward Chen
fc1f463ff1
[ios] Enable training package in packaging pipeline (#16683)
Build iOS training package in packaging pipeline.
Refactor iOS packaging pipeline to build different package variants in parallel.
2023-07-19 19:55:00 -07:00
saurabh
24566058b3
ovep dockerfile and wheel docs changes (#16482)
### Description
This PR is includes changes in the documentation of _readmeOV.rst_ file
and also the changes in the dockerfile which enables to build ORT with
latest OpenVINO 2023.0.0



### Motivation and Context
Modified the dockerfile to incorporate the latest version of OpenVINO
(2023.0.0) for building Onnxruntime.
The changes in the PR aim to improve the overall user experience by
providing accurate and up-to-date documentation while leveraging latest
OpenVINO 2023.0.0
2023-07-19 09:01:09 -07:00
Edward Chen
df8843c4a7
Upgrade old Python version in packaging pipeline (#16667)
- Upgrade from Python 3.6 to 3.8 in packaging pipeline.
- Raise build.py minimum required Python version.
2023-07-17 08:24:47 -07:00
Adrian Lizarraga
19169afe30
[QNN EP] Add option to skip unit tests in the QNN NuGet packaging pipeline (#16164)
Add option to skip unit tests in the QNN NuGet packaging pipeline.
2023-07-14 10:52:05 -07:00
Yi Zhang
36b121d8c2
add more check to Web CI on cache restore (#16689)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Make sure the data is correct.
2023-07-14 10:00:13 +08:00
Scott McKay
a3fc04ba74
Fix CodeCoverage pipeline (#16684)
### Description
<!-- Describe your changes. -->
Delete second reference to onnxruntime_api_tests_without_env in the code
coverage commands. One was removed in #16373 and the duplicate wasn't
noticed.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
2023-07-14 07:47:04 +10:00
PeixuanZuo
ebc311365b
[ROCm] Optimize ROCm CI to reduce time (#16620)
This PR mainly optimize ROCm CI test to reduce time and CPU utilization.

- use smaller batch size on strided_batched_gemm/batched_gemm test
- disable cpu training test
- fix test_e2e_padding_elimination Occasional failures on ROCm.
2023-07-13 10:58:03 +08:00
Yi Zhang
f3b40abe29
Use pipeline cache to cache onnx node test data. (#16659)
### Description
Use pipeline cache instead of reading data from the image.


### Motivation and Context
1. To reduce the browser dependency of custom image.
2. The onnx node test data is less than 30M and the cache download time
is very short.
2023-07-13 09:26:27 +08:00
PeixuanZuo
596dbe277e
[ROCm] add upgrade to fix security issue (#16668) 2023-07-12 17:57:18 +08:00
Scott McKay
ce68a4c06a
Fix Linux build failure when onnxruntime_DISABLE_ABSEIL=ON (#16373)
### Description
<!-- Describe your changes. -->
Add ort_value.h to session_options.h so OrtValue is defined. 

Update a unit test binary to add required include paths. Adding
ort_value.h pulls in more data type headers.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#16193
2023-07-12 11:23:18 +10:00
Yulong Wang
5b6c1394cb
[js/test] CI: use pre-downloaded testdata in image (#16562)
### Description
update web CI to use pre-downloaded testdata in image
2023-07-10 22:22:14 -07:00
PeixuanZuo
2fd5e1cc39
[ROCm] fix shell bug (#16641)
`set -ex` with `grep` will exit when grep doesn't meet any string.
2023-07-10 17:31:27 +08:00
PeixuanZuo
cb4bf4f5c8
[ROCm] Move ROCm build step on CPU only machine (#16596)
- Move ROCm build step on CPU only machine
- Add the performance data of the huggingface bert-large model on the
MI200
- At the beginning of the test step, check the agent's GPU usage and
kill the threads occupying the GPU, which may be left over from previous
tasks that exited abnormally.
- Use different docker images during the build and test steps. The
difference is the `uid` and `user` when build docker image and create
docker container.
2023-07-10 11:55:10 +08:00
Edward Chen
e22b0836e7
[objc] Update docs and fix static analysis build (#16617)
- Update some documentation comments.
- Use onnxruntime_training.h as the umbrella header so training API docs are included in generated docs.
- Fix static analysis build.
2023-07-07 07:58:54 -07:00
Yi Zhang
fed08e070a
Add compiler cache in linux wasm build (#16579)
### Description
Add compiler cache in wasm build to accelerate web ci

### Motivation and Context
It could reduce the pipeline duration by 30 minutes.
web ci could be completed in 2 hours with cache.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1053219&view=results
2023-07-06 06:58:48 +08:00
Vrajang Parikh
fd8ad9b950
Enable iOS packaging for training (#16525)
### Description
Enable support for building iOS packages/CocoaPods with training API

- Add `Training` Package variant and config files in current iOS
packaging utilities to enable creation of training packages

### Motivation and Context
This PR introduces new `Training` variant in
`build_and_assemble_ios_pods.py` script which allows creating pods for
iOS with training API enabled.

The sample script to build training pods:

```
python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py --variant Training \
--build-settings-file  tools/ci_build/github/apple/default_full_ios_training_framework_build_settings.json \ 
-b=-- path_to_protoc_exe=<path/to/protoc>
``` 

Note: build settings file should have `--enable_training` as a build
parameter.


Simply adding training packaging increases the duration of the Azure
pipeline for packaging by 70 minutes. To address this issue, we need to
parallelize pod creation. In order not to further strain the pipeline,
the changes for training packaging will be added in another PR, which
optimizes the packaging pipeline.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-07-05 13:27:59 -07:00
PeixuanZuo
e2526714e2
[ROCm] Move MIGraphX build step on CPU only machine (#16582)
- Move MIGraphX build step on CPU only machine
- Use ccache on build step
- Not pass host uid into docker build process.
2023-07-05 13:55:28 +08:00
Wei-Sheng Chin
a0a5f57581
[DORT] Use new FX-to-ONNX exporter (#16450)
The ONNX exporter in DORT have been moved to PyTorch as a formal
feature. We therefore switch to consume the exporter from PyTorch
instead of maintaining two duplicates.
2023-07-04 13:13:04 -07:00
PeixuanZuo
d540c7da0f
[ROCm] Add ROCm5.6 to python package pipeline (#16572)
Add ROCm5.6 to python package pipeline.
2023-07-04 18:18:12 +08:00
pengwa
ac100ebb64
Fix orttraining-ortmodule-distributed CI (#16569)
### Fix orttraining-ortmodule-distributed CI

https://pypi.org/project/pydantic/#history released version 2.0 1st
July, Deepspeed has known issue on newer version of it
(https://github.com/microsoft/DeepSpeed/issues/3280). So fix this by add
similar check as DS did in
https://github.com/microsoft/DeepSpeed/pull/3290
2023-07-03 13:18:59 +08:00
Scott McKay
2fd25de360
Use verbose logging in Android emulator in React Native CI (#16528)
### Description
<!-- Describe your changes. -->
Set emulator logging to verbose to see if it helps with intermittent
React Native CI failures when emulator crashes at startup


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-30 11:51:20 +10:00
Yi Zhang
fb7e1f133f
[Fix] TSA Upload failed in nuget pipeline. (#16476)
### Description
partially revert PR  #16244.


### Motivation and Context
npm pipeline couldn't triggered if nuget pipeline status is warning.


### Test Run

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=321873&view=logs&s=b17bed5b-cc14-5026-390a-fb2feea063f2
2023-06-28 06:40:49 +08:00
Rachel Guo
892b1b19ea
[js/rn] limit x86_64 arch in detox xcodebuild for react native e2e test (#16460)
### Description
<!-- Describe your changes. -->

As title.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Works with local onnxruntime-c pod in js/rn/e2e test.
2023-06-27 09:45:04 -07:00
guyang3532
4768ac5f30
Fix onnxruntime-CI-nightly-ort-pipeline Failure (#16495)
The image for the onnxruntime-CI-nightly-ort-pipeline is too old. 
The ort package in the image is older than latest test code in nightly
ci. This causes the nightly ci failed.
2023-06-27 23:19:23 +08:00
Yi Zhang
6e9541046e
extend react native ci timeout limit (#16469)
### Description
<!-- Describe your changes. -->

### Motivation and Context
2 consecutive runs in npm pipeline failed due to time out
2023-06-27 08:44:03 +08:00
Yifan Li
e2c214d81f
[TensorRT EP] TRT 8.6 minor version update (#16475)
### Description
* Minor version update: TRT 8.6.0.12->8.6.1.6
  * CI pipeline ymls/dockerfiles are updated
* cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt
binaries uploaded to [win img
307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results)
* Re-enable unit tests which were failed in 8.6.0 and re-gained support
in 8.6.1
2023-06-26 10:44:27 -07:00
PeixuanZuo
7e211f0e03
[ROCm] Move mount data step into docker container (#16471)
Some CI jobs may interrupted unexpectedly and didn't execute umount data
step. The data left in host device will cause `device or resource busy`
and make subsequent CI jobs fail.

Move the mount data step into docker container, the host machine will
not be occupied when CI jobs exit incorrectly.
2023-06-26 10:25:06 +08:00
Rachel Guo
04dbdc96bf
[js/rn] Fix React Native CI pipeline E2E test (#16447)
### Description
<!-- Describe your changes. -->

Based on this kindly provided quick fix:
https://github.com/microsoft/onnxruntime/pull/16411

See more description in the above linked pr about bumping AGP version,
etc.

Also fixed import header file path in detox e2e test.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Good build:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18
2023-06-22 14:33:49 -07:00
Yi Zhang
8e8840f1de
Enable Web CI on Linux (#16419)
### Description
1. Enable Web ci on Linux

### Motivation and Context
1. speed up web ci, the duration can be reduced from 160 minutes to 130
minutes, a time saving of 20% could be be achieved.
The total computation time is 455 minutes now. Moved to Linux, it could
be reduced to 336 minutes.
2. It's the first step to enable compilation cache for emscripten
3. per Yulong's request, build_web stages are still using windows pool


![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17)


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results
2023-06-22 15:42:58 +08:00
yf711
0ad0d6ebbf
Unblock Linux MultiGPU TensorRT CI (#16446)
### Description
Revert docker base image to
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e



### Motivation and Context
The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has
minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with
Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider
tests (these three tests have higher error which are above the tolerance)

That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor
that make maxwell GPU generator higher error. CIs with T4 GPU are not
affected.
2023-06-21 17:15:39 -07:00
Rachel Guo
961fa7274a
[NNAPI doc] add reducemean to supported op list (#16414)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-21 00:29:20 -07:00
Rachel Guo
b4b126ffb0
Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431)
### Description
<!-- Describe your changes. -->
Set onnxruntime-c local pod path environment variable for react native
e2e tests on react-native-ci.yml


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Previously the E2E test project is not properly consuming a local built
onnxruntime-c version pod.


https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816
2023-06-21 00:27:36 -07:00
PeixuanZuo
470d6c1cce
[ROCm] Delete unused file to fix Component Governance Alert (#16407)
Delete unused file to fix Component Governance Alert
2023-06-19 11:28:32 -07:00
PeixuanZuo
1418d8728c
[ROCm] Fix CI Pipeline (#16409)
1. add `set -ex` before commands.
2. update ccache.
2023-06-19 15:22:13 +08:00
Yi Zhang
8b9eab093b
keep symlinks in maven package (#16376)
### Description
1. Keep symlink in the package.
2. keep the artifact package format

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-19 09:41:39 +08:00
Changming Sun
188d5f5398
Fix Linux Multi GPU build pipeline (#16368)
### Description
The build pipeline runs on Azure NV12 machines that will be deprecated
soon because the SKU is too old. So this PR will move the pipeline to a
Windows machine with two A10 GPUs.
2023-06-15 16:24:46 -07:00
Yi Zhang
3e99e43a1d
extend Final AAR testing timeout limit (#16340)
### Description
<!-- Describe your changes. -->



### Motivation and Context
improve nuget pipeline stability
2023-06-15 17:27:45 +08:00
PeixuanZuo
097346be9d
[ROCm] Add clean step for ROCm CI pipeline (#16336)
1. Add clean step for ROCm CI pipeline
2. Fix error "device or resource busy" bug by setting umount dataset
step as `always()` step.
2023-06-15 13:44:12 +08:00