Commit graph

2015 commits

Author SHA1 Message Date
Xavier Dupré
47a0289ee6
[CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530)
### Description
Windows GPU Reduced Ops CI Pipeline is broken due to the introduction of
a second template type in registered kernels. The python code checking
the registration is broken due to that. This PR addresses this issue on
the python side by keeping only one type equal to the concatenation of
the two types.
2023-07-07 18:21:06 +02:00
Edward Chen
6be7b03e53
Enable -Wshorten-64-to-32 warning if available. (#16524)
- Fix some warnings from Xcode build (`-Wshorten-64-to-32`).
- Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet.
- Some clean up in build.py including setting CMake generator more consistently.
2023-07-07 08:11:44 -07:00
Edward Chen
e22b0836e7
[objc] Update docs and fix static analysis build (#16617)
- Update some documentation comments.
- Use onnxruntime_training.h as the umbrella header so training API docs are included in generated docs.
- Fix static analysis build.
2023-07-07 07:58:54 -07:00
Scott McKay
697dd12f6e
Re-organize the transpose optimization and layout transformation files. (#16246)
### Description
<!-- Describe your changes. -->
Split out the more basic changes from #15552 for easier review.

Re-organize to clarify the structure
- Separate out generic base functionality from ORT specific components
  - pass in handlers for internal ORT ops to Optimize
- Split out layout transformation from transpose optimization
- Separate out level 1 transpose optimizer
- Cleanup some naming to try and clarify things like an optimizer vs.
general optimization code

Most of the changes are from this movement of code.

Two implementation changes:
- the extended handlers are queried first in GetHandler
- allows the extended handlers to override the default behaviour for an
ONNX operator
- simplify the Optimize function to remove OptimizerMode. 
- `can_modify_node` is used instead of `mode` and
`ignore_assigned_nodes` and a long description of the current usage is
added. I don't _think_ that changes the current behavior and hopefully
clarifies what happens and when, and makes the base transpose optimizer
implementation more generic.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Create a cleaner separation to support adding EP specific logic next to
cleanly handle where an EP has additional layout sensitive behaviour
required (e.g. it's Resize implementation only handles one layout).
2023-07-07 08:24:47 +10:00
Yi Zhang
fed08e070a
Add compiler cache in linux wasm build (#16579)
### Description
Add compiler cache in wasm build to accelerate web ci

### Motivation and Context
It could reduce the pipeline duration by 30 minutes.
web ci could be completed in 2 hours with cache.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1053219&view=results
2023-07-06 06:58:48 +08:00
Vrajang Parikh
fd8ad9b950
Enable iOS packaging for training (#16525)
### Description
Enable support for building iOS packages/CocoaPods with training API

- Add `Training` Package variant and config files in current iOS
packaging utilities to enable creation of training packages

### Motivation and Context
This PR introduces new `Training` variant in
`build_and_assemble_ios_pods.py` script which allows creating pods for
iOS with training API enabled.

The sample script to build training pods:

```
python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py --variant Training \
--build-settings-file  tools/ci_build/github/apple/default_full_ios_training_framework_build_settings.json \ 
-b=-- path_to_protoc_exe=<path/to/protoc>
``` 

Note: build settings file should have `--enable_training` as a build
parameter.


Simply adding training packaging increases the duration of the Azure
pipeline for packaging by 70 minutes. To address this issue, we need to
parallelize pod creation. In order not to further strain the pipeline,
the changes for training packaging will be added in another PR, which
optimizes the packaging pipeline.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-07-05 13:27:59 -07:00
PeixuanZuo
e2526714e2
[ROCm] Move MIGraphX build step on CPU only machine (#16582)
- Move MIGraphX build step on CPU only machine
- Use ccache on build step
- Not pass host uid into docker build process.
2023-07-05 13:55:28 +08:00
Wei-Sheng Chin
a0a5f57581
[DORT] Use new FX-to-ONNX exporter (#16450)
The ONNX exporter in DORT have been moved to PyTorch as a formal
feature. We therefore switch to consume the exporter from PyTorch
instead of maintaining two duplicates.
2023-07-04 13:13:04 -07:00
PeixuanZuo
d540c7da0f
[ROCm] Add ROCm5.6 to python package pipeline (#16572)
Add ROCm5.6 to python package pipeline.
2023-07-04 18:18:12 +08:00
pengwa
ac100ebb64
Fix orttraining-ortmodule-distributed CI (#16569)
### Fix orttraining-ortmodule-distributed CI

https://pypi.org/project/pydantic/#history released version 2.0 1st
July, Deepspeed has known issue on newer version of it
(https://github.com/microsoft/DeepSpeed/issues/3280). So fix this by add
similar check as DS did in
https://github.com/microsoft/DeepSpeed/pull/3290
2023-07-03 13:18:59 +08:00
Scott McKay
2fd25de360
Use verbose logging in Android emulator in React Native CI (#16528)
### Description
<!-- Describe your changes. -->
Set emulator logging to verbose to see if it helps with intermittent
React Native CI failures when emulator crashes at startup


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-30 11:51:20 +10:00
Yi Zhang
fb7e1f133f
[Fix] TSA Upload failed in nuget pipeline. (#16476)
### Description
partially revert PR  #16244.


### Motivation and Context
npm pipeline couldn't triggered if nuget pipeline status is warning.


### Test Run

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=321873&view=logs&s=b17bed5b-cc14-5026-390a-fb2feea063f2
2023-06-28 06:40:49 +08:00
Rachel Guo
892b1b19ea
[js/rn] limit x86_64 arch in detox xcodebuild for react native e2e test (#16460)
### Description
<!-- Describe your changes. -->

As title.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Works with local onnxruntime-c pod in js/rn/e2e test.
2023-06-27 09:45:04 -07:00
guyang3532
4768ac5f30
Fix onnxruntime-CI-nightly-ort-pipeline Failure (#16495)
The image for the onnxruntime-CI-nightly-ort-pipeline is too old. 
The ort package in the image is older than latest test code in nightly
ci. This causes the nightly ci failed.
2023-06-27 23:19:23 +08:00
Yi Zhang
6e9541046e
extend react native ci timeout limit (#16469)
### Description
<!-- Describe your changes. -->

### Motivation and Context
2 consecutive runs in npm pipeline failed due to time out
2023-06-27 08:44:03 +08:00
Yifan Li
e2c214d81f
[TensorRT EP] TRT 8.6 minor version update (#16475)
### Description
* Minor version update: TRT 8.6.0.12->8.6.1.6
  * CI pipeline ymls/dockerfiles are updated
* cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt
binaries uploaded to [win img
307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results)
* Re-enable unit tests which were failed in 8.6.0 and re-gained support
in 8.6.1
2023-06-26 10:44:27 -07:00
PeixuanZuo
7e211f0e03
[ROCm] Move mount data step into docker container (#16471)
Some CI jobs may interrupted unexpectedly and didn't execute umount data
step. The data left in host device will cause `device or resource busy`
and make subsequent CI jobs fail.

Move the mount data step into docker container, the host machine will
not be occupied when CI jobs exit incorrectly.
2023-06-26 10:25:06 +08:00
Rachel Guo
04dbdc96bf
[js/rn] Fix React Native CI pipeline E2E test (#16447)
### Description
<!-- Describe your changes. -->

Based on this kindly provided quick fix:
https://github.com/microsoft/onnxruntime/pull/16411

See more description in the above linked pr about bumping AGP version,
etc.

Also fixed import header file path in detox e2e test.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Good build:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18
2023-06-22 14:33:49 -07:00
Yi Zhang
8e8840f1de
Enable Web CI on Linux (#16419)
### Description
1. Enable Web ci on Linux

### Motivation and Context
1. speed up web ci, the duration can be reduced from 160 minutes to 130
minutes, a time saving of 20% could be be achieved.
The total computation time is 455 minutes now. Moved to Linux, it could
be reduced to 336 minutes.
2. It's the first step to enable compilation cache for emscripten
3. per Yulong's request, build_web stages are still using windows pool


![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17)


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results
2023-06-22 15:42:58 +08:00
Baiju Meswani
42489a8a24
Add ability to create ort format models from training offline utility (#16360) 2023-06-21 18:51:43 -07:00
yf711
0ad0d6ebbf
Unblock Linux MultiGPU TensorRT CI (#16446)
### Description
Revert docker base image to
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e



### Motivation and Context
The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has
minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with
Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider
tests (these three tests have higher error which are above the tolerance)

That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor
that make maxwell GPU generator higher error. CIs with T4 GPU are not
affected.
2023-06-21 17:15:39 -07:00
Rachel Guo
961fa7274a
[NNAPI doc] add reducemean to supported op list (#16414)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-21 00:29:20 -07:00
Rachel Guo
b4b126ffb0
Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431)
### Description
<!-- Describe your changes. -->
Set onnxruntime-c local pod path environment variable for react native
e2e tests on react-native-ci.yml


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Previously the E2E test project is not properly consuming a local built
onnxruntime-c version pod.


https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816
2023-06-21 00:27:36 -07:00
PeixuanZuo
470d6c1cce
[ROCm] Delete unused file to fix Component Governance Alert (#16407)
Delete unused file to fix Component Governance Alert
2023-06-19 11:28:32 -07:00
PeixuanZuo
1418d8728c
[ROCm] Fix CI Pipeline (#16409)
1. add `set -ex` before commands.
2. update ccache.
2023-06-19 15:22:13 +08:00
Yi Zhang
8b9eab093b
keep symlinks in maven package (#16376)
### Description
1. Keep symlink in the package.
2. keep the artifact package format

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-19 09:41:39 +08:00
saurabh
a6ce7b339f
Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147)
### Description
This PR enables execution of subgraphs in OVEP and currently, when OVEP
developers install the onnxruntime-openvino package on windows from
pypi, they would have to additionally download OpenVINO windows binaries
and run the setupvars.bat script which sets the environment PATH to
locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer
sample.



### Motivation and Context
Fix: We want to make the user experience easy for OVEP Python developers
on windows platform.
This fix, introduces a function add_openvino_libs_to_path at the
location tools/python/util/add_openvino_win_libs.py.
The above function, can be called by OVEP python users in the
application code and that takes care of setting
the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino)
which was installed.
This change also makes sure that add_openvino_libs_to_path() function is
added to onnxruntime python package
only when it is build for OpenVINO Execution Provider for ONNXRuntime
and not for default ORT python package builds.

New user experience for Python OVEP developers on windows platform:
step 1: pip install onnxruntime-openvino
step 2: pip install openvino
step 3: <Add these 2 lines in the application code>
import onnxruntime.tools.add_openvino_win_libs as utils
utils.add_openvino_libs_to_path()

---------

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-06-16 19:47:09 -07:00
dependabot[bot]
dd660c054e
Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331) 2023-06-16 13:08:46 -07:00
Changming Sun
188d5f5398
Fix Linux Multi GPU build pipeline (#16368)
### Description
The build pipeline runs on Azure NV12 machines that will be deprecated
soon because the SKU is too old. So this PR will move the pipeline to a
Windows machine with two A10 GPUs.
2023-06-15 16:24:46 -07:00
Yi Zhang
3e99e43a1d
extend Final AAR testing timeout limit (#16340)
### Description
<!-- Describe your changes. -->



### Motivation and Context
improve nuget pipeline stability
2023-06-15 17:27:45 +08:00
PeixuanZuo
097346be9d
[ROCm] Add clean step for ROCm CI pipeline (#16336)
1. Add clean step for ROCm CI pipeline
2. Fix error "device or resource busy" bug by setting umount dataset
step as `always()` step.
2023-06-15 13:44:12 +08:00
Baiju Meswani
5eec24837f
Fix for AMD GPU pipeline (#16357) 2023-06-14 20:36:16 -07:00
Changming Sun
dbc7a195b1
Update win-ci-pipeline.yml: enable xnnpack tests (#16244)
1. Enable xnnpack test
2. Change TSA database name from onnxruntime_master to onnxruntime_main.
This is a leftover of renaming the "master" branch to "main"
3. Add two static analysis jobs for WinML and DML
4. Rename the machine pool "aiinfra-dml-winbuild" to
"onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO
instances use the same machine pool name.
5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4"
to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have
enough T4 GPUs.
2023-06-14 19:12:42 -07:00
Baiju Meswani
8a3de16d14
Temporary fix to make the training pipeline green (#16353) 2023-06-14 13:11:35 -07:00
Edward Chen
4f23577cb5
[React Native] Publish E2E test logs on build failure too. (#16327)
### Description
<!-- Describe your changes. -->

Publish E2E test logs on build failure too.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Get more information about intermittent test failures.
2023-06-12 17:56:46 -07:00
JiCheng
eed02a3f78
Xnnpack QDQ test (#16281)
### Description
A few QDQ tests failed on XNNPACK EP.

The reason should be the range of input_data doesn't fit for scale and
zero_point.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-12 14:00:42 +08:00
Yulong Wang
f274bbb0c8
[js] add API that allows to get package version (#16207)
### Description

Add an API for users to get version of current package. example usage:

```js
import { env } from 'onnxruntime-node';

console.log(env.versions.node);  // output "1.16.0"
```

```js
import { env } from 'onnxruntime-web';

console.log(env.versions.web);  // output "1.16.0"
console.log(env.versions.common);  // output "1.16.0"
console.log(env.versions.node);  // output "undefined"
```

#16156
2023-06-09 16:18:53 -07:00
Yi Zhang
3b5a8352c1
CodeSign Mac packages in nuget pipeline (#16291)
### Description
1. Updated Mac package workflow for easily debugging.
2. Changed Archive type from tgz to zip since zip is supported by ESRP.
3. .../dylib.dSYM/Contents/Resources/DWARF/libonnxruntime.1.16.0.dylib
is a debug symbol file, so it couldn't be signed.

### Motivation and Context
It‘s required from VS code.
Mac binaries in nuget should be signed
2023-06-10 06:35:47 +08:00
Edward Chen
b668a6da96
Treat Objective-C static analysis warnings as errors (#16293)
- Update Objective-C static analysis check to fail on warnings.
- Address warning.
- Clean up build definition.
2023-06-09 08:51:49 -07:00
Vrajang Parikh
67f4a4fd16
Objective-C binding for ORT training (#16127)
### Description
Implement Objective-C binding for `ORTCheckPoint`. Additionally, 
- Modify `onnxruntime_objectivec.cmake` to only include training header
and sources when training flag is enabled
- Enable objective-c binding for `orttraining-mac-ci-pipeline`

### Motivation and Context
This PR is part of implementing Objective-C bindings for training API.
It implements objective-c binding for ORTCheckPoint class. The
objective-C API closely resembles the C++ API.

**Note**: The test for saving checkpoint is skipped as it requires use
of training session. It will be added when the objective-c binding for
`ORTTrainingSession` is added.
2023-06-07 14:01:30 -07:00
Edward Chen
1261d0b8ba
Fix some build issues on MacOS with Xcode 14.3. (#15878)
- Fix flatbuffers flatc warning, unused-but-set-variable.
- Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code).
- Update CI builds to use Xcode 14.3.
- Update minimum iOS version to 12.0.
- Update Mac hosted agents to MacOS 13 where possible.
2023-06-07 12:07:11 -07:00
PeixuanZuo
a95f8ae53c
[ROCm] Update ROCm/MIGraphX CI pipeline (#16215)
MIGraphX CI

- Change docker container user name to `onnxruntimedev`

ROCm CI

- Build docker image every job instead of using prebuild image.
- Every job create a container with only one GPU with command `docker
run -it --device=/dev/kfd --device=/dev/dri/renderDxxx`
- Remove tests that are unstable or use outdated interfaces.
- Enable training ortmodule test.
2023-06-05 10:28:10 +08:00
Changming Sun
6b5b79872b
Avoid taking dependency on dl.fedoraproject.org (#16202)
### Description
1. Avoid taking dependency on dl.fedoraproject.org
The website is not very stable. Our build pipelines often fail to fetch
packages from there.

2. Update manylinux to the latest version
2023-06-02 07:41:46 -07:00
Changming Sun
5bfa1183d1
Add a Memory Profiling build job in post merge pipeline (#16172)
### Description
1. Add a Memory Profiling build job
2. Remove no absl build job since the feature will be removed
3. Simplify post-merge-jobs.yml by unifying the pool names

### Motivation and Context
To catch build errors in #16124
2023-06-01 13:00:44 -07:00
Yi Zhang
e0199cfbd9
extend mac packaging timeout limit (#16173)
### Description

### Motivation and Context
MacOS_py_wheels are often failed due to timeout
2023-05-31 18:31:28 +08:00
Baiju Meswani
7edc4b105d
Copy missing training header files to the package archive (#16119) 2023-05-30 16:45:40 -07:00
Sunghoon
bf05d4ec26
Fix nightly ort CI pipeline (#16162)
This PR changes [night ort CI
pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198)
to pick up the latest night ACPT image, which was changed from torch
2.0.0.dev to torch 2.1.0.dev.
2023-05-30 14:00:34 -07:00
Xavier Dupré
e726151b5c
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.

* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel

### Motivation and Context
Supports latest onnx version.

Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 13:25:58 -07:00
Yi Zhang
31fc25d2c2
[Fix] Check if CUDA is downloaded in AGENT_TEMPDIRECTORY (#16142)
### Description
supplement of #15915

### Motivation and Context
fix nuget pipeline exception in the stage of
Final_Jar_Testing_Windows_GPU

```
  JUnit Jupiter:ProviderOptionsTest:testCUDAOptions()
    MethodSource [className = 'ai.onnxruntime.providers.ProviderOptionsTest', methodName = 'testCUDAOptions', methodParameterTypes = '']
    => ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1131 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\cloudtest\AppData\Local\Temp\onnxruntime-java17193857285260738736\onnxruntime_providers_cuda.dll"
```


### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313476&view=results
2023-05-30 13:14:08 +08:00
Yi Zhang
73584f9360
More fixes on nuget pipeline (#16091)
### Description
1. parameters couldn't using string to comprare, change it to boolean.
2. Windows_CI_GPU_DML_DEV_arm64 on the pool onnxruntime-Win-CPU-2022
failed to pass prefast step, change the pool to aiinfra-dml-winbuild.
3. skipped test_zfnet512, it's failed in Nuget_Test_Win_Training_CPU

Todo
Only Final_Jar_Testing_Windows_GPU failed now.

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313042&view=logs&s=d66543d5-16de-5a48-6ecb-a36e21ff8d4d&j=d9489789-5e39-5a05-13ab-9aaf7b4d386f
2023-05-27 08:59:12 +08:00