Commit graph

1501 commits

Author SHA1 Message Date
Yi Zhang
64b63921a2
Retry the step of Start Android simulator (#15584)
### Description
Add Retry once There's a failure in `Start Android Simulator`. 

### Motivation and Context
`Start Android Simulator` isn't stable enough and the pipeline would
hang.

We could find many instances in
https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=188&contextType=build
2023-04-20 12:06:35 +08:00
Yi Zhang
5b6f79e79b
Improve windows build cache steps (#15537)
### Description
1. Split deps' compilation cache and ort's
2. reduce the caches generation in merge branch.

### Motivation and Context
Reduce pipeline cache stage.
2023-04-20 09:42:22 +08:00
Yi Zhang
573e4cf95f
[Fix] Python Packaging Pipeline exception. (#15568)
### Description
supplement of #15299

### Motivation and Context
It broke Python Packaging Pipeline since April 12.
2023-04-19 21:57:14 +08:00
liqun Fu
919d8f2660
update with onnx main (#14929) 2023-04-18 08:42:51 -07:00
Justin Chu
a36caba073
Bump ruff in CI (#15533)
### Description

Bump ruff version in CI and fixed new lint errors. 

- This change enables the flake8-implicit-str-concat rules which helps
detect unintended string concatenations:
https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc
- Update gitignore to include common python files that we want to
exclude.


### Motivation and Context

Code quality
2023-04-17 10:11:44 -07:00
Yi Zhang
4e1f75810c
Add compilation cache in 2 Linux CPU pipelines and refactor the Linux build step with cache (#15484)
### Description
1. Add compilation cache in Linux CPU ARM and Linux Minimal Build.
2. Integrate 4 Linux CPU build step with cache into one.
3. install ccache from source code in Linux ARM64 image.

### Motivation and Context
1. Enable more build steps with compilation cache.
2. Make it easier to add cache.

It could save 40 more minutes of compilation time in Linux ARM64.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=959619&view=logs&j=1e0830bb-fd74-5d0a-5029-1c63b4266d7b&t=75260ed7-7566-5947-2095-566660191920
2023-04-14 23:56:59 +08:00
Changming Sun
5bed8d0285
Disable XNNPack EP's tests in Windows CI pipeline (#15406)
### Description

1. Disable XNNPack EP's tests in Windows CI pipeline
The EP code has a known problem(memory alignment), but the problem does
not impact the usages that we ship the code to. Now we only use XNNPack
EP in mobile apps and web usages. We have already pipelines to cover
these usages. We need to prioritize fixing the bugs found in these
pipelines, and there no resource to put on this Windows one. We can
re-enable the tests once we reached an agreement on how to fix the
memory alignment bug.

2.  Delete anybuild.yml which was for an already deleted pipeline.
3. Move Windows CPU pipelines to AMD CPU machine pools which are
cheaper.
4. Disable some qdq/int8 model tests that will fail if the CPU doesn't
have Intel AVX512 8-bit instructions.
2023-04-13 12:19:32 -07:00
Yulong Wang
e1e8852213
[build/npm] dump ORT_COMMON_FROM from validation (#15475)
### Description
dump ORT_COMMON_FROM from validation

This writes environment variable ORT_COMMON_FROM for later steps in the
release pipeline to use.
2023-04-12 13:48:19 -07:00
yf711
8cd5f3ad9c
[TensorRT EP] support TensorRT 8.6-EA (#15299)
### Description

<!-- Describe your changes. -->

* Integrate TRT 8.6EA on relevant Linux/Windows/pkg pipelines
  * Update onnx-tensorrt to 8.6
  * Add new dockerfiles for TRT 8.6 and clean old ones
* Update
[CGManifest](https://github.com/microsoft/onnxruntime/tree/main/cgmanifests)
files and ort build deps version
  * yml/script update
* Enable built-in TRT parser option on TRT related pipelines by default
* Exclude test TopKOperator.Top3ExplicitAxisInfinity out of TRT EP tests
(8.6-EA has issue with topk operator)
2023-04-12 11:34:59 -07:00
Numfor Tiapo
e3086b2ed8
Move DML CI Pipeline to A10 (#15468)
This change moves the DML CI pipeline to the A10 machines and fixes or
disables tests that were failing from this change.

- Max error rate threshold was increased for Image Tests
- Some failing batch tests were disabled

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2023-04-12 10:19:40 -07:00
PeixuanZuo
0016554090
[ROCm] disable composable_kernel and kernel explorer for MIGraphX CI (#15479)
Disable composable_kernel and kernel explorer for MIGraphx CI to save
build time.
Composable_kernel and kernel explorer are tested on ROCm CI.
2023-04-12 22:26:40 +08:00
Rachel Guo
9c42d5e31f
[CoreML EP]Add broadcasting support for binary ops (#15187)
### Description
<!-- Describe your changes. -->

As title

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/15110

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-11 13:50:45 -07:00
Yulong Wang
0fbf715824
[build] add script to validate generated NPM packages (#15453)
### Description
add script to validate generated NPM packages and publish it to
artifacts, so that release pipeline can use it.

once this PR is merged, I will update the NPM package release pipeline.
2023-04-11 11:04:55 -07:00
Dmitri Smirnov
ce3b4eabd3
Implement Optional Metadata support and C# test support (#15314)
### Description
Implement Optional Type metadata support in the library.
Implement optional support in C# API along with metadata.
Implement Sequence, Map, Optional test data support
and test execution.

Prune tests and provide more details for failing tests in C# code.

Note, this PR does not enable running onnx test models in C++.

### Motivation and Context
Opset18 optional type support.
2023-04-11 09:41:59 -07:00
Yi Zhang
311f84d00c
Fix one nuget packaging pipline error (#15458)
### Description
Fix one typo in #14965 


### Motivation and Context
Fix the error `"onnxruntime_providers_shared.dll not found for win-x64"`
2023-04-11 18:00:10 +08:00
zhijiang
29c74d3c43
softmax perf improvement pr1 - add more softmax related test (#15176)
1. add fp16 test
2. add test for shape is not power of two.
2023-04-11 17:02:40 +08:00
Yi Zhang
feafbc4263
Refactor all Mac build steps (#15440)
### Description


### Motivation and Context
Make the compilation cache steps easy to use and maintain
Reduce cache storage.
2023-04-11 12:12:46 +08:00
Changming Sun
c8524d2dab
Refactor web-ci pipeline and delete eager mode CI pipeline (#15416)
### Description
1. Move it to a separated pool that use the same image as [the public
hosted
pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml).
Also, create a beta pool which contains the next version image of the
hosted pool, and add jobs in our post merge pipeline to test if the next
version image will break our CI. So, usually we will have at least one
week to prepare.

2. Change the cmake generator in use in our pipelines from "Ninja" to
"MingW Makefile", because the latest version of cmake doesn't work with
the latest version of Ninja. People who prefer Ninja could still use
ninja in their local build by passing "--cmake_generator ninja" to
[build.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py).

3. Delete eager mode CI pipeline. 


### Motivation and Context
I need to update the software we have in our CI build machines, and I
need to resolve this incompatibility issue. In more detail, the build
error I hit was:

em++: error:
CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o:
No such file or directory
("CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o"
was expected to be an input file, based on the commandline arguments
provided)

After this PR we will deprecate python 3.7 support. The eager mode CI
pipeline is the last one that still use python 3.7. Then we can rework
the PR #10953 made by [fs-eire](https://github.com/fs-eire) last year.

Fixed
[AB#14435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14435)
2023-04-10 10:41:04 -07:00
Yi Zhang
0ea965c541
clear cache stat. after building (#15439)
### Description
Add  `ccache -z` after every building.


### Motivation and Context
Uploaded Cache stat shouldn't include cache stat.
2023-04-10 13:56:55 +08:00
Hariharan Seshadri
f77c8f4863
Fix Npm packaging pipeline (#15425)
### Description
It seems like https://github.com/microsoft/onnxruntime/pull/15329
re-worked some jobs in `react-native-ci.yml` into stages. When this
template is used from within `npm-packaging-pipeline.yml`, there is
problem in that there is a stage that contains multiple stages as jobs.
Per my understanding, this is not acceptable to Azure DevOps. So,
re-working some portion of `npm-packaging-pipeline.yml` to accomadate
changes in https://github.com/microsoft/onnxruntime/pull/15329

### Motivation and Context
Fix NPM packaging pipeline
Validating test run with fix:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=297391&view=results
2023-04-07 22:13:39 -07:00
Edward Chen
666aff56a4
Add workflow to update Objective-C docs. (#15413)
Add workflow to update Objective-C API docs. Remove the Objective-C API doc generation step from the packaging pipeline.

There are similar workflows for automatically updating other language API docs. This change enables this for Objective-C too.
2023-04-07 15:00:15 -07:00
Edward Chen
8db86f2c52
Use fixed version of Android NDK in binary size checks pipeline. (#15422)
Ensure that we build with a known version of NDK and are not surprised when the default version on the build machine changes.

A similar change was made for other Android build pipelines previously, but this one was missed.
2023-04-07 14:53:54 -07:00
Edward Chen
139f3df4d2
Update binary size checks pipeline to use stages for separate checks. (#15408)
Allow running of any single check instead of all of them.
2023-04-07 09:55:40 -07:00
Changming Sun
df11c85955
Download protoc.exe from nuget when cross-compiling (#15395)
### Description
1. The protoc package on nuget.org contains binaries for
Windows_x86/Windows_x64/Linux_x86/Linux_x64/MacOS_x64, which can cover
most use cases. Though it doesn't have binaries for AMR64, they are only
needed when we cross-compile for Intel CPUs on ARM CPUs. It is rare.
When you have such a need, you always can build protoc from source by
yourself and pass it to build.py as "--path_to_protoc_exe". Or if you
have security concerns that you don't want to use prebuilt binaries from
outside, you can do the same thing.

2. Remove GoogleTestAdapter related thing. That part of code is out of
maintain.

### Motivation and Context
As a follow-up of PR #15190.
2023-04-06 17:06:59 -07:00
Dmitri Smirnov
dc1845a9c8
Update mimalloc dependancy to the latest release (2.1.1) for Windows build. (#15382)
### Description
Update mimalloc dependency.

### Motivation and Context
The latest release contains important fixes including memory leaks and
used by customers.
2023-04-06 13:07:00 -07:00
Sheil Kumar
0fbbb6a43e
WindowsAI build failing due to deprecated .NET5 SDK missing in build image (#15383)
WindowsAI build failing due to deprecated .NET5 SDK missing in build
image

.NET5 was deprecated last year, and recently the build machine images
have been updated to not include this SDK.
Unblock failing builds by force insalling .NET5 SDK as part of the build
pipeline.
2023-04-06 08:51:07 -07:00
Jian Chen
2e52de265a
Upgrade remainding python to 3.11 removing 3.7 (#15321)
### Description
Upgrade remainding python to 3.11 removing 3.7


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-05 21:43:51 -07:00
Yi Zhang
962d8d2b19
Add compilation cache in react native CI (#15329)
### Description
1. Replacing jobs with stages for better debugging and maintainance
2. Added compilation cache to accelerate the workflow.
3. Splited building protobuf and major code as 2 tasks



### Motivation and Context
Reduced compilation time about one hour.
test run:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=943695&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=8b360243-7783-51da-8079-2304089d3d1d
2023-04-06 10:39:14 +08:00
Jian Chen
af28754e6f
Update python package pipeline to support 3.11 (#15311)
### Description
Update python package pipeline to support 3.11

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-04 10:55:32 -07:00
Yi Zhang
b54ca9a041
Read the cache in main build if it's a (Intermediate)merge branch. (#15330)
### Description
In merge branch,  the run only reads the cache generated in main build.
As a result, each run in merge branch will not upload new cache except
at the first time.

### Motivation and Context
1.Reduce the cache storage.
If there's some big changes, devs should trigger the specific builds
manually in https://dev.azure.com/onnxruntime/onnxruntime/_build. It
still reads own branch cache.
2023-04-04 20:21:05 +08:00
RandySheriffH
e4aae94f20
Remove azure build to unblock PRs (#15336)
Temporarily remove Azure build check to unblock PR(s).
We need to investigate the sudden build failure and reenable.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-04-03 12:47:14 -07:00
PeixuanZuo
d80859f63d
[ROCm] fix python packaging pipeline and add python10 (#15282)
rocm python packaging pipeline failed because manylinux version and
manylinux.patch update.
1. fix duplicate `epel-release` installation issue, ROCm pipeline
install it at the begin of the dockerfile to install rocm libs. remove
duplicate installation on install-runtime-packages.sh.
```
/var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package.
Error: Nothing to do
```
2. add python10 to fix error below.
```
+ /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools
build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory
```
3. add python10 to rocm pipeline.

pipeline link:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results
2023-03-31 10:25:21 +08:00
Yi Zhang
c5f5e3ec5e
Improve 2 cache tasks in one pipeline yaml (#15267)
### Description
1. Make 2 cache tasks in one pipeline really works
2. Each building step has its own environment variable CCACHE_DIR
instead of job variables.
3. Extenal Protobuf compilation cache only updates with deps.txt. It
doesn't generate new cache in every commit.


### Motivation and Context
The simple workflow is as below
```
--------build with ccache-------             
         |                       
        cache                    
         |                       
      {CCACHE_DIR}-----cache stat.
```

```
-------Cache@2------
           |
    download cache           
           |                         
          {path}--------upload cache
```

1. {XXX} means environment variable or task input.
2. {CCACHE_DIR} must be consistent with {path}. Ccache produces caches
in {CCACHE_DIR} and Cache@2 download cache into {path} and tar {path}
and upload it.
3. Protobuf changes with deps.txt so that it would reduce the storage
size.
4. Next step, we may split the compilation into 2 steps, one for
external dependencies and another for ORT.
2023-03-30 23:22:11 +08:00
Yi Zhang
aab3c15585
Add Compliation Cache in CoreML pipeline (#15259)
### Description
1. move the cache task definition into template
2. In debug mode, the compiler mtime is different in different machine.
So, change the CCACHE_COMPILERCHECK to content.


### Motivation and Context
1. Accelerate the CoreML pipeline.
Test run:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=938040&view=logs&j=1ac7588f-a5bd-5ff7-4a8a-a34869d50220
With Cache, the run can be finished in 12 minutes. Without cache, it
takes about 1 hour.
3. Make the cache function easy to use and maintain.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-30 23:18:52 +08:00
Yulong Wang
2928fda490
[web] disable browser test temporarily (#15280)
### Description
This PR disables browser test temporarily. The test randomly fails and
we are investigating the issue. Disable the test to unblock others.
2023-03-30 08:15:36 -07:00
Changming Sun
15f7dca9fb
Update protobuf to 3.21.x (#15245)
### Description

Fixed
[AB#10092](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/10092),
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753),
[AB#11759](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11759)

### Motivation and Context
The one we use has a security issue in Java, though we don't use that
version's protobuf java package.
2023-03-29 14:08:18 -07:00
Changming Sun
4a0b86eba6
Update the post-merge pipeline (#14965)
### Description
1.  Remove Linux jobs for ORT-Extension combined build
2.  Add a macOS build job for ORT-Extension combined build
3. Adjust the yaml file so that it can support two different ADO
instances.


### Motivation and Context
To test our code better. And it will enable us to run such tests for
every commit in the main branch. It would be easier for us to figure out
which change caused a build break.

See
[AB#13435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13435)
2023-03-29 13:12:07 -07:00
Changming Sun
fb1f03fdff
Increase the timeout value of win-wasm-ci.yml (#15257) 2023-03-29 13:11:51 -07:00
Jian Chen
85948d6bc6
Cjian/windows update python3.11 (#15243)
### Description
windows update python3.11



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-28 22:15:47 -07:00
PeixuanZuo
62b2947ac1
[ROCm] remove python3.7 from python packaging pipeline (#15230)
remove python3.7 from python packaging pipeline.

https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=289720&view=results
2023-03-28 10:37:04 +08:00
Changming Sun
462c6043b5
Remove Win8 support (#15219)
### Description
Remove Win8 support since it is EOL.

See
https://learn.microsoft.com/en-us/lifecycle/announcements/windows-8-1-end-support-january-2023

### Motivation and Context
Simplify code.
2023-03-27 18:51:49 -07:00
Jian Chen
792d411135
Update python 3.11 and remove 3.7 for Linux (#15214)
### Description
Update python 3.11 and remove 3.7



### Motivation and Context
Update python 3.11 and remove 3.7

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-27 14:46:30 -07:00
Changming Sun
63cc1bb26a
Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144)
### Description
1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper
2. Enable CCache for orttraining pipeline

### Motivation and Context
Azure AMD CPU machines are generally much cheaper than Intel CPU
machines. However, they don't have local disks.
2023-03-27 14:10:08 -07:00
Changming Sun
ffcfb1ec98
Remove protobuf submodule (#15190)
### Description
Remove protobuf submodule as a follow-up of #13523

"Android CI Pipeline" and "Zip-Nuget-Java-Nodejs Packaging Pipeline"
need to be tested.


### Motivation and Context
It is related to
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753)

Fixed
[AB#14027](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14027)
2023-03-27 10:35:49 -07:00
Yi Zhang
d182d34f1d
pause caching docker image in pipeline cache in Linux Aten Pipeline (#15227)
### Description
Pause caching the docker images in pipeline cache in Linux Aten
Pipeline.

### Motivation and Context
We need to work out a better way to reduce the storage.
2023-03-27 11:06:53 +08:00
Jian Chen
750747d8c9
Cjian/multi stage packaging pipeline (#14993) 2023-03-24 23:39:15 -07:00
Justin Chu
d834ec895a
Adopt linrtunner as the linting tool - take 2 (#15085)
### Description

`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.

This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.

Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.

Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.

### Notable changes

1. This PR **removed some suboptimal patterns**:

	- `not xxx in` -> `xxx not in` membership checks
	- bare excepts (`except:` -> `except Exception`)
	- unused imports
	
	The follow up PR will remove:
	
	- `import *`
	- mutable values as default in function definitions (`def func(a=[])`)
	- more unused imports
	- unused local variables

2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.

3. Removed the legacy flake8 ci flow and updated docs.

4. The added workflow supports SARIF code scanning reports on github,
example snapshot:
	

![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png)

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Unified linting experience in CI and local.

Replacing https://github.com/microsoft/onnxruntime/pull/14306

---------

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-03-24 15:29:03 -07:00
Yi Zhang
5c5c345abc
Add smoking tests for all CPU Packages. (#15153)
### Description
So far, 2 packages are not supported.
1. Mac silicon, because there isn't Mac silicon agent in Azure.
2. Linux ARM64, because there isn't microsoft-hosted Linux ARM64 agent
in ADO and UsePythonVersion isn't supported in self-hosted Linux ARM
pool.

Test Run:

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=291132&view=logs&j=3a60a0ba-1640-5a1c-2d51-19af647b2d6b
2023-03-24 12:30:05 +08:00
Yi Zhang
338e6672dd
use build.sourceversion in cache image key (#15019)
### Description
Use build.sourceversion in docker image cache key.



### Motivation and Context
We used filpath as the cache key in #14496.
In most cases, the docker base image tag is latest.
So, the hash of the files couldn't be aware of the change of base image.
As the result, the docker image restored, but the image will still be
rebuilt .
The maintenance cost would be huge if we pin image hash in docker file.
For example,
https://quay.io/repository/pypa/manylinux2014_x86_64?tab=tags&tag=latest,
it's updated almost every week.
So far, the build.sourceversion is the right way to keep cache is
updated and valid.
2023-03-24 10:01:22 +08:00
Ye Wang
2ee822d483
Extend memory efficient attention coverage in Attention/MHA cuda op (#15064)
### Description
<!-- Describe your changes. -->

1. upgrade cutlass to 3.0 that containing attn_bias support.
2. extend Attention/MHA to use memory efficient attention when
rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size
+ 1] are present.

new mask format introduction:
MASK_1D_KEY_SEQ_LEN_START,  
[3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1],
query_start[0], ..., query_start[batch_size - 1], query_end[batch_size -
1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size -
1]]

e.g
2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this
1D mask is [3, 5, 0, 6, 12, 0, 6, 12]


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It potentially benefits tnlrv6 and t5(encoder)

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-03-23 11:05:17 -07:00
Justin Chu
896ab94780
Remove root in run_python_dockerbuild.sh (#15169)
Running docker in root causes the pipeline to be stateful and
subsequently fail
2023-03-23 06:32:36 -07:00
Yi Zhang
a3570eb5bf
Add mac packages smoking test (#15122)
### Description
Check the Mac x86_64 packages installation.

### Motivation and Context
To avoid installation error, add packages smoking test before release.
2023-03-21 18:02:44 +08:00
PeixuanZuo
32a4eebc17
[ROCm] add rocm5.4.2 to python package pipeline (#15081)
add rocm5.4.2 to python package pipeline:
https://download.onnxruntime.ai/onnxruntime_nightly_rocm542.html
2023-03-20 10:30:14 +08:00
Yi Zhang
1e7849c2c8
Add compilation cache in iOS pipeline (#15070)
### Description
<!-- Describe your changes. -->

### Motivation and Context
iOS pipeline duration could be reduced to 20 more minutes from 90 more
minutes

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=921577&view=results

### Ref
https://ccache.dev/manual/4.8.html#_c_modules
2023-03-16 21:43:18 +08:00
Yi Zhang
881f3f6be3
[Fix] Error in Linux_Packaging_combined_GPU of nuget packaing pipeline (#15060)
### Description


### Motivation and Context
It caused by the #14958, in the nuget packaging pipeline, it calls
get_docker_image.py directly rather than by get-docker-image-steps.yml.
Considering the difference, one parameter is added for compatibility.


### Test Link

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=288042&view=logs&j=505ca2b7-596d-550d-8417-9b1519e87977
2023-03-16 08:49:37 +08:00
Jian Chen
6891ab5bac
fix_macos (#15018)
### Description
<!-- Describe your changes. -->
This fix macos packaging build on universal2 arch. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-14 21:54:44 -07:00
Yi Zhang
f096f6167b
Remove python37 and cuda37 packages in orttraing (#15041)
### Description
supplement of #14874 and #14887

### Motivation and Context


N.B.
I'm not sure if python matrix of rocm is expected (python3.7-3.9) @faxu
@snnn

(https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/azure-pipelines/orttraining-py-packaging-pipeline-rocm.yml)
2023-03-15 08:54:15 +08:00
Rachel Guo
db4e664f7c
Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989)
### Description
<!-- Describe your changes. -->

Re-enable the react native e2e android unit test for react native CI as
recent change of specifying `default` instead of `google-apis` in
android emulator CI tests gives pretty stable result for now.

Upgrade the targetSDKversion for gradle test project in
react-native/android to meet minimum target api level requirement for
Google Play apps.


https://support.google.com/googleplay/android-developer/answer/11926878?hl=en

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

React Native CI issue.
2023-03-14 13:35:38 -07:00
Yi Zhang
ca315b9148
Use ADO cache to cache docker image instead of ACR (#14496)
### Description
Now, we only enable image cache in pipeline cache for Linux Aten
Pipeline.
It'll be enabled in other Linux pipelines gradually.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fixed
[AB#13143](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13143)


### Verification
1. No Image Cache in Pipeline

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904531&view=results
2. Use Cached Image in Pipeline

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904533&view=results
2023-03-11 10:32:02 +08:00
Changming Sun
a8ad0edbeb
BUG FIX: the if...else in telemetry-steps.yml does not really work (#14972)
### Description
BUG FIX: the if...else in telemetry-steps.yml does not really work. It
always says "Telemetry is disabled." even through the pipeline doesn't
have the pipeline variable.

### Motivation and Context
For example, recently I setup a new pipeline in
https://dev.azure.com/onnxruntime/onnxruntime/_build without setting the
ADO variable, but the powershell code still thinks that we have enabled
telemetry.

See:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=910107&view=results

The reason it didn't work because when the pipeline
variable("TELEMETRYGUID") doesn't exist,  the occurrence of
 "$(TELEMETRYGUID)" would be not replace to anything. It will remain as
it is.
2023-03-10 15:39:07 -08:00
Adrian Lizarraga
e2febe87f6
[QNN EP] Update QNN SDK to 2.8 (#14978)
### Description
- Add QNN 2.8 SDK
- Make QNN SDK version a pipeline template parameter for QNN pipelines.

### Motivation and Context
Updates to latest QNN SDK version, and allows testing different QNN SDK
versions without modifying yaml files.
2023-03-10 13:21:19 -08:00
Edward Chen
bd142bfb04
Gradle clean up (#14973)
- Use java/gradlew directly in .github/workflows/publish-java-apidocs.yml.
- Remove use of deleted step from tools/ci_build/github/azure-pipelines/android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml.
- Remove Gradle installations and PATH updates from Dockerfiles and scripts. Now Gradle wrapper is used so a system Gradle installation is not needed.
2023-03-10 10:50:32 -08:00
Yi Zhang
acbb7ad453
enable cache in orttraining-mac-ci (#14979)
### Description
enable compilation cache  in orttraining-mac-ci

### Motivation and Context
The workflow duration can be reduced to 12 minutes from about 100
minutes at best.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=911536&view=results
2023-03-10 07:34:25 +08:00
Yulong Wang
1187d4ade6
[wasm] extend build timeout for static lib (#14952)
### Description
extend build timeout for web assembly static lib.
2023-03-09 15:03:34 -08:00
Jian Chen
b4fe98ac2e
Update to MacOS-12 (#14924)
### Description
<!-- Describe your changes. -->


Update to MacOS-12
### Motivation and Context

Fixed
[AB#13233](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13233)
2023-03-09 10:18:14 -08:00
Yi Zhang
d55ae490e1
detach patch manylinux from get_docker_image (#14958)
### Description
Make patch manylinux one single step.


### Motivation and Context
If we want to use hash of docker-related files as the cache key, the
files should keep consistent before and after docker build.
And changes in generated build_scripts should trigger rebuilding the
image as well.
2023-03-09 15:40:58 +08:00
zhijiang
80e25ad6ac
fix cg issue (#14372)
### Description
tensorboard depends on rsa>=3.1.4, while rsa 4.5 has vuln issue, so pin
it to higher version as suggested

Fixed
[AB#7352](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/7352)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-09 15:28:11 +08:00
Edward Chen
c46c7ccba5
Update Gradle version (#14862)
- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
  Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.

- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
2023-03-08 12:22:06 -08:00
Adam Pocock
47f00b5d49
[Java] Initial on device training support (#14027)
contributor: @Craigacp
2023-03-08 10:01:08 -08:00
Ashwini Khade
f71ac9859e
Update acpt image in the training pipeline (#14855)
### Description
Current pipeline refers to an old image which is causing test failures.
Updating the image to the latest one.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Fixes pipeline failure:
https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198
- If it fixes an open issue, please link to the issue here. -->
2023-03-07 14:10:32 -08:00
Changming Sun
3e08a67dd6
Add Linux ARM64 CI pipeline (#14904) 2023-03-06 21:47:10 -08:00
Adrian Lizarraga
d45b47945c
Linux QNN Pipeline: fix build error reporting (#14922)
### Description
Split up the ORT build step in the Linux QNN CI Pipeline.


### Motivation and Context
Build errors were not being immediately reported at the end of the build
step. The build step currently concatenates multiple shell commands, and
the return code for the last (mkdir) was being reported. This PR ensures
that the return code of the `python build.py ...` command is reported
for the build step.
2023-03-06 17:49:35 -08:00
Changming Sun
c1155b70c5
Remove 37 and 50 from CUDA compute archs (#14874)
### Description
To reduce CUDA package's size a little bit. 37 is for Tesla K80. Azure's
NC-series uses it, but in most cases CUDA can dynamic generate device
code .
2023-03-03 12:24:21 -08:00
Yi Zhang
8c454a76e0
Check Mac silicon package name (#14898)
### Description
1. add comments 
2. check Mac silicon package name 

### Motivation and Context
There isn't Mac silicon Agent in ADO.
We couldn't add smoking test to test the wheel can be installed.
But We can check whether the package name is correct to avoid the
mistake in 1.14 release.

Test run

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=283100&view=logs&j=fe710151-df7c-5aa4-0cea-cf5331faa499&t=3182cefe-2612-53c6-4445-e5b3e0c4ac57
2023-03-03 18:27:54 +08:00
Changming Sun
f3b6664384
Remove Python 3.7 from the python packaging pipeline (#14887)
### Description
1. Remove Python 3.7 from the python packaging pipeline. It is planned
for the next release and approved by the PMs. Also we will add 3.11, but
it will be addressed in another PR.
2. Stop generating python packages based on Ubuntu 18.04 which will
reach EOL next month. We will either replace them with Ubuntu 20.04 or a
CentOS 8 variant.
2023-03-02 19:44:49 -08:00
Chun-Wei Chen
70a31e047a
Consume ONNX 1.13.1 in ONNX Runtime (#14812)
### Description
<!-- Describe your changes. -->
Consume ONNX 1.13.1 in ONNX Runtime. (ONNX 1.13.0 to ONNX 1.13.1)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ONNX 1.13.1 patch was just released yesterday. This PR is making ORT's
ONNX submodule consistent with the latest released ONNX. Not sure
whether this PR is really needed, but let me make it ready. Previous PR
for testing ONNX 1.13.1rc2 :
https://github.com/microsoft/onnxruntime/pull/14634.

Fixed
[AB#13174](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13174)
.
2023-03-02 14:57:35 -08:00
Hector Li
c6074f3a4b
OnnxRuntime QNN EP (#14791)
### Description
Integrate Qualcomm QNN SDK to enable inference on QC hexagon NPU devices

### Motivation and Context
Enable Ort inference on QC hexagon NPU devices.

---------

Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Adrian Lizarraga <adrianlm2@gmail.com>
2023-03-01 13:48:20 -08:00
Scott McKay
b7fde84341
Changes to support standalone custom ops in a minimal build. (#14497)
### Description
<!-- Describe your changes. -->
Changes to support standalone custom ops in a minimal build. Also
incorporates changes from #14492 (needed to test builds prior to that
being checked in).

We first need to save the schema info from the operators used by the
standalone op invoker in the ORT format model. Add mechanism for that.

Merge the kernel lookup logic so the same is used in full and minimal
build. NOTE: the version matching is now consistent with all other
kernel lookups, and the call to CreateOp MUST use the exact version for
the operator. Previously matching wasn't as strict, but this can lead to
the incorrect kernel being chosen.

Add tests.

NOTE: There is currently no way to detect the ops/types/opsets used
inside these custom ops as they don't exist until we create kernels,
which is after model loading completes (which is the point the ORT
format model is saved). Due to that they have to be manually added to
the configuration used to do the reduced ops build. That shouldn't be
too hard for the custom op author to add given the custom op
implementation is specifying the op, opset and type constraints (i.e.
they have the info and it's just a case of capturing/formatting it
correctly).


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable usage of the standalone op invoker by custom ops in a minimal
build.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-01 11:22:54 +10:00
Yulong Wang
69c5edb11b
[wasm] upgrade emsdk from 3.1.19 to 3.1.32 (#14818)
### Description
upgrade emsdk from 3.1.19 to 3.1.32

also add explicit config for stack size (1MB).
2023-02-28 11:06:09 -08:00
Yi Zhang
6320decf04
increase Test GPU Job's timeout to 8 hours (#14850)
### Description
<!-- Describe your changes. -->

### Motivation and Context
In practice, 6 hours is not enough to finish the job.
2023-02-28 18:52:03 +08:00
Yi Zhang
0be20dc0f6
Run GPU test job after all CPU test jobs succeed. (#14833)
### Description
Make GPU job depends on all CPU jobs

### Motivation and Context
GPU resources are very limited in packaging pipeline.
And GPU test job is very time consuming.
Only one CPU job fails, the workflow fails, so the GPU job is
meaningless.
To utilize GPU resources more efficiently, run GPU job only after all
CPU jobs succeed.

###test pipeline

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=280905&view=results
2023-02-28 07:44:51 +08:00
Yulong Wang
6b83ad9659
[js/web] allow unittest (onnxruntime_test_all) to run in browser (#14820)
### Description
allow onnxruntime_test_all to run in browser for WebAssembly build (use
flag `--wasm_run_tests_in_browser`).

To output the logs from stdout correctly, this test needs to be build
with `--enable_wasm_threads`.
2023-02-24 16:45:33 -08:00
Rachel Guo
0700788b6e
Disable e2e android react native CI test temporarily (#14803)
### Description
<!-- Describe your changes. -->

Disable e2e android react native test temporarily to unblock the CI
failure with no easy fix.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Temp solution to unblock CI failure.
2023-02-24 09:32:18 -08:00
Jian Chen
29428cd9dc
Cjian/pr into main for 1.14.1 fix (#14805)
### Description
<!-- Describe your changes. -->

PR a change made to 1.14.1 into Main branch as well. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-02-23 18:10:57 -08:00
Jian Chen
62ee0c8110
Migrating ORT Extensions from Git submodule to cmake FetchContent (#14298)
### Description
<!-- Describe your changes. -->

Merging extensions from Git submodule to cmake FetchContent


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Jian Chen <jchen351@MacBook-Pro.local>
2023-02-22 19:42:36 -08:00
Edward Chen
b3b9be19b1
Update clang-tidy path for updated Mac image. (#14760)
Update clang-tidy path for updated Mac image. Fix Objective-C static analysis build.
2023-02-22 09:00:42 -08:00
Edward Chen
ad78579b66
Update java/build.gradle to not use deprecated features that were removed in gradle 8.0. (#14733)
### Description
<!-- Describe your changes. -->

Update java/build.gradle to not use deprecated features that were
removed in gradle 8.0.
Also move gradle wrapper setup from a script into a step template.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix builds which use hosted Mac agents and gradle.

Recently the system version of gradle got upgraded to 8.0. Even though
we use an older gradle wrapper version, java/build.gradle is still
processed with gradle 8.0 in the initial call to `gradle wrapper`.
2023-02-20 11:19:49 +08:00
Wei-Sheng Chin
7b31bcda2e
Disable LazyTensor-ORT Test (#14703)
As title since LazyTensor is replaced by Dynamo in PyTorch 2.0.
2023-02-17 17:46:51 +08:00
Patrice Vignola
ce9a71620f
Fix DML release build (#14661)
### Description
Fixes the DML release build for 1.14.1. This was initially fixed by
https://github.com/microsoft/onnxruntime/pull/13417 for 1.13.1, but the
changes didn't make their way back to the main branch.
2023-02-13 17:31:11 -08:00
Tang, Cheng
8f34c8c8ed
Introduce collective ops to ort inference build (#14399)
### Description
Introduce collective ops into onnxruntime inference build, including
1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI
flag
2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL
flag


### Motivation and Context
Enable the collective ops in onnxruntime inference build so we have the
ability to run distributed inference with multiple GPUs.
The original ncclAllReduce ops in training build require quite complex
configurations, which is not suitable for inference case, and it already
broken. so we introduce a new implementation.

---------

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-02-07 13:47:48 -08:00
RandySheriffH
b6bec54341
Revert mimalloc from v2.0.9 to v2.0.3 (#14603)
Revert mimalloc from v2.0.9 to v2.0.3 to silence build error in
[post-merge
](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=273075&view=logs&j=f019f681-ae8f-5ee4-d119-02530df66a84&t=6c90c65c-2ab2-56af-633f-b5631256a8e1&l=351)
pipeline.
New dependency version was generated
[here](https://aiinfra.visualstudio.com/Lotus/_artifacts/feed/Lotus/UPack/onnxruntime_build_dependencies/overview/1.0.29).

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: rui-ren <ruiren1225@gmail.com>
2023-02-07 09:58:25 -08:00
Baiju Meswani
68a402e739
Add support for python 3.10 for onnxruntime-training cuda and cpu (#14100) 2023-02-02 11:32:41 -08:00
RandySheriffH
01cafe89f0
Specify deps in deps.txt and manifest (#14530)
Specify new deps and update cgmanifest.json.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-02-02 09:44:57 -08:00
Baiju Meswani
7954976e0a
Fix python packaging pipeline (#14533)
fix onnx and protobuf inconsistencies in python packaging pipeline.
2023-02-02 13:11:18 +08:00
Yulong Wang
0578eeff91
upgrade EsrpCodeSigning from v1 to v2 (#14531)
### Description
This change upgrade EsrpCodeSigning from v1 to v2 in our build pipeline.
2023-02-02 13:08:26 +08:00
Yi Zhang
80f807c03d
upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279)
### Description
upgrade protobuf to 3.20.2, same as onnx 1.13.0

### Motivation and Context
Per component governance requirement and Fixes #14060

unused-parameter error occurs in 2 conditions.
1. compile protolbuf

`onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66:
error: unused parameter ‘prototype’ [-Werror=unused-parameter]`
2. include onnx_pb.h
```
2023-01-28T10:20:15.0410853Z FAILED: CMakeFiles/onnxruntime_pybind11_state.dir/onnxruntime_src/onnxruntime/python/onnxruntime_pybind_iobinding.cc.o 
......
2023-01-28T10:20:15.0466024Z                  from /build/Debug/_deps/onnx-src/onnx/onnx_pb.h:51,
2023-01-28T10:20:15.0466958Z                  from /onnxruntime_src/include/onnxruntime/core/framework/to_tensor_proto_element_type.h:10,
....
2023-01-28T10:20:15.0609678Z /build/Debug/_deps/onnx-build/onnx/onnx-operators-ml.pb.h:1178:25:   required from here
2023-01-28T10:20:15.0610895Z /onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66: error: unused parameter ‘prototype’ [-Werror=unused-parameter]
2023-01-28T10:20:15.0611707Z cc1plus: all warnings being treated as errors

```

https://dev.azure.com/onnxruntime/2a773b67-e88b-4c7f-9fc0-87d31fea8ef2/_apis/build/builds/874605/logs/22
2023-01-31 12:55:09 -08:00
cloudhan
3b6d551c35
Enable ccache for HIP objects (#14465)
This enables HIP compiler to be launched with `ccache` when build with `--use_cache`
2023-01-28 22:34:24 +08:00
Vincent Wang
7aecb2150f
Fix onnxruntime-CI-nightly-ort-pipeline Failure (#14464)
PyTorch skipped version 1.14 and jumped to 2.0, while the image for the
onnxruntime-CI-nightly-ort-pipeline is still using
nightly-ubuntu2004-cu116-py38-torch1140dev. Switch to the new torch
version image to fix the failure of the pipeline.
2023-01-28 16:05:56 +08:00
Tianlei Wu
94b1791974
Upgrade CUTLASS to v2.11 and add sequence length threshold for cutlass FMHA (#14401)
### Description
Add sequence length threshold for triggering cutlass FMHA in FP32. See
performance test results in
https://github.com/microsoft/onnxruntime/pull/14343 to see how this
threshold is selected.

Upgrade cutlass to v2.11 and update deps.txt and cgmanifest for nuget
pipeline build (test build:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=268574&view=results)
2023-01-25 09:43:48 -08:00
Edward Chen
7cc9aed314
Android package custom build script update (#14403)
Update Android package custom build script.
- Use later version of various dependencies (CMake, JDK, Android command line tools, Android NDK, Ubuntu). The CMake version was too old for the current ORT code.
- Do in-container build in a directory that is not shared with the host. Resolves some file permission issues and speeds up file access.

Add a nightly build to make sure the script works with the latest ORT.
2023-01-25 09:19:05 -08:00
Yi Zhang
cf3661ff6d
Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… (#14375)
…ckaging_CPU_x86_default (#14332)"

This reverts commit a491f33f54.

### Description


### Motivation and Context
It looks an ADO issue.
Now, it's recovered.
It could be reenabled.
2023-01-21 09:32:39 +08:00
Edward Chen
3b382ea7e1
Free OrtStatus in ASSERT_ORT_STATUS_OK, make run_android_emulator.py work with newer JDK version (#14369)
- Free OrtStatus in ASSERT_ORT_STATUS_OK in model_tests.cc
- Make run_android_emulator.py work with newer JDK version
2023-01-20 09:27:47 -08:00
Yi Zhang
3d6cea14f4
Remove intermedia obj files once build finished (#14361)
### Description
Remove intermedia obj files and reenable cache

### Motivation and Context
Recently, training_debug_x64 pipeline often failed due to not enough
space.
It could free nearly 8G space by deleting obj files.
So, the compilation cache can be reenabled
2023-01-20 13:37:15 +08:00
Edward Chen
ae0e090c7b
Fix post merge jobs pipeline build issues (#14346)
- Fix debug node inputs outputs nullptr dereference with ONNX optional types.
- Fix model test memory leak.
- Convert jobs to stages in post-merge-jobs.yml to allow a subset of builds to be enabled when running manually.
- Fix buffer overrun in CumSum op exposed by Mimalloc build.
2023-01-19 11:16:42 -08:00
Yi Zhang
b51415b0ea
disable cache for training_x64_debug (#14358)
### Description
disable cache to save disk space for training_x64_debug


### Motivation and Context
To mitigate not enough disk space in training_x64_debug first.
2023-01-19 15:08:34 +08:00
Adrian Lizarraga
a491f33f54
Allow PostAnalysis@2 task to continue on error for Windows_Packaging_CPU_x86_default (#14332)
### Description
Allows the PostAnalysis@2 task for windows CI jobs to continue even if
an error is encountered.


### Motivation and Context
This is a temporary workaround that enables the
`Windows_Packaging_CPU_x86_default` job within the Zip-Nuget-Java-NodeJS
packaging pipeline to finish. A recent update to dotnet 6 has broken the
PostAnalysis task for this job.

This task was originally added by
https://github.com/microsoft/onnxruntime/pull/13694
2023-01-18 19:54:48 -08:00
Rui Ren
904e63633a
increase the time limit as more unit tests added (#14327)
### Description
Pipeline failed because we added more unit tests, reference:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863643&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=305229be-e8ba-5189-ca61-fcb77d866478

Now we have: [2430 tests](
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863619&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43853)
Previously we had: [2422
tests](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=859543&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43640)

- Timeout error as we have 2 hour threshold
```
jobs:
- job: Linux_Build
  timeoutInMinutes: 120
  variables:
    skipComponentGovernanceDetection: true
```

### Motivation and Context

- Increase the timeoutInMinutes to `150`
2023-01-18 15:51:21 -08:00
Guenther Schmuelling
60290393f3
enable ort-extensions in wasm release builds (#14239)
enable ort-extensions in wasm release builds. sentence piece, gpt2, bert
and word piece tokenizers for now.

wasm size will grow from 8.4MB to 8.9MB.
2023-01-17 12:39:13 -08:00
Yi Zhang
fb801d58b1
Add Cache in Linux CPU Aten Pipeline (#14313)
### Description
Add compilation cache in Linux CPU Aten Pipeline.
The pipeline could be completed in 6 minutes at best.

### Motivation and Context
1. Accelerate the pipeline.
2. It's the shortest pipeline with docker image. I'll use it to try
moving the storage of linux docker image from ACR to ADO pipeline cache.
2023-01-17 10:49:29 +08:00
Yi Zhang
6d60dc24fe
install shared deps script (#14234)
### Description
Add a new install_shared_deps.sh

### Motivation and Context
Azcopy, Ninja, Node.js and CCache are all needed, but they are copied
everywhere.
2023-01-16 18:27:29 +08:00
Yi Zhang
2a82f95040
Increase package python test pipeline timeout limit (#14288)
### Description
Increase python test pipeline timeout limit. 
So far, It's a known issue for tensortRT8.5.
2023-01-14 13:46:09 +08:00
PeixuanZuo
d3a09cf77f
[ROCm] use pytest-xdist for fast pytest (#14261)
### Description

Use pytest-xdist to distribute tests across multiple CPUs to speed up
test execution.
Use pytest-rerunfailures to rerun failed test in case of pytest-xdist
crash.
`pytest -n 16` can reduce pytest time from 80 minutes to 20 minutes.


### Motivation and Context
Now kernel explorer pytest of ROCm CI takes nearly 1 hour 20 minutes. It
will take longer time when we add more tunableOp in the future.
2023-01-13 16:57:50 +08:00
Scott McKay
b9ecd428c1
Add ability to register custom ops by specifying a function name (#14177)
### Description
<!-- Describe your changes. -->
Use dlsym/GetProcAddress to lookup a custom ops registration function by
name and call it.

This will be better on mobile platforms where the custom ops library is
linked against, and there isn't necessarily a filesystem that a library
path can be loaded from.

Alternative is to wire up passing in the address of the function, but
that has multiple complications which differ by platform.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable using ort and ort-ext packages on mobile platforms.

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-01-12 15:11:34 +10:00
sfatimar
7654cd50e8
Openvino ep 2022.3 v4.3 (#14210)
### Description
Changes to incorporate OpenVINO EP 2022.3


### Motivation and Context
This change is required to incorportate OpenVINO EP 2022.3
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: flexci <mohsinmx>
2023-01-11 16:31:26 -08:00
RandySheriffH
83ad562826
Rename CloudEP to AzureEP (#14175)
Rename CloudEP to AzureEP.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-11 12:25:04 -08:00
Ashwini Khade
d92c663f28
Create dedicated build for training api (#14136)
### Description
Enable creating dedicated build for on device training. With this PR we
can build a lean binary for on device training using flag
--enable_training_apis. This binary includes only the essentials like
training ops, optimizers etc and NOT features like Aten fallback,
strided tensors, gradient builders etc . This binary also removes all
the deprecated components like training::TrainingSession and OrtTrainer
etc

### Motivation and Context
This enables our partners to create a lean binary for on device
training.
2023-01-10 20:58:04 -08:00
PeixuanZuo
33367fa2dc
[MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0 (#14184)
### Description
Update the MIGraphX version used in ORT to rocm-5.4.0

### Motivation and Context
The previous branch migraphx_for_ort has stopped updating, it is too far
away from the MIgraphX latest release branch. More discussion here:
https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2023-01-10 13:40:25 +08:00
Yi Zhang
6463f4383b
make WITHCACHE as an option in MacOS workflow (#14188)
### Description
1. Set the WithCache default value as false in Mac OS CI workflow too.
2. Add date of today in cache key to avoid cache size keep increasing
too.

WithCache, the pipeline duration reduced from 70 more minutes to 10 more
minutes
2023-01-10 10:54:19 +08:00
liqun Fu
1be36913cc
to work with onnx 1.13 rc, implement ver 18 reduce and optioanl ops, … (#13765) 2023-01-09 10:26:16 -08:00
Baiju Meswani
c6ff5bac9d
Update torch in eager mode CI pipeline (#14094) 2023-01-06 11:46:44 -08:00
zhijiang
0ed7277bbe
fix training compilation option (#14151)
fix the pipeline failure for compilation option error
2023-01-06 14:25:03 +08:00
Yi Zhang
2ce7b1c1dc
Enable cache for msbuild (#14085)
### Description
Enable ccache in windows CPU compilation.
The windows compilation in CI could be reduced to 1 more minute at most.

![image](https://user-images.githubusercontent.com/16190118/210294061-86742cf4-65c7-4cc2-9725-e102c3c64abd.png)
2023-01-06 11:19:57 +08:00
Ashwini Khade
e5e3570ac5
fix cg issue (#14112)
### Description
Update torch version to 1.13.1 to fix CG issue:
https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/10666/
2023-01-04 09:07:13 -08:00
Yi Zhang
f864b54393
Use today's cache only (#14120)
### Description
Add date value of today into the cache key.

### Motivation and Context
Microsoft-host agent has only 10GB for build.
To limit cache size, pipeline only use cache generated today.
2023-01-04 17:48:52 +08:00
Baiju Meswani
0ff61f7b97
Update torch to 1.13.1 in CI and packaging pipelines for ort training (#14055) 2023-01-03 20:03:33 -08:00
Ashwini Khade
68b5b2d7d3
Refactor training build options (#13964)
### Description
1. Renames all references of on device training to training apis. This
is to keep the naming general. Nothing really prevents us from using the
same apis on servers\non-edge devices.
2. Update ENABLE_TRAINING option: With this PR when this option is
enabled, training apis and torch interop is also enabled.
3. Refactoring for onnxruntime_ENABLE_TRAINING_TORCH_INTEROP option: 
   -  Removed user facing option
- Setting onnxruntime_ENABLE_TRAINING_TORCH_INTEROP to ON when
onnxruntime_ENABLE_TRAINING is ON as we always build with torch interop.

Once this PR is merged when --enable_training is selected we will do a
"FULL Build" for training (with all the training entry points and
features).
Training entry points include:
1. ORTModule
2. Training APIs

Features include:
1. ATen Fallback
2. All Training OPs includes communication and collectives
3. Strided Tensor Support
4. Python Op (torch interop)
5. ONNXBlock (Front end tools for training artifacts prep when using
trianing apis)

### Motivation and Context
Intention is to simply the options for building training enabled builds.
This is part of the larger work item to create dedicated build for
learning on the edge scenarios with just training apis enabled.
2023-01-03 13:28:16 -08:00
RandySheriffH
587e891cae
CloudEP (#13855)
Implement CloudEP for hybrid inferencing.
The PR introduces zero new API, customers could configure session and
run options to do inferencing with Azure [triton
endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint)
Sample configuration in python be like:

```
sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton');
sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com');
sess_opt.add_session_config_entry('cloud.model_name', 'detection2');
sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1
sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose
...
run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint.
run_opt.add_run_config_entry('cloud.auth_key', '...')
...
sess.run(None, {'input':input_}, run_opt)
```

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-03 10:03:15 -08:00
Baiju Meswani
b85878953f
Fix nightly ort training ci pipeline (#14007) 2022-12-30 12:28:57 -08:00
PeixuanZuo
b5fd2a6a80
[ROCm] Add ROCm5.4 to python package pipeline (#14012)
Add ROCm5.4 to python package pipeline.
The download link of ROCm5.4 nightly build whl is
https://download.onnxruntime.ai/onnxruntime_nightly_rocm54.html
The download linkd of ROCm5.4 nightly build whl with profiling is
https://download.onnxruntime.ai/onnxruntime_nightly_rocm54.profiling.html

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-22 10:01:40 +08:00
PeixuanZuo
ab2dd8dfaf
[ROCm] Update ROCm and MigraphX CI to ROCm5.4 (#14011)
Update ROCm and MigraphX CI to ROCm5.4
Run ortmodule_test with ROCm5.4 and all
passed(https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=824742&view=logs&j=8292f886-7946-5da9-7977-04484c342eda&t=5de68eaa-cbdc-5be5-13d0-bb946f4ddb2d).

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-22 10:01:05 +08:00
Changming Sun
fc2a6db573
Update absl to the latest release (#13990)
### Description
Update absl to a new version

### Motivation and Context
The new version contains fixes that are needed for Nvidia GPU build.
Once we update it to that version, we don't need to maintain our private
patches for Nvidia GPU build.
2022-12-19 14:25:13 -08:00
Yulong Wang
cc0a6213e4
[js] update versions of a few build dependencies (#13977)
### Description
update versions of a few build dependencies for onnxruntime NPM
packages.

update nodejs version to v16.x in linux CI. v12 is too out-of-dated. see
[nodejs release
schedule](https://github.com/nodejs/release#release-schedule)

### Motivation and Context
- upgrade to latest webpack allows using of latest Node.js LTS version.
previous version of webpack does not work on Node.js v18 and it is fixed
in latest version
- upgrade to latest typescript, ts-loader and other dev deps to
accelerate the build and bundling.
- upgrade also helps to resolve security warnings that may be vulnerable
in out-of-dated version
2022-12-16 17:26:54 -08:00
Chi Lo
ba89cae3bd
Update package pipelines to support TRT 8.5 (#13998)
Update following package pipelines to support TRT 8.5 after
https://github.com/microsoft/onnxruntime/pull/13867:

- [Linux Multi GPU TensorRT CI
Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1016&_a=summary)
- [Python packaging
pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)
-
[build-perf-test-binaries](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1130&_a=summary)
-
[Linux-GPU-EP-Perf](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)
2022-12-16 15:01:50 -08:00
Yi Zhang
aa9fbed3d4
Add compilation cache for Linux GPU (#13995)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-16 16:38:12 +08:00
Yi Zhang
7d20d889d1
Use cache for compilation in container (#13960)
### Description
For compilation in container,  ADO Cache task doesn't work directly.
The workaround is to mount the cache directory to the container, and let
CCache in container to read/write cache data.
In short, we just leverage ADO API to download/upload cache data.

The Post-jobs works in stack-mode, So the PostBuildCleanUp Tasks should
be defined first.
Thus, The PostBuildCleanUp would be executed lastly.
Else, Cache Task would fail to upload cache because the Agent Directory
is cleaned.
2022-12-16 07:19:07 +08:00
Changming Sun
a9b1fb032b
FIX: macOS CI pipeline doesn't run tests (#13970)
### Description
Fix a problem: macOS CI pipeline doesn't run tests. It is due a code
refactoring I recently made.

### Motivation and Context
Add the tests back.
2022-12-14 18:39:31 -08:00
Chi Lo
5b492cbae3
[TensorRT EP] support TensorRT 8.5 (#13867)
Integrate TensorRT 8.5

- Update TensorRT EP to support TensorRT 8.5
- Update relevant CI pipelines
- Disable known non-supported ops for TensorRT
- Make timeout configurable.
We observe more than [20
hours](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=256729&view=logs&j=71ce39d8-054f-502a-dcd0-e89fa9931f40)
of running unit tests with TensorRT 8.5 in package pipelines. Because we
can't use placeholder to significantly reduce testing time (c-api
application test will deadlock) in package pipelines, we only run
subsets of model tests and unit tests that are related to TRT (add new
build flag--test_all_timeout and set it to 72000 seconds by package
pipelines). Just to remember, we still run all the tests in TensorRT CI
pipelines to have full test coverage.

- include https://github.com/microsoft/onnxruntime/pull/13918 to fix
onnx-tensorrt compile error.

Co-authored-by: George Wu <jywu@microsoft.com>
2022-12-14 13:06:03 -08:00
Yi Zhang
7894d44d2d
Improve MacOS Cache Code (#13958)
### Description
Update cache key to make cache could be updated.
2022-12-14 20:47:09 +08:00
Edward Chen
b4dd5dda12
Revert "Update protobuf version to 3.18.3 in tools/ci_build/github/linux/docker/scripts/requirements.txt." (#13963)
Reverts microsoft/onnxruntime#13922
2022-12-13 18:15:06 -08:00
Edward Chen
b23395f977
Update protobuf version to 3.18.3 in tools/ci_build/github/linux/docker/scripts/requirements.txt. (#13922)
### Description
<!-- Describe your changes. -->

Update protobuf version to 3.18.3 in
tools/ci_build/github/linux/docker/scripts/requirements.txt.

### Motivation and Context

Address component governance alert CVE-2022-1941
2022-12-12 12:38:27 -08:00
Yi Zhang
2cb12caf93
Output cache stats (#13937)
### Description
Output cache stats
2022-12-12 15:22:13 +08:00
Changming Sun
89812a623e
Add two daily build jobs to validate some extra build configs (#13921)
### Description
Add two daily build jobs to validate some extra build configs


### Motivation and Context

To catch issues like: #13893
2022-12-10 09:15:14 -08:00
Adrian Lizarraga
db9c677b63
[EP Perf Dashboard] Add TensorRT 8.5.1.1 dockerfile (#13843)
### Description
- Adds a dockerfile for Ubuntu with TensorRT 8.5.1.1.
- Adds option to run EP Perf pipeline with TensorRT 8.5

### Motivation and Context
Necessary to benchmark models with TensorRT 8.5
2022-12-09 14:33:52 -08:00
shalvamist
d22be84add
Pin packaging to version 21.3 to address training pipeline failures 2022-12-09 09:05:55 -08:00
Changming Sun
81c2defd3b
Remove unused git submodules (#13830) 2022-12-07 21:59:16 -08:00
PeixuanZuo
7694b695a9
[ROCm] Simplify ROCm manylinux dockerfile (#13873)
### Description
<!-- Describe your changes. -->

1. Remove ROCm5.3 pipeline because it has rocblas bug, we don't need it.
2. We removed the dependency on centos docker image provided by
AMD(https://hub.docker.com/r/rocm/dev-centos-7) and build ROCm centos
base image by ourselves. The reference
dockerfile(https://github.com/RadeonOpenCompute/ROCm-docker/blob/master/dev/Dockerfile-centos-7)
is very redundant for our need. We simplified the ROCm manylinux
dockerfile.
3. Different versions of rocm use the same dockerfile
`Dockerfile.manylinux2014_rocm`.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-08 09:18:27 +08:00
Edward Chen
a64ddb36d0
Always build with XNNPACK EP in iOS CI build. (#13850)
Always build with XNNPACK EP in iOS CI build.
Combine builds for CPU, CoreML, and XNNPACK EPs due to limited build agent resources.
2022-12-07 16:08:34 -08:00
Changming Sun
d12521d7b2
Upgrade pybind11 (#13853)
Upgrade pybind11 to include the fix for #9735
2022-12-06 15:39:23 -08:00
Yi Zhang
78d18fbf34
Use CacheTask to Accelerate MacOS build (#13859)
### Description
Use CCache and ADO CacheTask to Accelerate MacOS build.
ref:
https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops

### Motivation and Context
The MacOS CI duration could be reduced from more than **70minutes** to
**10 minutes**

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=824912&view=results
2022-12-07 07:14:40 +08:00
Ashwini Khade
65201e47bf
Enable nuget packages for on device training (#13637)
### Description
This PR enables building nuget packages locally for on device training
using --build_nuget arg.
This PR also enables the C# bindings by default in the managed package.
If a user triggers any training apis when the native binary is not built
for training, an exception with message "Training is disabled in the
current build. Please build ONNXRuntime from source with the build flags
enable_training and enable_training_on_device. " is thrown.

Build command for creating nuget packes for on device training:
build.bat --enable_training --enable_training_on_device --build_nuget 

2 Nuget packages are built
1. Microsoft.ML.OnnxRuntime.Managed
2. Microsoft.ML.OnnxRuntime.Training OR
Microsoft.ML.OnnxRuntime.Training.Gpu



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-05 14:54:09 -08:00