### Description
1. Remove Linux jobs for ORT-Extension combined build
2. Add a macOS build job for ORT-Extension combined build
3. Adjust the yaml file so that it can support two different ADO
instances.
### Motivation and Context
To test our code better. And it will enable us to run such tests for
every commit in the main branch. It would be easier for us to figure out
which change caused a build break.
See
[AB#13435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13435)
### Description
windows update python3.11
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
### Description
1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper
2. Enable CCache for orttraining pipeline
### Motivation and Context
Azure AMD CPU machines are generally much cheaper than Intel CPU
machines. However, they don't have local disks.
### Description
Pause caching the docker images in pipeline cache in Linux Aten
Pipeline.
### Motivation and Context
We need to work out a better way to reduce the storage.
### Description
`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.
This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.
Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.
Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.
### Notable changes
1. This PR **removed some suboptimal patterns**:
- `not xxx in` -> `xxx not in` membership checks
- bare excepts (`except:` -> `except Exception`)
- unused imports
The follow up PR will remove:
- `import *`
- mutable values as default in function definitions (`def func(a=[])`)
- more unused imports
- unused local variables
2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.
3. Removed the legacy flake8 ci flow and updated docs.
4. The added workflow supports SARIF code scanning reports on github,
example snapshot:

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unified linting experience in CI and local.
Replacing https://github.com/microsoft/onnxruntime/pull/14306
---------
Signed-off-by: Justin Chu <justinchu@microsoft.com>
### Description
Use build.sourceversion in docker image cache key.
### Motivation and Context
We used filpath as the cache key in #14496.
In most cases, the docker base image tag is latest.
So, the hash of the files couldn't be aware of the change of base image.
As the result, the docker image restored, but the image will still be
rebuilt .
The maintenance cost would be huge if we pin image hash in docker file.
For example,
https://quay.io/repository/pypa/manylinux2014_x86_64?tab=tags&tag=latest,
it's updated almost every week.
So far, the build.sourceversion is the right way to keep cache is
updated and valid.
### Description
<!-- Describe your changes. -->
1. upgrade cutlass to 3.0 that containing attn_bias support.
2. extend Attention/MHA to use memory efficient attention when
rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size
+ 1] are present.
new mask format introduction:
MASK_1D_KEY_SEQ_LEN_START,
[3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1],
query_start[0], ..., query_start[batch_size - 1], query_end[batch_size -
1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size -
1]]
e.g
2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this
1D mask is [3, 5, 0, 6, 12, 0, 6, 12]
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
It potentially benefits tnlrv6 and t5(encoder)
---------
Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
Check the Mac x86_64 packages installation.
### Motivation and Context
To avoid installation error, add packages smoking test before release.
### Description
<!-- Describe your changes. -->
This fix macos packaging build on universal2 arch.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Re-enable the react native e2e android unit test for react native CI as
recent change of specifying `default` instead of `google-apis` in
android emulator CI tests gives pretty stable result for now.
Upgrade the targetSDKversion for gradle test project in
react-native/android to meet minimum target api level requirement for
Google Play apps.
https://support.google.com/googleplay/android-developer/answer/11926878?hl=en
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
React Native CI issue.
### Description
BUG FIX: the if...else in telemetry-steps.yml does not really work. It
always says "Telemetry is disabled." even through the pipeline doesn't
have the pipeline variable.
### Motivation and Context
For example, recently I setup a new pipeline in
https://dev.azure.com/onnxruntime/onnxruntime/_build without setting the
ADO variable, but the powershell code still thinks that we have enabled
telemetry.
See:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=910107&view=results
The reason it didn't work because when the pipeline
variable("TELEMETRYGUID") doesn't exist, the occurrence of
"$(TELEMETRYGUID)" would be not replace to anything. It will remain as
it is.
### Description
- Add QNN 2.8 SDK
- Make QNN SDK version a pipeline template parameter for QNN pipelines.
### Motivation and Context
Updates to latest QNN SDK version, and allows testing different QNN SDK
versions without modifying yaml files.
- Use java/gradlew directly in .github/workflows/publish-java-apidocs.yml.
- Remove use of deleted step from tools/ci_build/github/azure-pipelines/android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml.
- Remove Gradle installations and PATH updates from Dockerfiles and scripts. Now Gradle wrapper is used so a system Gradle installation is not needed.
### Description
Make patch manylinux one single step.
### Motivation and Context
If we want to use hash of docker-related files as the cache key, the
files should keep consistent before and after docker build.
And changes in generated build_scripts should trigger rebuilding the
image as well.
### Description
tensorboard depends on rsa>=3.1.4, while rsa 4.5 has vuln issue, so pin
it to higher version as suggested
Fixed
[AB#7352](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/7352)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.
- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
### Description
Current pipeline refers to an old image which is causing test failures.
Updating the image to the latest one.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Fixes pipeline failure:
https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198
- If it fixes an open issue, please link to the issue here. -->
### Description
Split up the ORT build step in the Linux QNN CI Pipeline.
### Motivation and Context
Build errors were not being immediately reported at the end of the build
step. The build step currently concatenates multiple shell commands, and
the return code for the last (mkdir) was being reported. This PR ensures
that the return code of the `python build.py ...` command is reported
for the build step.
### Description
To reduce CUDA package's size a little bit. 37 is for Tesla K80. Azure's
NC-series uses it, but in most cases CUDA can dynamic generate device
code .
### Description
1. Remove Python 3.7 from the python packaging pipeline. It is planned
for the next release and approved by the PMs. Also we will add 3.11, but
it will be addressed in another PR.
2. Stop generating python packages based on Ubuntu 18.04 which will
reach EOL next month. We will either replace them with Ubuntu 20.04 or a
CentOS 8 variant.
### Description
<!-- Describe your changes. -->
Consume ONNX 1.13.1 in ONNX Runtime. (ONNX 1.13.0 to ONNX 1.13.1)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ONNX 1.13.1 patch was just released yesterday. This PR is making ORT's
ONNX submodule consistent with the latest released ONNX. Not sure
whether this PR is really needed, but let me make it ready. Previous PR
for testing ONNX 1.13.1rc2 :
https://github.com/microsoft/onnxruntime/pull/14634.
Fixed
[AB#13174](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13174)
.
### Description
<!-- Describe your changes. -->
Changes to support standalone custom ops in a minimal build. Also
incorporates changes from #14492 (needed to test builds prior to that
being checked in).
We first need to save the schema info from the operators used by the
standalone op invoker in the ORT format model. Add mechanism for that.
Merge the kernel lookup logic so the same is used in full and minimal
build. NOTE: the version matching is now consistent with all other
kernel lookups, and the call to CreateOp MUST use the exact version for
the operator. Previously matching wasn't as strict, but this can lead to
the incorrect kernel being chosen.
Add tests.
NOTE: There is currently no way to detect the ops/types/opsets used
inside these custom ops as they don't exist until we create kernels,
which is after model loading completes (which is the point the ORT
format model is saved). Due to that they have to be manually added to
the configuration used to do the reduced ops build. That shouldn't be
too hard for the custom op author to add given the custom op
implementation is specifying the op, opset and type constraints (i.e.
they have the info and it's just a case of capturing/formatting it
correctly).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable usage of the standalone op invoker by custom ops in a minimal
build.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
Make GPU job depends on all CPU jobs
### Motivation and Context
GPU resources are very limited in packaging pipeline.
And GPU test job is very time consuming.
Only one CPU job fails, the workflow fails, so the GPU job is
meaningless.
To utilize GPU resources more efficiently, run GPU job only after all
CPU jobs succeed.
###test pipeline
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=280905&view=results
### Description
allow onnxruntime_test_all to run in browser for WebAssembly build (use
flag `--wasm_run_tests_in_browser`).
To output the logs from stdout correctly, this test needs to be build
with `--enable_wasm_threads`.
### Description
<!-- Describe your changes. -->
Disable e2e android react native test temporarily to unblock the CI
failure with no easy fix.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Temp solution to unblock CI failure.
### Description
<!-- Describe your changes. -->
PR a change made to 1.14.1 into Main branch as well.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Merging extensions from Git submodule to cmake FetchContent
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Jian Chen <jchen351@MacBook-Pro.local>