### Description
<!-- Describe your changes. -->
* Leverage template `common-variables.yml` and reduce usage of hardcoded
trt_version
8391b24447/tools/ci_build/github/azure-pipelines/templates/common-variables.yml (L2-L7)
* Among all CI yamls, this PR reduces usage of hardcoding trt_version
from 40 to 6, by importing trt_version from `common-variables.yml`
* Apply TRT 10.5 and re-enable control flow op test
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Reduce usage of hardcoding trt_version among all CI ymls
### Next refactor PR
will work on reducing usage of hardcoding trt_version among
`.dockerfile`, `.bat` and remaining 2 yml files
(download_win_gpu_library.yml & set-winenv.yml, which are step-template
yaml that can't import variables)
### Description
TensorRT 10.4 is GA now, update to 10.4
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- TensorRT 10.2.0.19 -> 10.3.0.26
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
* promote trt version to 10.2.0.19
* EP_Perf CI: clean config of legacy TRT<8.6, promote test env to
trt10.2-cu118/cu125
* skip two tests as Float8/BF16 are supported by TRT>10.0 but TRT CIs
are not hardware-compatible on these:
```
1: [ FAILED ] 2 tests, listed below:
1: [ FAILED ] IsInfTest.test_isinf_bfloat16
1: [ FAILED ] IsInfTest.test_Float8E4M3FN
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
This branch is based on rel-1.18.0 and supports TensorRT 10-GA.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
### Description
1. Update donwload-artifacts to flex-downloadartifacts to make it eaiser
to debug.
2. Move the native files into Gpu.Windows and Gpu-linux packages.
Onnxruntime-Gpu has dependency on them.
3. update the package validation as well
4. Add 2 stages to run E2E test for GPU.Windows and GPU.Linux
for example:

### Motivation and Context
Single Onnxruntime.Gpu Package size has already excceded the Nuget size
limit.
We split the package into some smaller packages to make them can be
published.
For compatibility, the user can install or upgrade Onnxruntime.Gpu,
which will install Gpu.Windows and Gpu.Linux automatically.
And the user can only install Gpu.Windows and Gpu.Linux directly.
### Test Link
1. In ORT_NIGHTLY
2. Install the preview version in nuget-int. (nuget source:
https://apiint.nugettest.org/v3/index.json)
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
### Description
1. Add a Memory Profiling build job
2. Remove no absl build job since the feature will be removed
3. Simplify post-merge-jobs.yml by unifying the pool names
### Motivation and Context
To catch build errors in #16124
### Description
Fix the bug in #15693
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR creates Nuget and Android for Training.
### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.
## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**
### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**
### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
### Description
Add parameters to make some stages could use other run's intermediate
output.
### Motivation and Context
nuget workflow has 38 stages of 4 layers.
We had to run the whole workflow from begining to test one stage.
It could make life easier to run only one stage for testing.
like

### N.B.
In this PR, Nuget_Test_Linux_CPU, Nuget_Test_LinuxGPU and
Jar_Packaging_GPU are enabled as the first step.
So I can start to move tests from Linux host to container
### Description
1. Disable XNNPack EP's tests in Windows CI pipeline
The EP code has a known problem(memory alignment), but the problem does
not impact the usages that we ship the code to. Now we only use XNNPack
EP in mobile apps and web usages. We have already pipelines to cover
these usages. We need to prioritize fixing the bugs found in these
pipelines, and there no resource to put on this Windows one. We can
re-enable the tests once we reached an agreement on how to fix the
memory alignment bug.
2. Delete anybuild.yml which was for an already deleted pipeline.
3. Move Windows CPU pipelines to AMD CPU machine pools which are
cheaper.
4. Disable some qdq/int8 model tests that will fail if the CPU doesn't
have Intel AVX512 8-bit instructions.
Merge CPU/GPU nuget pipeline. The old GPU nuget pipeline will be only for DML.
TODO: the result GPU package contains PDB files for some of the DLLs, but not all. It is due to the refactoring of CUDA EP to pluggable DLLs. At that time we forgot to copy the PDB files. However, I can't add them in now. Because currently the package is already 220MB large. If the missed PDB files were added, then it will be oversize. nuget.org doesn't accept >250MB packages.
1. Remove some unused code and simplify tools/ci_build/github/linux/run_dockerbuild.sh.
2. Enable Nuget CUDA tests. The original design was we could leverage Directory.Build.props and let cmake generate the required properties(USE_CUDA/...) there. However, in nuget packaging pipeline we test the package on a different host that doesn't run cmake command and doesn't have the auto-generated Directory.Build.props file.
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs).
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2. (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use.
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines. In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build.
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines.
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
1. Remove openmp related packaging pipelines and build jobs.
2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently.
3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support
4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols
Add python 3.8/3.9 support for Windows GPU and Linux ARM64
Delete jemalloc from cgmanifest.json.
Add onnx node test to Nuphar pipeline.
Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate.
Delete Java GPU packaging pipeline
Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure.
Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch.
Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits
And, due to some internal restrictions, I need to rename some of the agent pools
Update gpu packaging pipelines to CUDA11
In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
* add centos tests to linux cpu ci pipeline
* Disable failing test
* use centos6 instead of centos7
* change back to centos7
* add dotnet runtime dependency
* fix dotnet runtime dependencies
* install dotnet sdk instead of runtimes
* add more dotnet dependencies
* temporary skip failing test
* ix lib path
* reenable failing test
* add SAS token to download internal test data for nuget pipeline
* update azure endpoint
* fix keyvault download step
* fix variable declaration for secret group
* fix indentation
* fix yaml syntax for variables
* fix setting secrets for script
* fix env synctax
* Fix macos pipeline
* attempt to add secrets to windows download data
* fix mac and win data download
* fix windows data download
* update test data set url and location
* Simplify linux gpu pipeline
* Refactor win-gpu-ci-pipeline.yml
* Set cuda environment variables for testing and version
* Remove variables from starter script
* minor fix
* Add GPU Nuget pipeline
* Set DisableContribOps environment variable for Linux package tests
* Add ESRP tasks
* Add ESRP signing templates
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test out variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* update cpu pipeline to conditionally esrp sign
* Set C# GPU tests to run only if env var is set
* Refactor for easy parameter passing
* refactored esrp templates
* remove variables from template
* Add packaging variables back to pipelines
* update C# for cuda 10
* Merge vars ana parameters for gpu pipeline
* remove vars from mklml pipeline
* display envvars on terminal
* Clean up C# cuda tests, and upgrade to Cuda10
* Introduce CUDNN_PATH pipeline varaible
* YAML variable are always uppercased (not true with classic)
* Update C# GPU test to be more meaningful
* remove macos from gpu tests
* remove debugging info for DisableContribOps option
* Remove DisableContrib ops parameters -- use variables only
* Fix typo from = to -
* remove debug steps
* fix typo
* remove unused variable TESTONGPU from some templates
* clean up CUDA env setup scripts
* Remove CUDNN_PATH from setup_env_cuda.bat
* Initial check in
* Add win x86
* minor update to x86
* update win-ci
* update win-ci
* update win-x86ci
* add linux and mac templates
* add nuget pipelines and test templates
* remove buildConfig
* add compliance template
* fix minor typos
* update pool for macos
* update mac agent pool
* update macos pool
* update agent pools for tests
* turn off debug build for testing
* some modifications to packaging scripts
* change ordering of compliance tasks
* Add mklml pipeline
* Add packagename variable to mklml pipeline
* remove unrequired dependent jobs from mklml pipeline
* Update build command for macOS legs in mklml and cpu pipeline
* Set vcvars to true
* Add no contrib ops pipeline
* Add no-contrib-ops pipeline
* set vcvars to true for package tests
* remove repetition in nuget templates
* get buildarch correct
* get name of test template correct
* remove steps from test_all_os.yml
* add parameters to test_all_os.yml
* Need jobs, not steps
* set envars for disablecontrib ops
* add cleanup tasks and CG to package tests
* fix path to cleanup script for macos
* remove buildDirectory -- not needed
* remove fp16tiny_yolov2 model from nocontribops tests
* remove debugging info
* fix individual linux pipelines to use correct template
* remove unneeded bak_latest2
* increase timeout to 120 to allow for variance
* turn off code coverage report