### Description
1. Enable VCPKG flag in Windows CPU CI build pipelines.
2. Increased the min supported cmake version from 3.26 to 3.28. Because
of it, drop the support for the old way of finding python by
"find_package(PythonLibs)". Therefore, in build.py we no longer set
"PYTHON_EXECUTABLE" cmake var when doing cmake configure.
3. Added "xnnpack-ep" as a feature for ORT's vcpkg config.
4. Added asset cache support for ORT's vcpkg build
5. Added VCPKG triplet files for Android build.
6. Set VCPKG triplet to "universal2-osx" if CMAKE_OSX_ARCHITECTURES was
found in cmake extra defines.
7. Removed a small piece of code in build.py, which was for support CUDA
version < 11.8.
8. Fixed an issue that CMAKE_OSX_ARCHITECTURES sometimes got specified
twice when build.py invoked cmake.
9. Added more model tests to Android build. After this change, we will
test all ONNX versions instead of just the latest one.
10. Fixed issues that are related to build.py's "--build_nuget"
parameter. Also, enable the flag in most Windows CPU CI build jobs.
11. Removed a restriction in build.py that disallowed cross-compiling
Windows ARM64 nuget package on Windows x86.
### Motivation and Context
Adopt vcpkg.
Remove PostBuildCleanup tasks since it is deprecated. It is to address a
warning in our pipelines:
"Task 'Post Build Cleanup' version 3 (PostBuildCleanup@3) is dependent
on a Node version (6) that is end-of-life. Contact the extension owner
for an updated version of the task. Task maintainers should review Node
upgrade guidance: https://aka.ms/node-runner-guidance"
Now the cleanup is controlled in another place:
https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/workspace?view=azure-pipelines
The code change was generated by the following Linux command:
```bash
find . -name \*.yml -exec sed -i '/PostBuildCleanup/,+2d' {} \;
```
### Description
1. Add XNNPack build on Linux ARM64
2. Build only one python wheel for PR request.
[AB#49763](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/49763)
### Motivation and Context
Why I add xnnpack build on Linux ARM64 rather than Windows ARM64.
Becuase KleidiAI doesn't support Windows
```
IF(XNNPACK_TARGET_PROCESSOR STREQUAL "arm64" AND XNNPACK_ENABLE_ARM_I8MM AND NOT CMAKE_C_COMPILER_ID STREQUAL "MSVC")
IF (XNNPACK_ENABLE_KLEIDIAI)
MESSAGE(STATUS "Enabling KleidiAI for Arm64")
ENDIF()
ELSE()
SET(XNNPACK_ENABLE_KLEIDIAI OFF)
ENDIF()
```
---------
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
https://github.com/onnx/onnx/issues/6047 and
https://github.com/microsoft/onnxruntime-genai/issues/257 .
### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v} libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v}
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
Package Architecture Version Repository Size
========================================================================================================================
Upgrading:
libnvinfer-devel x86_64 8.6.1.6-1.cuda12.0 cuda 542 M
libnvinfer-headers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 118 k
libnvinfer-headers-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 14 k
libnvinfer-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 13 M
libnvinfer-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 13 M
libnvinfer-vc-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 107 k
libnvinfer-vc-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 251 k
libnvinfer8 x86_64 8.6.1.6-1.cuda12.0 cuda 543 M
libnvonnxparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 467 k
libnvonnxparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 757 k
libnvparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 2.0 M
libnvparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 854 k
Installing dependencies:
cuda-toolkit-12-0-config-common noarch 12.0.146-1 cuda 7.7 k
cuda-toolkit-12-config-common noarch 12.2.140-1 cuda 7.9 k
libcublas-12-0 x86_64 12.0.2.224-1 cuda 361 M
libcublas-devel-12-0 x86_64 12.0.2.224-1 cuda 397 M
Transaction Summary
========================================================================================================================
```
As you can see from the output, they are CUDA 12 packages.
The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.
```bash
v="8.6.1.6-1.cuda11.8" &&\
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.
2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)