Commit graph

2065 commits

Author SHA1 Message Date
Yi Zhang
560778fd07
use mac 12 for esrp code sign (#22134)
### Description
Fix regression caused by #17361 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-19 12:06:41 +08:00
Adrian Lizarraga
b8dae685e4
[QNN EP] Build Python 3.12 wheel for Windows ARM64 (#22118)
### Description
Builds arm64 python 3.12 wheel for QNN EP.


### Motivation and Context
2024-09-17 21:16:31 -07:00
Yi Zhang
b94ba09e4f
Upgrade XNNPACK to latest version (#22012)
### Description
Update XNNPack to latest version (Sep 4)
- Some op outputs are changed, channel or stride paras are moved into
reshape func.
e.g.
96962a602d
- input params of xnnpack's resize related function are changed a lot
- KleidiAI is added as a dependency in ARM64
- The latest XNNPACK includes 2 static libs microkernels-prod and
xnnpack.
Without microkernels-prod, it throws the exception of Undefined symbols.
- Add ORT_TARGET_PROCESSOR to get the real processor target in CMake
2024-09-17 10:12:16 -07:00
Jian Chen
fa68ae2def
Update pool to MacOS-13 (#17361)
### Description
See https://github.com/microsoft/onnxruntime-extensions/pull/476
and https://github.com/actions/runner-images/issues/7671

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Current issue
- [ ] For default xcode 15.2, that come with the MacOS-13, We Need to
update the boost container header boost/container_hash/hash.hpp version
to pass the build
- [x] For xcode 14.2 The Build passed but the `Run React Native Detox
Android e2e Test` Failed.
Possible flaky test, https://github.com/microsoft/onnxruntime/pull/21969
- [x] For xcode 14.3.1 We encountered following issue in `Build React
Native Detox iOS e2e Tests`
```
ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
Applied following code to the eof in both ios/Podfile and fixed the
issue
```
post_install do |installer|
    installer.generated_projects.each do |project|
        project.targets.each do |target|
            target.build_configurations.each do |config|
                config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0'
            end
        end
    end
end
```


- [x] https://github.com/facebook/react-native/issues/32483

Applying changes to ios/Pofile
```
pre_install do |installer|
  # Custom pre-install script or commands
  puts "Running pre-install script..."

  # Recommended fix for https://github.com/facebook/react-native/issues/32483
  # from https://github.com/facebook/react-native/issues/32483#issuecomment-966784501
  system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"")
end
```

- [ ] Detox environment setting up exceeded time out of 120000ms during
iso e2e test


### dependent 

- [x] https://github.com/microsoft/onnxruntime/pull/21159

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2024-09-17 10:07:30 -07:00
Changming Sun
59b7b6bb7c
Remove training from web ci pipeline (#22082)
### Description
Remove training from web ci pipeline


### Motivation and Context
2024-09-13 09:52:49 -07:00
mindest
951b1b7160
[CI] Linux ROCm CI Pipeline: fix error, set trigger rules. (#22069)
### Description
* Correct the wrong EP name for ROCm, fix CI error.
* Update `set-trigger-rules.py`.
* Modify the .yml via `set-trigger-rules.py`
2024-09-12 09:54:32 -07:00
Yi Zhang
ae39c40e5b
fix typo in iOS pipeline (#22067)
### Description
<!-- Describe your changes. -->



### Motivation and Context
The parameter isn't correct.
Maybe it hasn't negative impact by chance so far.

d8e64bb529/cmake/CMakeLists.txt (L1712-L1717)
2024-09-12 19:07:42 +08:00
jingyanwangms
4a5d66c15f
Default value 10.2->10.3 in linux-gpu-tensorrt-daily-perf-pipeline.yml (#21823)
### Description
Fix default value 10.2->10.3 in
linux-gpu-tensorrt-daily-perf-pipeline.yml

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-10 15:26:16 -07:00
George Wu
31ae11788a
[QNN EP] Update QNN SDK to 2.26 (#22037)
* update default QNN SDK version to 2.26
* enable layernorm implicit bias workaround for QNN 2.26
* update artifact names for py win arm64 and arm64ec to re-enable
ort-qnn-nightly arm64 python packages
2024-09-10 14:03:06 -07:00
Yi Zhang
de7a02beef
Add parameter for flexdonwload (#22009)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Thus, we can run Nuget_Packaging_GPU stage directly
2024-09-08 14:17:55 +08:00
Edward Chen
f3725b9f06
Use output variable from InstallAppleProvisioningProfile task to set provisioning profile UUID. (#22018)
This is more flexible than hardcoding the provisioning profile name or UUID. The name shouldn't usually change but it is not guaranteed to remain constant.
2024-09-06 18:00:34 -07:00
Edward Chen
970ebc2ccf
Fix typo in coreml_supported_mlprogram_ops.md (#22004)
### Description
<!-- Describe your changes. -->

Fix typo: ai:onnx -> ai.onnx

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Typo.
2024-09-06 12:50:56 +10:00
Edward Chen
0c398b3e52
Update Android NDK version to 27.0.12077973. (#21989)
Upgrade to newer version. r26 will be unsupported soon.
2024-09-05 17:57:24 -07:00
Scott McKay
8b661f7157
Fix DML packaging CIs (#21997)
### Description
<!-- Describe your changes. -->
The DML CIs build native and C# as well as sign DLLs in the same CI.
Some parts of that require .net 8 and some .net 6.
Update to use .net 8 in general, and revert to .net 6 for the signing.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline.
2024-09-05 22:30:40 +08:00
mindest
30f07758a2
Add packaging version constraint. (#21814)
### Description
Newer `setuptools` requires newer version of `packaging`, due to
function update.

### Motivation and Context
Fixes #21792
2024-09-04 16:57:04 -07:00
Prathik Rao
ed232dc1ef
Sets enable_windows_arm64ec_qnn to false in training CI (#21981)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 16:01:14 -07:00
Scott McKay
44fc7b443c
Update C# test projects (#21631)
### Description
<!-- Describe your changes. -->
Update various test projects to .net8 from EOL frameworks.
Replace the Xamarin based Android and iOS test projects with a MAUI
based project that uses .net 8.
Add new CoreML flags to C# bindings

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove usage of EOL frameworks.
2024-09-05 08:21:23 +10:00
Jian Chen
09d786fc14
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt (#21936)
### Description
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt



### Motivation and Context
By doing this, the package within os_packaging/requirements.txt can be
scanned by CG task
2024-09-04 13:18:05 -07:00
Edward Chen
cbf3c50d75
Improve stability of Android ReactNative E2E test (#21969)
- Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test
- Attempt to close any Application Not Responding (ANR) dialog prior to running Android test
- Add `--take-screenshots failing` option to detox test commands to save screenshots on failure
2024-09-04 08:41:07 -07:00
sfatimar
8dba8e3e24
Memory Optimization for Compilation in OVEP (#21872)
Calling Split API Calls Read+Model in lieu of unified Compile Model call
for export compile flow to ensure memory optimization. Freeing up model
proto and serialized string and read model ov ir later to free up memory
for the ahead pipeline
Optimization during EpCtxt flow
All the Graph related operations require all the Node Attributes to be
set while dealing with model instances internally with them, in the
existing implementation these attributes make a copy when constructing a
Graph dynamically during runtime.
Propose to use these attributes in place without creating a copy to
avoid memory allocation / copy while calling these Graph related
functions.
Changes to ensure the bug fixes related to openvino version and epctxt
file path.
Moving Compiler version to C++20 for getting r-value mem optimizations
benefit

### Motivation and Context
This change is required because memory optimization during Compilation
flow is too high.

---------

Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
2024-09-03 13:52:31 -07:00
Yulong Wang
bad00a3657
Add dependency dawn into deps.txt (#21910)
### Description

Add dependency dawn into deps.txt. This is a preparation for introducing
WebGPU EP.
2024-09-02 04:24:28 -07:00
Kyle
b1ae43cbcb
Add Files Signature Validation after Signed by ESRP (#21949)
### Description
<!-- Describe your changes. -->
Files signature validation after signed by ESRP.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Add validation after the ESRP process.
- Make sure the targeting pattern/suffix files are signed successfully
by ESRP.
- If the signature is not Valid, then will fail the following stages.
2024-09-02 17:16:59 +08:00
Yi Zhang
60b07623a2
Add a reminder in set-trigger-rules script (#21929)
### Description
After editing the set-trigger-rules.py, we must run the file.



### Motivation and Context
Obviously the script wasn't run because some files's name are incorrect.
2024-08-30 12:18:10 -07:00
mindest
bfa4da4f65
Add Linux ROCm CI Pipeline (#21798)
### Description

* Add new ROCm CI pipeline (`Linux ROCm CI Pipeline`) focusing on
inference.
* Resolve test errors; disable flaky tests.

based on test PR #21614.
2024-08-30 14:50:32 +08:00
dependabot[bot]
4ac1558498
Bump torch from 1.13.1+cpu to 2.2.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/torch_eager_cpu (#21919)
Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1+cpu to
2.2.0.
2024-08-29 21:57:24 -07:00
Yi Zhang
be76e1e1b8
Add dependent stages in nuget packaging pipelines (#21886)
### Description
Since the stage need to download drop-extra, it should add the
dependencies



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-29 11:34:10 +08:00
Jian Chen
e95277484e
Adding $(Build.SourcesDirectory)s to the ignoreDirectories (#21878) 2024-08-27 19:56:48 -07:00
George Wu
23f3912334
support both qnn x64 and arm64ec stages in py packaging pipeline (#21880)
both arm64ec and x64 packages are needed.  
x64 is needed for offline context binary generation
and arm64ec is needed for interop with python packages that don't have
prebuilt arm64 packages and only have x64.
2024-08-27 15:07:30 -07:00
Caroline Zhu
b7f09d4c27
Increase timeout for orttraining-linux-gpu pipeline (#21844)
### Description
Increase timeout to 160 minutes

### Motivation and Context
- Recent runs of orttraining-linux-gpu pipeline have been timing out
2024-08-27 11:47:12 -07:00
Jian Chen
7f851f4e61
Removing docker_base_image parameter and variables (#21864)
### Description
Removing `docker_base_image` parameter and variables. From the Cuda
Packaging pipeline.



### Motivation and Context
Since the docker image is hard coded in the 

`onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda12/Dockerfile`
and 

`onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda11/Dockerfile`
This parameter and variable is no longer needed.
2024-08-27 10:36:17 -07:00
Yi Zhang
2877de73e1
sign native dll with correct cert (#21854)
### Description
Fixed #21775



### Motivation and Context
The dlls should be signed with Keycode CP-230012.
The default is the test code sign.
2024-08-26 16:46:19 +08:00
Caroline Zhu
983c4d57a4
Fix typo for react native pipeline (#21845)
### Description
fix typo

### Motivation and Context
[RN pipeline
failing](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=188&_a=summary)
since #21578 with this error:

![image](https://github.com/user-attachments/assets/75e5b968-572f-42cc-9816-7940de464cfa)
2024-08-26 12:05:11 +10:00
Guenther Schmuelling
ba7baae994
Revert "Upgrade emsdk from 3.1.59 to 3.1.62" (#21817)
Reverts microsoft/onnxruntime#21421

Users are seeing chrome memory grow to 16GB before it crashes:
https://github.com/microsoft/onnxruntime/issues/21810

Revert for now so we have time to debug.
2024-08-22 11:21:00 -07:00
Jian Chen
6c1a3f85a6
Do not allow clearing Android logs if the emulator is not running (#21578)
### Description
Do not allow clearing Android logs if the emulator is not running



### Motivation and Context
Previously the Clearing Android logs step stuck until the pipeline
timeout. If one of the previous steps failed.
2024-08-22 10:18:01 -07:00
Yi Zhang
12f426c63f
update size limit check of training GPU wheel (#21762)
### Description
<!-- Describe your changes. -->



### Motivation and Context
The training wheel size limit should be 400M
2024-08-21 09:30:05 +08:00
Tianlei Wu
7c93d5ded1
Upgrade pytorch_lightning to 2.3.3 to fix orttraining_amd_gpu_ci_pipeline (#21789)
### Description
Upgrade pytorch_lightning to fix orttraining_amd_gpu_ci_pipeline
```
#24 1.838 WARNING: Ignoring version 1.6.0 of pytorch_lightning since it has invalid metadata:
#24 1.838 Requested pytorch_lightning==1.6.0 from cee67f4849/pytorch_lightning-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with `==` or `!=` operators
#24 1.838     torch (>=1.8.*)
#24 1.838            ~~~~~~^
#24 1.838 Please use pip<24.1 if you need to use this version.
#24 1.838 ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10
#24 1.838 ERROR: Could not find a version that satisfies the requirement pytorch_lightning==1.6.0 (from versions: 0.0.2, 0.2, 0.2.2, 0.2.3, 0.2.4, 0.2.4.1, 0.2.5, 0.2.5.1, 0.2.5.2, 0.2.6, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.4.1, 0.3.5, 0.3.6, 0.3.6.1, 0.3.6.3, 0.3.6.4, 0.3.6.5, 0.3.6.6, 0.3.6.7, 0.3.6.8, 0.3.6.9, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.5.0, 0.5.1, 0.5.1.2, 0.5.1.3, 0.5.2, 0.5.2.1, 0.5.3, 0.5.3.1, 0.5.3.2, 0.5.3.3, 0.6.0, 0.7.1, 0.7.3, 0.7.5, 0.7.6, 0.8.1, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.10.0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0rc1, 1.3.0rc2, 1.3.0rc3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.7.post0, 1.3.8, 1.4.0rc0, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0rc0, 1.8.0rc1, 1.8.0rc2, 1.8.0, 1.8.0.post1, 1.8.1, 1.8.2, 1.8.3, 1.8.3.post0, 1.8.3.post1, 1.8.3.post2, 1.8.4, 1.8.4.post0, 1.8.5, 1.8.5.post0, 1.8.6, 1.9.0rc0, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 2.0.0rc0, 2.0.0, 2.0.1, 2.0.1.post0, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.9.post0, 2.1.0rc0, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0)
#24 1.838 ERROR: No matching distribution found for pytorch_lightning==1.6.0
```
2024-08-19 12:58:22 -07:00
jingyanwangms
c018ba43ef
[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742)
### Description
- TensorRT 10.2.0.19 -> 10.3.0.26

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-18 13:26:41 -07:00
Edward Chen
63e8849992
build_aar_package.py - Check that executable is present before trying to copy it. (#21730)
Check that executable is present before trying to copy it.

Accommodate builds where we skip building the test executables.
2024-08-16 11:21:09 -07:00
Yi Zhang
8a59b4dc4b
Move Python Training CUDA 12.2 pipeline to another pool. (#21745)
### Description
<!-- Describe your changes. -->



### Motivation and Context
[Python Training CUDA 12.2
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary)
has been always cancelled by remote provider since Aug 2nd.
But other workflows with the same pool haven't this issue.
 It looks like there're some weird things in Azure devops.
It works by using another pool. In fact, the SKU is smaller than the
old.

### Verification
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
2024-08-15 17:31:56 +08:00
Satya Kumar Jandhyala
6d8de1f7b8
Upgrade emsdk from 3.1.59 to 3.1.62 (#21421)
### Description
Upgrade EM SDK to 3.1.62.



### Motivation and Context
The changes are required to clear wasm64 errors.
2024-08-14 12:38:52 -07:00
Prathik Rao
e32e3575d8
pin pytorch lightning version for training CI (#21731)
### Description
<!-- Describe your changes. -->

Pins pytorch-lightning package to version 2.3.3 since version >=2.4.0
requires torch > 2.1.0 which is not compatible with cu118.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

ORT 1.19 Release Preparation
2024-08-13 20:04:56 -07:00
Yi Zhang
6db3d63add
move the A100 stage to main build (#21722)
### Description
<!-- Describe your changes. -->



### Motivation and Context
We couldn't get enough A100 agent time to finish the jobs since today.
The PR makes the A100 job only runs in main branch to unblock other PRs
if it's not recovered in a short time.
2024-08-13 22:48:58 +08:00
George Wu
a8462ffb61
enable qnn python arm64ec packaging (#21575)
create the x64 qnn python package as arm64ec so it can be published
publicly.
2024-08-12 22:43:17 -07:00
Yulong Wang
6ae7e02d34
Web CI: make multi-browser test job optional (#21669)
### Description

This job is a little bit unstable. Make it optional to avoid blocking
other PRs before we revise it.
2024-08-09 23:53:26 -07:00
Scott McKay
410ae94e9e
Use zipped xcframework in nuget package (#21663)
### Description
<!-- Describe your changes. -->
The xcframework now uses symlinks to have the correct structure
according to Apple requirements. Symlinks are not supported by nuget on
Windows.

In order to work around that we can store a zip of the xcframeworks in
the nuget package.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix nuget packaging build break
2024-08-09 17:38:18 +10:00
Tianlei Wu
a46e49b439
Unblock migraphx and linux GPU training ci pipelines (#21662)
### Description
* Fix migraphx build error caused by
https://github.com/microsoft/onnxruntime/pull/21598:
Add a conditional compile on code block that depends on ROCm >= 6.2.
Note that the pipeline uses ROCm 6.0.

Unblock orttraining-linux-gpu-ci-pipeline and
orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline
pipelines:
* Disable a model test in linux GPU training ci pipelines caused by
https://github.com/microsoft/onnxruntime/pull/19470:
Sometime, cudnn frontend throws exception that cudnn graph does not
support a Conv node of keras_lotus_resnet3D model on V100 GPU.
Note that same test does not throw exception in other GPU pipelines. The
failure might be related to cudnn 8.9 and V100 GPU used in the pipeline
(Amper GPUs and cuDNN 9.x do not have the issue).
The actual fix requires fallback logic, which will take time to
implement, so we temporarily disable the test in training pipelines.
* Force install torch for cuda 11.8. (The docker has torch 2.4.0 for
cuda 12.1 to build torch extension, which it is not compatible cuda
11.8). Note that this is temporary walkround. More elegant fix is to
make sure right torch version in docker build step, that might need
update install_python_deps.sh and corresponding requirements.txt.
* Skip test_gradient_correctness_conv1d since it causes segment fault.
Root cause need more investigation (maybe due to cudnn frontend as
well).
* Skip test_aten_attention since it causes assert failure. Root cause
need more investigation (maybe due to torch version).
* Skip orttraining_ortmodule_distributed_tests.py since it has error
that compiler for torch extension does not support c++17. One possible
fix it to set the following compile argument inside setup.py of
extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17'].
However, due to the urgency of unblocking the pipelines, just disable
the test for now.
* skip test_softmax_bf16_large. For some reason,
torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so
the test was run in CI, but V100 does not support bf16 natively.
* Fix typo of deterministic

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-08 19:44:15 -07:00
Scott McKay
d616025884
Match changes in gh-pages PR (#21628)
### Description
<!-- Describe your changes. -->
Update to match #21627 and make the info for Split consistent.

As a Split that doesn't split anything is a no-op it doesn't seem
meaningful to call that limitation out in the docs.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-08 10:29:15 +10:00
Adrian Lizarraga
0acefc7988
[QNN EP] Update QNN SDK to 2.25 (#21623)
### Description
- Update pipelines to use QNN SDK 2.25 by default
- Update ifdef condition to apply workaround for QNN LayerNorm
validation bug to QNN SDK 2.25 (as well as 2.24)



### Motivation and Context
Use the latest QNN SDK
2024-08-06 09:08:48 -07:00
Yi Zhang
0d1da41ca8
Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612)
### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-08-06 21:37:09 +08:00
Edward Chen
a5ce65d87a
Clean up some mobile package related files and their usages. (#21606)
The mobile packages have been removed.
2024-08-05 16:38:20 -07:00