### Description
fix session option access in Node.js binding
### Motivation and Context
This is a bug that affect transformer.js using ONNX Runtime Node.js
binding. Issue: #17377
This bug is already fixed in main branch, but it is not picked in 1.16
release.
Cherry-pick the following PRs to the release branch:
Fix: Fail to skip disabledmodel in winml (#17728)
Move dotnet build and test into docker in Linux CPU CI (#17417)
Run Nuget_Test_Linux_GPU in container (#17452)
Run Final_Jar_Testing_Linux_GPU in docker (#17533)
TreeEnsemble speed up (#17449)
Remove onnxruntime extensions from list of gitmodules (#17615)
Include onnxruntime_float16.h in the package. (#17637)
Fix static quantization for QDQ and Percentile distribution (#17649)
[TensorRT EP] Back out the PerThreadContext (#17690)
Update nodejs to 18.x (#17657)
Update linux-wasm-ci.yml: remove the ln command (#17735)
### Description
1. Delete Prefast tasks (#17522)
2. Disable yum update (#17551)
3. Avoid calling patchelf (#17365 and #17562) we that we can validate
the above fix
The main problem I'm trying to solve is: our GPU package depends on both
CUDA 11.x and CUDA 12.x . However, it's not easy to see the information
because ldd doesn't work with the shared libraries we generate(see issue
#9754) . So the patchelf change are useful for me to validate the
"Disabling yum update" was successful. As you can see we call "yum
update" from multiple places. Without some kind of validation it's hard
to say if I have covered all of them.
The Prefast change is needed because I'm going to update the VM images
in the next a few weeks. In case of we need to publish a patch release
after that.
### Motivation and Context
Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will
crash every time when we use TensorRT.
Cherry-pick #17507 for rel-1.16.0.
Note: The PR 17507 contains the part of engine decryption refactor that
we don't want to include it in ORT 1.16 release. This cherry pick PR
excludes this part.
### Description
Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size.
### Motivation and Context
PR #17227 increased binary size by 20%. Right the package size is about
260MB. However, nuget has a hard limit of 250MB. Without this change we
cannot publish the package.
Disable QNN QDQ test for release branch
### Description
Disable QNN QDQ test for release branch to get rid of model test failure
caused by new model update in build image.
### Description
<!-- Describe your changes. -->
Use name of temporary provisioning profile.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The old provisioning profile no longer works. Switched to a temporary
one that we can use before a new one is available. The temporary one has
a different name.
Alternative to #17454.
### Description
When I worked on PR #17173, I didn't notice that
onnxruntime\core\platform\windows\debug_alloc.cc also needs to call
dbghelp functions like SymInitialize. So, if we use vc runtime's
stacktrace functionality, vc runtime will initialize/uninitialize the
dbghelp library independently and vc runtime's stacktrace helper DLLs
get unloaded before our memory leak checker starts get work. Then we
call SymSetOptions, it crashes.
More details:
In VC runtime the C++23 stacktrace functions are implemented on top of
dbgeng.dll. In C:\Program Files\Microsoft Visual
Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\crt\src\stl\stacktrace.cpp,
you can see it has:
```
dbgeng = LoadLibraryExW(L"dbgeng.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
```
The dbgeng.dll is a wrapper around dbghelp.dll. It calls SymInitialize
and SymCleanup. dbgeng.dll gets unloaded before our memory leak check
starts to run. In theory we should be able to call SymInitialize again
if the previous user who called SymInitialize has also called
SymCleanup. However, users can use
SymRegisterCallback/SymRegisterCallback64/SymRegisterCallbackW64 to
register callback functions to dbghelp.dll. These callback functions
need to be alive when SymSetOptions(and some other dbghelp APIs) get
called.
### Motivation and Context
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.
Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results
### Motivation and Context
Currently the pipeline has a crash error.
AB#18580
### Description
Enable typed binary and support int32 type for binary.
Co-authored-by: Xing Xu <xing.xu@intel.com>
---------
Co-authored-by: Xing Xu <xing.xu@intel.com>
### Description
Add SkipLayerNormalization operator to JSEP.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Minor PR to move some CUDA only on-device training tests to CPU as well.
This is to make sure we have good coverage for CPU too.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
(1) Remove class BertOptimizationOptions that has been deprecated a long
time ago
(2) Move sys path setttings to `__init__.py`, and update imports
(3) Fix bert_perf_test to run properly.
(4) Fix a onnx path in a whisper test case
(5) Fix a few typos
(6) Update comments in bert_perf_test regarding to graph inputs
### Description
Updates NuGet packaging pipelines to use the correct license name.
### Motivation and Context
The license name changed. See https://github.com/microsoft/onnxruntime/pull/17170
The QNN_Windows_Nuget and Zip-Nuget-* pipelines will not run without this update.
### Description
When functions are inlined and constant nodes are being converted to
initializers, we need to create NodeArg for them.
Similar for inlined function subgraph, but we choose to give priority to
non-constant nodes and then fill the gaps with constant and
initializers.
### Motivation and Context
This addresses issue
https://github.com/microsoft/onnxruntime/issues/16813 for
`eca_halonext26ts_mod.onnx` model where it fails to remove unused
initializer because `NodeArg` was not created for it.
### Description
Move DML build job's Prefast task to a CPU machine pool which has larger
memory. The current one runs out of memory in every run.
### Motivation and Context
To fix the broken python packaging pipeline.
- 'js/web'
- 'js/node'
- 'onnxruntime/core/providers/js'
is updated
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
1. Fix reading .dat and .onnx on Linux 2. Fix issue of compiling graph
twice
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
1. Previous we have not tested large model on Linux. When the model is
sperate into .dat and .onnx, it failed to load the model.
2. Check if the provider pointer is already existed. If existed, do not
create again.
This PR handles two changes:
1. There is an issue when running "Debug build" TRT EP with "Release
build" TRT builtin parser on Windows. Enforce use oss parser for Debug
build.
Note: args.config in build.py is an array, for example ["Debug",
"Release"...]. The code will be much mess if we made the change there.
2. Update to use latest commit of oss parser.
Please see the https://github.com/microsoft/onnxruntime/issues/16273
### Description
Re-implement stacktrace. The new implementation doesn't directly use
Windows API, hence can avoid problems regarding to
initialize/uninitialize the dbghelp library.
### Motivation and Context
### Description
For windows headers are not duplicated to the normal cuda include. For
linux they are:
```
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include/nvtx3 | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
```
Is the preference via those added defines or should the include just be
changed to be `nvtx3/` ?
Also there is no library linking needed on Windows and the library is
not even present.
### Description
Adds continuous integration and pull-requestion validation triggers
directly to the yaml file for the Windows x64 QNN CI Pipeline.
### Motivation and Context
There have been various unit tests failures that break the
QNN_Windows_Nuget pipeline, which builds QNN EP for Windows x64. This PR
ensures that QNN EP is built and tested on a Windows x64 image for every
pull request.
Dump statistics of input and/or output tensors of each node. It could
help to find out why a model outputs NaN.
To use this tool, just add `--cmake_extra_defines
onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1` when build onnxruntime package.
Then set some environment varaibles before running model with
onnxruntime:
```
export ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_STATISTICS_DATA=1
```
Then statistics data will be appended after the dumping of input and
output tensors.
One possible cause of a FP16 or mixed precision model outputs NaN: some
number exceeds the limit of FP16 (like max FP16 value is 65504). When a
fp32 model has value > 65504 in a node output, it will become INF when
converting the node to FP16. In this case, you need keep related nodes
in FP32 to avoid the issue. You can dump tensor statistics of FP32 model
to find out such candidate nodes.
### Description
Fix a typo. LayerNormalization takes 2 or 3 inputs. The third input,
bias, is optional.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Bug fix for OVEP graph provider options and fallback
### Motivation and Context
A bug fix logic is added to handle the fallback to CPU EP.
Corner case Assertions are added for ProviderOptions in OpenVINO.
---------
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: Saurabh Kale <saurabh1.kale@intel.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->