### Description
[Successful pipeline
run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results)
Added flag to build the training artifacts & updated the
pull-wasm-artifacts script to pull the training artifacts as well.
Bundled into this PR are minor formatting fixes + naming fixes.
### Motivation and Context
[This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended
the WASM API wrapper to build training WASM artifacts as well.
The ORT training WASM artifacts are required to support ORT training web
bindings.
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.
2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
### Description
Git commands producing `git-commid-id` and `git-branch` are always run
in `CMAKE_CURRENT_SOURCE_DIR` (i.e. `onnxruntime/cmake`)
### Motivation and Context
Please refer to corresponding issue
[#17197](https://github.com/microsoft/onnxruntime/issues/17197).
In flatbuffers@v23.5.9 was broken forward declaration for
FlatBufferBuilder. Trying to compile onnxruntime falls with the
following error:
```
flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder')
typedef FlatBufferBuilderImpl<false> FlatBufferBuilder;
^
onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here
class FlatBufferBuilder;
```
This PR removes these declarations and puts includes instead
### Description
* Created `wasm/training_api` source and header files & modified
WebAssembly CMake to include training flags
* The `wasm/training_api` files use an `OrtTrainingManager` handle which
is a struct of an OrtCheckpointState and an OrtTrainingSession, rather
than creating a CheckpointState handle & a separate TrainingSession
handle.
* This is so that the TypeScript side only has to manage one handle that
will be passed between TrainingSession & CheckpointState
representations, rather than the TypeScript side managing separate
CheckpointStateHandle and TrainingSessionHandle.
### Motivation and Context
WASM API needs to be updated with ORT training API function calls so
that ORT training web bindings can be added for on-device training.
---------
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: carzh <carolinezhu@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
### Description
This PR contains changes to support error pop and kernel name.
- Add a function `JsepGetNodeName` to allow reading kernel name from JS
to C++
- When in debug mode ( `env.debug = true;` ) or in profiling mode (
`env.webgpu.profilingMode = 'default';` ), kernel name will be read from
ORT; otherwise use the kernel pointer ( a number ) as kernel name to
save calls from JS to C++.
- When in debug mode, WebGPU validation errors will be recorded and if
any error occurs, `inferenceSession.run()` will fail (Promise get
rejected). Behavior when not in debug mode is not changed. This is
because recording errors are not zero-overhead, and GPU validation
errors should occur consistently in and not in debug mode.
- Add `jsepOnRunStart()` and `jsepOnRunEnd()` hook to:
- allow implementation of the features mentioned above.
- pass session ID to backend.
### Description
This PR adds the following scripts for LLaMA:
- LLaMA conversion (support for TorchScript and Dynamo exporters)
- LLaMA parity
- LLaMA benchmark
- LLaMA quantization
- LLaMA integration with [Hugging Face
Optimum](https://github.com/huggingface/optimum)
### Motivation and Context
This PR adds scripts for using LLaMA. There is a [follow-up
PR](https://github.com/microsoft/onnxruntime/pull/17043) for adding
scripts for Whisper.
Enable verbose logging in unit test program with environment variable.
E.g., `ORT_UNIT_TEST_MAIN_LOG_LEVEL=0 ./onnxruntime_test_all --gtest_filter="<test that I want to see more logs for>"`.
### Description
When I worked on PR #17173, I didn't notice that
onnxruntime\core\platform\windows\debug_alloc.cc also needs to call
dbghelp functions like SymInitialize. So, if we use vc runtime's
stacktrace functionality, vc runtime will initialize/uninitialize the
dbghelp library independently and vc runtime's stacktrace helper DLLs
get unloaded before our memory leak checker starts get work. Then we
call SymSetOptions, it crashes.
More details:
In VC runtime the C++23 stacktrace functions are implemented on top of
dbgeng.dll. In C:\Program Files\Microsoft Visual
Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\crt\src\stl\stacktrace.cpp,
you can see it has:
```
dbgeng = LoadLibraryExW(L"dbgeng.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
```
The dbgeng.dll is a wrapper around dbghelp.dll. It calls SymInitialize
and SymCleanup. dbgeng.dll gets unloaded before our memory leak check
starts to run. In theory we should be able to call SymInitialize again
if the previous user who called SymInitialize has also called
SymCleanup. However, users can use
SymRegisterCallback/SymRegisterCallback64/SymRegisterCallbackW64 to
register callback functions to dbghelp.dll. These callback functions
need to be alive when SymSetOptions(and some other dbghelp APIs) get
called.
### Motivation and Context
This PR handles two changes:
1. There is an issue when running "Debug build" TRT EP with "Release
build" TRT builtin parser on Windows. Enforce use oss parser for Debug
build.
Note: args.config in build.py is an array, for example ["Debug",
"Release"...]. The code will be much mess if we made the change there.
2. Update to use latest commit of oss parser.
Please see the https://github.com/microsoft/onnxruntime/issues/16273
### Description
Re-implement stacktrace. The new implementation doesn't directly use
Windows API, hence can avoid problems regarding to
initialize/uninitialize the dbghelp library.
### Motivation and Context
### Description
For windows headers are not duplicated to the normal cuda include. For
linux they are:
```
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include/nvtx3 | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
```
Is the preference via those added defines or should the include just be
changed to be `nvtx3/` ?
Also there is no library linking needed on Windows and the library is
not even present.
### Description
1. Clean up cmake files. Remove some unused code
2. Remove the "Semmle" task from
tools/ci_build/github/azure-pipelines/templates/win-ci.yml. Semmle is
deprecated and replaced by CodeQL.
### Description
1. `onnxruntime_fetchcontent_makeavailable` works around unconditional
install commands so that can be used instead of `FetchContent_Populate`
2. This dependency is Windows specific, mark it as such.
### Motivation and Context
1. This simplifies `cmake/external/wil.cmake` not to do anything
specific wether WIL was fetched or found
2. Given it's specific to Windows, it might not be available on other OS
in specific air-gapped environment such as
[conan-center-index](https://github.com/conan-io/conan-center-index).
This allows downstream builds not to require specific patches for
something not required by the build in the first place.
### Description
Remove the onnxruntime-extensions submodule since it now was used via
cmake FetchContent
### Motivation and Context
The submodule relies on an outdated version of the extensions, and the
build instructions should be updated to eliminate any confusion.
### Description
This PR fixes build break for WebAssembly introduced in
6986981482
(435ad2b1d8).
This change updates onnx.patch in onnxruntime repo. the corresponding PR
in onnx repo is: https://github.com/onnx/onnx/pull/5495.
It may takes a while for the next onnx version bump.
### Description
Some pipelines are failing. It is because PR #16325 set ONNX version to
`rel-1.14.1` . It is a branch name, not a commit or tag name. It means
whenever the branch got a new commit, we will auto pick it and use it.
### Description
This change upgrade emsdk to 3.1.44.
Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".
most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
This is in preparation for planned ROCm 6.0 changes that are not
backward compatible. However, the adjustments made by this PR to the
current onnxruntime cmake files will work with ROCm 5.x and 6.x.
### Description
1. Add "--windows_sdk_version" argument to build.py
2. Fix Windows Static Analysis build pipeline. It is failing because it
picks up a different Windows SDK version after a build machine image
update. If we can explicitly specify Windows SDK version, we can avoid
such things happening again.
3. Remove --enable_training from Windows Static Analysis build pipeline
because PR #16993 makes it incompatible with "no_rtti".
AB#18315
OpenVINO EP ORT 5.1 Branch
Changes for the new API to take in OpenVINO Provider Options
and compatibility with OV 2023.1
### Motivation and Context
The change is required for the new API to take in OpenVINO Provider
Options
and make it seamless.
---------
Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: saurabhintel0 <saurabh1.kale@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
### Description
<!-- Describe your changes. -->
Adjust nativs to display tagged strings.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Hard to debug without seeing names.
### Motivation and Context
When we handle PyTorch models' inputs in different places (ORTModule or
others), it's common for us to flatten a structured data into a 1-D
tensor list (required by lib for example torch.onnx.export,
torch.autograd.Function.forward or ORT inference session), then do
subsequent work, then unflatten back to original hierarchy as returned
values.
DeepStage3 hooks support work also need such a lib to do similar things,
so I was proposing to extract this pair of APIs in training/utils/,
which can be more used more generally. Also a comprehensive set of test
data are used for testing unflatten/flatten in unit tests.
Let me know if you have any other suggestions.
### Refactor schema extraction and output unflattening
Move `_extract_schema` and `unflatten_user_output` in
`orttraining/orttraining/python/training/ortmodule/_io.py` . to
`extract_data_and_schema` and `unflatten_data_using_schema` in
`orttraining/orttraining/python/training/utils/torch_io_helper.py` as
shared libs, which can be used later by other features (deepspeed stage
3 hook rewrite).
While there are still a few duplicated logic handling flatten with
different task by recursively loop the data struct, will change them
step by step in case of heavy review efforts.
Last week I fixed error #16484 found when trying to build onnxruntime
with the icpx compiler. Another thing I found out is that icpx uses
-ffast-math flag by default. You can check it by running the compiler
with -v flag like following:
```bash
# Setup the environment
. /opt/intel/oneapi/setvars.sh
# Compile any file to see all the implicit flags
icpx -v main.cpp
```
This leads to a bunch of warnings during the build like:
```bash
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/cpu/tensor/upsample_op_test.cc:5:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/provider_test_utils.h:6:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/checkers.h:10:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68:
In file included from /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/Core:172:
/mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/src/Core/MathFunctions.h:1019:12: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare]
return isnan EIGEN_NOT_A_MACRO (x);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
```
And some tests are failing as well, usually with infinities involved. To
list a few:
```bash
# ...
1: [ FAILED ] IsInfTest.test_isinf_float
1: [ FAILED ] IsInfTest.test_isinf_double
1: [ FAILED ] IsInfTest.test_isinf_positive_float
1: [ FAILED ] IsInfTest.test_isinf_positive_double
1: [ FAILED ] IsInfTest.test_isinf_negative_float
1: [ FAILED ] IsInfTest.test_isinf_negative_double
1: [ FAILED ] IsNaNOpTest.IsNaNFloat
1: [ FAILED ] IsNaNOpTest.IsNaNDouble
# ...
```
This PR adds a quick global check for the IntelLLVM compiler, as in the
way its name is reported by CMake and then, depending on the compiler
driver, sets either MSVC-like or GCC-like switch to disable fast-maths.
Probably a bit cleaner solution would be to use
```target_compile_options(${TARGET} PRIVATE MEOW)``` instead of a
global-wide ```set(CMAKE_CXX_FLAGS MEOW)```, but then we'd be required
to add it to all the individual targets and execution providers and this
will lead to a lot of code duplication.
In additions to `onnxruntime_test_all`, `onnxruntime_shared_lib_test`
and `onnxruntime_customopregistration_test` should
also add "-Wno-deprecated-declarations" flag to ignore compiler warning
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
Added Gelu operator to JSEP
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add some safety check for conv op.
It is to validate if the attributes coming from a conv op are in a valid
range. (shouldn't be too large or too small).