### Description
[DirectML EP] Add DML EP registration for Col2Im operator
### Motivation and Context
Add Col2Im support for opset 18.
This operator is implemented as the DirectML Fold operator.
---------
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
### Fix build when flash attention and memory efficient attention are
disabled
On a customer env with lower version of CUDA < 11.6. Both flash
attention and memory efficient attention is turned OFF according to
e8f33b54ba/cmake/CMakeLists.txt (L701).
So
e8f33b54ba/cmake/external/cutlass.cmake (L1)
condition check return false. No cutlass lib is built.
```
Turn off flash attention since CUDA compiler version < 11.6
```
While, the kernels in
https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/moe/ft_moe
are depending on cutass for its build, so we get error like this:
```
[ 77%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu.o
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
23 | #include "cutlass/array.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
23 | #include "cutlass/array.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
23 | #include "cutlass/array.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu:17:
/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_template.h:23:10: fatal error: cutlass/array.h: No such file or directory
23 | #include "cutlass/array.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal : Could not open input file /tmp/tmpxft_00044da3_00000000-11_moe_gemm_kernels_fp16_fp16.compute_60.cpp1.ii
make[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:6290: CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:2210: CMakeFiles/onnxruntime_providers_cuda.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
Traceback (most recent call last):
File "/tmp/onnxruntime/tools/ci_build/build.py", line 2746, in <module>
sys.exit(main())
File "/tmp/onnxruntime/tools/ci_build/build.py", line 2639, in main
build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1527, in build_targets
run_subprocess(cmd_args, env=env)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 824, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "/tmp/onnxruntime/tools/python/util/run.py", line 49, in run
completed_process = subprocess.run(
File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
```
### Motivation and Context
To summarize, there are two cases we will have build failure for Linux
CUDA build:
1. User use cuda version < 11.6
2. User disabled Flash attention and memory efficient attention
explictly with onnxruntime_USE_FLASH_ATTENTION and
onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION
Add ACL as the DNNL runtime option for aarch64 platforms. Update
makefile and the python wheel build script.
### Description
<!-- Describe your changes. -->
Add ACL as the DNNL runtime option for aarch64 platforms. Update
makefile and the python wheel build script.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is to enable the optimized ACL gemm kernels for dnnl execution
provider on aarch64 platform.
### Description
Truncate traling non-existing arguments.
Make sure we do not skip on the non-existing arguments in the middle,
because shape inferece relies on their proper position.
This also affects the argument position in the Edges that must be
properly rebuilt
each time If node branch is inlined.
Make sure that when we rename Defs in subgraphs, new renamed defs are
created in those subgraphs
instead of pointing to outer scope defs.
Add unit test.
### Motivation and Context
This is a follow up for
https://github.com/microsoft/onnxruntime/pull/18105
Currently, the non-trailing arguments are simply ignored and the edges
are created
with potentially incorrect positions.
### Description
<!-- Describe your changes. -->
1. Introduce MoE CUDA op to ORT based on FT implementation.
2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows.
Remove patch file for cutlass 3.0.0.
3. Sharded MoE implementation will come with another PR
limitation: __CUDA_ARCH__ >= 700
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add 32-bit patch binary and infra to fallback to it. The Azure devops
Windows CIs are missing patch.exe from their git install for some reason
so the default `find_package(Patch)` fails as that is where it expects
to find it.
Remove Eigen patch. Underlying issue was fixed in source 3 years ago by
c6c84ed961
and the patch command is invalid (args are for git apply not patch).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make usage of patch consistent across all CIs
Fix https://github.com/microsoft/onnxruntime/issues/15248
### Description
Some CMake scripts reference Microsoft.GSL::GSL. Most of the time, the
GSL package that is found on the system is used. However, when cuda is
enabled, it is downloaded and patched. Most CMake scripts rely on the
first case and forget about the second. This patch makes the second case
behave like the first case.
### Motivation and Context
This is an issue that occurs 'in the wild'. For example, I had to patch
this to be able to enable the CUDA provider for the onnxruntime conan
package (see https://github.com/conan-io/conan-center-index/pull/20392).
### Description
<!-- Describe your changes. -->
Fix the broken pieces due to the latest Abseil update.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
Make the debugging bearable.
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well
Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
- some ops use a workspace buffer
- copied workspace allocation from XNNPACK unit test code
- some suffixes changed
Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
- not an issue currently
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed
Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.
NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
### Description
Google test can be built either with absl/re2 or not. This PR enables
the build option so that google test framework can print out a nice
stacktrace when something went wrong. It helps locate test errors in CI
build pipelines.
Also, Google test will remove the build option and make it always ON. So
sooner or later we must make this change.
This reverts commit bb136f86c8, then
re-implement it in a different way.
I reverted the original change, then added a version constraint to the
find_package args.
If you still found it picks up wrong gtest version after this change,
you may disable `find_package` by setting
'FETCHCONTENT_TRY_FIND_PACKAGE_MODE' to NEVER. For example, the latest
gtest version is 1.14.0. If at a later time Google releases a new
version of gtest and that one is incompatible with the ONNX Runtime
source code you get today and your dev environment already installed the
new version and you do not want to create a new clean build environment
that is without the package, you can add `--cmake_extra_defines
FETCHCONTENT_TRY_FIND_PACKAGE_MODE=NEVER` to your build command to solve
the problem.
### Description
1. `onnxruntime_fetchcontent_makeavailable` works around unconditional
install commands so that can be used instead of `FetchContent_Populate`
2. This dependency is Windows specific, mark it as such.
### Motivation and Context
1. This simplifies `cmake/external/wil.cmake` not to do anything
specific wether WIL was fetched or found
2. Given it's specific to Windows, it might not be available on other OS
in specific air-gapped environment such as
[conan-center-index](https://github.com/conan-io/conan-center-index).
This allows downstream builds not to require specific patches for
something not required by the build in the first place.
### Description
Remove the onnxruntime-extensions submodule since it now was used via
cmake FetchContent
### Motivation and Context
The submodule relies on an outdated version of the extensions, and the
build instructions should be updated to eliminate any confusion.
### Description
This change upgrade emsdk to 3.1.44.
Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".
most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
### Description
<!-- Describe your changes. -->
Adjust nativs to display tagged strings.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Hard to debug without seeing names.
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
Changes allow downloading prebuilt protoc compiler when building
WebAssebly version on mac systems.
Otherwise it tries to build a js/wasm version of protoc and throws an
error while executing it: "protoc.js permission denied"
### Motivation and Context
I need to switch between my main working computer and a PC to make
changes to WebAssebly build. Would like not to do that anymore.
Otherwise, an unsupported version of gtest/gmock will be found at
/opt/conda/include for ROCm builds. Though this issue was initially
found for ROCm builds, the issue is generic. onnxruntime requires a
specific version of googletest and should not rely on locating
googletest using find_package.
The ROCm error was:
```
In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75,
from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47,
from /opt/conda/include/gmock/gmock-function-mocker.h:39,
from /opt/conda/include/gmock/gmock.h:61,
from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17:
/opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>::
MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal::
FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’:
/opt/conda/include/gmock/gmock-matchers.h:2303:10: required from here
/opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal::
FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’}
```
- Fix flatbuffers flatc warning, unused-but-set-variable.
- Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code).
- Update CI builds to use Xcode 14.3.
- Update minimum iOS version to 12.0.
- Update Mac hosted agents to MacOS 13 where possible.
### Description
DML_PACKAGE_DIR cmake variable is not getting set properly when dml_path
build options is used.
### Motivation and Context
- Why is this change required? What problem does it solve?
It is required for DML Perf dashboard.
<!--- If it fixes an open issue, please link to the issue here. -->
### Description
Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if
(CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code
look more nature.
For example,
```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY)
```
becomes
```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
```
### Description
Revert a change in #15797: restore the correct version of emsdk
### Motivation and Context
Without change, when you build it on Windows you will see:
```
2023-05-17 19:41:30,093 build [INFO] - Activating emsdk...
2023-05-17 19:41:30,093 util.run [INFO] - Running subprocess in 'C:\src\onnxruntime2\cmake\external\emsdk'
'C:\src\onnxruntime2\cmake\external\emsdk\emsdk.bat' activate 3.1.37
error: tool or SDK not found: '3.1.37'
```
When building the FlatBuffers dependencies, gcc13 emits a
stringop-overflow warning. All warnings being turned into errors, that
fails the compilation of FlatBuffers, and as a consequence also fails
the build of onnxruntime.
This commit adds the application of a patch to FlatBuffers's
CMakeList.txt, to add -Wno-error=stringop-overflow to the
CMAKE_CXX_FLAGS.
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.
### Motivation and Context
---------
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
### Description
- Update DML version to 1.11.0
- Disable Gemm+Softmax fusion
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Download protoc from Github Release instead of Nuget to avoid having
dependency on nuget.exe on Linux
### Motivation and Context
To avoid having dependency on nuget.exe on Linux. Many users' build
environment do not have nuget or dotnet.
### Description
While building ORT for DML EP with `dml_EXTERNAL_PROJECT` flag, 2
variables (`DML_SHARED_LIB`, `DML_PACKAGE_DIR`) value is not set
properly.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->