onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-24 22:17:32 +00:00

Author	SHA1	Message	Date
Wenbing Li	d052c8a45c	Remove the extensions submodule (#17097 ) ### Description Remove the onnxruntime-extensions submodule since it now was used via cmake FetchContent ### Motivation and Context The submodule relies on an outdated version of the extensions, and the build instructions should be updated to eliminate any confusion.	2023-08-14 10:16:33 -07:00
Yulong Wang	5704e71b89	update onnx.patch to apply wasm build break fix (#17104 ) ### Description This PR fixes build break for WebAssembly introduced in `6986981482` (`435ad2b1d8`). This change updates onnx.patch in onnxruntime repo. the corresponding PR in onnx repo is: https://github.com/onnx/onnx/pull/5495. It may takes a while for the next onnx version bump.	2023-08-11 15:00:39 -07:00
Changming Sun	4728f20f9a	Fix CI build (#17118 ) ### Description Some pipelines are failing. It is because PR #16325 set ONNX version to `rel-1.14.1` . It is a branch name, not a commit or tag name. It means whenever the branch got a new commit, we will auto pick it and use it.	2023-08-11 10:56:38 -07:00
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
Bowen Bao	6986981482	Bump ONNX version (#16325 ) ### Description Bump ONNX version to https://github.com/onnx/onnx/tree/rel-1.14.1 to include a fix for segfault when shape inferencing nested onnx functions. ### Motivation and Context Resolves #16170	2023-08-10 11:27:28 -07:00
Jeff Daily	dbbfc249f7	[ROCm] update header and binary search paths used by cmake (#17083 ) This is in preparation for planned ROCm 6.0 changes that are not backward compatible. However, the adjustments made by this PR to the current onnxruntime cmake files will work with ROCm 5.x and 6.x.	2023-08-10 11:05:21 +08:00
Changming Sun	7d340256f1	Add "windows_sdk_version" build arg and fix SCA build pipeline (#17062 ) ### Description 1. Add "--windows_sdk_version" argument to build.py 2. Fix Windows Static Analysis build pipeline. It is failing because it picks up a different Windows SDK version after a build machine image update. If we can explicitly specify Windows SDK version, we can avoid such things happening again. 3. Remove --enable_training from Windows Static Analysis build pipeline because PR #16993 makes it incompatible with "no_rtti". AB#18315	2023-08-09 14:01:16 -07:00
sfatimar	2c5d4dce77	Openvino ep ort 5.1 (#17042 ) OpenVINO EP ORT 5.1 Branch Changes for the new API to take in OpenVINO Provider Options and compatibility with OV 2023.1 ### Motivation and Context The change is required for the new API to take in OpenVINO Provider Options and make it seamless. --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: saurabhintel0 <saurabh1.kale@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-08-09 11:50:10 -07:00
Dmitri Smirnov	07dfe34714	Fix FunctionProto visualization (#17063 ) ### Description Title ### Motivation and Context Need to debug function protos	2023-08-09 11:05:52 -07:00
Baiju Meswani	249917a093	Add mac and windows python packages for onnxruntime-training (#16993 )	2023-08-07 20:32:55 -07:00
Chen Fu	3c10f027de	4b quantization for weights of LLMs (#16833 ) ### Description Blockwise 4b quantization for LLMs. 1. Introduce 4b block-wise quantization for linear layer weights. 2. Implements matrix multiplication kernel for fp32 x int4 3. Implements special operator MatMulFpQ4 4. Implements quantization tool, that convert MatMul operator to MatMulFpQ4, when the right hand side is 2D const tensor. ### Motivation and Context Compress and accelerate LLMs \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|Q4GEMM/Q4Sym/M:1/N:4096/K:4096/Threads:8\| 218054\| \|Q4GEMM/Q4Sym/M:1024/N:4096/K:4096/Threads:8\| 35830155\| \|Q4GEMM/Q4Sym/M:2048/N:4096/K:4096/Threads:8\| 73479790\| \|Q4GEMM/Q4Zp8/M:1/N:4096/K:4096/Threads:8\| 270152\| \|Q4GEMM/Q4Zp8/M:1024/N:4096/K:4096/Threads:8\| 35826721\| \|Q4GEMM/Q4Zp8/M:2048/N:4096/K:4096/Threads:8\| 73021200\| \|Q4GEMM/Q4Sym128/M:1/N:4096/K:4096/Threads:8\| 213832\| \|Q4GEMM/Q4Sym128/M:1024/N:4096/K:4096/Threads:8\| 36749874\| \|Q4GEMM/Q4Sym128/M:2048/N:4096/K:4096/Threads:8\| 72618120\| \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|SGEMM/LLM/M:1/N:4096/K:4096/Threads:8\| 522610\| \|SGEMM/LLM/M:1024/N:4096/K:4096/Threads:8\| 39237689\| \|SGEMM/LLM/M:2048/N:4096/K:4096/Threads:8\| 75983467\| --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-08-07 12:23:55 -07:00
Dmitri Smirnov	d5e4bdbe7d	Fix protobuf TaggedStringPtr display (#17008 ) ### Description <!-- Describe your changes. --> Adjust nativs to display tagged strings. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Hard to debug without seeing names.	2023-08-04 17:51:01 -07:00
pengwa	a6887f171f	Refactor schema extraction and output unflattening (#16894 ) ### Motivation and Context When we handle PyTorch models' inputs in different places (ORTModule or others), it's common for us to flatten a structured data into a 1-D tensor list (required by lib for example torch.onnx.export, torch.autograd.Function.forward or ORT inference session), then do subsequent work, then unflatten back to original hierarchy as returned values. DeepStage3 hooks support work also need such a lib to do similar things, so I was proposing to extract this pair of APIs in training/utils/, which can be more used more generally. Also a comprehensive set of test data are used for testing unflatten/flatten in unit tests. Let me know if you have any other suggestions. ### Refactor schema extraction and output unflattening Move `_extract_schema` and `unflatten_user_output` in `orttraining/orttraining/python/training/ortmodule/_io.py` . to `extract_data_and_schema` and `unflatten_data_using_schema` in `orttraining/orttraining/python/training/utils/torch_io_helper.py` as shared libs, which can be used later by other features (deepspeed stage 3 hook rewrite). While there are still a few duplicated logic handling flatten with different task by recursively loop the data struct, will change them step by step in case of heavy review efforts.	2023-08-04 13:58:21 +08:00
Jeff Daily	1629a6fa75	[ROCm] add gfx1100 and gfx1101 to CMAKE_HIP_ARCHITECTURES (#16972 ) ### Description Support additional AMD GPU architectures. ### Motivation and Context AMD announced expanding support for additional GPUs. https://community.amd.com/t5/rocm/new-rocm-5-6-release-brings-enhancements-and-optimizations-for/ba-p/614745 This PR is how we will deliver that expanded support to onnxruntime.	2023-08-04 08:38:42 +08:00
Michael Klimenko	07e6648e12	Enable Intel oneAPI DPC++/C++ compiler build (#16587 ) Last week I fixed error #16484 found when trying to build onnxruntime with the icpx compiler. Another thing I found out is that icpx uses -ffast-math flag by default. You can check it by running the compiler with -v flag like following: ```bash # Setup the environment . /opt/intel/oneapi/setvars.sh # Compile any file to see all the implicit flags icpx -v main.cpp ``` This leads to a bunch of warnings during the build like: ```bash In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/cpu/tensor/upsample_op_test.cc:5: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/provider_test_utils.h:6: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/checkers.h:10: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68: In file included from /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/Core:172: /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/src/Core/MathFunctions.h:1019:12: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare] return isnan EIGEN_NOT_A_MACRO (x); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` And some tests are failing as well, usually with infinities involved. To list a few: ```bash # ... 1: [ FAILED ] IsInfTest.test_isinf_float 1: [ FAILED ] IsInfTest.test_isinf_double 1: [ FAILED ] IsInfTest.test_isinf_positive_float 1: [ FAILED ] IsInfTest.test_isinf_positive_double 1: [ FAILED ] IsInfTest.test_isinf_negative_float 1: [ FAILED ] IsInfTest.test_isinf_negative_double 1: [ FAILED ] IsNaNOpTest.IsNaNFloat 1: [ FAILED ] IsNaNOpTest.IsNaNDouble # ... ``` This PR adds a quick global check for the IntelLLVM compiler, as in the way its name is reported by CMake and then, depending on the compiler driver, sets either MSVC-like or GCC-like switch to disable fast-maths. Probably a bit cleaner solution would be to use ```target_compile_options(${TARGET} PRIVATE MEOW)``` instead of a global-wide ```set(CMAKE_CXX_FLAGS MEOW)```, but then we'd be required to add it to all the individual targets and execution providers and this will lead to a lot of code duplication.	2023-08-02 12:50:35 -07:00
Chi Lo	f4faceab28	Ignore deprecated declarations warning for TRT EP build (#16948 ) In additions to `onnxruntime_test_all`, `onnxruntime_shared_lib_test` and `onnxruntime_customopregistration_test` should also add "-Wno-deprecated-declarations" flag to ignore compiler warning	2023-08-02 09:51:58 -07:00
Changming Sun	73ddba964f	Update the MacOS/Linux build scripts that build/install protobuf from source (#16906 ) ### Description 1. As a follow-up of #16761, this PR allows build ORT on iOS/Android without the need to explicitly specify a protoc path. #16761 is for WASM. This one is for iOS/Android 2. Update the MacOS/Linux build scripts that build/install protobuf from source. Make them be more flexible. Add the support for RedHatEnterprise(ubi), which will needed for upgrading the base image from centos:7 to ubi:8. 3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile : the docker file's base image has preinstalled protobuf in /usr/local, we should uninstall them to avoid conflicts.	2023-07-31 10:51:48 -07:00
Dmitri Smirnov	50764362ac	Update protobuf Natvis visualization (#16911 ) ### Description Protobuf library update broke debug visualization. ### Motivation and Context Hard to debug	2023-07-31 09:35:21 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
Tianlei Wu	742edec5e8	[CUDA] Add PackedMultiHeadAttention operator (#16779 ) ### Description Add new operator for MultiHeadAttention with inputs removed padding. This only supports packed QKV format.	2023-07-28 16:35:38 -07:00
Changming Sun	161a9d1d6d	Add some safety check for conv op (#16839 ) ### Description Add some safety check for conv op. It is to validate if the attributes coming from a conv op are in a valid range. (shouldn't be too large or too small).	2023-07-27 16:37:55 -07:00
Yi Zhang	2e214d6e27	Workaround to upgrade VS2022 for Windows ARM build (#16826 ) ### Description ### Motivation and Context It should be reverted when VS2022 is upgraded to 17.7 or above. ### Vefication https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=331401&view=logs&j=7517abfd-115a-5c61-78a0-7ba3c9e3a88d	2023-07-25 08:35:52 +08:00
Arthur Islamov	210d29b40e	Allow --build_wasm on a mac system (#16761 ) ### Description Changes allow downloading prebuilt protoc compiler when building WebAssebly version on mac systems. Otherwise it tries to build a js/wasm version of protoc and throws an error while executing it: "protoc.js permission denied" ### Motivation and Context I need to switch between my main working computer and a PC to make changes to WebAssebly build. Would like not to do that anymore.	2023-07-21 14:21:37 -07:00
Jeff Daily	bb136f86c8	[ROCm][MIGraphX] for googletest dep, set OVERRIDE_FIND_PACKAGE (#16715 ) Otherwise, an unsupported version of gtest/gmock will be found at /opt/conda/include for ROCm builds. Though this issue was initially found for ROCm builds, the issue is generic. onnxruntime requires a specific version of googletest and should not rely on locating googletest using find_package. The ROCm error was: ``` In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75, from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47, from /opt/conda/include/gmock/gmock-function-mocker.h:39, from /opt/conda/include/gmock/gmock.h:61, from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17: /opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>:: MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal:: FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’: /opt/conda/include/gmock/gmock-matchers.h:2303:10: required from here /opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal:: FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’} ```	2023-07-21 00:57:38 +08:00
Edward Chen	f236768d5c	[ios] Enable `--use_extensions` with custom built iOS pod (#16711 ) - Fix link errors by including the needed onnxruntime-extensions libraries in the static framework. - Add Objective-C API to register custom ops from embedded onnxruntime-extensions. Caveat: Not all onnxruntime-extensions build options are working yet. E.g., building with the onnxruntime-extensions OpenCV dependency does not work.	2023-07-14 15:37:16 -07:00
Dmitri Smirnov	853c4ff0a5	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 ) ### Description Introduce `Float16/BFloat16` support for C# and C++ APIs. User should be able to perform conversions from `float` to/from `Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and whether the number is denormalized.` ### Motivation and Context User filed issues such as: https://github.com/microsoft/onnxruntime/issues/14303	2023-07-14 10:46:52 -07:00
Dipanjan Sengupta	a461608409	Amx flag removal (#16527 ) ### Description 1. Replacing AMX intrinsics with machine code macros in QGEMM kernel. 2. Removing AMX build flags for GCC in cmake file. 3. Fixing the link time optimization (LTO) issue introduced with asm .include of an assembly file. I have moved the AMX instruction macro definitions from QgemmU8S8KernelAmxCommon.S to the amx_common.h to fix the LTO issue. Note that I am also pushing the macros defined in QgemmU8S8KernelAmxCommon.S for future reference. A special thanks to @laxmansole who helped in the development of the instruction macro definitions for AMX intrinsics and fixing the LTO issue. ### Motivation and Context The additional AMX flag in cmake adds an extra layer of dependency on GCC version to use the feature.These changes should allow the usage of the AMX feature with just the CPU ID check.	2023-07-13 11:19:49 -07:00
Vincent Wang	c07a3b869c	Triton Codegen for ORTModule (#15831 ) Fuse connected elementwise and reduce Ops to TritonOp and codegen triton code to run the kernel. This PR is co-edited by @wejoncy and @er3x3	2023-07-13 18:17:58 +08:00
mindest	b7fd5af48b	[ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657 ) ### Description - Update existing rocBLAS get_solutions API using `*_get_solutions_by_type` (supported from ROCm5.6); remove the original nested TunableOp logic. - Update kernel_explorer.	2023-07-13 11:20:26 +08:00
cloudhan	3866614519	Avoid cmake repeatly printing DISABLE_FLOAT8_TYPES=ON (#16656 )	2023-07-13 09:29:20 +08:00
Scott McKay	ce68a4c06a	Fix Linux build failure when onnxruntime_DISABLE_ABSEIL=ON (#16373 ) ### Description <!-- Describe your changes. --> Add ort_value.h to session_options.h so OrtValue is defined. Update a unit test binary to add required include paths. Adding ort_value.h pulls in more data type headers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #16193	2023-07-12 11:23:18 +10:00
mindest	347c963d5c	[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196 ) ### Description - Refactor existing Triton TunableOp-related code (based on work in #15862) - Add GroupNorm Triton implementation	2023-07-11 13:55:30 +08:00
cloudhan	5fee3f4302	Remove the special min cmake for rocm (#16570 ) #15807 fixed the building error for rocm with cmake 3.26. The specialized relaxation of the cmake version is not needed anymore.	2023-07-10 13:19:48 +08:00
Edward Chen	6be7b03e53	Enable `-Wshorten-64-to-32` warning if available. (#16524 ) - Fix some warnings from Xcode build (`-Wshorten-64-to-32`). - Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet. - Some clean up in build.py including setting CMake generator more consistently.	2023-07-07 08:11:44 -07:00
Scott McKay	697dd12f6e	Re-organize the transpose optimization and layout transformation files. (#16246 ) ### Description <!-- Describe your changes. --> Split out the more basic changes from #15552 for easier review. Re-organize to clarify the structure - Separate out generic base functionality from ORT specific components - pass in handlers for internal ORT ops to Optimize - Split out layout transformation from transpose optimization - Separate out level 1 transpose optimizer - Cleanup some naming to try and clarify things like an optimizer vs. general optimization code Most of the changes are from this movement of code. Two implementation changes: - the extended handlers are queried first in GetHandler - allows the extended handlers to override the default behaviour for an ONNX operator - simplify the Optimize function to remove OptimizerMode. - `can_modify_node` is used instead of `mode` and `ignore_assigned_nodes` and a long description of the current usage is added. I don't _think_ that changes the current behavior and hopefully clarifies what happens and when, and makes the base transpose optimizer implementation more generic. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Create a cleaner separation to support adding EP specific logic next to cleanly handle where an EP has additional layout sensitive behaviour required (e.g. it's Resize implementation only handles one layout).	2023-07-07 08:24:47 +10:00
cao lei	0c5f492493	remove AllocatorMgr class (#16509 ) ### Description Remove AllocatorManager class ### Motivation and Context After the refactor PR #15833 is in, AllocatorManager class is not referenced anymore.	2023-06-28 15:43:19 -07:00
Vrajang Parikh	960e320dff	Objective C Training API: TrainingSession (#16374 ) ### Description - Implement Objective-C binding for `ORTTrainingSession` - Add `ORTUtils` utility class to handle conversion between C++ and Objective-C types - Add test case for saving checkpoint - Add unit test cases for `ORTTrainingSession` ### Motivation and Context This PR is part of implementing Objective-C bindings for training API. It implements objective-c binding for training session. The objective-C API closely resembles the C++ API. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-06-28 09:13:56 -07:00
Baiju Meswani	cbfbe210a8	Fix bug that accidentally disabled training op tests (#16488 )	2023-06-27 18:39:54 -07:00
Yifan Li	e2c214d81f	[TensorRT EP] TRT 8.6 minor version update (#16475 ) ### Description * Minor version update: TRT 8.6.0.12->8.6.1.6 * CI pipeline ymls/dockerfiles are updated * cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt binaries uploaded to [win img 307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results) * Re-enable unit tests which were failed in 8.6.0 and re-gained support in 8.6.1	2023-06-26 10:44:27 -07:00
Scott McKay	48eff09664	Fix file list for test of build with IO debug (#16474 ) ### Description <!-- Describe your changes. --> Update file list to adjust for recent changes to test infra. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-26 16:36:22 +10:00
Chen Fu	5c125b4366	Cfu revertamx (#16455 ) ### Description This is to revert two PRs that aim at reducing AMX toolchain requirements. Unfortunately we still have some pipeline issues. https://github.com/microsoft/onnxruntime/pull/16390 https://github.com/microsoft/onnxruntime/pull/16086 ### Motivation and Context Looks like gcc link time optimization does not work very well with inline assembly in the above PRs.	2023-06-23 09:20:23 -07:00
Baiju Meswani	10ba1e270c	Minimal Build for On-Device Training (#16326 ) 🛠️ __Changes in this pull request:__ This pull request introduces two significant changes to the project: - Changing on device training checkpoint format: The current implementation stores the on device training checkpoint as a sequence of tensors in multiple files inside a checkpoint folder, which can be inefficient in terms of storage and performance. In this PR, I have modified the checkpoint format to utilize the flatbuffer table to save the checkpoint to a single file, providing a more compact and efficient representation. The changes around this are twofold: - Add the checkpoint flatbuffer schema that will generate the necessary checkpoint source files. - Update the checkpoint saving and loading functionality to use the new format. - Adding support for onnxruntime minimal build: To support scenarios where binary size is a constraint, I made changes to ensure that the training build can work well with the minimal build. 🔍 __Open Issues:__ - In order to extract the optimizer type, the existing implementation re-loaded the onnx optimizer model and parsed it. This is no longer possible, since the model format can either be onnx or ort. One idea is to do the same for ort format optimizer model. This needs some investigation. - Changes to the offline tooling to generate ort format training artifacts. - End-to-end training example showcasing the use of the minimal training build. - Add support for export model for inferencing in a minimal build.	2023-06-22 12:27:23 -07:00
RandySheriffH	6e29e185f3	Clean AzureEP logics (#16367 ) Moving out AzureEP invokers out of core runtime. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-06-21 09:38:52 -07:00
Chi Lo	4e3cff60fd	CUDA graph support for TRT EP (#16081 ) CUDA EP already supports [CUDA graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs), also we observed some models can benefit from using CUDA graph with `trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP. The implementation is based on https://github.com/microsoft/onnxruntime/pull/9978 with the same [constraints](https://github.com/microsoft/onnxruntime/pull/9978) as below: - Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. - Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the TRT EP. - The input/output types of models need to be tensors. - Shapes of inputs/outputs cannot change across inference calls. - IObinding is required.	2023-06-21 09:36:45 -07:00
Yuhong Guo	48e6186b1a	Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161 ) ### Description <!-- Describe your changes. --> 1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all CUDA UTs through this ut lib without affecting production lib `onnxruntime_providers_cuda`. 2. Move all test cases from `core/providers/cuda/test/` to `test/providers/cuda/`. These test cases are built into lib `onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all --gtest_filter="CUDA_EP_Unittest"`. Since the lib is only for test, we can use gtest macros in the test cases. Previous implementation do not support using gtest lib in the CUDA UT cases. 3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a bit. A new function `onnxruntime_add_object_library` is to build a object target. The 2 libs `onnxruntime_providers_cuda_ut` & `onnxruntime_providers_cuda` share most of the code, so the object files can be used in both libs, which helps reduce build time. Another function `config_cuda_provider_shared_module` is used to configure all 3 similar targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut). 4. Refactored the test to call `testing::InitGoogleTest` & `RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`. After this change, we can see all the cases running in `CUDA_EP_Unittest.All`: ![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> After https://github.com/microsoft/onnxruntime/pull/13016, there are still test files in test/providers/cuda/ that are not moved to core/providers/cuda/test/ and the test cases are disabled. This PR helps to clean the unfinished TODOs. Even through onnxruntime_shared_lib_test covers some test for CUDA provider. onnxruntime_shared_lib_test works like a coarse grain end-to-end test for CUDA provider. If CUDA unittest can run cases for a single component, this wound be helpful for CUDA developers. --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-06-20 14:54:55 -07:00
Prateek Chokse	12dffef768	added support for cmake "find_package" (#8919 ) Description: Adds support for cmake find_package. Motivation and Context As mentioned in issue #7150 onnxruntime doesn't have support for CMake find_package, this PR adds that and also adds the CMake package version file. Now anyone can link onnxruntime like this: ```cmake find_package(onnxruntime) add_executable(test Source.cpp) target_link_libraries(test PRIVATE onnxruntime::onnxruntime) ``` this also simplifies #3124	2023-06-19 22:20:31 -07:00
Dipanjan Sengupta	35fa6af428	Fix for the build break in AMX feature on Mac OS. (#16390 ) ### Description Fixing the build break issue in Apple pipeline due to AMX flag removal.	2023-06-16 21:00:41 -07:00
Scott McKay	8fdfd20191	Separate out operator vs model testing. (#16228 ) ### Description <!-- Describe your changes. --> Split up OpTester to separate out operator vs model testing. This led to a lot of other cleanups/refactoring. - create BaseTester class and derived OpTester/ModelTester classes to limit APIs to what is applicable for each test type - e.g. adding an attribute isn't relevant to a model test - cleanup structure - don't expose member variables either directly or via public methods returning them - split out checkers so they can be easily re-used - refactor so there's one public Check method for comparing two OrtValue instances containing any data type - refactor the GradientOpTester usage - it required a lot of OpTester internals to be exposed and no other tests needed this - it also returned Status through various parts which prevented the usage of the google test macros which provide better output. change to return void and use the macros. - fix some other minor issues - update some cmake files so all the source files are included - remove some low value helpers (FetchTensor and GetShapeVector) - remove some outdated code to allow unreleased opset versions from when onnx opset 15 wasn't released - move files from test/util/include/test to test/util/include - doesn't seem to be any reason for the additional subdirectory given they're not files use to test the code in test/util - files were moved with no changes ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup test infrastructure. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-06-17 12:58:57 +10:00
saurabh	a6ce7b339f	Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147 ) ### Description This PR enables execution of subgraphs in OVEP and currently, when OVEP developers install the onnxruntime-openvino package on windows from pypi, they would have to additionally download OpenVINO windows binaries and run the setupvars.bat script which sets the environment PATH to locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer sample. ### Motivation and Context Fix: We want to make the user experience easy for OVEP Python developers on windows platform. This fix, introduces a function add_openvino_libs_to_path at the location tools/python/util/add_openvino_win_libs.py. The above function, can be called by OVEP python users in the application code and that takes care of setting the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino) which was installed. This change also makes sure that add_openvino_libs_to_path() function is added to onnxruntime python package only when it is build for OpenVINO Execution Provider for ONNXRuntime and not for default ORT python package builds. New user experience for Python OVEP developers on windows platform: step 1: pip install onnxruntime-openvino step 2: pip install openvino step 3: <Add these 2 lines in the application code> import onnxruntime.tools.add_openvino_win_libs as utils utils.add_openvino_libs_to_path() --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-06-16 19:47:09 -07:00
Silvio Traversaro	4915191e63	Fix build of Python wheel on Windows with single-config generator (#16337 ) ### Description Before this PR, the CMake code assumed that when on Windows a multiple-config CMake generator was used, while on non-Windows there was the assumption of a single-config CMake generator. After this PR this information is obtained from the [`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html) global CMake propery. ### Motivation and Context I discovered this problem when building with Ninja generator on Windows, but I guess this should fix problems also on non-Windows platforms when using a multiple-config generator (such as Xcode on macOS or "Ninja Multi-Config" on all platforms). See https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html for more info.	2023-06-16 09:17:49 -07:00

1 2 3 4 5 ...

1432 commits