onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Dmitri Smirnov	fdb132643d	Remove redundant Resolve() after each inlined function (#17556 ) ### Description Remove `Resolve()` on the entire graph as each function is resolved. We retain `Resolve()` after each inlining iteration. ### Motivation and Context Poor performance for inlining the model and session initialization. Original model before Resolve() removal FunctionTest.Profiling (65953 ms) After Resolve() Removal FunctionTest.Profiling (2911 ms) RelWithDebInfo pre-inlined model. Presumably because it runs Level1 optimizers Non-inlined model consists of functions and Level1 optimizers have no effect. FunctionTest.Profiling (9851 ms)	2023-09-15 12:13:37 -07:00
cao lei	32f5658abb	remove gsl to make status.h independent from gsl (#17402 ) ### Description <!-- Describe your changes. --> Make status.h independent from gsl. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In the coming new feature external EP API (see the prototype https://github.com/microsoft/onnxruntime/pull/16718), we need to expose stream in the public header, however, stream is dependent on status.h which is dependent on gsl. We are seeking a way to decouple stream from gsl. From Changming's comment offline, prefast is disabled so all GSL_SUPPRESS are not taking any effect now. He will handle the warnings when enable prefast in the future	2023-09-13 21:47:43 -07:00
Yulong Wang	550293d9ad	OrtMemoryInfo: support new name "WebGPU_Buffer" (#17469 ) ### Description Add new name "WebGPU_Buffer" to OrtMemoryInfo. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 (this one)	2023-09-08 16:37:35 -07:00
Xavier Dupré	024f1dd72b	Fix float 8 rounding on CPU (#16940 ) ### Description Fix float 8 rounding issues discovered in issue #16938 (only CPU provider).	2023-09-07 20:48:25 +02:00
RandySheriffH	6c39641ea2	Fix a memleak in RunAsync python (#17326 ) Release ort value outputs that are created and released from ort::run(...). --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-30 12:54:17 -07:00
Artem Shilkin	6e60dba726	Fix compilation with newer flatbuffers (#17164 ) In flatbuffers@v23.5.9 was broken forward declaration for FlatBufferBuilder. Trying to compile onnxruntime falls with the following error: ``` flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder') typedef FlatBufferBuilderImpl<false> FlatBufferBuilder; ^ onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here class FlatBufferBuilder; ``` This PR removes these declarations and puts includes instead	2023-08-29 10:28:26 -07:00
pengwa	18d5cfdb85	Fix build - redefinition of default argument for ‘long unsigned int Extent’ (#17281 ) ### Fix build - redefinition of default argument for ‘long unsigned int Extent’ One of the training customer env, building ORT, there is such a build error. The GCC version are ``` aiscuser@node-0:/tmp/onnxruntime$ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 aiscuser@node-0:/tmp/onnxruntime$ g++ --version g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 ``` But on our dev node using same GCC/G++, we don't have build issue., not sure what's the difference but giving an explict type when creating `gsl::span` fixed the problem. ``` /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:394:7: error: redefinition of default argument for ‘long unsigned int Extent’ 394 \| class span \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span_ext:46:51: note: original definition appeared here 46 \| template <class ElementType, std::size_t Extent = dynamic_extent> \| ^~~~~~~~~~~~~~~ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:82:93: error: return type ‘class gsl::span<const std::byte>’ is incomplete 82 \| [[nodiscard]] inline gsl::span<const std::byte> AsByteSpan(const void* data, size_t length) { \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h: In function ‘void onnxruntime::AsByteSpan(const void, size_t)’: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: class template argument deduction failed: 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: no matching function for call to ‘span(const std::byte, size_t&)’ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: candidate: ‘template<class Type, long unsigned int Extent> gsl::span(Type (&)[Extent])-> gsl::span<ElementType, FirstExtent>’ 740 \| span(Type (&)[Extent]) -> span<Type, Extent>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘Type [Extent]’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: candidate: ‘template<class Type, long unsigned int Size> gsl::span(std::array<_Tp, _Nm>&)-> gsl::span<ElementType, FirstExtent>’ 743 \| span(std::array<Type, Size>&) -> span<Type, Size>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘std::array<_Tp, _Nm>’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte*>(data), length); \| ^ ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 00:40:40 +08:00
Scott McKay	b3cb775cf9	Two fixes involving minimal builds (#17000 ) ### Description <!-- Describe your changes. --> - allocation planner was breaking if graph had no nodes - in this particular model a branch of an If node returned an outer scope value directly. - if model used non-tensor types and sparse tensors are disabled the call to IsSpareTensor causes an exception when prematurely terminates the code. - it's perfectly fine to check if a value is a sparse tensor when support for them is disabled. we just can't do anything with that OrtValue which is what the current ifdef's after the call to IsSparseTensor handle. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix model execution failure for partner with model that uses sequences in a minimal build with sparse tensors disabled.	2023-08-23 16:01:22 +10:00
Edward Chen	ae62d752d6	Prevent GSL_SUPPRESS arguments from being modified by clang-format (#17242 ) Prevent `GSL_SUPPRESS` arguments from being modified by clang-format and update existing usages. clang-format was changing something like `GSL_SUPPRESS(r.11)` to `GSL_SUPPRESS(r .11)`. For some compilers (e.g., clang), the `gsl::suppress` attribute takes a quoted string argument. We don't want to insert spaces there.	2023-08-22 18:26:53 -07:00
Edward Chen	d6cd41cfc1	[CoreML EP] Add Shape, Gather, and Slice ops (#17153 ) Add CoreML EP shape related ops: - Shape - Gather - Slice Add support for int64/int32 inputs in CoreML EP.	2023-08-18 22:34:34 -07:00
Dmitri Smirnov	5c54b64a63	Create NodeArgs for all Constant nodes and initializers for functions being inlined (#17089 ) ### Description When functions are inlined and constant nodes are being converted to initializers, we need to create NodeArg for them. Similar for inlined function subgraph, but we choose to give priority to non-constant nodes and then fill the gaps with constant and initializers. ### Motivation and Context This addresses issue https://github.com/microsoft/onnxruntime/issues/16813 for `eca_halonext26ts_mod.onnx` model where it fails to remove unused initializer because `NodeArg` was not created for it.	2023-08-17 14:22:28 -07:00
Changming Sun	5249b7ab7c	Re-implement stacktrace (#17173 ) ### Description Re-implement stacktrace. The new implementation doesn't directly use Windows API, hence can avoid problems regarding to initialize/uninitialize the dbghelp library. ### Motivation and Context	2023-08-16 16:07:49 -07:00
RandySheriffH	3dd2c1b4d7	EP context for custom op (#16454 ) Implement infrastructures to allow EP resources surfaced to custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-16 13:03:40 -07:00
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
Chi Lo	7361c283c7	Add API for updating CUDA EP provider option user compute stream (#17037 ) Add a generic `UpdateCUDAProviderOptionsWithValue()` C API to update CUDA EP provider options where its data type is pointer that can't be represented by string. Note: Please see some comments for the similar [PR ](https://github.com/microsoft/onnxruntime/pull/16965)for TRT EP.	2023-08-09 09:24:19 -07:00
Chi Lo	fc8003349e	Add API for updating TRT EP provider option user compute stream (#16965 ) Add a generic `UpdateTensorRTProviderOptionsWithValue()` C API to update TensorRT provider options where its data type is pointer that can't be represented by string.	2023-08-04 15:14:43 -07:00
Edward Chen	f98d3f8a23	[CoreML EP] Enable inputs with dynamic shape (#16915 ) Enable node inputs with dynamic shape to be handled by the CoreML EP.	2023-08-03 18:15:00 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
Dmitri Smirnov	bf006d34a9	Used feature macro for if constexpr in a public header (#16836 ) ### Description Use feature macro for `if constexpr` ### Motivation and Context We still do not require customers to use C++17 compiler.	2023-07-25 21:42:30 -07:00
kunal-vaishnavi	b7176f9826	Fix bug with saving model optimized by inference session (#16716 ) ### Description A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531) added a temporary directory to save the model optimizations after loading a model into an `InferenceSession`. Many models that have an external data file, however, require the data file to be in the same directory as the ONNX model file. Because the model is saved in a temporary directory and the data is saved in another directory, this causes a `FileNotFoundError` error when trying to load the model in the temporary directory. This PR fixes this error by saving the external data file in the same directory that the optimized model is located in. ### Motivation and Context This PR fixes a bug with using a temporary directory while running the optimizer for models that have an external data file.	2023-07-20 18:44:28 -07:00
Xavier Dupré	2bc9fbb621	Fix url in the code documentation (graph optimizations) (#16770 ) ### Description Fix a wrong url in the documentation as mentioned in issue #16678. ### Motivation and Context Better documentation.	2023-07-20 07:02:22 -07:00
Dmitri Smirnov	e752cbe7f2	Work on eliminating Internal Compiler Error (#16741 ) ### Description <!-- Describe your changes. --> Replace the offending bitwise `operator \|` with if() logic for ARM.	2023-07-18 10:17:52 -07:00
cloudhan	a45b834722	Fix warning about uninitialized member (#16736 ) #16506 Cause almost every translation units on linux complaint ``` [1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17, from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16, from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7, from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4: /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’: /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66: required from here /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor 241 \| union float32_bits { \| ^~~~~~~~~~~~ /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’ 242 \| unsigned int u; \| ^ ``` This PR shut the compiler up.	2023-07-17 11:33:54 -07:00
Dmitri Smirnov	b8c40b7813	Fix parameter naming that fails Doc generation. (#16717 ) ### Description Rename `FromBits` param name to match the docs. ### Motivation and Context Fix API Doc generation.	2023-07-16 22:02:05 -07:00
RandySheriffH	e1ca8ee6d4	RunAsync C/CXX API (#16613 ) Implement RunAsync API - the session will run in a thread of intra-op thread pool. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-07-16 16:51:40 -07:00
Dmitri Smirnov	853c4ff0a5	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 ) ### Description Introduce `Float16/BFloat16` support for C# and C++ APIs. User should be able to perform conversions from `float` to/from `Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and whether the number is denormalized.` ### Motivation and Context User filed issues such as: https://github.com/microsoft/onnxruntime/issues/14303	2023-07-14 10:46:52 -07:00
cao lei	329e8156d4	clean unused parameter in ORT_UNUSED_PARAMETER (#16538 ) ### Description clean unused parameter in ORT_UNUSED_PARAMETER ### Motivation and Context clean unused parameters in ORT_UNUSED_PARAMETER which are introduced from #15833	2023-07-07 13:20:36 -07:00
Edward Chen	6be7b03e53	Enable `-Wshorten-64-to-32` warning if available. (#16524 ) - Fix some warnings from Xcode build (`-Wshorten-64-to-32`). - Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet. - Some clean up in build.py including setting CMake generator more consistently.	2023-07-07 08:11:44 -07:00
Xavier Dupré	d906d48ae9	Support custom ops taking float 8 tensors as inputs and outputs (#16323 ) ### Description C API for custom ops does not support float 8 types. This PR changes that. ### Motivation and Context The list of operators supporting float 8 is very limited. It should be extended to custom ops to let developpers add customized operators for these specific types.	2023-07-06 14:36:06 +02:00
cao lei	0c5f492493	remove AllocatorMgr class (#16509 ) ### Description Remove AllocatorManager class ### Motivation and Context After the refactor PR #15833 is in, AllocatorManager class is not referenced anymore.	2023-06-28 15:43:19 -07:00
Baiju Meswani	efeb6672d6	Temporary optimizer support for ort format models in non minimal build (#16485 )	2023-06-28 11:35:57 -07:00
Christian Bourjau	6dd4e4801a	Allow custom operator functions to safely propagate errors through the C-API (#16479 ) ### Description This PR implements a backward-compatible way to define custom operators with fallible compute functions. The C++ API templated gained an optional `Fallible` argument. Closes #14287 ### Motivation and Context #14287 contains more context. The gist is that the current C-API defines compute operations of custom operators as functions returning `void` rather than an `OrtStatusPtr`. Currently, errors are often propagated across the C-ABI using C++ exceptions. That is very unsafe and undefined behavior. Moreover, it is difficult for languages other than C++ to use this approach even if they wanted to. A C-compliant sound and safe way to propagate errors allows for non-C++ fallible custom operators. ### An example in action https://github.com/cbourjau/ort-custom-op/pull/6/files is a demonstration of how this PR can be used to write safe and fallible custom operators in Rust.	2023-06-28 08:16:32 -07:00
Pranav Sharma	a270d8407e	Allow saving of large models after optimization (github issue 12882) (#16440 ) ### Description Allow saving of large models after optimization. ### Motivation and Context Addresses https://github.com/microsoft/onnxruntime/issues/12882	2023-06-21 22:46:26 -07:00
Chi Lo	4e3cff60fd	CUDA graph support for TRT EP (#16081 ) CUDA EP already supports [CUDA graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs), also we observed some models can benefit from using CUDA graph with `trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP. The implementation is based on https://github.com/microsoft/onnxruntime/pull/9978 with the same [constraints](https://github.com/microsoft/onnxruntime/pull/9978) as below: - Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. - Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the TRT EP. - The input/output types of models need to be tensors. - Shapes of inputs/outputs cannot change across inference calls. - IObinding is required.	2023-06-21 09:36:45 -07:00
Yuhong Guo	48e6186b1a	Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161 ) ### Description <!-- Describe your changes. --> 1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all CUDA UTs through this ut lib without affecting production lib `onnxruntime_providers_cuda`. 2. Move all test cases from `core/providers/cuda/test/` to `test/providers/cuda/`. These test cases are built into lib `onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all --gtest_filter="CUDA_EP_Unittest"`. Since the lib is only for test, we can use gtest macros in the test cases. Previous implementation do not support using gtest lib in the CUDA UT cases. 3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a bit. A new function `onnxruntime_add_object_library` is to build a object target. The 2 libs `onnxruntime_providers_cuda_ut` & `onnxruntime_providers_cuda` share most of the code, so the object files can be used in both libs, which helps reduce build time. Another function `config_cuda_provider_shared_module` is used to configure all 3 similar targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut). 4. Refactored the test to call `testing::InitGoogleTest` & `RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`. After this change, we can see all the cases running in `CUDA_EP_Unittest.All`: ![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> After https://github.com/microsoft/onnxruntime/pull/13016, there are still test files in test/providers/cuda/ that are not moved to core/providers/cuda/test/ and the test cases are disabled. This PR helps to clean the unfinished TODOs. Even through onnxruntime_shared_lib_test covers some test for CUDA provider. onnxruntime_shared_lib_test works like a coarse grain end-to-end test for CUDA provider. If CUDA unittest can run cases for a single component, this wound be helpful for CUDA developers. --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-06-20 14:54:55 -07:00
cao lei	dd72192cf4	ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833 ) ### Description This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice. By this change, EP level will shift the burden of maintaining allocators, which will be user friendly for EP developers --------- Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-19 17:44:45 -07:00
Changming Sun	5754cd7d1d	Add fp16 support to CPU EP gemm op (#15506 )	2023-06-15 14:38:17 -07:00
Changming Sun	b72fe664c1	Refactor prepack buffer code (#16280 ) ### Description 1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure the deleter is always right. 2. Change some std::unique_ptr to std::optional 3. Bypass Arena allocator when allocating the prepack buffers for mlas. In this special case, Arena doesn't help any. And this change is just an internal implementation change, it doesn't affect our public interface.	2023-06-08 14:42:02 -07:00
Dmitri Smirnov	908e940660	[CPP Api] Remove deprecated CustomOp API (#16256 ) ### Description Custom Op API has been deprecated in 1.15 release. We are removing it.	2023-06-07 14:03:13 -07:00
PeixuanZuo	1b518c6836	[ROCm] add early stop to tunable profile progress (#15716 ) For TunableOp, some instance may has very bad performance and it will take a long time during profile process. Add `tunable_op_max_tuning_duration_ms` parameter to limit max tuning time.	2023-06-01 10:18:25 +08:00
Xavier Dupré	e726151b5c	Introduce float 8 types (#14731 ) ### Description The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA API to cast float/half to float8 if CUDA>=11.8, a custom implementation if CUDA<11.8. * It implements, Cast, QuantizeLinear, DequantizeLinear for all types on CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA. * It extends the supported types for control flow operator, Shape, Reshape, Identity, If, Loop, Scan, Reshape * It implements Equal(19). * Cast, QuantizeLinear, DequantizeLinear operators now support a parameter `saturate` only valid for float 8 types. It is true by default. In that case, any value out of range is converted into the maximum float 8 value. If false, it is infinite. * QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA (and ROCm by extension), scale = 1D tensor with one scale per channel ### Motivation and Context Supports latest onnx version. Fixes [AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395) --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-05-30 13:25:58 -07:00
Dmitri Smirnov	9939092e71	[CPP API]Fix constness in C++API (#16103 ) ### Description `CreateMap` and `CreateSequence` should be able to take in const data.	2023-05-26 14:09:00 -07:00
Changming Sun	a5410515ad	Fix: Some fields in OrtCUDAProviderOptionsV2 struct are not initialized (#16113 ) ### Description The file include/onnxruntime/core/providers/cuda/cuda_provider_options.h is a C++ file. It is not for C. Before this commit, this header file is already not compatible with C compilers. Because it has: ``` onnxruntime::ArenaExtendStrategy arena_extend_strategy; ``` And this file is intended to be internal only. It is an internal header file. It should not be included in onnxruntime_c_api.h and should not be used with the public C APIs. User can only get the instance of OrtCUDAProviderOptionsV2 via CreateCUDAProviderOptions. In such a way we can add new members to this struct without breaking binary compatibility. Since it is an internal header, we can safely use C++ grammar there.	2023-05-26 11:34:22 -07:00
Yuhong Guo	04a8f50674	New configuration to limit the arena extension (#15983 ) Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In our real scenario, we observe that if the model is big enough the BfcArena will extend uncontrollable. As showed by the following figures, if a model uses more than 16GB memory, the BfcArena will totally apply for 32GB memory according to the `kNextPowerOfTwo` strategy. With the new strategy, the extension is limited. The default maximum extension size is 1GB. #### Without the new configuration After loading the model, ORT uses 32G GPU memory. ![image](https://github.com/microsoft/onnxruntime/assets/19584326/42b93c66-b957-4f20-a13b-d34cb390afff) #### With the new configuration After loading the model, ORT uses 23G GPU memory. ![image](https://github.com/microsoft/onnxruntime/assets/19584326/5abffeff-9ca3-4187-a262-37fd2764fe1b) Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-05-25 02:19:07 -07:00
Adrian Lizarraga	efc84a43e8	[QNN EP] Add session option to disable fallback to default CPU EP (#16016 ) ### Description Adds the session config option `disable_cpu_ep_fallback` to allow the user to prevent the CPU EP from handling nodes not supported by other execution providers. ```C++ // Graph nodes that are not supported by the execution providers (EPs) explicitly added to the session are // assigned (i.e., "fallback") to the CPU EP by default. // // This option allows the user to disable the fallback of unsupported graph nodes to the CPU EP. // If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot // fully support all of the nodes in the graph. // // It is invalid to set this option and explicitly add the CPU EP to the session. In this case, session creation // will also fail with an error. // // Option values: // - "0": CPU EP fallback is not disabled. [DEFAULT] // - "1": CPU EP fallback is disabled. static const char* const kOrtSessionOptionsDisableCPUEPFallback = "session.disable_cpu_ep_fallback"; ``` #### Example use ```C++ #include "core/session/onnxruntime_cxx_api.h" #include "core/session/onnxruntime_session_options_config_keys.h" int main(int argc, char** argv) { Ort::SessionOptions so; so.AddConfigEntry(kOrtSessionOptionsDisableCPUEPFallback, "1"); // Disable fallback to the CPU EP. onnxruntime::ProviderOptions options; #if defined(_WIN32) options["backend_path"] = "QnnCpu.dll"; #else options["backend_path"] = "libQnnCpu.so"; #endif so.AppendExecutionProvider("QNN", options); const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "qnn_ep_partial_support.onnx"; Ort::Session session(*ort_env, ort_model_path, so); // Throws exception if nodes fallback to CPU // ... ``` ### Motivation and Context Makes it easier for application developers to ensure that the entire model runs on specific EPs. This is critical for Qualcomm/scenarios. If the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference) --------- Co-authored-by: Pranav Sharma <prs@microsoft.com>	2023-05-23 17:56:32 -07:00
Hector Li	4324d2173b	[QNN EP] Enable Qnn context cache to save model initialization time (#15815 ) ### Description Enable Qnn Context cache feature to save model initialization time Provider options: qnn_context_cache_enable\|1 to enable the cache feature qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default. ### Motivation and Context Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost. --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2023-05-19 10:52:17 -07:00
RandySheriffH	4dfb89b3ad	Implement mutex-free spin lock for task queue (#14834 ) Implemented "lock-free" spinlock to save CPU usage on context switching. The change has been tested on queene service of Ads team, the lock-free version of ort (40 threads) saves CPU usage on gen8 (128 logical processors on 8 numa nodes) windows by nearly half, from 65% to 35%. For 32 cores, the curve is flat: Anubis, 32 vCPU, windows, hugging face models, 95 percentile E2E latency in ms: model \| mutex(ms) \| mutex-free --- \| --- \| --- alvert_base_v2 \| 34.21 \| 34.09 bert_large_uncased \| 116.27\| 117.84 bart_base \| 72.06 \| 71.99 distilgpt2 \| 25.43 \| 25.02 vit_base_patch16_224 \| 37.33 \| 37.76 Anubis, 32 vCPU win, Linux, 1st party models, 95 percentile E2E latency in ms: model \| mutex(ms) \| mutex-free --- \| --- \| --- deepthink_v2 \| 24.35 \| 22.95 bing_feeds \| 36.96 \| 36.48 deep_writes \| 14.46 \| 14.32 keypoints \| 9.34 \| 7.69 model11 \| 1.71 \| 1.66 model12 \| 1.82 \| 1.44 model2 \| 4.21 \| 3.95 model6 \| 1.08 \| 1.05 agiencoder \| 0.99 \| 0.93 geminet_transformer \| 5.32 \| 5.24 --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-05-19 10:12:10 -07:00
cloudhan	856afa49dd	[C#] Add missing rocm csharp api (#15540 )	2023-05-18 08:15:19 +08:00
Baiju Meswani	6b7181d31d	Add C# API documentation for training (and some other changes) (#15935 )	2023-05-16 03:15:24 -07:00

1 2 3 4 5 ...

858 commits