Commit graph

858 commits

Author SHA1 Message Date
Vincent Wang
e6301eee6a
Bump Up Version to 1.17.0 (#17587)
Bump up version to 1.17.0 as the 1.16.0 release branch had been branched
out.
2023-09-20 11:02:58 +08:00
Dmitri Smirnov
fdb132643d
Remove redundant Resolve() after each inlined function (#17556)
### Description
Remove `Resolve()` on the entire graph as each function is resolved.
We retain `Resolve()` after each inlining iteration.

### Motivation and Context
Poor performance for inlining the model and session initialization.

Original model before Resolve() removal
FunctionTest.Profiling (**65953 ms**)
After Resolve() Removal
FunctionTest.Profiling (**2911 ms**)

RelWithDebInfo pre-inlined model. Presumably because it runs Level1
optimizers
Non-inlined model consists of functions and Level1 optimizers have no
effect.
FunctionTest.Profiling (**9851 ms**)
2023-09-15 12:13:37 -07:00
cao lei
32f5658abb
remove gsl to make status.h independent from gsl (#17402)
### Description
<!-- Describe your changes. -->
Make status.h independent from gsl.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
In the coming new feature external EP API (see the prototype
https://github.com/microsoft/onnxruntime/pull/16718), we need to expose
stream in the public header, however, stream is dependent on status.h
which is dependent on gsl. We are seeking a way to decouple stream from
gsl.

From Changming's comment offline, prefast is disabled so all
GSL_SUPPRESS are not taking any effect now. He will handle the warnings
when enable prefast in the future
2023-09-13 21:47:43 -07:00
Yulong Wang
550293d9ad
OrtMemoryInfo: support new name "WebGPU_Buffer" (#17469)
### Description
Add new name "WebGPU_Buffer" to OrtMemoryInfo.

This is one of the prerequisites for supporting IO binding for WebGPU
buffer in onnxruntime-web.

list of prerequisites PRs:
#17465
#17469 (this one)
2023-09-08 16:37:35 -07:00
Xavier Dupré
024f1dd72b
Fix float 8 rounding on CPU (#16940)
### Description
Fix float 8 rounding issues discovered in issue #16938 (only CPU
provider).
2023-09-07 20:48:25 +02:00
RandySheriffH
6c39641ea2
Fix a memleak in RunAsync python (#17326)
Release ort value outputs that are created and released from
ort::run(...).

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-30 12:54:17 -07:00
Artem Shilkin
6e60dba726
Fix compilation with newer flatbuffers (#17164)
In flatbuffers@v23.5.9 was broken forward declaration for
FlatBufferBuilder. Trying to compile onnxruntime falls with the
following error:
```
flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder')
typedef FlatBufferBuilderImpl<false> FlatBufferBuilder;
                                     ^
onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here
    class FlatBufferBuilder;
```
This PR removes these declarations and puts includes instead
2023-08-29 10:28:26 -07:00
pengwa
18d5cfdb85
Fix build - redefinition of default argument for ‘long unsigned int Extent’ (#17281)
### Fix build - redefinition of default argument for ‘long unsigned int
Extent’

One of the training customer env, building ORT, there is such a build
error. The GCC version are

```
aiscuser@node-0:/tmp/onnxruntime$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0


aiscuser@node-0:/tmp/onnxruntime$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0


```

But on our dev node using same GCC/G++, we don't have build issue., not
sure what's the difference but giving an explict type when creating
`gsl::span` fixed the problem.

```
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:394:7: error: redefinition of default argument for ‘long unsigned int Extent’
  394 | class span
      |       ^~~~
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span_ext:46:51: note: original definition appeared here
   46 | template <class ElementType, std::size_t Extent = dynamic_extent>
      |                                                   ^~~~~~~~~~~~~~~
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:82:93: error: return type ‘class gsl::span<const std::byte>’ is incomplete
   82 | [[nodiscard]] inline gsl::span<const std::byte> AsByteSpan(const void* data, size_t length) {
      |                                                                                             ^
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h: In function ‘void onnxruntime::AsByteSpan(const void*, size_t)’:
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: class template argument deduction failed:
   83 |   return gsl::span(reinterpret_cast<const std::byte*>(data), length);
      |                                                                    ^
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: no matching function for call to ‘span(const std::byte*, size_t&)’
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: candidate: ‘template<class Type, long unsigned int Extent> gsl::span(Type (&)[Extent])-> gsl::span<ElementType, FirstExtent>’
  740 | span(Type (&)[Extent]) -> span<Type, Extent>;
      | ^~~~
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note:   template argument deduction/substitution failed:
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note:   mismatched types ‘Type [Extent]’ and ‘const std::byte*’
   83 |   return gsl::span(reinterpret_cast<const std::byte*>(data), length);
      |                                                                    ^
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: candidate: ‘template<class Type, long unsigned int Size> gsl::span(std::array<_Tp, _Nm>&)-> gsl::span<ElementType, FirstExtent>’
  743 | span(std::array<Type, Size>&) -> span<Type, Size>;
      | ^~~~
/tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note:   template argument deduction/substitution failed:
/tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note:   mismatched types ‘std::array<_Tp, _Nm>’ and ‘const std::byte*’
   83 |   return gsl::span(reinterpret_cast<const std::byte*>(data), length);
      |                                                                    ^
```



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-25 00:40:40 +08:00
Scott McKay
b3cb775cf9
Two fixes involving minimal builds (#17000)
### Description
<!-- Describe your changes. -->
- allocation planner was breaking if graph had no nodes
- in this particular model a branch of an If node returned an outer
scope value directly.

- if model used non-tensor types and sparse tensors are disabled the
call to IsSpareTensor causes an exception when prematurely terminates
the code.
- it's perfectly fine to check if a value is a sparse tensor when
support for them is disabled. we just can't do anything with that
OrtValue which is what the current ifdef's after the call to
IsSparseTensor handle.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix model execution failure for partner with model that uses sequences
in a minimal build with sparse tensors disabled.
2023-08-23 16:01:22 +10:00
Edward Chen
ae62d752d6
Prevent GSL_SUPPRESS arguments from being modified by clang-format (#17242)
Prevent `GSL_SUPPRESS` arguments from being modified by clang-format and update existing usages.

clang-format was changing something like `GSL_SUPPRESS(r.11)` to `GSL_SUPPRESS(r .11)`.

For some compilers (e.g., clang), the `gsl::suppress` attribute takes a quoted string argument. We don't want to insert spaces there.
2023-08-22 18:26:53 -07:00
Edward Chen
d6cd41cfc1
[CoreML EP] Add Shape, Gather, and Slice ops (#17153)
Add CoreML EP shape related ops:
- Shape
- Gather
- Slice

Add support for int64/int32 inputs in CoreML EP.
2023-08-18 22:34:34 -07:00
Dmitri Smirnov
5c54b64a63
Create NodeArgs for all Constant nodes and initializers for functions being inlined (#17089)
### Description
When functions are inlined and constant nodes are being converted to
initializers, we need to create NodeArg for them.
Similar for inlined function subgraph, but we choose to give priority to
non-constant nodes and then fill the gaps with constant and
initializers.

### Motivation and Context
This addresses issue
https://github.com/microsoft/onnxruntime/issues/16813 for
`eca_halonext26ts_mod.onnx` model where it fails to remove unused
initializer because `NodeArg` was not created for it.
2023-08-17 14:22:28 -07:00
Changming Sun
5249b7ab7c
Re-implement stacktrace (#17173)
### Description
Re-implement stacktrace. The new implementation doesn't directly use
Windows API, hence can avoid problems regarding to
initialize/uninitialize the dbghelp library.

### Motivation and Context
2023-08-16 16:07:49 -07:00
RandySheriffH
3dd2c1b4d7
EP context for custom op (#16454)
Implement infrastructures to allow EP resources surfaced to custom ops.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-16 13:03:40 -07:00
Yulong Wang
9cd4e5af68
[wasm] upgrade emsdk to 3.1.44 (#17069)
### Description
This change upgrade emsdk to 3.1.44.

Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".

most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
2023-08-10 16:08:36 -07:00
Chi Lo
7361c283c7
Add API for updating CUDA EP provider option user compute stream (#17037)
Add a generic `UpdateCUDAProviderOptionsWithValue()` C API to update
CUDA EP provider options where its data type is pointer that can't be
represented by string.

Note: Please see some comments for the similar [PR
](https://github.com/microsoft/onnxruntime/pull/16965)for TRT EP.
2023-08-09 09:24:19 -07:00
Chi Lo
fc8003349e
Add API for updating TRT EP provider option user compute stream (#16965)
Add a generic `UpdateTensorRTProviderOptionsWithValue()` C API to update
TensorRT provider options where its data type is pointer that can't be
represented by string.
2023-08-04 15:14:43 -07:00
Edward Chen
f98d3f8a23
[CoreML EP] Enable inputs with dynamic shape (#16915)
Enable node inputs with dynamic shape to be handled by the CoreML EP.
2023-08-03 18:15:00 -07:00
satyajandhyala
dd24d52737
[JS/Web] Added Gelu contrib operator support to JSEP (#16909)
### Description
Added Gelu operator to JSEP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-31 09:18:58 -07:00
Dmitri Smirnov
bf006d34a9
Used feature macro for if constexpr in a public header (#16836)
### Description
Use feature macro for `if constexpr`

### Motivation and Context
We still do not require customers to use C++17 compiler.
2023-07-25 21:42:30 -07:00
kunal-vaishnavi
b7176f9826
Fix bug with saving model optimized by inference session (#16716)
### Description
A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531)
added a temporary directory to save the model optimizations after
loading a model into an `InferenceSession`. Many models that have an
external data file, however, require the data file to be in the same
directory as the ONNX model file. Because the model is saved in a
temporary directory and the data is saved in another directory, this
causes a `FileNotFoundError` error when trying to load the model in the
temporary directory.

This PR fixes this error by saving the external data file in the same
directory that the optimized model is located in.

### Motivation and Context
This PR fixes a bug with using a temporary directory while running the
optimizer for models that have an external data file.
2023-07-20 18:44:28 -07:00
Xavier Dupré
2bc9fbb621
Fix url in the code documentation (graph optimizations) (#16770)
### Description
Fix a wrong url in the documentation as mentioned in issue #16678.



### Motivation and Context
Better documentation.
2023-07-20 07:02:22 -07:00
Dmitri Smirnov
e752cbe7f2
Work on eliminating Internal Compiler Error (#16741)
### Description
<!-- Describe your changes. -->
Replace the offending bitwise `operator |` with if() logic for ARM.
2023-07-18 10:17:52 -07:00
cloudhan
a45b834722
Fix warning about uninitialized member (#16736)
#16506 Cause almost every translation units on linux complaint

```
[1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o
In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18,
                 from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17,
                 from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17,
                 from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16,
                 from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7,
                 from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4:
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’:
/home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66:   required from here
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor
  241 | union float32_bits {
      |       ^~~~~~~~~~~~
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’
  242 |   unsigned int u;
      |                ^
```

This PR shut the compiler up.
2023-07-17 11:33:54 -07:00
Dmitri Smirnov
b8c40b7813
Fix parameter naming that fails Doc generation. (#16717)
### Description
Rename `FromBits` param name to match the docs.

### Motivation and Context
Fix API Doc generation.
2023-07-16 22:02:05 -07:00
RandySheriffH
e1ca8ee6d4
RunAsync C/CXX API (#16613)
Implement RunAsync API - the session will run in a thread of intra-op
thread pool.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-07-16 16:51:40 -07:00
Dmitri Smirnov
853c4ff0a5
[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506)
### Description
Introduce `Float16/BFloat16` support for C# and C++ APIs.
User should be able to perform conversions from `float` to/from
`Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and
whether the number is denormalized.`

### Motivation and Context
User filed issues such as:
https://github.com/microsoft/onnxruntime/issues/14303
2023-07-14 10:46:52 -07:00
cao lei
329e8156d4
clean unused parameter in ORT_UNUSED_PARAMETER (#16538)
### Description
clean unused parameter in ORT_UNUSED_PARAMETER


### Motivation and Context
clean unused parameters in ORT_UNUSED_PARAMETER which are introduced
from #15833
2023-07-07 13:20:36 -07:00
Edward Chen
6be7b03e53
Enable -Wshorten-64-to-32 warning if available. (#16524)
- Fix some warnings from Xcode build (`-Wshorten-64-to-32`).
- Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet.
- Some clean up in build.py including setting CMake generator more consistently.
2023-07-07 08:11:44 -07:00
Xavier Dupré
d906d48ae9
Support custom ops taking float 8 tensors as inputs and outputs (#16323)
### Description
C API for custom ops does not support float 8 types. This PR changes
that.



### Motivation and Context
The list of operators supporting float 8 is very limited. It should be
extended to custom ops to let developpers add customized operators for
these specific types.
2023-07-06 14:36:06 +02:00
cao lei
0c5f492493
remove AllocatorMgr class (#16509)
### Description
Remove AllocatorManager class


### Motivation and Context
After the refactor PR #15833 is in, AllocatorManager class is not
referenced anymore.
2023-06-28 15:43:19 -07:00
Baiju Meswani
efeb6672d6
Temporary optimizer support for ort format models in non minimal build (#16485) 2023-06-28 11:35:57 -07:00
Christian Bourjau
6dd4e4801a
Allow custom operator functions to safely propagate errors through the C-API (#16479)
### Description
This PR implements a backward-compatible way to define custom operators
with fallible compute functions. The C++ API templated gained an
optional `Fallible` argument. Closes #14287

### Motivation and Context
#14287 contains more context. The gist is that the current C-API defines
compute operations of custom operators as functions returning `void`
rather than an `OrtStatusPtr`. Currently, errors are often propagated
across the C-ABI using C++ exceptions. That is very unsafe and undefined
behavior. Moreover, it is difficult for languages other than C++ to use
this approach even if they wanted to. A C-compliant sound and safe way
to propagate errors allows for non-C++ fallible custom operators.

### An example in action
https://github.com/cbourjau/ort-custom-op/pull/6/files is a
demonstration of how this PR can be used to write safe and fallible
custom operators in Rust.
2023-06-28 08:16:32 -07:00
Pranav Sharma
a270d8407e
Allow saving of large models after optimization (github issue 12882) (#16440)
### Description
Allow saving of large models after optimization.

### Motivation and Context
Addresses https://github.com/microsoft/onnxruntime/issues/12882
2023-06-21 22:46:26 -07:00
Chi Lo
4e3cff60fd
CUDA graph support for TRT EP (#16081)
CUDA EP already supports [CUDA
graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs),
also we observed some models can benefit from using CUDA graph with
`trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP.

The implementation is based on
https://github.com/microsoft/onnxruntime/pull/9978 with the same
[constraints](https://github.com/microsoft/onnxruntime/pull/9978) as
below:

- Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
- Usage of CUDA Graphs is limited to models where-in all the model ops
(graph nodes) can be partitioned to the TRT EP.
- The input/output types of models need to be tensors.
- Shapes of inputs/outputs cannot change across inference calls.
- IObinding is required.
2023-06-21 09:36:45 -07:00
Yuhong Guo
48e6186b1a
Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161)
### Description
<!-- Describe your changes. -->

1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar
to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is
only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all
CUDA UTs through this ut lib without affecting production lib
`onnxruntime_providers_cuda`.
2. Move all test cases from `core/providers/cuda/test/` to
`test/providers/cuda/`. These test cases are built into lib
`onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all
--gtest_filter="*CUDA_EP_Unittest*"`. Since the lib is only for test, we
can use gtest macros in the test cases. Previous implementation do not
support using gtest lib in the CUDA UT cases.
3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a
bit. A new function `onnxruntime_add_object_library` is to build a
object target. The 2 libs `onnxruntime_providers_cuda_ut` &
`onnxruntime_providers_cuda` share most of the code, so the object files
can be used in both libs, which helps reduce build time. Another
function `config_cuda_provider_shared_module` is used to configure all 3
similar
targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut).
4. Refactored the test to call `testing::InitGoogleTest` &
`RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`.
After this change, we can see all the cases running in
`CUDA_EP_Unittest.All`:

![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87)




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

After https://github.com/microsoft/onnxruntime/pull/13016, there are
still test files in test/providers/cuda/ that are not moved to
core/providers/cuda/test/ and the test cases are disabled. This PR helps
to clean the unfinished TODOs.

Even through onnxruntime_shared_lib_test covers some test for CUDA
provider. onnxruntime_shared_lib_test works like a coarse grain
end-to-end test for CUDA provider. If CUDA unittest can run cases for a
single component, this wound be helpful for CUDA developers.

---------

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-06-20 14:54:55 -07:00
cao lei
dd72192cf4
ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833)
### Description
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice. By this change, EP level will shift the burden of
maintaining allocators, which will be user friendly for EP developers

---------

Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:44:45 -07:00
Changming Sun
5754cd7d1d
Add fp16 support to CPU EP gemm op (#15506) 2023-06-15 14:38:17 -07:00
Changming Sun
b72fe664c1
Refactor prepack buffer code (#16280)
### Description
1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure
the deleter is always right.
2. Change some std::unique_ptr to std::optional
3. Bypass Arena allocator when allocating the prepack buffers for mlas.
In this special case, Arena doesn't help any. And this change is just an
internal implementation change, it doesn't affect our public interface.
2023-06-08 14:42:02 -07:00
Dmitri Smirnov
908e940660
[CPP Api] Remove deprecated CustomOp API (#16256)
### Description
Custom Op API has been deprecated in 1.15 release. We are removing it.
2023-06-07 14:03:13 -07:00
PeixuanZuo
1b518c6836
[ROCm] add early stop to tunable profile progress (#15716)
For TunableOp, some instance may has very bad performance and it will
take a long time during profile process.
Add `tunable_op_max_tuning_duration_ms` parameter to limit max tuning
time.
2023-06-01 10:18:25 +08:00
Xavier Dupré
e726151b5c
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.

* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel

### Motivation and Context
Supports latest onnx version.

Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 13:25:58 -07:00
Dmitri Smirnov
9939092e71
[CPP API]Fix constness in C++API (#16103)
### Description
`CreateMap` and `CreateSequence` should be able to take in const data.
2023-05-26 14:09:00 -07:00
Changming Sun
a5410515ad
Fix: Some fields in OrtCUDAProviderOptionsV2 struct are not initialized (#16113)
### Description
The file include/onnxruntime/core/providers/cuda/cuda_provider_options.h
is a C++ file. It is not for C.

Before this commit, this header file is already not compatible with C compilers. Because it has:
```
onnxruntime::ArenaExtendStrategy arena_extend_strategy;
```

And this file is intended to be internal only. It is an internal header file. It should not be included in onnxruntime_c_api.h and should not be used with the public C APIs. User can only get the instance of OrtCUDAProviderOptionsV2 via CreateCUDAProviderOptions. In such a way we can add new members to this struct without breaking binary compatibility.
Since it is an internal header, we can safely use C++ grammar there.
2023-05-26 11:34:22 -07:00
Yuhong Guo
04a8f50674
New configuration to limit the arena extension (#15983)
Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
In our real scenario, we observe that if the model is big enough the
BfcArena will extend uncontrollable.
As showed by the following figures, if a model uses more than 16GB
memory, the BfcArena will totally apply for 32GB memory according to the
`kNextPowerOfTwo` strategy. With the new strategy, the extension is
limited. The default maximum extension size is 1GB.

#### Without the new configuration
After loading the model, ORT uses 32G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/42b93c66-b957-4f20-a13b-d34cb390afff)

#### With the new configuration
After loading the model, ORT uses 23G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/5abffeff-9ca3-4187-a262-37fd2764fe1b)

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-05-25 02:19:07 -07:00
Adrian Lizarraga
efc84a43e8
[QNN EP] Add session option to disable fallback to default CPU EP (#16016)
### Description
Adds the session config option `disable_cpu_ep_fallback` to allow the
user to prevent the CPU EP from handling
nodes not supported by other execution providers.

```C++
// Graph nodes that are not supported by the execution providers (EPs) explicitly added to the session are
// assigned (i.e., "fallback") to the CPU EP by default.
//
// This option allows the user to disable the fallback of unsupported graph nodes to the CPU EP.
// If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot
// fully support all of the nodes in the graph.
//
// It is invalid to set this option and explicitly add the CPU EP to the session. In this case, session creation
// will also fail with an error.
//
// Option values:
// - "0": CPU EP fallback is not disabled. [DEFAULT]
// - "1": CPU EP fallback is disabled.
static const char* const kOrtSessionOptionsDisableCPUEPFallback = "session.disable_cpu_ep_fallback";
```

#### Example use
```C++
#include "core/session/onnxruntime_cxx_api.h"
#include "core/session/onnxruntime_session_options_config_keys.h"

int main(int argc, char** argv) {
    Ort::SessionOptions so;
    so.AddConfigEntry(kOrtSessionOptionsDisableCPUEPFallback, "1");  // Disable fallback to the CPU EP.

    onnxruntime::ProviderOptions options;
#if defined(_WIN32)
    options["backend_path"] = "QnnCpu.dll";
#else
    options["backend_path"] = "libQnnCpu.so";
#endif

    so.AppendExecutionProvider("QNN", options);

    const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "qnn_ep_partial_support.onnx";
    Ort::Session session(*ort_env, ort_model_path, so);  // Throws exception if nodes fallback to CPU
    // ...
```

### Motivation and Context
Makes it easier for application developers to ensure that the entire
model runs on specific EPs. This is critical for Qualcomm/scenarios. If
the compute cannot be offloaded to the NPU, running on CPU is not
acceptable. (could be the difference between 90 second inference and 6
seconds inference)

---------

Co-authored-by: Pranav Sharma <prs@microsoft.com>
2023-05-23 17:56:32 -07:00
Hector Li
4324d2173b
[QNN EP] Enable Qnn context cache to save model initialization time (#15815)
### Description
Enable Qnn Context cache feature to save model initialization time
Provider options:
qnn_context_cache_enable|1 to enable the cache feature
qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default.

### Motivation and Context
Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost.

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-05-19 10:52:17 -07:00
RandySheriffH
4dfb89b3ad
Implement mutex-free spin lock for task queue (#14834)
Implemented "lock-free" spinlock to save CPU usage on context switching.
The change has been tested on queene service of Ads team, the lock-free
version of ort (40 threads) saves CPU usage on gen8 (128 logical
processors on 8 numa nodes) windows by nearly half, from 65% to 35%.

For 32 cores, the curve is flat:

Anubis, 32 vCPU, windows, hugging face models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
 alvert_base_v2 | 34.21 | 34.09
 bert_large_uncased | 116.27| 117.84
 bart_base | 72.06 | 71.99
 distilgpt2 | 25.43 | 25.02
 vit_base_patch16_224 | 37.33 | 37.76

Anubis, 32 vCPU win, Linux, 1st party models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
deepthink_v2 | 24.35 | 22.95
bing_feeds |  36.96 | 36.48
deep_writes |  14.46 | 14.32
keypoints |  9.34 | 7.69
model11 |  1.71 | 1.66
model12 |  1.82 | 1.44
model2 |  4.21 | 3.95
model6 |  1.08 | 1.05
agiencoder |  0.99 | 0.93
geminet_transformer |  5.32 | 5.24

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-19 10:12:10 -07:00
cloudhan
856afa49dd
[C#] Add missing rocm csharp api (#15540) 2023-05-18 08:15:19 +08:00
Baiju Meswani
6b7181d31d
Add C# API documentation for training (and some other changes) (#15935) 2023-05-16 03:15:24 -07:00