onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-18 18:52:16 +00:00

Author	SHA1	Message	Date
Dmitri Smirnov	81a763a9eb	Make TensorShapeVector to use InlinedVector<Int64_t> to reduce on template instantiations (#18519 ) ### Description Use InlinedVector<int64> instead of <int64_t,5> to reduce on the number of template instantiations. ### Motivation and Context The reported size reduction is small, just a few Ks. Just trying it out.	2023-11-21 14:13:50 -08:00
Sheil Kumar	2a01622536	Hide NPU Adapter selection behind macro (#18515 ) Hide NPU Adapter selection behind macro --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-11-21 08:47:56 -08:00
RandySheriffH	53917a3353	Move up members in Lite Custom Op hierarchy for possible memleaks. (#18478 ) Move data member in LiteOpFunc to its parent to avoid possible mem leaks. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-18 15:00:54 -08:00
Edward Chen	0a4d76d98b	MLAS AArch64 quantized int4 Gemm kernel (#18031 ) - Implement MLAS function for quantized 4-bit int Gemm (Gemm with float A and quantized 4-bit int B) for ARM NEON. This is an initial implementation. Only the M=1 path (with M being number of rows of A and C) has any optimization attempted so far. More optimization to come in future PRs. - Connect MatMulNBits contrib op to MLAS function.	2023-11-15 09:31:54 -08:00
Dmitri Smirnov	f19c673595	If Branch Constant Folding (#18105 ) ### Description When and if `If` condition proves to be a constant value, inline the corresponding subgraph yielding to more constant folding and optimization. ### Motivation and Context Newly converted models feature lots of nested `If` nodes that can be inlined and collapsed. In particular, for the sample models we are gaining on TorchScript exported models. For `HF Mobile Bert Dynamo` runtime went down from 0.069 -> 0.046. In total, AOT inlining + `If` constant folding yields improvement of about 50% 0.102 -> 0.046. Brining us very close to TorchScript exported models. `HF Bart Dynamo` further improves 0.668 -> 0.45. AOT + `If` constant folding improves 0.98 -> 0.45 Earlier the size of HF Mobile Bert 161Mb+, now 98Mb HF Bart Dynamo pre-optimized model was about 1.2Gb. It is now 710MB ![image](https://github.com/microsoft/onnxruntime/assets/11303988/1491a247-d371-4e66-85a3-2aeb702e8ca0)	2023-11-13 17:33:30 -08:00
RandySheriffH	646f77a94b	Align context virtuals (#18396 ) Deprecate ROCM context virtual function, to align with CUDA. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-11 12:41:37 +10:00
RandySheriffH	59262dfc63	Add cuda context headers to zip (#18330 ) Expose cuda context headers for cuda custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-11-09 14:53:58 -08:00
Ted Themistokleous	8d50313816	[Migraphx EP] Static int8 QDQ support (#17931 ) ### Description <!-- Describe your changes. --> Adding static int8 quantization support for MIGraphX Execution Provider - Allows for parsing in calibration tables generated by Onnxruntime or TensorRT's toolsets - Add proper environment variables into the MIGraphX EP - Update python API to include updating execution provider flags -> was missing on python side - Hook into MIGraphX's int8 quantitation and optimization of models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization. First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support. These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed. --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2023-11-09 17:46:49 +08:00
Hector Li	55c19d6ab5	[QNN EP] Enable option to set QNN context priority (#18315 ) Enable option qnn_context_priority to set QNN context priority, options: "low", "normal", "normal_high", "high". ### Description Enable option qnn_context_priority to set QNN context priority, options: "low", "normal", "normal_high", "high". This feature guarantees the model inference with higher priority. Tested with onnxruntime_perf_test tool using same model. 1. Run the model on the NPU with single instance, the latency is 300ms. 2. Run the same model on NPU with 2 instance at same time. Case 1: both with same priority (high ) -- latency is 600ms Case 2: 1 with low priority -- latency is 30,000ms 1 with high priority -- latency is 300ms Case 3: 1 with normal priority -- latency is 15,000ms 1 with high priority -- latency is 300ms	2023-11-08 20:56:36 -08:00
Justin Chu	c250540722	Bump linter versions (#18341 ) Bump linter versions and run format.	2023-11-08 13:04:40 -08:00
Adrian Lizarraga	a0eeeafa80	[QNN EP] Session option for graph optimization (#18262 ) ### Description Adds the QNN session option `htp_graph_finalization_optimization_mode` to enable QNN graph optimizations at the expense of longer preparation time. ### Motivation and Context Allow enabling QNN graph optimizations per app/model.	2023-11-08 10:06:15 -08:00
Preetha Veeramalai	d87216bcb1	Openvino ep ort 23.1 (#17911 ) ### Description Integration to OpenVINO 2023.1 ### Motivation and Context - Alignment with latest OpenVINO Version. - Device name change from VPUX to NPU and Remove from supported list until official public support is available. --------- Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Saurabh Kale <saurabh1.kale@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com>	2023-11-01 08:39:39 -07:00
RandySheriffH	2b95e74fa1	Versioning for custom op (#18088 ) Allow custom ops to have versions. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-10-31 16:50:27 -07:00
Maximilian Müller	2eeafc37bc	Enable global TRT timing cache (#17865 ) I am adding a new `trt_timing_cache_path` option. Internally it is handled as `global_cache_path_` and will be set via a fall through approach: 1. no path provided => workdir 2. `trt_engine_cache_path` provided but no `trt_timing_cache_path` => `trt_engine_cache_path` 3. `trt_timing_cache_path` provided => `trt_timing_cache_path` (if not provided `trt_engine_cache_path` will still be workdir) ### Motivation and Context A TRT timing cache can be reused across multiple models as it only holds kernel timings and it is common that network "patterns" are reused. This can accelerate build times a lot. --------- Co-authored-by: Carson M <carson@pyke.io>	2023-10-27 09:23:19 -07:00
Patrice Vignola	538e97cbda	[DML EP] Add dynamic graph compilation (#17876 ) Historically, DML was only able to fuse partitions when all sizes are known in advance or when we were overriding them at session creation time. But in practice, it should be possible to compile partitions at compute time if the caller knows that the dimensions won't be changed for every inference (e.g. resizing a webcam window, or padding the input to powers of 2). This graph will be cached and reused until the sizes change. This is an opt-in option gated under the `enable_dynamic_graph_fusion` option, which means that it will only be enabled when the caller requests it since they have more context on how their model will be called between inferences. This PR also adds the option to disable metacommands from the python API, which is an option for the C API but was lacking for python.	2023-10-25 19:56:16 -07:00
liqun Fu	efa0cc2562	implement isinf20 and isnan20 (#17874 )	2023-10-24 10:58:54 -07:00
Dmitri Smirnov	2c50b75a26	Functions Ahead Of Time inlininng (#17764 ) ### Description Inline functions in an EP aware fashion. The result of this PR is that models that are having been inlined by ONNX inliner and optimized and models that have been AOT inlined appear to be visually identical. For tests I used two models. The only difference is the resulting size because ONNX inliner removes local function definitions and AOT does not. Difference in sizes for `HF Mobile` model was 2.5 MB, and for `HF Bart` it was ~500K. It seems that the resuling model size affects the load time more than the actual optimizations. In general, the inlined models grow in size very fast and can easily exceed 2Gb limit. Q. Should we make AOT optional? `If` costant folding and the removal of local inlined models will be coming in other PRs. Some stats: ![image](https://github.com/microsoft/onnxruntime/assets/11303988/fcb4c815-2e06-4574-8d96-5a0a727d1ecf)	2023-10-23 17:42:20 -07:00
RandySheriffH	009cd4ea2e	Allow cuda custom ops allocate deferred cpu mem (#17893 ) Expose a new allocator from cuda stream. The allocator manages deferred cpu memory which only get recycled before stream destruction. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-10-20 16:12:21 -07:00
Maximilian Müller	7c17e33c07	Make CUDA a NHWC EP (#17200 ) ### Description CUDA inference speed heavily relies on Tensor Cores. To have tensor cores achieve the optimal throughput they require the data layout to be NHWC rather than NCHW. ### Motivation and Context Especially for convolutional networks this is very important. I will illustrate this using a very simple network: ``` import torch import torch.nn as nn class Net1(nn.Module): def __init__(self): super(Net1, self).__init__() # 1 input image channel, 6 output channels, 5x5 square convolution # kernel self.m = nn.ModuleList([ nn.Conv2d(in_channels=8, out_channels=32, kernel_size=5, stride=1), nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1), nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1), nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, bias=False), nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, bias=False), ]) def forward(self, x): for module in self.m: x = module(x) return x if __name__ == "__main__": dtype = torch.half device = "cuda" dummy_input = torch.randn(8, 8, 512, 512, dtype=dtype, device=device) model = Net1().to(dtype=dtype, device=device) input_names = ["input1"] output_names = ["output1"] torch.onnx.export(model, dummy_input, "test.onnx", input_names=input_names, output_names=output_names) ``` I profiled the launch of `./build/RelWithDebInfo/onnxruntime_perf_test -e cuda -I -q -t 5 test.onnx` using sys and nvtx ranges. Current master launches below kernels: ![image](https://github.com/microsoft/onnxruntime/assets/44298237/81655fce-0f8e-4f78-9335-b858a8c8977b) If I add the introduced `-l` flag we see below kernels: ![image](https://github.com/microsoft/onnxruntime/assets/44298237/fceb5d6f-c12d-442b-b15a-948797630008) Notice the missing NCHW<>NHWC kernels per operation. The layout optimizer introduced a transpose op as first and last op of the whole network. The `op_generic_tensor_kernel` shows the bias used which should also be optimized out next. Measured across some very basic models: \| CUDA EP \| NCHW [ms] \| NHWC [ms] \| Speedup \| \|:------------------------\|--------------------------------------:\|-----------------------------------------:\|------------------:\| \| \| -e cuda -t 5 -q \| -e cuda -t 5 -q -l \| \| \| resnet101-v2-7_bs8_fp16 \| 18.33 \| 13.07 \| 1.4 \| \| resnet101-v2-7_bs8 \| 21.8 \| 12.06 \| 1.81 \| \| test \| 102.07 \| 73.62 \| 1.39 \| Average speedup: 1.53 ## Outlook Next the mission will be to first write a templated unit test to check for correctness of NHWC vs NCHW ops. After that we have to transition more ops to measure perf improvements on a broader range of models. Currently this is not easily possible as we can do not support all ops in the NHWC domain. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2023-10-16 10:16:37 -07:00
RandySheriffH	c6c3555d0e	Custom op shape inference API (#17737 ) Add c/cxx API to allow custom ops do shape inference. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-10-13 12:57:42 -07:00
Zhang Lei	762703e037	Support output cross qk, dtw and more for whisper model (#17500 ) Support cross qk in beam search for whisper model and related features Make whisper exporting tools support cross qk and some related features, * extra_decoding_ids * no_speech_prob Implement DTW kernel, unfold tensor kernel with unit test Several fix related with multiple session running parallel, like: * guard multihead_attention, fused_fp16_runner_ * some memory allocation with stream awareness * add use_ep_level_unified_stream option	2023-10-13 11:47:15 -07:00
Numfor Tiapo	b8f373b0ae	Add API for NPU Device Selection in the DML EP (#17612 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-10-11 14:53:00 -07:00
Hector Li	385fab5bae	[QNN EP] Qnn cache improvement (#17757 ) ### Description Improve the QNN context binary cache feature to reduce the memory overhead and initialization time overhead. Instead of dumping a Qnn context binary file with metadata as header, we dump a Onnx format file with metadata inside Onnx node. ### Motivation and Context reduce the memory overhead and initialization time overhead	2023-10-06 15:56:33 -07:00
Chi Lo	569876fb16	[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617 ) Two major modifications of this PR: 1. Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field. 2. Make Python API capable of using TensorRT plugins by adding new Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs to register ep's custom op domain before model load. For C++ API, it's slightly different, when calling SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op domain to session option. Later ORT can register custom op domain from session option before model loading)	2023-10-06 14:12:20 -07:00
Adrian Lizarraga	8e6019af2e	[QNN EP] Enable QNN Saver for debugging issues (#17747 ) ### Description - Enables option to use the QNN Saver backend for dumping QNN API calls to file. - Adds logic to read environment variable `ORT_UNIT_TEST_ENABLE_QNN_SAVER` from QNN EP unit tests. If enabled, unit tests will use the QNN Saver backend and dump files to `./saver_output/`. ### Motivation and Context QNN Saver makes it easier to debug issues when unit tests fail. The output files generated by QNN Saver can be used to replay the exact QNN API calls that lead to a specific error condition. QNN Saver dumps QNN API calls (and weights) to disk. - saver_output/saver_output.c: C file containing all QNN API calls. - saver_output/params.bin: binary file containing all input/output/parameter tensor data provided during tensor creation, op config validation, and graph execution. Enabling the QNN Saver backend has 2 note-worthy effects: 1. All QNN API calls will succeed. 2. Inference output returns dummy data. Because the output files from QNN Saver are always overwritten, it is recommended to run individual unit tests via the `--gtest_filter` command-line option. Example (linux): ```shell $ ORT_UNIT_TEST_ENABLE_QNN_SAVER=1 ./onnxruntime_test_all --gtest_filter=QnnHTPBackendTests.Resize_DownSample_Linear_AlignCorners ```	2023-10-03 16:24:33 -07:00
Pranav Sharma	668c70ee11	Add support for specifying a custom logging function per session. (#17727 ) ### Description Add support for specifying a custom logging function per session. Bindings for other languages will be added after this PR is merged. ### Motivation and Context Users want a way to override the logging provided by the environment.	2023-09-29 19:46:55 -07:00
Scott McKay	33295ed883	Handle string initializers in constant folding (#17422 ) ### Description <!-- Describe your changes. --> * Allow either an allocator or a MemBuffer to be used when creating an OrtValue from an TensorProto * `Tensor<std::string>` requires an allocator to allocate/free the string values * Forcing the buffer to be allocated outside of the Tensor doesn't seem to provide any benefit in this usage as the Tensor class disables copy and assignment (so we wouldn't create 2 copies of the buffer via the Tensor class that externally managing the would buffer avoid) * New approach means we don't need to manage the buffers in the optimizer Info class as the Tensor dtor will do that * Update naming - MLValue was replaced by OrtValue a long time ago ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #17392	2023-09-27 21:15:58 +10:00
RandySheriffH	37dcefb5b7	Patch lite custom op API (#17605 ) A few enhancements: - Support compute returning status; - Support variadic; --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-09-26 14:02:18 -07:00
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Dmitri Smirnov	fdb132643d	Remove redundant Resolve() after each inlined function (#17556 ) ### Description Remove `Resolve()` on the entire graph as each function is resolved. We retain `Resolve()` after each inlining iteration. ### Motivation and Context Poor performance for inlining the model and session initialization. Original model before Resolve() removal FunctionTest.Profiling (65953 ms) After Resolve() Removal FunctionTest.Profiling (2911 ms) RelWithDebInfo pre-inlined model. Presumably because it runs Level1 optimizers Non-inlined model consists of functions and Level1 optimizers have no effect. FunctionTest.Profiling (9851 ms)	2023-09-15 12:13:37 -07:00
cao lei	32f5658abb	remove gsl to make status.h independent from gsl (#17402 ) ### Description <!-- Describe your changes. --> Make status.h independent from gsl. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In the coming new feature external EP API (see the prototype https://github.com/microsoft/onnxruntime/pull/16718), we need to expose stream in the public header, however, stream is dependent on status.h which is dependent on gsl. We are seeking a way to decouple stream from gsl. From Changming's comment offline, prefast is disabled so all GSL_SUPPRESS are not taking any effect now. He will handle the warnings when enable prefast in the future	2023-09-13 21:47:43 -07:00
Yulong Wang	550293d9ad	OrtMemoryInfo: support new name "WebGPU_Buffer" (#17469 ) ### Description Add new name "WebGPU_Buffer" to OrtMemoryInfo. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 (this one)	2023-09-08 16:37:35 -07:00
Xavier Dupré	024f1dd72b	Fix float 8 rounding on CPU (#16940 ) ### Description Fix float 8 rounding issues discovered in issue #16938 (only CPU provider).	2023-09-07 20:48:25 +02:00
RandySheriffH	6c39641ea2	Fix a memleak in RunAsync python (#17326 ) Release ort value outputs that are created and released from ort::run(...). --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-30 12:54:17 -07:00
Artem Shilkin	6e60dba726	Fix compilation with newer flatbuffers (#17164 ) In flatbuffers@v23.5.9 was broken forward declaration for FlatBufferBuilder. Trying to compile onnxruntime falls with the following error: ``` flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder') typedef FlatBufferBuilderImpl<false> FlatBufferBuilder; ^ onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here class FlatBufferBuilder; ``` This PR removes these declarations and puts includes instead	2023-08-29 10:28:26 -07:00
pengwa	18d5cfdb85	Fix build - redefinition of default argument for ‘long unsigned int Extent’ (#17281 ) ### Fix build - redefinition of default argument for ‘long unsigned int Extent’ One of the training customer env, building ORT, there is such a build error. The GCC version are ``` aiscuser@node-0:/tmp/onnxruntime$ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 aiscuser@node-0:/tmp/onnxruntime$ g++ --version g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 ``` But on our dev node using same GCC/G++, we don't have build issue., not sure what's the difference but giving an explict type when creating `gsl::span` fixed the problem. ``` /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:394:7: error: redefinition of default argument for ‘long unsigned int Extent’ 394 \| class span \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span_ext:46:51: note: original definition appeared here 46 \| template <class ElementType, std::size_t Extent = dynamic_extent> \| ^~~~~~~~~~~~~~~ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:82:93: error: return type ‘class gsl::span<const std::byte>’ is incomplete 82 \| [[nodiscard]] inline gsl::span<const std::byte> AsByteSpan(const void* data, size_t length) { \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h: In function ‘void onnxruntime::AsByteSpan(const void, size_t)’: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: class template argument deduction failed: 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: no matching function for call to ‘span(const std::byte, size_t&)’ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: candidate: ‘template<class Type, long unsigned int Extent> gsl::span(Type (&)[Extent])-> gsl::span<ElementType, FirstExtent>’ 740 \| span(Type (&)[Extent]) -> span<Type, Extent>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘Type [Extent]’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: candidate: ‘template<class Type, long unsigned int Size> gsl::span(std::array<_Tp, _Nm>&)-> gsl::span<ElementType, FirstExtent>’ 743 \| span(std::array<Type, Size>&) -> span<Type, Size>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘std::array<_Tp, _Nm>’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte*>(data), length); \| ^ ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 00:40:40 +08:00
Scott McKay	b3cb775cf9	Two fixes involving minimal builds (#17000 ) ### Description <!-- Describe your changes. --> - allocation planner was breaking if graph had no nodes - in this particular model a branch of an If node returned an outer scope value directly. - if model used non-tensor types and sparse tensors are disabled the call to IsSpareTensor causes an exception when prematurely terminates the code. - it's perfectly fine to check if a value is a sparse tensor when support for them is disabled. we just can't do anything with that OrtValue which is what the current ifdef's after the call to IsSparseTensor handle. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix model execution failure for partner with model that uses sequences in a minimal build with sparse tensors disabled.	2023-08-23 16:01:22 +10:00
Edward Chen	ae62d752d6	Prevent GSL_SUPPRESS arguments from being modified by clang-format (#17242 ) Prevent `GSL_SUPPRESS` arguments from being modified by clang-format and update existing usages. clang-format was changing something like `GSL_SUPPRESS(r.11)` to `GSL_SUPPRESS(r .11)`. For some compilers (e.g., clang), the `gsl::suppress` attribute takes a quoted string argument. We don't want to insert spaces there.	2023-08-22 18:26:53 -07:00
Edward Chen	d6cd41cfc1	[CoreML EP] Add Shape, Gather, and Slice ops (#17153 ) Add CoreML EP shape related ops: - Shape - Gather - Slice Add support for int64/int32 inputs in CoreML EP.	2023-08-18 22:34:34 -07:00
Dmitri Smirnov	5c54b64a63	Create NodeArgs for all Constant nodes and initializers for functions being inlined (#17089 ) ### Description When functions are inlined and constant nodes are being converted to initializers, we need to create NodeArg for them. Similar for inlined function subgraph, but we choose to give priority to non-constant nodes and then fill the gaps with constant and initializers. ### Motivation and Context This addresses issue https://github.com/microsoft/onnxruntime/issues/16813 for `eca_halonext26ts_mod.onnx` model where it fails to remove unused initializer because `NodeArg` was not created for it.	2023-08-17 14:22:28 -07:00
Changming Sun	5249b7ab7c	Re-implement stacktrace (#17173 ) ### Description Re-implement stacktrace. The new implementation doesn't directly use Windows API, hence can avoid problems regarding to initialize/uninitialize the dbghelp library. ### Motivation and Context	2023-08-16 16:07:49 -07:00
RandySheriffH	3dd2c1b4d7	EP context for custom op (#16454 ) Implement infrastructures to allow EP resources surfaced to custom ops. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-16 13:03:40 -07:00
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
Chi Lo	7361c283c7	Add API for updating CUDA EP provider option user compute stream (#17037 ) Add a generic `UpdateCUDAProviderOptionsWithValue()` C API to update CUDA EP provider options where its data type is pointer that can't be represented by string. Note: Please see some comments for the similar [PR ](https://github.com/microsoft/onnxruntime/pull/16965)for TRT EP.	2023-08-09 09:24:19 -07:00
Chi Lo	fc8003349e	Add API for updating TRT EP provider option user compute stream (#16965 ) Add a generic `UpdateTensorRTProviderOptionsWithValue()` C API to update TensorRT provider options where its data type is pointer that can't be represented by string.	2023-08-04 15:14:43 -07:00
Edward Chen	f98d3f8a23	[CoreML EP] Enable inputs with dynamic shape (#16915 ) Enable node inputs with dynamic shape to be handled by the CoreML EP.	2023-08-03 18:15:00 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
Dmitri Smirnov	bf006d34a9	Used feature macro for if constexpr in a public header (#16836 ) ### Description Use feature macro for `if constexpr` ### Motivation and Context We still do not require customers to use C++17 compiler.	2023-07-25 21:42:30 -07:00
kunal-vaishnavi	b7176f9826	Fix bug with saving model optimized by inference session (#16716 ) ### Description A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531) added a temporary directory to save the model optimizations after loading a model into an `InferenceSession`. Many models that have an external data file, however, require the data file to be in the same directory as the ONNX model file. Because the model is saved in a temporary directory and the data is saved in another directory, this causes a `FileNotFoundError` error when trying to load the model in the temporary directory. This PR fixes this error by saving the external data file in the same directory that the optimized model is located in. ### Motivation and Context This PR fixes a bug with using a temporary directory while running the optimizer for models that have an external data file.	2023-07-20 18:44:28 -07:00
Xavier Dupré	2bc9fbb621	Fix url in the code documentation (graph optimizations) (#16770 ) ### Description Fix a wrong url in the documentation as mentioned in issue #16678. ### Motivation and Context Better documentation.	2023-07-20 07:02:22 -07:00

1 2 3 4 5 ...

886 commits