onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-28 22:56:32 +00:00

Author	SHA1	Message	Date
Dmitri Smirnov	fdb132643d	Remove redundant Resolve() after each inlined function (#17556 ) ### Description Remove `Resolve()` on the entire graph as each function is resolved. We retain `Resolve()` after each inlining iteration. ### Motivation and Context Poor performance for inlining the model and session initialization. Original model before Resolve() removal FunctionTest.Profiling (65953 ms) After Resolve() Removal FunctionTest.Profiling (2911 ms) RelWithDebInfo pre-inlined model. Presumably because it runs Level1 optimizers Non-inlined model consists of functions and Level1 optimizers have no effect. FunctionTest.Profiling (9851 ms)	2023-09-15 12:13:37 -07:00
Artem Shilkin	6e60dba726	Fix compilation with newer flatbuffers (#17164 ) In flatbuffers@v23.5.9 was broken forward declaration for FlatBufferBuilder. Trying to compile onnxruntime falls with the following error: ``` flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder') typedef FlatBufferBuilderImpl<false> FlatBufferBuilder; ^ onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here class FlatBufferBuilder; ``` This PR removes these declarations and puts includes instead	2023-08-29 10:28:26 -07:00
Dmitri Smirnov	5c54b64a63	Create NodeArgs for all Constant nodes and initializers for functions being inlined (#17089 ) ### Description When functions are inlined and constant nodes are being converted to initializers, we need to create NodeArg for them. Similar for inlined function subgraph, but we choose to give priority to non-constant nodes and then fill the gaps with constant and initializers. ### Motivation and Context This addresses issue https://github.com/microsoft/onnxruntime/issues/16813 for `eca_halonext26ts_mod.onnx` model where it fails to remove unused initializer because `NodeArg` was not created for it.	2023-08-17 14:22:28 -07:00
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
kunal-vaishnavi	b7176f9826	Fix bug with saving model optimized by inference session (#16716 ) ### Description A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531) added a temporary directory to save the model optimizations after loading a model into an `InferenceSession`. Many models that have an external data file, however, require the data file to be in the same directory as the ONNX model file. Because the model is saved in a temporary directory and the data is saved in another directory, this causes a `FileNotFoundError` error when trying to load the model in the temporary directory. This PR fixes this error by saving the external data file in the same directory that the optimized model is located in. ### Motivation and Context This PR fixes a bug with using a temporary directory while running the optimizer for models that have an external data file.	2023-07-20 18:44:28 -07:00
Wanming Lin	00b1e79e04	Support WebNN EP (#15698 ) Description: This PR intends to enable WebNN EP in ONNX Runtime Web. It translates the ONNX nodes by [WebNN API](https://webmachinelearning.github.io/webnn/), which is implemented in C++ and uses Emscripten [Embind API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#). Temporarily using preferred layout NHWC for WebNN graph partitions since the restriction in WebNN XNNPack backend implementation and the ongoing [discussion](https://github.com/webmachinelearning/webnn/issues/324) in WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW' layouts. No WebNN native EP, only for Web. Motivation and Context: Allow ONNXRuntime Web developers to access WebNN API to benefit from hardware acceleration. WebNN API Implementation Status in Chromium: - Tracked in Chromium issue: [#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291) - CPU device: based on XNNPack backend, and had been available on Chrome Canary M112 behind "#enable-experimental-web-platform-features" flag for Windows and Linux platforms. Further implementation for more ops is ongoing. - GPU device: based on DML, implementation is ongoing. Open: - GitHub CI: WebNN currently is only available on Chrome Canary/Dev with XNNPack backend for Linux and Windows. This is an open to reviewers to help identify which GitHub CI should involved the WebNN EP and guide me to enable it. Thanks!	2023-05-08 21:25:10 -07:00
Yulong Wang	14cc02c65c	[js/web] WebGPU backend via JSEP (#14579 ) ### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (ON_COMPLETE)(PVOID state, DATA data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview Inter-op JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. Resource Management Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. about data transfer `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. run kernel in JS Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_`. disabled features* memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. prefer channels last JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. Testing code It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-24 15:21:18 -07:00
Justin Chu	cf19c3697d	Run clang-format in CI (#15524 ) ### Description Run clang-format in CI. Formatted all c/c++, objective-c/c++ files. Excluded ``` 'onnxruntime/core/mlas/', 'onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/', ``` because they contain assembly or is data heavy ### Motivation and Context Coding style consistency	2023-04-18 09:26:58 -07:00
Edward Chen	9f942e1a3e	Graph transformer to ensure unique DQ nodes for QDQ node units (#15145 ) ### Description <!-- Describe your changes. --> Add required graph transformer to duplicate DQ nodes to ensure that QDQ node units have unique DQ nodes. This condition is necessary for QDQ node unit processing. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> There is an existing Python utility that does this: `c7ced7a5e9/tools/python/util/qdq_helpers/qdq_model_utils.py (L77)` This PR implements it as a graph transformer so it is integrated into ORT and does not require a separate step to update the model. There are also tests to ensure that its effects are not undone by basic level graph optimizations.	2023-03-31 08:39:43 +10:00
Xavier Dupré	5930e7e22f	Introduce RemovableAttributes (#14868 ) ### Description TreeEnsemble* kernels fully copies all the parameters from the onnx graph. Even if they are no longer needed or unused (hitrates), they remain in memory. For big models >= 200 trees, max_depth > 10, the model usually weights more than 10 Mb. This change offers a kernel the possibility to remove all unneeded attributes after they were used to create the session. Attributes are deleted after the model was possibly saved, at the of the session creation. The current design is to be debatted: * it stored the list of removable attributes in class `onnxruntime::Node`, * the node is marked as `const` everytime this implementation needs to register the name of a removable attribute or to remove them. The current implementation is just a POC as it needs to cast `onnxruntime::Node` into `const onnxruntime::Node`. Should we keep the list of removable attributes in `onnxruntime::Node`? ### Motivation and Context Motivation is mostly to reduce memory consumption. --------- Signed-off-by: xadupre <xadupre@microsoft.com>	2023-03-07 12:37:12 +01:00
Hector Li	c6074f3a4b	OnnxRuntime QNN EP (#14791 ) ### Description Integrate Qualcomm QNN SDK to enable inference on QC hexagon NPU devices ### Motivation and Context Enable Ort inference on QC hexagon NPU devices. --------- Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Adrian Lizarraga <adrianlm2@gmail.com>	2023-03-01 13:48:20 -08:00
Scott McKay	b7fde84341	Changes to support standalone custom ops in a minimal build. (#14497 ) ### Description <!-- Describe your changes. --> Changes to support standalone custom ops in a minimal build. Also incorporates changes from #14492 (needed to test builds prior to that being checked in). We first need to save the schema info from the operators used by the standalone op invoker in the ORT format model. Add mechanism for that. Merge the kernel lookup logic so the same is used in full and minimal build. NOTE: the version matching is now consistent with all other kernel lookups, and the call to CreateOp MUST use the exact version for the operator. Previously matching wasn't as strict, but this can lead to the incorrect kernel being chosen. Add tests. NOTE: There is currently no way to detect the ops/types/opsets used inside these custom ops as they don't exist until we create kernels, which is after model loading completes (which is the point the ORT format model is saved). Due to that they have to be manually added to the configuration used to do the reduced ops build. That shouldn't be too hard for the custom op author to add given the custom op implementation is specifying the op, opset and type constraints (i.e. they have the info and it's just a case of capturing/formatting it correctly). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable usage of the standalone op invoker by custom ops in a minimal build. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-01 11:22:54 +10:00
Yuriy Chernyshov	973aaf110b	Improve compatibility with certain STL's We use customized libc++ which uses raw pointers as std::vector::iterators. As per [expr.pre.incr](https://eel.is/c++draft/expr.compound#expr.pre.incr), builtin `operator++` can only be applied to lvalue, while `std::vector::begin()` returns an rvalue. See [this](https://godbolt.org/z/d3a1aKTWP) godbolt snippet for the details.	2023-02-21 14:06:16 -08:00
Dmitri Smirnov	61e7636e61	Re-work GetAvailableProviders API (#14486 ) ### Description Re-work `OrtApi::GetAvailableProviders` in a way that the data is returned in a single allocation. Fix exception safety issues and fix `Release` function. Remove warning suppressions. Fix exception safety issue in C++ API. Fix exception safety issue in C# API. Move EP name length enforcement to the implementation. ### Motivation and Context The original motivation comes from https://github.com/microsoft/onnxruntime/issues/14378. However, the API is already implemented. Cc: @prabhat00155	2023-02-01 14:38:04 -08:00
RandySheriffH	83ad562826	Rename CloudEP to AzureEP (#14175 ) Rename CloudEP to AzureEP. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-11 12:25:04 -08:00
RandySheriffH	587e891cae	CloudEP (#13855 ) Implement CloudEP for hybrid inferencing. The PR introduces zero new API, customers could configure session and run options to do inferencing with Azure [triton endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint) Sample configuration in python be like: ``` sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton'); sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com'); sess_opt.add_session_config_entry('cloud.model_name', 'detection2'); sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1 sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose ... run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint. run_opt.add_run_config_entry('cloud.auth_key', '...') ... sess.run(None, {'input':input_}, run_opt) ``` Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-03 10:03:15 -08:00
Tang, Cheng	a81faee41e	Multi-stream execution support (#13495 ) Description: This PR including following works: 1. provide stream and related synchronization abstractions in onnxruntime. 2. enhance onnxruntime's execution planner / executor / memory arena to support execute multiple streams in parallel. 3. deprecate the parallel executor for cpu. 4. deprecate the Fence mechanism. 5. update the cuda / tensorrt EP to support the stream mechanism, support running different request in different cuda stream. Motivation and Context - Why is this change required? currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations: 1. it is difficult to enable inter-node parallelization, we have a half-baked parallel executor but it is very difficult to make it work with GPU. 2. The fence mechanism can work with single gpu stream + cpu thread case, but when extend to multiple stream, it is difficult to manage the cross GPU stream synchronizations. 3. our cuda EP rely on the BFCArena to make the memory management work with the GPU async kernels, but current BFCArena is not aware of the streams, so it doesn't behavior correctly when run with multiple streams. This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios. This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR. Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Lei Cao <leca@microsoft.com>	2022-12-15 07:39:29 -08:00
Edward Chen	215732f74b	Ignore saved runtime optimizations when updating ORT format model <v5. (#13393 ) The old runtime optimization format is not readily convertible to the new one without extra information for translating kernel def hashes. Ignore such saved runtime optimizations and output a warning for now.	2022-11-08 13:36:46 -08:00
Edward Chen	2ecd1d6622	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
RandySheriffH	a83a9ed6b0	Remove miscellaneous nuphar configs (#13070 ) Remove a handful of nuphar related configurations after deprecation. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-26 13:41:28 -07:00
wangxiyuan	952c99304a	Add CANN EP (#12416 ) Description: This PR adds Ascend CANN execution provider support. Motivation and Context - Why is this change required? What problem does it solve? As the info shown in the issue. CANN is the API layer for Ascend processor. Add CANN EP can allow user run onnx model on Ascend hardware via onnxruntime The detail change: 1. Added CANN EP framework. 2. Added the basic operators to support ResNet and VGG model. 3. Added C/C++、Python API support - If it fixes an open issue, please link to the issue here. https://github.com/microsoft/onnxruntime/issues/11477 Author: lijiawei <lijiawei19@huawei.com> wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: FFrog <ljw1101.vip@gmail.com>	2022-09-22 14:53:40 -07:00
Edward Chen	454f77cd94	Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791 ) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.	2022-09-20 14:24:59 -07:00
Cheng	8cedafe250	[xnnpack] Have `Initializer` in Mobile related EPs in Minimal_build and creating EP specific dynamic-schema (#12555 ) * Remove the dependence of Qlinearsoftmax schema * refactor initializerview && create shared schema * Dynamic Create EP specific schema * Have Initializer in minimal_build * address comments * remove CancelFuseSubGraph	2022-09-06 14:32:15 +08:00
Scott McKay	0b0c51e028	Support direct usage of ORT format model flatbuffer for initializers (#12465 ) * Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure. Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).	2022-08-12 18:31:43 +10:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
Yateng Hong	c579497134	Fix TRT custom op issue (#12283 ) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.	2022-07-29 03:39:56 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
G. Ramalingam	b1411c8357	Restructure function inliner (#11731 ) * Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change	2022-06-24 09:21:31 -07:00
Dmitri Smirnov	267a424e52	Retry Rework execution frame to reduce memory allocations (#11897 ) * Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)" This reverts commit `d2cbae3a04`. * Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.	2022-06-20 10:29:43 -07:00
Yi Zhang	d2cbae3a04	Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888 ) Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)" This reverts commit `2ecba6fd25`.	2022-06-17 17:07:21 +08:00
Dmitri Smirnov	2ecba6fd25	Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804 ) Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.	2022-06-16 16:50:48 -07:00
Hector Li	95a16c1ffe	Snpe ep (#11665 ) * Initiate Ort SNPE EP * fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master * correct the source path for libonnxruntime.so while building for andorid package * add AdditionalDependencies for amr64 * On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. * fix build failure if snpe is not enabled * update doc for contrib op * separate out snpe ep settings to onnxruntime_snpe_provider.cmake * renaming according review comments * update according review comments	2022-06-03 14:10:02 -07:00
Scott McKay	4445dd6bc1	XNNPACK EP (#11445 ) * Implement XNNPACK support via an EP. * Layout transform uses the GraphPartitioner infrastructure. * Node fusion is supported. * Conv and MaxPool implementations were ported from Changming's PR. * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled	2022-06-03 20:22:34 +10:00
Tang, Cheng	3f3c5fcd68	Unify the Compile API for mobile build and normal build (#10632 ) * use the lightweight compile api as default; use dnnl ep for testing * apply to tensorrt ep * fix the missing files * fix build * fix the copy issue on linux * migrate migraphx and openvino ep * fix openvino build break * fix linux build * fix unused parameter * fix coreml build * use graph view's filtered initializers * fix openvino break * fix tvm compile api * fix tvm / rknpu / vitisai ep build * add IsInitializedTensor in graph_viewer; fix nuphar build * use serializer directly as tvm ep is still static lib * fix the type mismatch * fix the type mismatch * fix merge conflict * add a comment * fix minimal build * fix the DML EP's legacy approach * save type/shape in dnnl IR * fix linux break * fix tvm failure * dnnl ep: move initializer referenced out of dnnl subgraph * Revert "add IsInitializedTensor in graph_viewer; fix nuphar build" This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587. * add IsInitializedTensor to graph viewer * add the legacy code for nuphar build to temporarily make nuphar build work * ignore internal test for nuphar * remove the out of date tests * keep the legacy API in EP for a while * turn serializer into a static function * update comments * fix tvm build * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/framework/execution_provider.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * updatee comments; add warning message for legacy compil call * add a flag to control out of scope arg in serialization * fix trt build; improve the test * resolve merege errors * fix a typo Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-05-05 08:30:07 -07:00
RandySheriffH	8d69b9398b	APIs for custom op to invoke ort operator directly (#10713 ) * draft kernel creation * setup eager context * call into kernel in eager mode * redefine test case * refact eager context * add comment * remove header * rename argument * redefine API definition with types * list outputs as argument * switch to int to represent length * fix compile err * create attribute API * add test case for topk * remove bool from c api * add gru test case * remove var * fix compile warnings * rename status * fix compile err * exclude sparse tensor * fix comments * fix comments * fix build err * rename file and move location * format code * move file to session folder * fix comments Co-authored-by: Randy <Randy@randysmac.attlocal.net>	2022-05-03 14:16:30 -07:00
Tang, Cheng	4b875e3543	Re-implment the function support in onnxruntime (#11167 ) * initial fix * refactor the function handle * update the implementation * fix linux build break * fix training build * fix minmal build * fix gradient checker * deprecate the local function members in graph. host it in model * fix changming's comments * fix comments about inlined containers * fix a missed inlined container * fix training build * avoid const for std string_view Co-authored-by: Cheng Tang <chenta@microsoft.com>	2022-04-29 10:15:58 -07:00
Gary Miguel	7aa4af238a	Add strict_shape_type_inference config option (#11081 ) Prior to this, certain shape and type errors were surfaced only when the model was using the latest known op set version. Providing users an explicit option allows for better testing of code that produces models, which includes unit tests within this repo and other repos such as the TF-ONNX and PT-ONNX converters. Remove the previous behavior which seems quite counter-intuitive: an otherwise identical model with a later op set version should be treated identically in this regard. The option defaults to false to avoid causing errors for users that rely on the previous permissive behavior. Turned on the strict enforcement by default in OpTester, which revealed a few disagreements between ORT and ONNX on what the correct output shape should be. Fix shape inference bug in ReduceSumTraining with noop_with_empty_axes=1 which was revealed. Fix TensorOpTest.Unsqueeze_scalar, which was testing negative axes on an op set version where the op did not actually support negative axes. Fixes #9506.	2022-04-21 08:32:40 -07:00
Dmitri Smirnov	2700261f7c	Provide an API to supply external initializers data from user buffers (#11109 ) Imlpement AddExternalInitializers	2022-04-07 12:21:53 -07:00
Scott McKay	47c09e6701	Clarify usage of kOnnxDomainAlias. (#10962 ) * Clarify usage of kOnnxDomainAlias.	2022-03-25 09:52:59 +10:00
Edward Chen	f468ea40e5	Refactor Node::AddAttribute() (#10869 )	2022-03-16 14:53:00 +10:00
Edward Chen	c147c9dda6	Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778 ) Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD as it is now implied by ORT_EXTENDED_MINIMAL_BUILD. Remove related CMake option.	2022-03-08 16:18:49 -08:00
Dmitri Smirnov	e23a224518	Fix CUDA 10.2 compile error due to inlined_containers.h inclusion (#10702 ) Fix CUDA 10.2 compile error due to inlined_containers.h inclusion into a common CUDA header. Use NumberOfNodes() to reserve space in a hash table Prefer separate call to reserve() rather than passing in the hash table constructor. They have somewhat different meaning.	2022-02-28 19:56:44 -08:00
Thiago Crepaldi	e788cc2a23	Convert com.microsoft::ATen into org.pytorch.aten::ATen onnx op (#10060 ) Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-02-28 14:14:45 -05:00
Dmitri Smirnov	2679711bee	Refactor transformers and other code to reduce memory allocation calls (#10523 ) Work on minimizing memory management calls by reducing number of allocations and copies. Replace std::unordered_set to InlinedHashSet and add usage of InlinedVector. Employ std::move() to minimize copying and memory allocations. Remove copying of the const shared data into each of the PropagateCast transformer instances. Move inlined_containers.h header to include/common Adjust AsSpan imlementation for C++ < 17	2022-02-24 16:17:14 -08:00
Ashwini Khade	f436d3437e	Add layout transformer for NNAPI (#10371 ) * Add layout transformer for NNAPI * plus merge fixes * plus some more merge fixes * test fixes * comments + cleanup * plus updates * post merge changes * enable layout transformer in extended minimal build * plus more comments * more tests + fix CI * plus updates per review * more updates per review * fix file name * fix qdq tests * plus more updates * plus updates * typo fix * fix qdq selection in 2nd optimization pass * fix typo * fix a test * update dependency structure for layout transformer * plus updates * more updates * plus change * more updates to fix linker error in minimal build * remove unnecessary headers	2022-02-15 20:25:29 -08:00
Valery Chernov	1cdc23aba4	[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260 ) * update java API for STVM EP. Issue is from PR#10019 * use_stvm -> use_tvm * rename stvm worktree * STVMAllocator -> TVMAllocator * StvmExecutionProviderInfo -> TvmExecutionProviderInfo * stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict * STVMRunner -> TVMRunner * StvmExecutionProvider -> TvmExecutionProvider * tvm::env_vars * StvmProviderFactory -> TvmProviderFactory * rename factory funcs * StvmCPUDataTransfer -> TvmCPUDataTransfer * small clean * STVMFuncState -> TVMFuncState * USE_TVM -> NUPHAR_USE_TVM * USE_STVM -> USE_TVM * python API: providers.stvm -> providers.tvm. clean TVM_EP.md * clean build scripts #1 * clean build scripts, java frontend and others #2 * once more clean #3 * fix build of nuphar tvm test * final transfer stvm namespace to onnxruntime::tvm * rename stvm->tvm * NUPHAR_USE_TVM -> USE_NUPHAR_TVM * small fixes for correct CI tests * clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files * update CUDA support for TVM EP * roll back CudaNN home check * ERROR for not positive input shape dimension instead of WARNING * update documentation for CUDA * small corrections after review * update GPU description * update GPU description * misprints were fixed * cleaned up error msgs Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>	2022-02-15 10:21:02 +01:00
Edward Chen	c43c1691ad	Enable transpose optimizer in minimal extended build (#10349 ) Enable transpose optimizer and infrastructure it depends on in a minimal extended build.	2022-01-31 09:41:04 -08:00
Edward Chen	792db33f01	Enable loading of ORT format model graph runtime optimizations (#9901 ) Initial implementation of load/replay of runtime optimizations in an ORT format model.	2022-01-04 12:09:07 -08:00
stevenlix	05d20343ee	Remove duplicated constant initializer copies for TensorRT nodes (#10105 ) * add new field constant_initializers in metadef and remove constant initializers from trt node inputs * remove redundancy * use GetConstantInitializer() to get constant initializers * add ORT_ENFORCE check Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2021-12-22 12:19:56 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00

1 2 3 4

198 commits