Commit graph

263 commits

Author SHA1 Message Date
Edward Chen
454f77cd94
Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791)
# Motivation
Currently, ORT minimal builds use kernel def hashes to map from nodes to
kernels to execute when loading the model. As the kernel def hashes must
be known ahead of time, this works for statically registered kernels.
This works well for the CPU EP.
For this approach to work, the kernel def hashes must also be known at
ORT format model conversion time, which means the EP with statically
registered kernels must also be enabled then. This is not an issue for
the always-available CPU EP. However, we do not want to require that any
EP which statically registers kernels is always available too.
Consequently, we explore another approach to match nodes to kernels that
does not rely on kernel def hashes. An added benefit of this is the
possibility of moving away from kernel def hashes completely, which
would eliminate the maintenance burden of keeping the hashes stable.

# Approach
In a full build, ORT uses some information from the ONNX op schema to
match a node to a kernel. We want to avoid including the ONNX op schema
in a minimal build to reduce binary size. Essentially, we take the
necessary information from the ONNX op schema and make it available in a
minimal build.
We decouple the ONNX op schema from the kernel matching logic. The
kernel matching logic instead relies on per-op information which can
either be obtained from the ONNX op schema or another source.
This per-op information must be available in a minimal build when there
are no ONNX op schemas. We put it in the ORT format model.
Existing uses of kernel def hashes to look up kernels are replaced
with the updated kernel matching logic. We no longer store
kernel def hashes in the ORT format model’s session state and runtime
optimization representations. We no longer keep the logic to
generate and ensure stability of kernel def hashes.
2022-09-20 14:24:59 -07:00
Tang, Cheng
739b5675c8
remove legacy compile api (#12932)
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-09-15 13:18:40 -07:00
Scott McKay
1016c33519
Fix prefast warning in upsample.cc. (#12938)
* Fix prefast warning.
* Fix some other static analysis warnings.
2022-09-14 08:14:33 +10:00
Yulong Wang
c144acc534
Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Haoming Chen
8a038b9b0c
Fix a build error (#12600)
LLVM compiler complains the std::hash<const char*> and suggests std::hash<const void*>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.
2022-08-17 10:49:54 -07:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Cheng
64e991a9fc
[Qlinearsoftmax] contrib cpu (#12177)
* [Qlinearsoftmax] contrib cpu

* int8 implementation

* contrib operator md

* qdq transformer test

* new attribute: opset

* doc

* quantized tool

* remove template to reduce Binary size

* doc of contribe operators

* enforce x_shape is valid

* fix reduce_size if input-shape is dynamic

* add UT

* register one op for reducing binarysize

* kernel hash update

* docs/ContribOperators.md
2022-08-10 10:52:02 +08:00
Dmitri Smirnov
3bf614fd47
Eliminate memory allocations per recent profiling (#12225)
* Alloc begin

FeedsFetches refactoring
Refactor Tensor class
Fix buffer deletor
Remove new/delete deleted
Adjust alloc move
Fix up xnnpack provider
Clarifying the comment on Create()
2022-07-25 14:14:38 -07:00
Dmitri Smirnov
267a424e52
Retry Rework execution frame to reduce memory allocations (#11897)
* Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)"

This reverts commit d2cbae3a04.

* Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.
2022-06-20 10:29:43 -07:00
Edward Chen
a93fe7824a
Update EP compile API deprecation warning message. (#11808)
Minor wording update to warning message to clarify that the function style Compile API is deprecated now and will be removed soon.
Also updated some code comments.
2022-06-17 12:49:24 -07:00
Yi Zhang
d2cbae3a04
Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)
Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)"

This reverts commit 2ecba6fd25.
2022-06-17 17:07:21 +08:00
Dmitri Smirnov
2ecba6fd25
Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)
Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.
2022-06-16 16:50:48 -07:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
Scott McKay
927bac0f86
Rework allocator sharing to work for multiple devices. (#11700)
* Rework allocator sharing to work for multiple devices.
* Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP).
  * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. 
* Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. 
  * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory.  
* Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup.
* Add some clarifying comments around how replace allocator works.
* Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class.
* Fix invalid check of whether data is on CPU to use device info instead of allocator name.
2022-06-09 17:38:38 +10:00
Hector Li
95a16c1ffe
Snpe ep (#11665)
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
2022-06-03 14:10:02 -07:00
Scott McKay
4445dd6bc1
XNNPACK EP (#11445)
* Implement XNNPACK support via an EP.
  * Layout transform uses the GraphPartitioner infrastructure.
  * Node fusion is supported.
  * Conv and MaxPool implementations were ported from Changming's PR.
  * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled
2022-06-03 20:22:34 +10:00
Tang, Cheng
3f3c5fcd68
Unify the Compile API for mobile build and normal build (#10632)
* use the lightweight compile api as default; use dnnl ep for testing

* apply to tensorrt ep

* fix the missing files

* fix build

* fix the copy issue on linux

* migrate migraphx and openvino ep

* fix openvino build break

* fix linux build

* fix unused parameter

* fix coreml build

* use graph view's filtered initializers

* fix openvino break

* fix tvm compile api

* fix tvm / rknpu / vitisai ep build

* add IsInitializedTensor in graph_viewer; fix nuphar build

* use serializer directly as tvm ep is still static lib

* fix the type mismatch

* fix the type mismatch

* fix merge conflict

* add a comment

* fix minimal build

* fix the DML EP's legacy approach

* save type/shape in dnnl IR

* fix linux break

* fix tvm failure

* dnnl ep: move initializer referenced out of dnnl subgraph

* Revert "add IsInitializedTensor in graph_viewer; fix nuphar build"

This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587.

* add IsInitializedTensor to graph viewer

* add the legacy code for nuphar build to temporarily make nuphar build work

* ignore internal test for nuphar

* remove the out of date tests

* keep the legacy API in EP for a while

* turn serializer into a static function

* update comments

* fix tvm build

* Update include/onnxruntime/core/framework/execution_provider.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update include/onnxruntime/core/framework/execution_provider.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/framework/execution_provider.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* updatee comments; add warning message for legacy compil call

* add a flag to control out of scope arg in serialization

* fix trt  build; improve the test

* resolve merege errors

* fix a typo

Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2022-05-05 08:30:07 -07:00
RandySheriffH
8d69b9398b
APIs for custom op to invoke ort operator directly (#10713)
* draft kernel creation

* setup eager context

* call into kernel in eager mode

* redefine test case

* refact eager context

* add comment

* remove header

* rename argument

* redefine API definition with types

* list outputs as argument

* switch to int to represent length

* fix compile err

* create attribute API

* add test case for topk

* remove bool from c api

* add gru test case

* remove var

* fix compile warnings

* rename status

* fix compile err

* exclude sparse tensor

* fix comments

* fix comments

* fix build err

* rename file and move location

* format code

* move file to session folder

* fix comments

Co-authored-by: Randy <Randy@randysmac.attlocal.net>
2022-05-03 14:16:30 -07:00
Vincent Wang
1c64351e09
Create Tensor with Strides (#11294)
* create tensor with strides

* resolve comments

* refactor

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-04-28 16:49:37 +08:00
Dmitri Smirnov
a7d0158c24
Introduce a way to disable Abseil library (#11353)
Introduce a way to disable Abseil library.
Use cmake extra args, no new build switch.
2022-04-27 08:57:52 -07:00
Edward Chen
4854a09340
Consolidate utils::ToTensorProtoElementType, TypeToDataType, and data_types_internal::ToTensorDataType. (#9824) 2022-04-20 12:45:53 -07:00
Vincent Wang
9707181257
fix build error (#11199) 2022-04-13 13:09:19 +08:00
Dmitri Smirnov
12c687f594
Rework initializer.cc to eliminate code duplication (#11131)
Rework initializer.cc to eliminate code duplication and add type enforcement.
 Address review comments.  Add literal operators for MLFloat16 abd BFloat16 and tests.
2022-04-08 09:42:31 -07:00
Scott McKay
47c09e6701
Clarify usage of kOnnxDomainAlias. (#10962)
* Clarify usage of kOnnxDomainAlias.
2022-03-25 09:52:59 +10:00
Hariharan Seshadri
e80ff63274
Fix bug in MemcpyToHost (#10816) 2022-03-10 07:02:27 -08:00
Vincent Wang
4a38f9e31d
enable strided tensor for training only (#10748) 2022-03-08 08:31:28 +08:00
Fei Hu
60acfd3dd8
Support CUDA Graph in the CUDA EP (#9978) 2022-03-06 20:47:31 -08:00
Vincent Wang
9a22b5d253
Strided Tensor Support for Eager Mode (#10578)
* strided tensor for eager mode

* fix build and resolve comments

* fix win x86 build
2022-03-01 14:25:31 +08:00
Dmitri Smirnov
e23a224518
Fix CUDA 10.2 compile error due to inlined_containers.h inclusion (#10702)
Fix CUDA 10.2 compile error due to inlined_containers.h inclusion
 into a common CUDA header.
 Use NumberOfNodes() to reserve space in a hash table
 Prefer separate call to reserve() rather than passing in the
 hash table constructor. They have somewhat different meaning.
2022-02-28 19:56:44 -08:00
Dmitri Smirnov
b30e0e2283
Remove inline_containers include from tensor_shape (#10682)
Hide Inlined Hash set and maps guts behind template forward declarations.
Currently CUDA 10.2 compiler can not compile abseil but provider interfaces
use those types in their signatures. InlinedVector seems to be fine.
Introduce core/common/inlined_containers_fwd.h header
2022-02-26 20:07:18 -08:00
Dmitri Smirnov
2679711bee
Refactor transformers and other code to reduce memory allocation calls (#10523)
Work on minimizing memory management calls by
  reducing number of allocations and copies.
  Replace std::unordered_set to InlinedHashSet
  and add usage of InlinedVector.
  Employ std::move() to minimize copying and memory allocations.
  Remove copying of the const shared data into each of the
  PropagateCast transformer instances.
  Move inlined_containers.h header to include/common
  Adjust AsSpan imlementation for C++ < 17
2022-02-24 16:17:14 -08:00
Ashwini Khade
f436d3437e
Add layout transformer for NNAPI (#10371)
* Add layout transformer for NNAPI

* plus merge fixes

* plus some more merge fixes

* test fixes

* comments + cleanup

* plus updates

* post merge changes

* enable layout transformer in extended minimal build

* plus more comments

* more tests + fix CI

* plus updates per review

* more updates per review

* fix file name

* fix qdq tests

* plus more updates

* plus updates

* typo fix

* fix qdq selection in 2nd optimization pass

* fix typo

* fix a test

* update dependency structure for layout transformer

* plus updates

* more updates

* plus change

* more updates to fix linker error in minimal build

* remove unnecessary headers
2022-02-15 20:25:29 -08:00
Vincent Wang
ceb1e2b1a6
[ROCm] Bugfix of BFloat16-float conversion and Add FastGelu Kernel for AMD (#10557)
* bf16 bugfix on amd

* enable fastgelu ut on amd
2022-02-16 11:11:08 +08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426) 2022-01-28 14:25:12 -08:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
Vincent Wang
44e2db9397
CUDA BFloat16 Refactor (#10085) 2022-01-14 19:38:56 +08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Edward Chen
3466ee45a3
Add hash value typedef. (#9710)
Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.
2021-12-15 19:07:17 -08:00
Changming Sun
9d9ebd3b85
Fix some static analysis warnings in the core framework (#10033) 2021-12-14 14:41:42 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Ryan Lai
57a6f7c205
Various fixes to fix WindowsAI RI build. (#9877)
* WAI RI fixes

* span changes

* Spaces

* Additional warnings to fix

* Fix redundant commment
2021-11-29 21:33:15 -08:00
Edward Chen
bcc6ab29f6
Trim DataTypeImpl binary size (#9813)
* De-virtualize DataTypeImpl::AsXType() functions.
* Refactor helpers.
2021-11-22 12:06:24 -08:00
Scott McKay
dc1724b0e2
Reduce DataTypeImpl binary size (#9783)
* Reduce the number of virtual methods in DataTypeImpl to reduce binary size.
Refactor some helpers to reduce the amount of templatized code.
2021-11-19 10:29:12 +10:00
Dmitri Smirnov
6284cbe833
Add TensorShape noexcept for move ops and fix some warnings (#9802)
Add TensorShape noexcept for move ops and fix some warnings
2021-11-18 15:27:24 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Sheil Kumar
3d0bd2596f
Enable creating OrtValues from ID3D12Resources from the onnxruntime C-API (#9686)
* Add onnxruntime-windows api.

* minor fixes

* add to package headers

* Build ort_dml_api for provider extensions.

* Cleanup

* misc comment

* remove winml specific comments

* use dml check in onnxruntime

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/ort_apis.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/test/adapter/AdapterSessionTest.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

* PR feedback

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* PR feedback

* merge resolution and unreference param

* (naming) Remove Dml prefix

* maybe unused version

* move DML code into DML path. CIs failing because DML is not available when --use_dml is not on

* fix warning causing local build failures after merging

* Change getvaluememoryinfo to gettensormemoryinfo

* minor breaks

* fix comment paste

* fix comment

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2021-11-13 03:34:54 -08:00