Commit graph

736 commits

Author SHA1 Message Date
Dmitri Smirnov
2700261f7c
Provide an API to supply external initializers data from user buffers (#11109)
Imlpement AddExternalInitializers
2022-04-07 12:21:53 -07:00
Maajid khan
81fa28bc56
OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025)
* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Modification to include new api 2.0 changes in the code

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Log comments updated

* Changes to enable 2.0 api

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix build issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes issues

*Fixes compiler warnings c4458 on windows.
*Fixes the bug in device_type check logic
*Adds print info for enable_opencl_throttling
option in onnxruntime_perf_test

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* commit to make openvino_2021.4 compatible

* Fixed IO Buffer Optimization

* Fix output names issue

* Fix 2021.3 branch

* Bug Fix for Multiple inputs/outputs

- Assigns the right output_name and
input_name for the graph when
returned by CompiledModel::inputs()
OV function.

- Also takex care of output mismatch
issue b/w openvino output and onnx
output

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add comments for the changes made

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* IO Buffer Changes

* Commit for Disabling GPU Throttling for 2021.4

* Updated branch

* Fix windows build

->Fixed windows build in debug mode
->Disabled scatternd3_tensor_int64

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed CPP Unit tests for CPU

-Fixed shrink, MVN, ReduceL2, Maxpool,
upsample, scatter, slice, reshape,
unsqueeze.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed first set of GPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed additional failing tests on GPU

->Added conditions to disable certain ops
under certain conditions

->Disabled certain tests

->Added some op supports for no_dimension
supported

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added Expand op support for CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added condition for squeeze op

->Shape can't have empty axes attribute

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add support for LessOrEqual op function

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV Interface wait for replaced by indefinite wait call

* use names from ONNX model to access OV tensors

This chnage is to use the input/output names
retrieved from original onnx model to access
OV tensors and to check if there's any input
or output names mismatch b/w ONNX naming
and OV naming.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes Myriad unit tests and other issues

->Fixes Myriad CPP unit tests
->Fixes output mismatch issue with models with
sub graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix segfault issue

->Fixed case 3b condition in get_capability()
which was causing the segfault issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed build isuse with ov 2021.4 with I/O buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disables performance counters for I/O Buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed inputs/outputs mismatch for HDDL with 2022.1

Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>

* Fix to enable GPU FP16

* Enabled mlperf_ssd_mobilenet_300 model fully on CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added ov version specific dll packaging for nuget

* Fixed conditions for few ops

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Dockerfile updates

* Updated License Info

-Updated the copyrights License Info
-modified FP16 transformations with OV 2022.1

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling mlperf_ssd_mobilenet_300 model

->Disabled this model for openvino. The
test is failing in Internal_CI pipelines.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling failing python CPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 python errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: hdgx <harinix.d.g@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>
2022-04-06 13:30:33 -07:00
Vincent Wang
3b6cee8059
[CUDA] Optimize Conv and ConvGrad for Training (#10999)
* Optimize Conv and ConvGrad for Training

* add provider option to control

* fix typo
2022-03-29 07:31:36 +08:00
Chi Lo
8ba52b0a05
Bump master version to 1.12 (#10797)
* bump master version to 1.11

* bump master version to 1.12
2022-03-28 12:30:11 -07:00
Scott McKay
47c09e6701
Clarify usage of kOnnxDomainAlias. (#10962)
* Clarify usage of kOnnxDomainAlias.
2022-03-25 09:52:59 +10:00
Leandro Gracia Gil
1cc2cfb7b8
Move #ifndef ORT_CXX_API_THROW to the no exceptions case. (#10937)
This is related to https://github.com/microsoft/onnxruntime/issues/10564
which introduced a fix in the wrong case where exceptions are enabled.
2022-03-21 11:12:56 -07:00
Valery Chernov
625a1f7673
[TVM EP] code refactor (#10655)
* rename info to options for TVM EP

* transfer options processing from TVMExecutionProvider to TVMEPOptions

* transfer TVMRunner to separated files

* implement TVMCompiler class

* replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider

* correct logging of TVM EP options

* RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented

* add prepareComputeInfo method

* remove update_output_shapes flag

* embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner

* refactor compileModel method

* small cleaning

* separate TVM EP options data store and processing

* replace TvmTensorShape by InlinedVector with max_size 5

* correct indentation

* update TVM hash

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-03-16 13:55:04 +01:00
Edward Chen
f468ea40e5
Refactor Node::AddAttribute() (#10869) 2022-03-16 14:53:00 +10:00
Edward Chen
e53422c6d0
Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765)
Add runtime optimization support to ONNX -> ORT format conversion script.
Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.
2022-03-14 16:50:41 -07:00
Hariharan Seshadri
e80ff63274
Fix bug in MemcpyToHost (#10816) 2022-03-10 07:02:27 -08:00
Edward Chen
c147c9dda6
Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778)
Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD as it is now implied by ORT_EXTENDED_MINIMAL_BUILD.
Remove related CMake option.
2022-03-08 16:18:49 -08:00
Vincent Wang
4a38f9e31d
enable strided tensor for training only (#10748) 2022-03-08 08:31:28 +08:00
Fei Hu
60acfd3dd8
Support CUDA Graph in the CUDA EP (#9978) 2022-03-06 20:47:31 -08:00
Scott McKay
e337f5faf3
Enable QDQ cleanup and NHWC optimizers in an extended minimal build. (#10729)
* Enable QDQ cleanup and NHWC optimizers in an extended minimal build.
2022-03-04 15:45:42 +10:00
Rachel Guo
a9dc50ba8b
Add option to force QDQIsInt8Allowed to return true when exporting to ORT format (#10719)
* wip

* save

* minor update

* fix

* fix

* Revert "fix"

This reverts commit a76f364b2d.

* revert

* revert

* revert submodule removal

* address pr comments

* minor fix

* address cr comments

* fix format

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-03-02 23:26:14 -08:00
Yulong Wang
f4b2d3af2b
Upgrade emsdk to 3.1.3 (#10577) 2022-02-28 23:52:41 -08:00
Vincent Wang
9a22b5d253
Strided Tensor Support for Eager Mode (#10578)
* strided tensor for eager mode

* fix build and resolve comments

* fix win x86 build
2022-03-01 14:25:31 +08:00
Dmitri Smirnov
e23a224518
Fix CUDA 10.2 compile error due to inlined_containers.h inclusion (#10702)
Fix CUDA 10.2 compile error due to inlined_containers.h inclusion
 into a common CUDA header.
 Use NumberOfNodes() to reserve space in a hash table
 Prefer separate call to reserve() rather than passing in the
 hash table constructor. They have somewhat different meaning.
2022-02-28 19:56:44 -08:00
cloudhan
3243c9579f
Fix VLOG?_DEFAULT macros usability. (#10568)
* Add `set_default_logger_verbosity` api.

* fix docs

* make flake8 happy
2022-03-01 13:16:26 +10:00
Scott McKay
1f6d8248da
Add optional optimizer to remove leftover Q->DQ pairs after all other QDQ processing has completed (#10659)
Add an optimizer that can remove leftover Q->DQ pairs. Depending on the model this may help with performance and/or improve accuracy. Optional as it could make things worse so user needs to be aware of this and test what works best for their scenario. Enable with SessionOptions config param `session.enable_quant_qdq_cleanup`
2022-03-01 08:05:02 +10:00
Thiago Crepaldi
e788cc2a23
Convert com.microsoft::ATen into org.pytorch.aten::ATen onnx op (#10060)
Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-02-28 14:14:45 -05:00
Ryan Hill
eb116595d4
Add ability to customize ORT_CXX_API_THROW (#10688) 2022-02-28 00:15:10 -08:00
Dmitri Smirnov
b30e0e2283
Remove inline_containers include from tensor_shape (#10682)
Hide Inlined Hash set and maps guts behind template forward declarations.
Currently CUDA 10.2 compiler can not compile abseil but provider interfaces
use those types in their signatures. InlinedVector seems to be fine.
Introduce core/common/inlined_containers_fwd.h header
2022-02-26 20:07:18 -08:00
Dmitri Smirnov
2679711bee
Refactor transformers and other code to reduce memory allocation calls (#10523)
Work on minimizing memory management calls by
  reducing number of allocations and copies.
  Replace std::unordered_set to InlinedHashSet
  and add usage of InlinedVector.
  Employ std::move() to minimize copying and memory allocations.
  Remove copying of the const shared data into each of the
  PropagateCast transformer instances.
  Move inlined_containers.h header to include/common
  Adjust AsSpan imlementation for C++ < 17
2022-02-24 16:17:14 -08:00
RandySheriffH
e056fbaa51
Add restrictions for hybrid cpus for thread pool task distribution (#10393)
* add restrictions for hybrid cpus

* add unit test to mock hybrid cpu

* attach hybrid flag

* add mocking interface to CpuInfo

* make is_hybrid

* make mock function const

* add force_hybrid for thread pool

* remove header
2022-02-17 14:34:09 -08:00
Ashwini Khade
f436d3437e
Add layout transformer for NNAPI (#10371)
* Add layout transformer for NNAPI

* plus merge fixes

* plus some more merge fixes

* test fixes

* comments + cleanup

* plus updates

* post merge changes

* enable layout transformer in extended minimal build

* plus more comments

* more tests + fix CI

* plus updates per review

* more updates per review

* fix file name

* fix qdq tests

* plus more updates

* plus updates

* typo fix

* fix qdq selection in 2nd optimization pass

* fix typo

* fix a test

* update dependency structure for layout transformer

* plus updates

* more updates

* plus change

* more updates to fix linker error in minimal build

* remove unnecessary headers
2022-02-15 20:25:29 -08:00
Vincent Wang
ceb1e2b1a6
[ROCm] Bugfix of BFloat16-float conversion and Add FastGelu Kernel for AMD (#10557)
* bf16 bugfix on amd

* enable fastgelu ut on amd
2022-02-16 11:11:08 +08:00
Valery Chernov
1cdc23aba4
[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260)
* update java API for STVM EP. Issue is from PR#10019

* use_stvm -> use_tvm

* rename stvm worktree

* STVMAllocator -> TVMAllocator

* StvmExecutionProviderInfo -> TvmExecutionProviderInfo

* stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict

* STVMRunner -> TVMRunner

* StvmExecutionProvider -> TvmExecutionProvider

* tvm::env_vars

* StvmProviderFactory -> TvmProviderFactory

* rename factory funcs

* StvmCPUDataTransfer -> TvmCPUDataTransfer

* small clean

* STVMFuncState -> TVMFuncState

* USE_TVM -> NUPHAR_USE_TVM

* USE_STVM -> USE_TVM

* python API: providers.stvm -> providers.tvm. clean TVM_EP.md

* clean build scripts #1

* clean build scripts, java frontend and others #2

* once more clean #3

* fix build of nuphar tvm test

* final transfer stvm namespace to onnxruntime::tvm

* rename stvm->tvm

* NUPHAR_USE_TVM -> USE_NUPHAR_TVM

* small fixes for correct CI tests

* clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files

* update CUDA support for TVM EP

* roll back CudaNN home check

* ERROR for not positive input shape dimension instead of WARNING

* update documentation for CUDA

* small corrections after review

* update GPU description

* update GPU description

* misprints were fixed

* cleaned up error msgs

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
2022-02-15 10:21:02 +01:00
Chi Lo
0f5d0a091a
Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option (#10450)
* modify code for add additional field in OrtTensorRTProviderOptionsV2

* add include file

* fix typo

* fix bug

* add comment

* fix code

* revert change
2022-02-05 11:15:12 -08:00
Edward Chen
c43c1691ad
Enable transpose optimizer in minimal extended build (#10349)
Enable transpose optimizer and infrastructure it depends on in a minimal extended build.
2022-01-31 09:41:04 -08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426) 2022-01-28 14:25:12 -08:00
Edward Chen
0e951d7d6b
Add some more documentation for the C/C++ API tensor creation functions. (#10394) 2022-01-27 13:19:11 -08:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Edward Chen
df16c605e8
Add "available since" message for C API additions since v1.10.0. (#10348) 2022-01-25 10:15:34 -08:00
Edward Chen
4b87d2c172
Fix dockerfiles/Dockerfile.arm32v7 build. (#10360)
Install CMake, ignore some Eigen warnings.
2022-01-24 19:06:09 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
Vincent Wang
44e2db9397
CUDA BFloat16 Refactor (#10085) 2022-01-14 19:38:56 +08:00
RandySheriffH
79d2a0d185
Dynamic cost model to mitigate high E2E perf variance (#9833)
* commit dyamic block size

* summarize granularity

* add configure

* add test case

* call std stoi

* add comments

* fix typo

* rename var

* update comment

* reset default

* better comments

* extend LoopCounter for dynamic blocking

* fix comments and add more UT

* update comments

* swtich type to std::ptrdiff_t

* format code with better indention

* cast ptrdiff_t

* fix typo
2022-01-11 17:26:41 -08:00
Shucai Xiao
ce103ace93
Amdmigraphx fix build error (#9272)
* fix build error

* rename a missing api for the MIGraphX EP
2022-01-10 15:18:43 -08:00
Dwayne Robinson
1f5b073508
Minor DirectML EP provider factory comments (#9965) 2022-01-10 02:06:31 -08:00
Nat Kershaw (MSFT)
d52d3c0052
Update C/C++ API docs automation to create a PR (instead of push to publish branch) (#10093) 2022-01-07 16:16:47 -08:00
Hariharan Seshadri
0552a47ec2
Enable CUDA provider option configuration for C# (#10188) 2022-01-06 11:03:14 -08:00
Edward Chen
792db33f01
Enable loading of ORT format model graph runtime optimizations (#9901)
Initial implementation of load/replay of runtime optimizations in an ORT format model.
2022-01-04 12:09:07 -08:00
stevenlix
05d20343ee
Remove duplicated constant initializer copies for TensorRT nodes (#10105)
* add new field constant_initializers in metadef and remove constant initializers from trt node inputs

* remove redundancy

* use GetConstantInitializer() to get constant initializers

* add ORT_ENFORCE check

Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2021-12-22 12:19:56 -08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Edward Chen
3466ee45a3
Add hash value typedef. (#9710)
Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.
2021-12-15 19:07:17 -08:00
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
Changming Sun
20f8a06f1f
Remove OpenMP code (#10032) 2021-12-15 00:58:42 -08:00
Changming Sun
9d9ebd3b85
Fix some static analysis warnings in the core framework (#10033) 2021-12-14 14:41:42 -08:00
Ryan Hill
343a76945b
Fix some documentation errors plus ones generating doxygen warnings (#9993) 2021-12-09 17:42:34 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Ryan Lai
57a6f7c205
Various fixes to fix WindowsAI RI build. (#9877)
* WAI RI fixes

* span changes

* Spaces

* Additional warnings to fix

* Fix redundant commment
2021-11-29 21:33:15 -08:00
jingyanwangms
bf5e9a5044
bumping up ORT_API_VERSION to 10 (#9838)
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-11-22 20:27:45 -08:00
Edward Chen
bcc6ab29f6
Trim DataTypeImpl binary size (#9813)
* De-virtualize DataTypeImpl::AsXType() functions.
* Refactor helpers.
2021-11-22 12:06:24 -08:00
Dmitri Smirnov
567749b2dc
Expose IOBinding SynchronizeInputs/Outputs via C/C++/C# And Python APIs (#9823)
Add C/C++ APIs for SynchronizeBoundInputs/Outputs
 Add python bindings
 Expose SynchronizeBoundInputs/Outputs to C# API
2021-11-22 09:45:31 -08:00
Wei-Sheng Chin
e520bb5145
Improve print functions for NodeArg, Node, and Graph (#9801) 2021-11-19 09:48:27 -08:00
Scott McKay
dc1724b0e2
Reduce DataTypeImpl binary size (#9783)
* Reduce the number of virtual methods in DataTypeImpl to reduce binary size.
Refactor some helpers to reduce the amount of templatized code.
2021-11-19 10:29:12 +10:00
Dmitri Smirnov
6284cbe833
Add TensorShape noexcept for move ops and fix some warnings (#9802)
Add TensorShape noexcept for move ops and fix some warnings
2021-11-18 15:27:24 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
satyajandhyala
421e4c03ce
Update default cast propagation strategy from None to FloodFill (#9713)
* Changed the default cast propagation strategy from None to FloodFill.
2021-11-16 13:15:57 -08:00
Edward Chen
9acbfeba09
Address some code scan issues. (#9752) 2021-11-16 10:24:46 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Sheil Kumar
3d0bd2596f
Enable creating OrtValues from ID3D12Resources from the onnxruntime C-API (#9686)
* Add onnxruntime-windows api.

* minor fixes

* add to package headers

* Build ort_dml_api for provider extensions.

* Cleanup

* misc comment

* remove winml specific comments

* use dml check in onnxruntime

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/ort_apis.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/test/adapter/AdapterSessionTest.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

* PR feedback

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* PR feedback

* merge resolution and unreference param

* (naming) Remove Dml prefix

* maybe unused version

* move DML code into DML path. CIs failing because DML is not available when --use_dml is not on

* fix warning causing local build failures after merging

* Change getvaluememoryinfo to gettensormemoryinfo

* minor breaks

* fix comment paste

* fix comment

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2021-11-13 03:34:54 -08:00
RandySheriffH
21eb747a0f
Custom thread creation and join hooks (#9426) 2021-11-12 19:10:31 -08:00
Guoyu Wang
5ad6dbb314
Remove experimental from ORT format namespace (#9729)
* schema change

* cc channges

* remove temp debug code

* Adding fbs namespace to session_state_flatbuffers_utils.h

* Add fbs namepsace to all ort format utils
2021-11-11 19:46:30 -08:00
Gary Miguel
93e239747f
Construct valid graphs for ONNX checker for IR version < 4. (#9665)
* Construct valid graphs for ONNX checker for IR version < 4.

Previously the constructed graph was not guaranteed to have its
initializers be a subset of its inputs, which is required for IR
version < 4. This resulted in spurious failures.

Fixes #9663
2021-11-12 09:13:28 +10:00
Guoyu Wang
a70ae24475
Add QDQ::Selector::Select to use const GraphViewer instead of mutable Graph (#9621)
* Move qdq selector to use const GraphViewer

* minor update

* Move qdq logic from NodeSelector to QDQ Selectors

* Fix build break

* Move selector result to NodesToOptimizeIndexes

* fix build break

* address CR comments

* move indexes -> indices

* Pass  graph_viewer to avoid recreating many times

* Update after merge master

* update graph viewer remarks

* update comments

* Add ut for new qdq selector logic

* Increase minimal binary size limit

* UT minor update

* Address CR comments
2021-11-08 21:36:29 -08:00
Hariharan Seshadri
65590b049c
Expose an API to query the CUDA compute stream to launch a custom kernel (#9141) 2021-11-08 21:10:34 -08:00
Ryan Hill
24e35fba32
Change TensorShape to typically not allocate heap memory (#9542) 2021-11-08 10:29:54 -08:00
Hariharan Seshadri
bbeceb7541
Support optional type in ORT (#8339) 2021-11-04 15:01:42 -07:00
Edward Chen
ddb4c05852
Save graph runtime optimizations for minimal build (#9508)
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
2021-11-04 10:49:46 -07:00
Edward Chen
c315d1b3cd
Always enable ORT format model loading. (#9586) 2021-11-01 10:00:08 +10:00
TomWildenhain-Microsoft
e8268c9a18
Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284)
* Add Transpose Optimizer and modify nhwc optimizer to use it.

* Fix casts

* Fix casts2

* Fix move

* Add tests

* Add headers

* Fixes and tests

* Remove explicit template instantiation

* Fix build warning

* Name unit tests

* Code review fixes

* Add some comments

* Fix some casts

* Make optimization slightly less agressive

* Some unit test fixes

* Update Attention pattern to work with transpose optimizer

* Update attention fuser

* Fix attention fusion python script

* Improve transpose optimizer documentation

* Create OptimizerCtx struct

* Disable Slice handler for testing

* Implement Slice int32

* Only push transposes leading up to other transposes

* Improve optimization heuristic

* Add exemption for MaxPool

* Document transpose optimizer api.h

* Revert fusion tests to master

* Remove temp files

* Replace typedef with using

* Trim trailing whitespace

* Move class declarations from api_impl.h to api_impl.cc

* Remove copy constructors and move allocator

* Alphabetize headers

* Add override keyword

* Comments for nhwc_transformer

* Rename OrtGraph to ApiGraph, etc.

* Wrap line

* Remove extra qualifier on ApiGraph

* Refector attention fusion

* Remove c-style casts from api_impl.cc

* Improve documentation

* Avoid printing vector in ORT_ENSURES

* Revert attention fusion refactor

* Remove duplicate cost heuristics and improve documentation

* Fix size_t casts

* Fixes from Scott's review

* Unrevert attention refactor and more updates from Scott's review

* Revert api_impl.cc ValueInfo change

* only optimize first transpose input

* Unrevert api_impl.cc changes

* Make vector call reserve

* transpose_optimizer.cc update from Scott's comments

* Rename api::Graph to api::GraphRef etc.

* Consider domains 'onnx.ai' and '' equal

* Replace AddInput with SetInput

* Improve tests

* quantization and heuristic tests

* Comments for tests

* Replace const string_view with string_view and update tests

* Fixes requested by Edward

* Fix std::string to string_view conversion

* Add <string> to includes

* Fix bug for broadcasting ops with unknown rank. Slight safety improvements

* Changes requested by Edward

* Fix formatting

* Improve description of cost metric
2021-10-27 22:10:39 -07:00
Ginés Hidalgo
9639eded4b
Missing #pragma once in dml_provider_factory.h (#9457) 2021-10-27 02:49:52 -07:00
Ginés Hidalgo
a79d375d24 Added fixes for Clang on Win64 2021-10-22 16:59:09 -07:00
Changming Sun
d83adaaf9f
Remove optional-lite (#9424) 2021-10-22 16:45:45 -07:00
Sherlock
ff23b9ff55
Avoid cudaStreamSync at the end of Forward/Backward (#9470)
* Skip cudaStreamSynchronize at the end of fw 

* skip sync stream for end of backward
2021-10-21 11:28:25 -07:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard (#9279)
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
Guoyu Wang
60bbdf1403
Remove unused NodeArgs in Graph::Resolve (#9213)
* Remove unused NodeArgs

* Handle case where a node arg from an initializer from initializer_names_to_preserve

* Fix CI failure

* update test

* Fix outer scope node args failure

* Use NodeArg* as the key of the std::set instead of string

* Minor updates
2021-10-01 11:44:26 -07:00
RandySheriffH
058108bef9
Execution Provider Profiler (#8406)
* implement cuda provider

* define profiler common

* call start after register

* add memcpy event

* add cuda correlation

* format code

* add cupti to test path

* switch to CUpti_ActivityKernel3

* reset cupti path

* fix test case

* fix trt pipeline

* add namespace

* format code

* exclude training from testing

* remove mutex
2021-09-28 13:59:52 -07:00
Hariharan Seshadri
f7dedc9002
Fix default initialization value in C API header (#9126)
* fix default initialization value in C API header

* Fix conflicts

* Nits
2021-09-20 20:58:13 -07:00
Ryan Hill
6ae5f7a244
C API Docs - Add build instructions (#9106)
* Update Doxyfile, add build instructions to header
* Update paths in README.md
2021-09-17 18:40:27 -07:00
Ryan Hill
b876e5675b
C API Enum Name Fixes (#9092) 2021-09-17 15:11:26 -07:00
Ryan Hill
280e79463a
FIll in more documentation (#9088)
Fix plural values with %s
Fix more symbol links
Add custom header for web metrics
2021-09-16 17:08:27 -07:00
Ryan Hill
26509465f0
Add default C++ initialization to OrtCUDAProviderOptions (#9064)
* Add default C++ initialization to OrtCUDAProviderOptions
2021-09-16 15:03:58 -07:00
Guoyu Wang
bee5c26580
Add CPU_ONLY runtime option to NNAPI EP (#9066)
* Add NNAPI cpu only option

* update java

* Update comments
2021-09-15 15:50:18 -07:00
Edward Chen
e574be4a53
[C API Docs] Add docs for run options tag/log level accessors/modifiers. (#9045)
Add documentation for these C API functions:
RunOptionsGetRunLogSeverityLevel
RunOptionsGetRunLogVerbosityLevel
RunOptionsGetRunTag
RunOptionsSetRunLogSeverityLevel
RunOptionsSetRunLogVerbosityLevel
RunOptionsSetRunTag

Update some existing documentation.
2021-09-14 08:53:35 -07:00
satyajandhyala
ce7b12bf5d
Added new fp16 allow/safe opcodes in PropagateCastOps (#8964)
* Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps.

* Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops

* Added onnx models for squeeze/unsqueeze tests.
2021-09-10 11:53:26 -07:00
Ryan Hill
2439ced3ec
API Documentation (#8948)
* Make help information compile properly
2021-09-09 22:04:51 -07:00
Ashwini Khade
ec63d10303
add model local function support (#8540)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* draft - enable model local function

* enable model local functions in ORT

* update to latest rel onnx commit

* plus tests

* plus more updates

* plus updates

* test updates

* Fix for nested functions + shape inference

* plus bug fix and updates per review

* plus fixes per review

* plus test updates

* plus updates per review

* plus fixes

* fix a test
2021-09-08 11:47:01 -07:00
Vincent Wang
c343f7cb43
Add Algorithm Search for ConvGrad (#8613)
* algo search for conv grad

* global cache, bigger workspace size

* fix build error

* refactor

* refactor

* resolve comments

* fix rocm

* change lock places

* rename variable

* remove setting for inference

* resolve comments
2021-09-03 11:25:17 +08:00
Hariharan Seshadri
acd9db7fad
Fix location planning for initializers used only in nested subgraphs (#8642) 2021-09-01 00:02:08 -07:00
Tang, Cheng
4dc0ddf606
support register external ep lib information (#8897)
* support register external ep lib inforation; make eager mode share the same ep pools with training workloads

* fix inference code

* fix build break

* fix the message
2021-08-31 20:51:22 -07:00
Tang, Cheng
ae7f2d824d
Share the execution provider instance for training (#8719)
* seperate the training python module; share the execution proivder instance

* fix build break

* fix cuda test crash; reorg the python module code base

* se correct env

* use provider customized hash func

* fixbuild break

* fix rocm break

* use const ref in argument

* rename the file

* move hash func to trainiing module
2021-08-27 16:23:35 -07:00
Scott McKay
0034ad72e6
Minimize changes to fix missing symbols used from C# (#8867)
* Revert "Cleanup C# bindings to add EP (#8810)"

This reverts commit b21ea00020.

* Add back in a minimal set of changes.
Provide stubs in for a limited set of things
  - things called from C# using a static lib of ORT built for mac/ios
  - things in OrtApis that are not included in the build by default
  - things in OrtApis that are excluded in a minimal build

* Cleanup order or EPs in test

* Fix unused function in ROCM build
2021-08-28 07:10:14 +10:00
Edward Chen
7e53a1df6f
Enable selector action transformer infrastructure in minimal build. (#8804) 2021-08-27 17:16:05 +10:00
Rachel Guo
1886f1a737
Make SparseTensor infrastructure optional (#8802)
Add cmake parameter and #ifdefs to allow for disabling sparse tensor support. This comes with a significant binary size cost so we want to be able to exclude it in a minimal build.
2021-08-27 17:12:26 +10:00
Scott McKay
b21ea00020
Cleanup C# bindings to add EP (#8810)
Fix C# add EP bindings.
Add stubs to ORT so that if EP is not included in the build we return a graceful error message.
Move declaration of stubs into C API and out for EP so they're in one place and are easier to use (no extra header required in the C/C++ world and consistent with the CUDA EP setup).
Fix inconsistency in ROCM EP.
Cleanup a few other things.
2021-08-26 13:59:40 +10:00
Hariharan Seshadri
cee79526fd
Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442) 2021-08-25 12:04:20 -07:00
Changming Sun
4bfff45859
Downgrade Eigen (#8817) 2021-08-23 18:06:23 -07:00
Dmitri Smirnov
8713d76dd1
Introduce C and C++ APIs for Sparse Tensors (#8621)
Add IsSparseTensor
  Add CreateSparseTensor
 Add utilities and test fully sparse instantiation
 Fully sparse blocksparse
 Add test and docs for fully sparse tensor instantiation
 Rework creation API
 Use API
 Non string API
 Retrofit of existing String API
 Add tests
 Add documentation
 Address build issues (Winml pending)
 Add inference test
 Bump binary size
 Add ifdef DISABLE CONTRIB
2021-08-16 16:33:47 -07:00
Changming Sun
436ac6dd5f
Rename ml_value.h to ort_value.h (#8726) 2021-08-13 07:04:56 -07:00
Dmitri Smirnov
1a8adb96fe
Reduce templatization of C API and refactor for InitOrtValue (#8700)
Refactor for OrtInit
  Simplify C API
  Add ort_provider bridge interfaces
2021-08-12 16:51:18 -07:00
Edward Chen
89601ee6b3
[EP Partitioning Utils] Add check for assigned node. (#8473)
Adds a check that a node is not already assigned to an EP before adding it to an EP partition.
2021-08-12 16:08:25 -07:00
Hariharan Seshadri
e791faeca5
Fix bug in CPU force fallback logic (#8597) 2021-08-05 21:36:28 -07:00
Tim Harris
56441dcd88
Limit work items to available threads, upgrade checks from assert to ORT_ENFORCE (#8495) 2021-07-27 19:25:12 -07:00
Guoyu Wang
4c939e1cb7
Add an option to use the input model bytes (ORT format only) directly without copy at session creation (#8502)
* Do not copy the model_data when session is started by CreateSessionFromArray

* Add config option for disabling copy model bytes

* Add one additional test

* Address CR comments
2021-07-27 09:11:42 -07:00
Vincent Wang
619a8782a5
Improve AddValueInfo (#8451)
* change AddValueInfo

* fix after merge master
2021-07-23 16:39:55 +08:00
Dmitri Smirnov
950fe5e28b
Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038)
SparseTensor support
  Implement Builder pattern
  Fix support for 1-D and 2-D COO indices
  Implement and test CSR support.
  Handle shape inference for SparseTensors
  Implement conversion for COO, CSR and tests.
  Address the case where constant sparse initializer is the output.
  Implement test infra for SparseTensors
  Implement SparseDenseMatMul for Csr and COO and tested it.
  Add hash for SparseToDenseMatMul
  Finish shared provider refactor
  Refactor GetOrCreate to Create
  Working on py interface
  Expose OrtDevice and use it in allocate_numpy
	Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
	Add and test to_cuda()
	Add accessors to format specific indices
	Test values and indices views, read-only flag, after GC access
	Add sparse related methods to OrtValue
	Re-work SparseTensor wrapper, add OrtValue methods
	Rework numpy_array_to_cuda/to_cpu
	Add run_with_ort_values
	Add models and test sparse_mat_mul with run_with_ort_values
	Refactor sparse tensor to use a single buffer
        Ifdef x86 Eigen CSR sparse matmul implementation
        Exclude broken test, check for string type when copying cross device
       Split pybind schema, regenerate docs, add exclusion
       Conditionally exclude schema module
       Update docs fix cuda build
       Add test to a filter and renerate JS docs
      Add conversion and test string support for sparse tensors
      Exclude conversion utils from minimal build
      Add CUDA Memcpy and adjust provider interfaces
2021-07-22 15:24:36 -07:00
Hariharan Seshadri
3360024a0b
Support plugging in custom user-defined allocators for sharing between sessions (#8059) 2021-07-22 10:17:35 -07:00
Edward Chen
989491c333
[NNAPI EP] Make partitioning stop ops configurable. (#8444)
Enable NNAPI EP partitioning stop ops to be overridden by a session configuration option.
2021-07-22 09:21:42 -07:00
Edward Chen
695536a7ac
Make some common macros safer to use. (#8445) 2021-07-21 12:14:36 -07:00
Ryan Hill
cc9f793b48
Move one function from cuda_provider_factory.h (#8407) 2021-07-19 17:55:59 -07:00
satyajandhyala
84bc20fe9d
Enable cast propagation with level one by default. (#8286) 2021-07-08 14:38:09 -07:00
RandySheriffH
f40df30219
Replace functions with secured version for OSX compliance (#7586)
* replace strlen with strnlen

* replace vsnprintf with vsnprintf_l

* add macro

* switch to std numeric::limits

* apply uint16 max

* fix build err

* fix mac build

* define MAX_STR_LEN

* define MAX_STR_LEN

* fix typo

* trim empty lines

* apply constexpr

* fix typo

* add namespace

* fix build err

* rename global constant

Co-authored-by: Randy <Randy@randysmac.attlocal.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Randy <Randy@randysmac.local>
2021-07-08 11:02:36 -07:00
Zuwei Zhao
b46310b349
Integrate onnxruntime-extensions into onnxruntime. (#8143)
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-01 09:34:03 -07:00
Scott McKay
4993680e56
Graph::GetNodeProvidesGraphOutput -> NodeProducesGraphOutput (#8243)
'GetNode' is a little confusing as it returns a bool.

Update a couple more places where GetNodeOutputsInGraphOutputs was being used unnecessarily.
2021-06-30 20:43:33 +10:00
Scott McKay
b3479367cf
Add helper to check if node provides a graph output. (#8186)
* Add helper to check if node provides a graph output. The current approach unnecessarily creates a vector when most of the optimizers only care about a true/false response.

* Undo accidental change

* Fix a couple of issues due to copying from larger set of changes.
2021-06-30 12:15:42 +10:00
Changming Sun
c716b56f26
Update C++ Standard from 14 to 17 (#8041)
Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions:

1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work.

2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long.

3. Please avoid to use C++17 in CUDA files(*.cu). Also, the *.h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.
2021-06-25 14:08:01 -07:00
Chi Lo
91075255a7
Enable TRT provider option configuration for C# (updated version) (#7808)
* prepare for C# to configure provider options

* add c# code

* revert modification

* Add update provider info configuration in trt ep side

* fix bugs

* fix bug for compiler error C2259

* Add c# test

* fix bug

* fix bug

* Properly deal with string

* Add c# api for accepting trt provider options

* fix bug

* Modify C# test

* add shared lib test

* Add get provider options functionality

* clean up

* clean up

* fix bug

* fix bugs for CI

* Fix bugs for CI and documentation

* Move TRT EP provider options related functions out of C API

* revert

* fix bug

* refactor

* add check for provider options string

* code refactor

* fix CI bug

* Fix CI bugs

* clean up

* fix bug

* Fix bug for Post Analysis

* fix accidental bug

* Add API_IMPL_BEGIN/API_IMPL_END

* clean up

* code refactor

* code refactor

* fix CI fail

* fix bug

* use string append

* Change the code to better handle strncpy and string append
2021-06-25 03:21:22 -07:00
Negin Raoof
80b7b134bf
Adding optional ops in contrib ops (#7946)
* Added optional const spec
2021-06-24 13:16:31 -07:00
Guoyu Wang
f6292d9b38
[Android] Output error message to android log instead of stderr (#8114)
* Output error message to android log instead of stderr

* Address CR comments, move macro to a helper function

* Address CR comments

* Fix ort minimal build break
2021-06-22 17:50:06 -07:00
Tang, Cheng
059d705988
support pass in custom op registry for eager mode (#8087)
* support pass in custom op registry for eager mode

* fix the comments
2021-06-20 13:38:09 -07:00
Nat Kershaw (MSFT)
0237225117
Add @file annotation to support doxygen generation of C API docs (#7458) 2021-06-10 16:10:32 -07:00
Edward Chen
ab973dce33
[Objective-C API] Enable CoreML EP (#7914)
Enable CoreML EP in Objective-C API.
2021-06-03 18:59:10 -07:00
Jorn Tuyls
3bb780dcd5
Update Vitis AI EP to support multiple DPU targets through provider options (#6690)
* Update Vitis-AI EP support multiple DPU targets & specifically arm64 dpuczdx8g target

* Fix Vitis AI docker and default PyXIR versions

Co-authored-by: Jorn Tuyls <jornt@xilinx.com>
Co-authored-by: Jorn Tuyls <jornt.tuyls@gmail.com>
2021-06-03 19:53:46 +10:00
RandySheriffH
451fcb7df1
Add sequence support for identity on GPU (#7810)
* Add sequence supprot for identity on GPU

* implement TensorSeq in provider interface

* fix definition err

* Add new interface to TensorSeq

* fix comments

* fix comments

* fix mac warning

* move TensorSeq forward declaration

* add TensorSeq header

* remove declaration

* fix minor format

* fix minor format

* define TensorSeq as struct

Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2021-05-28 18:00:06 -07:00
Ryan Hill
5a63904aa9
Remove some templated versions of functions that are no longer needed (#7868)
* Switch to non template version of function
2021-05-28 13:22:45 -07:00
Edward Chen
fa093d8e45
[Objective-C API] Add ORTSession methods to get input, overridable initializer, and output names. (#7837) 2021-05-26 19:54:55 -07:00
Guoyu Wang
ae14cedd63
Fix c_api warning (#7803) 2021-05-22 01:23:39 -07:00
Edward Chen
b5c5e8c1ca
Update C++ API comment to resolve warning. (#7776) 2021-05-21 13:12:13 -07:00
Ryan Hill
c99aa3a3f3
Ryanunderhill/cuda shared (#7626)
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.

* More cuda shared library refactoring

* More cuda shared library refactoring

* More build options tested, converted the training ops over.

* Fix merge breaks

* Fix submodules

* Fix submodules

* Fix submodules

* Fix python

* Fix compile errors

* Duplicate symbol fix

* Test fix for ROCM provider

* Another ROCM test workaround

* ROCM Build Test

* ROCM build fix

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM test

* Reduce header dependencies

* Remove redundant namespace

* Test fix for linux

* Fix linux build

* Fix Eigen build error

* Fix unused parameter warning

* Test link error

* Another linker test

* Linker test

* Linker test

* Another test

* Another build test

* Fix linux link error

* Build test

* Fix control flow ops to use common base class with core code

* Remove extra qualifiers

* Fix template syntax for linux

* Fix cuda memory leak

* Fix pybind

* Test disabling cast

* Cleanup

* Restore cuda in test

* Remove more header dependencies

* Test not adding cuda provider to session

* Make GetProviderInfo_CUDA throw

* No-op cuda provider creation

* Fix some setup issues

* Fix memory cleanup on unload

* Diagnostics

* Don't unload library

* Add diagnostics

* Fix deleting registry at right time.

* Test disabling profiler

* Fix merge break

* Revert profiler change

* Move unloading of shared providers into Environment

* Free more global allocations before library unloads

* Add more diagnostics

* Move unloading back to the OrtEnv as there are multiple Environments created during a session.

Remove some library dependencies for tests.

* Fix more cmake files

* ERROR -> WARNING

* Fix python shutdown

* Test not using dml in pipeline

* Change python version and disable dml

* Update python version

* Test adding unload method for shared providers

* Disable DLL test

* Python test

* Revert "Python test"

This reverts commit c7ec2cfe98.

* Revert "Disable DLL test"

This reverts commit e901cb93aa.

* Revert "Test adding unload method for shared providers"

This reverts commit c427b78799.

* Point to RyanWinGPU

* Revert python version

* Fix id_to_allocator_map

* Another python exit test

* Remove extra debug messages
Try a more clean python shutdown through DllMain

* Revert DllMain idea, it didn't work

* Merge conflicts

* Fix merge with master issues.

* Comments

* Undo edit to file

* Cleanup + new training ops

* Revert yml changes

* Fix another merge error

* ROCM fix

* ROCM fix v2

* Put back Linux hack, it is necessary

* Stupid fixes

* Fix submodule out of sync

* ROCM fix 3

* ROCM 4

* Test java fix

* Fix typos

* Java test on my VM

* Fix build error

* Spotless fix

* Leave temp file around to load properly

* Fix cleanup on exit

* Fix break

* Java comments

* Remove LongformerAttentionBase workaround

* Spotless fix

* Switch yml back to regular build pool

* Revert "Switch yml back to regular build pool"

This reverts commit be35fc2a5a.

* Code review feedback

* Fix errors due to merge

* Spotless fix

* Fix minimal build

* Java fix for non cuda case

* Java fix for CPU build

* Fix Nuphar?

* Fix nuphar 2

* Fix formatting

* Revert "Remove LongformerAttentionBase workaround"

This reverts commit 648679b370.

* Training fix

* Another java fix

* Formatting

* Formatting

* For orttraining

* Last orttraining build fix...

* training fixes

* Fix test provider error

* Missing pass command

* Removed in wrong spot

* Python typo

* Python typos

* Python crash on exit, possibly due to unloading of libraries.

* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit

* Still can't unload providers in python, alas.

* Disable Nvtx temporarily

* MPI Kernels for Training

* MPI Kernels part 2

* Patch through INcclService

* Oops, wrong CMakeLists

* Missing namespace

* Fix missing ()

* Move INcclService::GetInstance around to link nicer

* Missing }

* Missing MPI libraries for Cuda

* Add extra GetType functions used by MPI

* Missing Nccl library

* Remove LOGS statements as a test

* Add in a couple more missing GetType methods

* Update comments

* Missed a logging reference in mpi_context.h

* Convert aten_op to shared (due to marge with master)

* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)

* Test passed, now with fix

* Missing static

* Oops, scope DistributedRunContext to just NCCL

* Merge related issues and code review feedback.

* Merge error

* Bump to rel-1.9.1 (#7684)

* Formatting

* Code review feedback for Java build on non Windows

* Remove cupti library dependency from core library

* Test Java pipeline fix

* Linux build fix

* Revert "Linux build fix"

This reverts commit a73a811516.

* Revert "Remove cupti library dependency from core library"

This reverts commit 6a889ee8bf.

* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages

* Add cuda to Tensorrt nuget package

* onnxruntime_common still has a cuda header dependency

Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
2021-05-20 07:53:47 -07:00
Changming Sun
31e6d3f85c
Revert CUPTI profiling feature (#7763)
For unknown reason it causes deadlocks when it is used with CUDA 11.1
2021-05-19 21:54:29 -07:00
alonre24
374acf1423
Disable external initializers build option (#7635)
* Merge set custom allocator to master

* Add documentation for the new API.
Reset global env in testCustomArenaAllocator so won't have a registered allocator of type arena (from previous test)

* Add a session option config that will allow to disable loading model with initializers that have an external data (+test it).

* Add the model used for the test and its external initializers data

* Change the session config option that disable external initializers to a build option.

* Addressing PR comments
2021-05-19 14:16:36 -07:00
Hariharan Seshadri
43e2ee37f2
Some cosmetic changes (#7741) 2021-05-18 00:02:07 -07:00
stevenlix
557b94637d
Add more TensorRT env variables to provider options (#7698)
* add all trt env variables to provider options

* add python test

* Update onnxruntime_c_api.h

* fix issues

* validate values for options
2021-05-16 22:09:52 -07:00
Hariharan Seshadri
53d1d55ea8
Add ability for pre-packed weights of shared initializers to be shared across sessions (#7421) 2021-05-14 20:44:42 -07:00
Changming Sun
1d403ba03b
Fix a compile warning in EigenNonBlockingThreadPool.h (#7638) 2021-05-14 11:38:34 -07:00
Edward Chen
19704aedbb
Update Objective-C API (#7675)
- Add session/run configuration
- Add additional supported tensor data types
- Clean up
2021-05-13 18:47:22 -07:00
satyajandhyala
9f69b2f291
Added InsertAndReduce strategy to PropagateCastOps transformation in addition to FloodFill strategy (#7454)
* Moved GraphTransformerConfiguration to a separate file and added strategy option to PropagateCastOps transformation.

* Added testing both FloodFill and InsertAndReduce stratigies for cast propagation.

* Added AddConsumer and RemoveConsumer functions to in graph.h for efficient graph editing.

* Added PropagateCastOps code documentation

* Added GraphTransformationConfiguration class hierarchy information

* Added RemoveInputOutputUpDownCasts
2021-05-10 20:46:28 -07:00
ankurverma85
de4089f8cb
GCC11/Libstdc++11 Compilation fixes (#7599)
Authored-by: Ankur Verma <ankurv@microsoft.com>
2021-05-10 12:50:08 -07:00
Hariharan Seshadri
4b691a5c0d
Add ability for memory arenas to "shrink" periodically (#7284) 2021-05-08 07:53:21 -07:00
Edward Chen
5a5fec0452
Fix logs getting skipped in single-line conditionals. (#7589)
Fix an issue where a log message got skipped.

A log call like this:
```
LOGS(...) << "message";
```
expands to something like this:
```
if (<output enabled>)
  logging::Capture(...).Stream() << "message";
```

This if statement without brackets is handy for logging arbitrary arguments with the `<<` operator. However, it has other drawbacks like possibly associating with a subsequent `else`.

```
if (cond)
  LOGS(...) << "a";
else
  <do something> // not run when !cond

// equivalently:
if (cond)
  if (<output enabled>)
    logging::Capture(...).Stream() << "a";
  else
    <do something> // not run when !cond
```

Updated the logging macros to handle this case by replacing `if (<enabled>) logging::Capture(...).Stream()` with `if (!<enabled>) {} else logging::Capture(...).Stream()`.

Thanks @tlh20 for the idea for the fix!
2021-05-07 15:40:47 -07:00
Pranav Sharma
bdb2ed7864
Revert "Add log to allow serving platforms to quantify ORT usage. (#7476)" (#7598)
This reverts commits da5c926, 4186233 and be2a304.
2021-05-06 16:21:32 -07:00
Pranav Prakash
053bada30f
Add support for setting shape inference function on fused nodes (#7007)
* Add support for setting shape inference function on fused nodes
* Add test for fused node shape inference
2021-05-05 13:32:07 +10:00
Tim Harris
2e09d9921a
"Sticky" allocation of worker threads (#7551)
[ PR previously merged as https://github.com//pull/7372, then reverted pending investigation of lost-wake-up issue seen with ParallelExecutor. Issue was a missing test for new work pushed to thread concurrent with a worker blocking. Change from 7372 is the addition of: https://github.com/microsoft/onnxruntime/blob/tiharr/dev-sticky-4/include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h#L1473-L1492 ]

Description: This change updates the heuristics used when a thread selects which worker threads to push work to on entering a parallel loop. Previously, worker threads would maintain a best-effort bitmap of "good worker hints" indicating the threads that were likely to be spinning waiting for work. This change uses a simpler heuristic where a thread records which workers ran its previous loop, and then re-submits its next loop to those same workers. The aim is to retain affinity between a thread and a set of workers, and to avoid maintaining the "good worker hints" bitmaps.

Motivation and Context: Profiling suggested that maintaining the "good worker hints" was taking unexpected time, particularly on NUMA systems. In addition, when running many concurrent workloads, the hints did not provide a way to help retain locality of workers and hence data in caches. Testing to confirm no regressions on microbenchmark (./build/Linux/Release/onnxruntime_benchmark --benchmark_filter=BM_ThreadPoolParallelFor) and on Linux mobilenet_v1_1.0_224.onnx, comparing p50 and p99 with vs without this change:

1 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0204s vs 0.0216s

2 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0213s vs 0.0221s
2021-05-03 18:28:13 +01:00