Commit graph

228 commits

Author SHA1 Message Date
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
Vincent Wang
44e2db9397
CUDA BFloat16 Refactor (#10085) 2022-01-14 19:38:56 +08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Edward Chen
3466ee45a3
Add hash value typedef. (#9710)
Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.
2021-12-15 19:07:17 -08:00
Changming Sun
9d9ebd3b85
Fix some static analysis warnings in the core framework (#10033) 2021-12-14 14:41:42 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Ryan Lai
57a6f7c205
Various fixes to fix WindowsAI RI build. (#9877)
* WAI RI fixes

* span changes

* Spaces

* Additional warnings to fix

* Fix redundant commment
2021-11-29 21:33:15 -08:00
Edward Chen
bcc6ab29f6
Trim DataTypeImpl binary size (#9813)
* De-virtualize DataTypeImpl::AsXType() functions.
* Refactor helpers.
2021-11-22 12:06:24 -08:00
Scott McKay
dc1724b0e2
Reduce DataTypeImpl binary size (#9783)
* Reduce the number of virtual methods in DataTypeImpl to reduce binary size.
Refactor some helpers to reduce the amount of templatized code.
2021-11-19 10:29:12 +10:00
Dmitri Smirnov
6284cbe833
Add TensorShape noexcept for move ops and fix some warnings (#9802)
Add TensorShape noexcept for move ops and fix some warnings
2021-11-18 15:27:24 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Sheil Kumar
3d0bd2596f
Enable creating OrtValues from ID3D12Resources from the onnxruntime C-API (#9686)
* Add onnxruntime-windows api.

* minor fixes

* add to package headers

* Build ort_dml_api for provider extensions.

* Cleanup

* misc comment

* remove winml specific comments

* use dml check in onnxruntime

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/ort_apis.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/test/adapter/AdapterSessionTest.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

* PR feedback

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* PR feedback

* merge resolution and unreference param

* (naming) Remove Dml prefix

* maybe unused version

* move DML code into DML path. CIs failing because DML is not available when --use_dml is not on

* fix warning causing local build failures after merging

* Change getvaluememoryinfo to gettensormemoryinfo

* minor breaks

* fix comment paste

* fix comment

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2021-11-13 03:34:54 -08:00
Hariharan Seshadri
65590b049c
Expose an API to query the CUDA compute stream to launch a custom kernel (#9141) 2021-11-08 21:10:34 -08:00
Ryan Hill
24e35fba32
Change TensorShape to typically not allocate heap memory (#9542) 2021-11-08 10:29:54 -08:00
Hariharan Seshadri
bbeceb7541
Support optional type in ORT (#8339) 2021-11-04 15:01:42 -07:00
Edward Chen
ddb4c05852
Save graph runtime optimizations for minimal build (#9508)
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
2021-11-04 10:49:46 -07:00
Sherlock
ff23b9ff55
Avoid cudaStreamSync at the end of Forward/Backward (#9470)
* Skip cudaStreamSynchronize at the end of fw 

* skip sync stream for end of backward
2021-10-21 11:28:25 -07:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
RandySheriffH
058108bef9
Execution Provider Profiler (#8406)
* implement cuda provider

* define profiler common

* call start after register

* add memcpy event

* add cuda correlation

* format code

* add cupti to test path

* switch to CUpti_ActivityKernel3

* reset cupti path

* fix test case

* fix trt pipeline

* add namespace

* format code

* exclude training from testing

* remove mutex
2021-09-28 13:59:52 -07:00
Ryan Hill
b876e5675b
C API Enum Name Fixes (#9092) 2021-09-17 15:11:26 -07:00
Vincent Wang
c343f7cb43
Add Algorithm Search for ConvGrad (#8613)
* algo search for conv grad

* global cache, bigger workspace size

* fix build error

* refactor

* refactor

* resolve comments

* fix rocm

* change lock places

* rename variable

* remove setting for inference

* resolve comments
2021-09-03 11:25:17 +08:00
Edward Chen
7e53a1df6f
Enable selector action transformer infrastructure in minimal build. (#8804) 2021-08-27 17:16:05 +10:00
Rachel Guo
1886f1a737
Make SparseTensor infrastructure optional (#8802)
Add cmake parameter and #ifdefs to allow for disabling sparse tensor support. This comes with a significant binary size cost so we want to be able to exclude it in a minimal build.
2021-08-27 17:12:26 +10:00
Hariharan Seshadri
cee79526fd
Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442) 2021-08-25 12:04:20 -07:00
Dmitri Smirnov
8713d76dd1
Introduce C and C++ APIs for Sparse Tensors (#8621)
Add IsSparseTensor
  Add CreateSparseTensor
 Add utilities and test fully sparse instantiation
 Fully sparse blocksparse
 Add test and docs for fully sparse tensor instantiation
 Rework creation API
 Use API
 Non string API
 Retrofit of existing String API
 Add tests
 Add documentation
 Address build issues (Winml pending)
 Add inference test
 Bump binary size
 Add ifdef DISABLE CONTRIB
2021-08-16 16:33:47 -07:00
Changming Sun
436ac6dd5f
Rename ml_value.h to ort_value.h (#8726) 2021-08-13 07:04:56 -07:00
Dmitri Smirnov
1a8adb96fe
Reduce templatization of C API and refactor for InitOrtValue (#8700)
Refactor for OrtInit
  Simplify C API
  Add ort_provider bridge interfaces
2021-08-12 16:51:18 -07:00
Edward Chen
89601ee6b3
[EP Partitioning Utils] Add check for assigned node. (#8473)
Adds a check that a node is not already assigned to an EP before adding it to an EP partition.
2021-08-12 16:08:25 -07:00
Dmitri Smirnov
950fe5e28b
Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038)
SparseTensor support
  Implement Builder pattern
  Fix support for 1-D and 2-D COO indices
  Implement and test CSR support.
  Handle shape inference for SparseTensors
  Implement conversion for COO, CSR and tests.
  Address the case where constant sparse initializer is the output.
  Implement test infra for SparseTensors
  Implement SparseDenseMatMul for Csr and COO and tested it.
  Add hash for SparseToDenseMatMul
  Finish shared provider refactor
  Refactor GetOrCreate to Create
  Working on py interface
  Expose OrtDevice and use it in allocate_numpy
	Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
	Add and test to_cuda()
	Add accessors to format specific indices
	Test values and indices views, read-only flag, after GC access
	Add sparse related methods to OrtValue
	Re-work SparseTensor wrapper, add OrtValue methods
	Rework numpy_array_to_cuda/to_cpu
	Add run_with_ort_values
	Add models and test sparse_mat_mul with run_with_ort_values
	Refactor sparse tensor to use a single buffer
        Ifdef x86 Eigen CSR sparse matmul implementation
        Exclude broken test, check for string type when copying cross device
       Split pybind schema, regenerate docs, add exclusion
       Conditionally exclude schema module
       Update docs fix cuda build
       Add test to a filter and renerate JS docs
      Add conversion and test string support for sparse tensors
      Exclude conversion utils from minimal build
      Add CUDA Memcpy and adjust provider interfaces
2021-07-22 15:24:36 -07:00
Hariharan Seshadri
3360024a0b
Support plugging in custom user-defined allocators for sharing between sessions (#8059) 2021-07-22 10:17:35 -07:00
Changming Sun
c716b56f26
Update C++ Standard from 14 to 17 (#8041)
Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions:

1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work.

2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long.

3. Please avoid to use C++17 in CUDA files(*.cu). Also, the *.h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.
2021-06-25 14:08:01 -07:00
RandySheriffH
451fcb7df1
Add sequence support for identity on GPU (#7810)
* Add sequence supprot for identity on GPU

* implement TensorSeq in provider interface

* fix definition err

* Add new interface to TensorSeq

* fix comments

* fix comments

* fix mac warning

* move TensorSeq forward declaration

* add TensorSeq header

* remove declaration

* fix minor format

* fix minor format

* define TensorSeq as struct

Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2021-05-28 18:00:06 -07:00
Ryan Hill
5a63904aa9
Remove some templated versions of functions that are no longer needed (#7868)
* Switch to non template version of function
2021-05-28 13:22:45 -07:00
Ryan Hill
c99aa3a3f3
Ryanunderhill/cuda shared (#7626)
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.

* More cuda shared library refactoring

* More cuda shared library refactoring

* More build options tested, converted the training ops over.

* Fix merge breaks

* Fix submodules

* Fix submodules

* Fix submodules

* Fix python

* Fix compile errors

* Duplicate symbol fix

* Test fix for ROCM provider

* Another ROCM test workaround

* ROCM Build Test

* ROCM build fix

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM test

* Reduce header dependencies

* Remove redundant namespace

* Test fix for linux

* Fix linux build

* Fix Eigen build error

* Fix unused parameter warning

* Test link error

* Another linker test

* Linker test

* Linker test

* Another test

* Another build test

* Fix linux link error

* Build test

* Fix control flow ops to use common base class with core code

* Remove extra qualifiers

* Fix template syntax for linux

* Fix cuda memory leak

* Fix pybind

* Test disabling cast

* Cleanup

* Restore cuda in test

* Remove more header dependencies

* Test not adding cuda provider to session

* Make GetProviderInfo_CUDA throw

* No-op cuda provider creation

* Fix some setup issues

* Fix memory cleanup on unload

* Diagnostics

* Don't unload library

* Add diagnostics

* Fix deleting registry at right time.

* Test disabling profiler

* Fix merge break

* Revert profiler change

* Move unloading of shared providers into Environment

* Free more global allocations before library unloads

* Add more diagnostics

* Move unloading back to the OrtEnv as there are multiple Environments created during a session.

Remove some library dependencies for tests.

* Fix more cmake files

* ERROR -> WARNING

* Fix python shutdown

* Test not using dml in pipeline

* Change python version and disable dml

* Update python version

* Test adding unload method for shared providers

* Disable DLL test

* Python test

* Revert "Python test"

This reverts commit c7ec2cfe98.

* Revert "Disable DLL test"

This reverts commit e901cb93aa.

* Revert "Test adding unload method for shared providers"

This reverts commit c427b78799.

* Point to RyanWinGPU

* Revert python version

* Fix id_to_allocator_map

* Another python exit test

* Remove extra debug messages
Try a more clean python shutdown through DllMain

* Revert DllMain idea, it didn't work

* Merge conflicts

* Fix merge with master issues.

* Comments

* Undo edit to file

* Cleanup + new training ops

* Revert yml changes

* Fix another merge error

* ROCM fix

* ROCM fix v2

* Put back Linux hack, it is necessary

* Stupid fixes

* Fix submodule out of sync

* ROCM fix 3

* ROCM 4

* Test java fix

* Fix typos

* Java test on my VM

* Fix build error

* Spotless fix

* Leave temp file around to load properly

* Fix cleanup on exit

* Fix break

* Java comments

* Remove LongformerAttentionBase workaround

* Spotless fix

* Switch yml back to regular build pool

* Revert "Switch yml back to regular build pool"

This reverts commit be35fc2a5a.

* Code review feedback

* Fix errors due to merge

* Spotless fix

* Fix minimal build

* Java fix for non cuda case

* Java fix for CPU build

* Fix Nuphar?

* Fix nuphar 2

* Fix formatting

* Revert "Remove LongformerAttentionBase workaround"

This reverts commit 648679b370.

* Training fix

* Another java fix

* Formatting

* Formatting

* For orttraining

* Last orttraining build fix...

* training fixes

* Fix test provider error

* Missing pass command

* Removed in wrong spot

* Python typo

* Python typos

* Python crash on exit, possibly due to unloading of libraries.

* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit

* Still can't unload providers in python, alas.

* Disable Nvtx temporarily

* MPI Kernels for Training

* MPI Kernels part 2

* Patch through INcclService

* Oops, wrong CMakeLists

* Missing namespace

* Fix missing ()

* Move INcclService::GetInstance around to link nicer

* Missing }

* Missing MPI libraries for Cuda

* Add extra GetType functions used by MPI

* Missing Nccl library

* Remove LOGS statements as a test

* Add in a couple more missing GetType methods

* Update comments

* Missed a logging reference in mpi_context.h

* Convert aten_op to shared (due to marge with master)

* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)

* Test passed, now with fix

* Missing static

* Oops, scope DistributedRunContext to just NCCL

* Merge related issues and code review feedback.

* Merge error

* Bump to rel-1.9.1 (#7684)

* Formatting

* Code review feedback for Java build on non Windows

* Remove cupti library dependency from core library

* Test Java pipeline fix

* Linux build fix

* Revert "Linux build fix"

This reverts commit a73a811516.

* Revert "Remove cupti library dependency from core library"

This reverts commit 6a889ee8bf.

* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages

* Add cuda to Tensorrt nuget package

* onnxruntime_common still has a cuda header dependency

Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
2021-05-20 07:53:47 -07:00
Hariharan Seshadri
43e2ee37f2
Some cosmetic changes (#7741) 2021-05-18 00:02:07 -07:00
Hariharan Seshadri
53d1d55ea8
Add ability for pre-packed weights of shared initializers to be shared across sessions (#7421) 2021-05-14 20:44:42 -07:00
Hariharan Seshadri
4b691a5c0d
Add ability for memory arenas to "shrink" periodically (#7284) 2021-05-08 07:53:21 -07:00
Changming Sun
1012535dab
Change onnxruntime::make_unique to std::make_unique (#7502)
1. Change onnxruntime::make_unique to std::make_unique
2. Add "-std=c++14" to ROCM EP's build flags.
2021-04-29 17:04:53 -07:00
Hariharan Seshadri
7b11283af0
Add ability to allocate initialized tensor memory from non-arena memory (#7267) 2021-04-20 20:27:48 -07:00
Edward Chen
0ebeaf529d
Check kernel def hashes (#7120)
Add unit test for verifying kernel def hashes.
Add way to add new types to kernel definition without changing hash.
2021-04-01 17:42:58 -07:00
Edward Chen
0ccfe6c86a
Enable type reduction for Scatter/ScatterElements CPU kernels (#7171)
Enable type reduction for Scatter/ScatterElements CPU kernels. Some refactoring to reduce binary size.
Add MLTypeCallDispatcher methods.
Minor cleanup for Pad CPU kernel.
2021-03-30 11:02:24 -07:00
Sherlock
ab86634c36
Address comments from ORTModule master merge (#7101)
* Address ortmodule merge master comments

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-26 16:26:42 -07:00
Vincent Wang
fda0470683
Add New AllocKind for YieldOp Outputs, Run YieldOp with InferenceSession in UT (#7125)
* new allockind, add ut

* change macro

* fix win build

* rename alloc kind

* fix mem leak
2021-03-25 15:18:51 +08:00
Thiago Crepaldi
3348b8485f Post merge update for ORTModule
Changes include:
* Revert Event Pool changes
* Add copyright and revert unrelated changes
* Add DLPack as submodule and remove to_dlpack and from_dlpack from public API
* Update golden numbers for DHP Parallel tests
* Update ORTTrainer unit test numbers
* Rollback to DLPack v0.3
* Disable flaky test
* Update third party notices and CG manifest file
* Minor refactoring of ORTValue API
2021-03-16 20:11:59 -07:00
Thiago Crepaldi
89d450697b Introduce ORTModule training API to ONNX Runtime 2021-03-10 10:48:10 -08:00
Vincent Wang
8468099f93
Use DLPack for Graph Inputs and External Outputs of YieldOp (#6968) 2021-03-10 09:13:45 -08:00
Edward Chen
d5ed3e7fba
Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960)
* Update EyeLike CPU kernel.

* Update Mod CPU kernel.

* Update Multinomial CPU kernel.

* Slight improvement to Pad CPU kernel binary size.

* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.
2021-03-10 15:32:56 +10:00