Commit graph

600 commits

Author SHA1 Message Date
Edward Chen
4b87d2c172
Fix dockerfiles/Dockerfile.arm32v7 build. (#10360)
Install CMake, ignore some Eigen warnings.
2022-01-24 19:06:09 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
Vincent Wang
44e2db9397
CUDA BFloat16 Refactor (#10085) 2022-01-14 19:38:56 +08:00
RandySheriffH
79d2a0d185
Dynamic cost model to mitigate high E2E perf variance (#9833)
* commit dyamic block size

* summarize granularity

* add configure

* add test case

* call std stoi

* add comments

* fix typo

* rename var

* update comment

* reset default

* better comments

* extend LoopCounter for dynamic blocking

* fix comments and add more UT

* update comments

* swtich type to std::ptrdiff_t

* format code with better indention

* cast ptrdiff_t

* fix typo
2022-01-11 17:26:41 -08:00
Shucai Xiao
ce103ace93
Amdmigraphx fix build error (#9272)
* fix build error

* rename a missing api for the MIGraphX EP
2022-01-10 15:18:43 -08:00
Dwayne Robinson
1f5b073508
Minor DirectML EP provider factory comments (#9965) 2022-01-10 02:06:31 -08:00
Nat Kershaw (MSFT)
d52d3c0052
Update C/C++ API docs automation to create a PR (instead of push to publish branch) (#10093) 2022-01-07 16:16:47 -08:00
Hariharan Seshadri
0552a47ec2
Enable CUDA provider option configuration for C# (#10188) 2022-01-06 11:03:14 -08:00
Edward Chen
792db33f01
Enable loading of ORT format model graph runtime optimizations (#9901)
Initial implementation of load/replay of runtime optimizations in an ORT format model.
2022-01-04 12:09:07 -08:00
stevenlix
05d20343ee
Remove duplicated constant initializer copies for TensorRT nodes (#10105)
* add new field constant_initializers in metadef and remove constant initializers from trt node inputs

* remove redundancy

* use GetConstantInitializer() to get constant initializers

* add ORT_ENFORCE check

Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2021-12-22 12:19:56 -08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Edward Chen
3466ee45a3
Add hash value typedef. (#9710)
Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.
2021-12-15 19:07:17 -08:00
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
Changming Sun
20f8a06f1f
Remove OpenMP code (#10032) 2021-12-15 00:58:42 -08:00
Changming Sun
9d9ebd3b85
Fix some static analysis warnings in the core framework (#10033) 2021-12-14 14:41:42 -08:00
Ryan Hill
343a76945b
Fix some documentation errors plus ones generating doxygen warnings (#9993) 2021-12-09 17:42:34 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Ryan Lai
57a6f7c205
Various fixes to fix WindowsAI RI build. (#9877)
* WAI RI fixes

* span changes

* Spaces

* Additional warnings to fix

* Fix redundant commment
2021-11-29 21:33:15 -08:00
jingyanwangms
bf5e9a5044
bumping up ORT_API_VERSION to 10 (#9838)
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-11-22 20:27:45 -08:00
Edward Chen
bcc6ab29f6
Trim DataTypeImpl binary size (#9813)
* De-virtualize DataTypeImpl::AsXType() functions.
* Refactor helpers.
2021-11-22 12:06:24 -08:00
Dmitri Smirnov
567749b2dc
Expose IOBinding SynchronizeInputs/Outputs via C/C++/C# And Python APIs (#9823)
Add C/C++ APIs for SynchronizeBoundInputs/Outputs
 Add python bindings
 Expose SynchronizeBoundInputs/Outputs to C# API
2021-11-22 09:45:31 -08:00
Wei-Sheng Chin
e520bb5145
Improve print functions for NodeArg, Node, and Graph (#9801) 2021-11-19 09:48:27 -08:00
Scott McKay
dc1724b0e2
Reduce DataTypeImpl binary size (#9783)
* Reduce the number of virtual methods in DataTypeImpl to reduce binary size.
Refactor some helpers to reduce the amount of templatized code.
2021-11-19 10:29:12 +10:00
Dmitri Smirnov
6284cbe833
Add TensorShape noexcept for move ops and fix some warnings (#9802)
Add TensorShape noexcept for move ops and fix some warnings
2021-11-18 15:27:24 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
satyajandhyala
421e4c03ce
Update default cast propagation strategy from None to FloodFill (#9713)
* Changed the default cast propagation strategy from None to FloodFill.
2021-11-16 13:15:57 -08:00
Edward Chen
9acbfeba09
Address some code scan issues. (#9752) 2021-11-16 10:24:46 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Sheil Kumar
3d0bd2596f
Enable creating OrtValues from ID3D12Resources from the onnxruntime C-API (#9686)
* Add onnxruntime-windows api.

* minor fixes

* add to package headers

* Build ort_dml_api for provider extensions.

* Cleanup

* misc comment

* remove winml specific comments

* use dml check in onnxruntime

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/ort_apis.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/test/adapter/AdapterSessionTest.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/session/onnxruntime_c_api.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/session/onnxruntime_c_api.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update winml/adapter/winml_adapter_c_api.cpp

* PR feedback

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update include/onnxruntime/core/providers/dml/dml_provider_factory.h

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* PR feedback

* merge resolution and unreference param

* (naming) Remove Dml prefix

* maybe unused version

* move DML code into DML path. CIs failing because DML is not available when --use_dml is not on

* fix warning causing local build failures after merging

* Change getvaluememoryinfo to gettensormemoryinfo

* minor breaks

* fix comment paste

* fix comment

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2021-11-13 03:34:54 -08:00
RandySheriffH
21eb747a0f
Custom thread creation and join hooks (#9426) 2021-11-12 19:10:31 -08:00
Guoyu Wang
5ad6dbb314
Remove experimental from ORT format namespace (#9729)
* schema change

* cc channges

* remove temp debug code

* Adding fbs namespace to session_state_flatbuffers_utils.h

* Add fbs namepsace to all ort format utils
2021-11-11 19:46:30 -08:00
Gary Miguel
93e239747f
Construct valid graphs for ONNX checker for IR version < 4. (#9665)
* Construct valid graphs for ONNX checker for IR version < 4.

Previously the constructed graph was not guaranteed to have its
initializers be a subset of its inputs, which is required for IR
version < 4. This resulted in spurious failures.

Fixes #9663
2021-11-12 09:13:28 +10:00
Guoyu Wang
a70ae24475
Add QDQ::Selector::Select to use const GraphViewer instead of mutable Graph (#9621)
* Move qdq selector to use const GraphViewer

* minor update

* Move qdq logic from NodeSelector to QDQ Selectors

* Fix build break

* Move selector result to NodesToOptimizeIndexes

* fix build break

* address CR comments

* move indexes -> indices

* Pass  graph_viewer to avoid recreating many times

* Update after merge master

* update graph viewer remarks

* update comments

* Add ut for new qdq selector logic

* Increase minimal binary size limit

* UT minor update

* Address CR comments
2021-11-08 21:36:29 -08:00
Hariharan Seshadri
65590b049c
Expose an API to query the CUDA compute stream to launch a custom kernel (#9141) 2021-11-08 21:10:34 -08:00
Ryan Hill
24e35fba32
Change TensorShape to typically not allocate heap memory (#9542) 2021-11-08 10:29:54 -08:00
Hariharan Seshadri
bbeceb7541
Support optional type in ORT (#8339) 2021-11-04 15:01:42 -07:00
Edward Chen
ddb4c05852
Save graph runtime optimizations for minimal build (#9508)
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
2021-11-04 10:49:46 -07:00
Edward Chen
c315d1b3cd
Always enable ORT format model loading. (#9586) 2021-11-01 10:00:08 +10:00
TomWildenhain-Microsoft
e8268c9a18
Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284)
* Add Transpose Optimizer and modify nhwc optimizer to use it.

* Fix casts

* Fix casts2

* Fix move

* Add tests

* Add headers

* Fixes and tests

* Remove explicit template instantiation

* Fix build warning

* Name unit tests

* Code review fixes

* Add some comments

* Fix some casts

* Make optimization slightly less agressive

* Some unit test fixes

* Update Attention pattern to work with transpose optimizer

* Update attention fuser

* Fix attention fusion python script

* Improve transpose optimizer documentation

* Create OptimizerCtx struct

* Disable Slice handler for testing

* Implement Slice int32

* Only push transposes leading up to other transposes

* Improve optimization heuristic

* Add exemption for MaxPool

* Document transpose optimizer api.h

* Revert fusion tests to master

* Remove temp files

* Replace typedef with using

* Trim trailing whitespace

* Move class declarations from api_impl.h to api_impl.cc

* Remove copy constructors and move allocator

* Alphabetize headers

* Add override keyword

* Comments for nhwc_transformer

* Rename OrtGraph to ApiGraph, etc.

* Wrap line

* Remove extra qualifier on ApiGraph

* Refector attention fusion

* Remove c-style casts from api_impl.cc

* Improve documentation

* Avoid printing vector in ORT_ENSURES

* Revert attention fusion refactor

* Remove duplicate cost heuristics and improve documentation

* Fix size_t casts

* Fixes from Scott's review

* Unrevert attention refactor and more updates from Scott's review

* Revert api_impl.cc ValueInfo change

* only optimize first transpose input

* Unrevert api_impl.cc changes

* Make vector call reserve

* transpose_optimizer.cc update from Scott's comments

* Rename api::Graph to api::GraphRef etc.

* Consider domains 'onnx.ai' and '' equal

* Replace AddInput with SetInput

* Improve tests

* quantization and heuristic tests

* Comments for tests

* Replace const string_view with string_view and update tests

* Fixes requested by Edward

* Fix std::string to string_view conversion

* Add <string> to includes

* Fix bug for broadcasting ops with unknown rank. Slight safety improvements

* Changes requested by Edward

* Fix formatting

* Improve description of cost metric
2021-10-27 22:10:39 -07:00
Ginés Hidalgo
9639eded4b
Missing #pragma once in dml_provider_factory.h (#9457) 2021-10-27 02:49:52 -07:00
Ginés Hidalgo
a79d375d24 Added fixes for Clang on Win64 2021-10-22 16:59:09 -07:00
Changming Sun
d83adaaf9f
Remove optional-lite (#9424) 2021-10-22 16:45:45 -07:00
Sherlock
ff23b9ff55
Avoid cudaStreamSync at the end of Forward/Backward (#9470)
* Skip cudaStreamSynchronize at the end of fw 

* skip sync stream for end of backward
2021-10-21 11:28:25 -07:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard (#9279)
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
Guoyu Wang
60bbdf1403
Remove unused NodeArgs in Graph::Resolve (#9213)
* Remove unused NodeArgs

* Handle case where a node arg from an initializer from initializer_names_to_preserve

* Fix CI failure

* update test

* Fix outer scope node args failure

* Use NodeArg* as the key of the std::set instead of string

* Minor updates
2021-10-01 11:44:26 -07:00
RandySheriffH
058108bef9
Execution Provider Profiler (#8406)
* implement cuda provider

* define profiler common

* call start after register

* add memcpy event

* add cuda correlation

* format code

* add cupti to test path

* switch to CUpti_ActivityKernel3

* reset cupti path

* fix test case

* fix trt pipeline

* add namespace

* format code

* exclude training from testing

* remove mutex
2021-09-28 13:59:52 -07:00
Hariharan Seshadri
f7dedc9002
Fix default initialization value in C API header (#9126)
* fix default initialization value in C API header

* Fix conflicts

* Nits
2021-09-20 20:58:13 -07:00
Ryan Hill
6ae5f7a244
C API Docs - Add build instructions (#9106)
* Update Doxyfile, add build instructions to header
* Update paths in README.md
2021-09-17 18:40:27 -07:00