Commit graph

372 commits

Author SHA1 Message Date
Dmitri Smirnov
3433576fd3
Support for Sparse Initializers (#5540)
Introduce sparse_initializers support.
  Convert them to dense on model load and prune graph_proto_
  so they don't consume space. Convert back to sparse on ORT Format model save.
  Implement serializing sparse initializers to OrtFormat.
  Fix Model::ToProto() to return original sparse initializers
  Set a flag that graph_sync is needed when loading a simple ORT Format model.
  otherwise nothing is resolved.
  Add ORT Format history to README.md
  ifdef MINIMAL build for DenseToSparseTensorInitializer
  Allow duplicate initializers to support existing models.
  Issue a warning instead of aborting.

* Revert "Remove SparseTensor support from minimal build. (#5114)"
This reverts commit 59ee8ffb17.



Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-10-27 10:32:06 -07:00
Yufeng Li
30cdc74bc0
Enable prepacking in subgraph (#5433)
Prepacking in subgraph is not supported currently. We see more and more models with subgraph, which has MatMul, MatMulInteger and other ops. Prepacking can speed up those models significantly.
2020-10-26 22:22:31 -07:00
Du Li
860cb22260
Bug fix for C API (#5520)
* remove if_def from C api

* Fix CI issues.

* revert change for symbols.txt
2020-10-24 13:37:58 -07:00
Ryan Hill
82c7a9756e
Fix shared provider unload crash (#5553) 2020-10-21 13:01:21 -07:00
Changming Sun
280cdf31f5
Revert "Fix shared provider unload crash (#5523)" (#5547)
This reverts commit 610676293e. Because Linux DNNL pipeline is failing.
2020-10-20 08:01:28 -07:00
Ryan Hill
610676293e
Fix shared provider unload crash (#5523)
* Change shared providers so that they are shutdown before shared library unload
* Move UnloadSharedProviders declaration into a shared header to avoid bugs.
2020-10-19 18:08:38 -07:00
Sunghoon
645d978589
Sunghcho/denormals (#5391)
* Add session option and global thread pool option to set denormal as zero.

* Revert unneccessary changes.

* Add cpuinfo submodule

* Add more comments

* Remove cpuinfo submodule dependency and check only SSE3 support for ftz and daz inspired by Tensorflow

* Preserve API order in C api

* Clean up and utilize SSE3 detection logic from existeing cpuid_info.h

* Keep the same order with header file

* Fix build issue with Linux pipeline, which has old g++ compiler

* Fix broken build on Linux and remove a duplicated unit test

* Remove reformatting at eigen thread pool

* Remove flatbuffers which is not intentionally added

* Revert "Remove flatbuffers which is not intentionally added"

This reverts commit 9f509a9aaaa3c7832d88854c82fd26b234770b7f.

* Remove flatbuffers which is not intentionally added

* Resolve comments
  - Put details on APIs
  - Add a log for ftz/daz initialization
  - Add clang
  - Fix typo

* Remove unnecessary header include

* Resolve comments
2020-10-15 12:47:42 -07:00
Chun-Wei Chen
2b6b3a2ee6
Add GetProfilingStartTimeNs() to Python/C# APIs (#5280)
* add Python API for getProfilingStartTime

* debug for using Python API

* add in C# api

* use uint intead of uint64_t to prevent warning

* typo for GetProfilingStartTimeNs

* remove const

* Update onnxruntime/python/session.py

Co-authored-by: Pranav Sharma <emailpranav@gmail.com>

* remove unnecessary return

* Add Python unit test

* Add C# unit test and refactor Python test

* use ulong in C# for uint64_t in C++

* remove time.monotonic_ns

* syntax: remove public for inner function

* correct the API's order

* getprofilingstarttime after run

* Correct the right order in NativeMethod.cs

* update order

* nit: remove spaces

* Update csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs

Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>

* use the updated function

* add comment about the precision

* add more comments

* add session.py back

* fix flake8

* remove session.py

* Add comments in C, C#, Python APIs about precision

Co-authored-by: Pranav Sharma <emailpranav@gmail.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
2020-10-14 05:32:43 -07:00
Xiang Zhang
b12824fa7a
add telemetry event for nodejs binding (#5463) 2020-10-12 22:53:01 -07:00
KeDengMS
c444b9d76a
Add CUDA option to run copy in default stream (#5445)
* Add CUDA option to run copy in default stream

This change fixes #4829. Thanks @maherzog for providing the repro!

The bug is caused by memory reuse in BFC arena, where copy and
compute stream in CUDA has a racing condition.

BFC arena is an arena allocator on top of cudaMalloc/Free to
reduce the cost in syncing CPU and GPU when alloc/free. It means
when CPU alloc/free the memory, GPU might not finished previous
work on the memory, so that CPU and GPU could run asynchronously.

This is OK if there's only one stream, where the execution order
in CPU and GPU are consistent. For example, if we have two kernels
A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB,
A and B could shares the same memory since computeA and computeB
will not have racing as long as they run in the same GPU compute
stream.

However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB,
the order of execution in GPU could have copyA happen after computeB,
if copy and compute happens in different GPU streams.

This change makes copy to run in default compute stream, while adding
an option to fall back to previous behavior if there's perf hit. This
is a short term fix before BFC arena could support multiple streams.

User may use following options to revert to previous behavior:
C API:
  struct OrtCUDAProviderOptions cudaProviderOpt;
  cudaProviderOpt.do_copy_in_default_stream = false;
C++ API:
  CUDAExecutionProviderInfo cudaEPInfo;
  cudaEPInfo.do_copy_in_default_stream = false;
C# API:
  pending...
Python:
  import onnxruntime
  onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False)

* Confirmed the test failes in CI when doing copy in separate stream

Revert the test to get CI pass now

* Fix Windows test

* Address CR
2020-10-12 22:12:05 -07:00
Scott McKay
a92ccbe1bc
Various armv7 related fixes (#5394)
* - Link with libatomic if needed
 - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script
 - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything.
 - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to.
 - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior
 - Fix ReductionOpTest.ReduceMean_*keepdims to allow for armv7 floating point inaccuracy

* Address PR comments
2020-10-09 22:34:32 +10:00
Du Li
323c4dfe02
Adding an option for cudnn conv algorithms. (#5159)
* adding cudnn conv algorithm selection options.

* adding cudnn conv algorithm selection options.

* export the api

* adding the perf test option.

* accomodating pr comments.

* Move OrtSessionOptionsAppendExecutionProvider_CUDA to onnxruntime_c_api.h

* Accomodating PR comments.
2020-10-05 16:53:52 -07:00
Sherlock
e71668f92c
Expose recompute configs to the frontend (#5318)
* Expose recompute configs to the frontend

* Add frontend test

* Ensure recompute graph transformer is only applied once

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-02 09:49:47 -07:00
Guoyu Wang
3a3f26f38e
Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276)
* Move fbs include from header to cc

* add initial cmake for flatbuffers

* Move most flatbuffers util to ort_flatbuffers

* move code around

* fix

* move test/perf runner to use flatbuffer directly instead of model

* minor update

* Fix build break

* Clean up includes and foward decl

* Fix traning CI build breaks

* Addressed PR comment, replaced some include with forward decls

* Remove ORT_MUST_USE_RESULT temporarily
2020-09-25 05:36:29 -07:00
Sherlock
b03fb82ab7
Transformer layer-wise Recompute (#4526)
* Build Recomputation Graph

* Make topological sort to run FW nodes first

* Pattern match start and end of transformer layer

* Topological sort with Priority

* Add logger to Gradient Graph Builder

* Use Logger

* Introduce Execution Order
2020-09-24 19:56:32 -07:00
Josh Bradley
4ed31ca214
Combine custom logger global threadpools (#4857)
* add custom logger and global threadpools to C and C++ API

* code cleanup and formatting

* reformat code

* tidy up some more code formatting

* remove comment

* fix API break from merging from master

* renamed API function to CreateEnvWithCustomLoggerAndGlobalThreadPools

* rename log variable and apply clang-format
2020-09-24 00:50:26 -07:00
Sherlock
038192bdb2
Place shape related compute nodes in CPU (#4940)
* Place shape related nodes in CPU
* visit candidates by topological order
* Make CPU node placement a utility function
* skip placing on CPU if the data typs is float16 or bfloat16
2020-09-21 17:10:39 -07:00
Pranav Sharma
974b9bfc09
Allow sharing of initializers between sessions. (#5092)
* Allow sharing of initializers between sessions.

* Allow sharing of initializers between sessions (2).

* Add test for C#

* Add test for C#; address PR comments

* Address PR comments
Moved AddInitializer logic to internal session options
Added tests for owned buffer
Clarified documentation
Fix bug where memory info and not device was getting compared

* Fix test

* Fix training build

* Add ver 5 end marker and ver 6 starter, add scenario and usage examples.
2020-09-21 14:09:37 -07:00
Pranav Sharma
d535894297
Add API to allow configuration of the global thread pools. (#5199) 2020-09-17 09:19:18 -07:00
Dmitri Smirnov
e6f85f338e
Refactor TensorAt, prepare for release (#5180)
* Refactor TensorAt
  locations* must be const and int64_t since our dims are int64_t
  Remove unnecessary copy of locations.
  Remove unnecesary casting and C-casting. Simplify implementation.
  Add a check for string type.
  Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it
  covers more ground and eliminate redundant copy.
  Eliminate inner loop, compute strides first.
2020-09-16 10:20:45 -07:00
Chun-Wei Chen
7f3aa3a163
Add GetStartTime() for profiler to get private profiling_start_time_ (#4994)
* add GetStartTime() for profiler

* add function in inference_session

* remove qualified name

* add the api in cxx_api.h

* rename starttime to StartTimeNs, expost profiling object

* rename GetProfilingStartTime

* move Ortapis to the right place

* move to the end

* add const for session

* const the right place

* use const auto instead of const auto* for session

* remove const for auto getstarttime

* remove const for auto getstarttime

add unit tests

* nit: update test name and add comments
2020-09-16 00:17:04 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Scott McKay
323a1ba8a4
Add option to exclude support for loading ORT format models in full build. (#5129)
* Add ability to exclude support for loading ORT format models.
Disable support for ORT format models in packages
2020-09-12 12:21:30 +10:00
Scott McKay
59ee8ffb17
Remove SparseTensor support from minimal build. (#5114)
* Remove SparseTensor support from minimal build.

Currently the only valid usage of a SparseTensor is as an attribute of a Constant node. That would have been lifted to a dense tensor initializer when loading the onnx model, so would not exist when saving the ORT format model. Due to that there can be no SparseTensors in an ORT format model.

Co-authored-by: gwang <wanggy@outlook.com>
2020-09-11 17:56:54 +10:00
Ryan Hill
3207de276c
Remove IDeviceAllocator class as it doesn't extend IAllocator in any way. (#5067) 2020-09-10 00:46:35 -07:00
Guoyu Wang
433061531e
Enable onnx_test_runner for ort format (#5100)
* Enable onnx_test_runner using ort format, for ort minimal build only

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-10 17:15:19 +10:00
Scott McKay
796ddeb2cb
Remove serialization of outer scope value info in ORT format model (#5077)
* Remove serialization of outer scope node arg info in ORT format model. We don't currently need it in a minimal build as only SessionState calls Graph::IsConstantInitializer and it doesn't search outer scope. If we do need it in the future the information can be calculated at runtime (small binary size cost to do so).

Motivation: ORT format model was 32% bigger for a BERT model with multiple levels of subgraph and a lot of nodes due to this. Size is about 5% larger of the original ONNX model with the change. ORT format has type/shape info for all nodes, and this model has 2000 nodes so this seems reasonable.

Added example code to dump ORT format model to json.

Fixed misc bug in python test script around handling float and non-float expected output.
2020-09-08 17:43:42 +10:00
Pranav Sharma
2c1410afe7
Remove usage of macros for constants in public header. (#5061)
* Remove usage of macros for constants

* Fix linkage issue
2020-09-05 01:27:20 -07:00
Vincent Wang
84de14a833
Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856)
* opset13 cuda kernels for BERT.

* add opset13 SoftmaxCrossEntropyLoss.

* opset13 size.

* fix argmax/min for ut.

* fix ut failure for argmax/min.

* OrtMemTypeCPUInput

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-05 08:09:52 +08:00
Scott McKay
b5c2932ae8
Last major set of ORT format model changes (#5056)
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
2020-09-05 07:59:01 +10:00
Du Li
6134994db9
Parallelizing elementwise kernels (#4577)
* Parallelizing unary elementarywise ops.

* Parallelizing binary elementwise ops.

* Accommodating PR comments.
2020-09-04 14:45:43 -07:00
Xiang Zhang
0dad79b495
Add SetLanguageProjection C Api and use it in four projections (#5023)
* Add SetLanguageProjection C Api and use it in four projections

* static cast enum languageprojection to uint32_t

* resolve comments

* fix typo and line added unintentionally

* revert unecessary change

* reorder c# api

* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
2020-09-04 14:26:39 -07:00
Ryan Hill
d792af776d
Remove Cuda dependency from TensorRT shared provider (#5014) 2020-09-04 11:35:02 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Tim Harris
bbb9d92a5f
Remove SchedulingParams variants of ThreadPool::TryParallelFor (#5050) 2020-09-03 09:04:31 -07:00
gwang-msft
64237d999c
Add Cmake config for onnxruntime_NO_EXCEPTIONS (#4975)
* additional noexception setting, added compile options

* more no exception changes

* addressed PR comments

* Fix build issue when MSVC static library is used.

* Clarify comment

* add fatal message for onnxruntime_NO_EXCEPTIONS enabled without onnxruntime_MINIMAL_BUILD

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2020-09-01 10:17:50 -07:00
Pranav Sharma
ad1701dfb1
Rename DeviceAllocatorRegistrationInfo to a more generic name; Use OrtArenaCfg for arena members; Remove unused OrtMemType; Simplify CreateAllocator interface. (#4970)
* Rename DeviceAllocatorRegistrationInfo to a more generic name; Remove OrtMemType; Simplify CreateAllocator interface.

* - fix builds
- fixed mixed aggregation + constructor calls (which were coded before this PR)
- changed default value of max_mem in API header
- added some validation of values for for arena_extend_strategy

* fix tensorrt and cuda tests
2020-09-01 09:25:32 -07:00
gwang-msft
7ca8388dc9
[ORT Mobile] file format schema and file I/O code (#4973)
* ort mobile file format schema and [de]serializing code
2020-09-01 11:51:31 +10:00
Ashwini Khade
8679a7244e
Enable rejecting models based on onnx opset (#4912)
* enable rejecting models based on onnx opset

* enable unreleased opsets in linux and mac CI

* test fixes and more updates

* enable unreleased opsets in CI builds

* enable released opsets in linux cis

* try fix windows ci yml

* yml fixes

* update yml

* yml updates post master merge

* review comments

* bug fix
2020-08-31 13:35:36 -07:00
gwang-msft
ea5732319e
Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894)
* init no exception changes

* initial test

* disable exceptions

* more throw handling

* minor update

* fix linux build break

* fix windows/nuphar build break

* address cr comments, move #ifdef to ORT_CATCH

* address cr comments, move #ifdef to ORT_CATCH

* handle return statement in ORT_CATCH

* linux build break fix

* addressed cr comments, remove ort_catch_end

* addressed cr comments, remove ort_catch_end

* move mlas to a separated ifdef flag

* merge master, move some new code in master to no_exc

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-08-28 23:03:51 -07:00
Scott McKay
08eb15068c
Exclude the Map types from the build if ML ops are disabled. (#4908)
* Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.
2020-08-27 17:48:12 +10:00
Dudeldu
3d63d8d4f1
Extend C++ API for Map/Sequence Type Info (#3517) (#4781)
* Extend C++ API for Map/Sequence Type Info (#3517)

Expose functionality to view type information about sequences/maps
to C++ API.

- Add functions
    - `TypeInfo::GetSequenceTypeInfo`
    - `SequenceTypeInfo::GetSequenceElementType`
    - `TypeInfo::GetMapTypeInfo`
    - `MapTypeInfo::GetMapValueType`
    - `MapTypeInfo::GetMapKeyType`
- Add structs
    - `SequenceTypeInfo`
    - `MapTypeInfo`

Co-authored-by: Dudeldu <mustermann.informatik@gmail.com>
Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com>

* Extend tests to cover new type info functionality for sequences and maps

 - two new test case in test_nontensor_types for maps and sequences

Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com>
2020-08-25 12:03:23 -07:00
Scott McKay
14c691030f
Fix build break from removing custom ORT onnx protobuf (#4904)
Exclude parsing of json config in model (also excludes json parsing library)
2020-08-25 18:10:42 +10:00
Changming Sun
26546f81fe
Remove the private ONNX protobuf definition file (#4878) 2020-08-24 12:40:33 -07:00
Scott McKay
728e886bba
Add kernel def hash logic for minimal build (#4891)
* Add hash based lookup of kernels
2020-08-23 14:39:07 +10:00
Scott McKay
db7669b225
Reduce ONNX dependency in minimal build (#4890)
* Next round of changes.

Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
2020-08-23 07:02:13 +10:00
Pranav Sharma
29dcfb24ab
Allow multiple sessions to share an allocator, optimize constant folding memory usage, expose arena configs. (#4813)
* Add support for sharing allocators

* Incremental update

* Address some PR comments, add unit tests, add documentation.

* Address PR comments, add tests and some documentation.

* Fix build and test issues

* Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this.
Also fixed the orttraining segfault. The segfault was because in the case of training session,
the CPU exec prov is not available at the time the transformers are applied. Changed it to create
a new one.
2020-08-22 10:03:17 -07:00
RandySheriffH
3fa73a5b6a
ReduceBinarySize (#4747)
* cancel night build on pyop

* add rewriter to rewrite cpu provider

* skip BuildKernelCreateInfo<void>

* refactor variable name and comment

* include ops from csv file

* process multiple eps

* add default function to cuda provider

* rename function and add license header

* fix import

* add doc

* fix typo

* deal with empty kernel entry in cuda

* rename the rewriter file

* add comment into provider file

* add comment and rename function

* log warnings

* refactor extracting logic

* add entry for script to run solo

* add better example

* avoid onnx importing

* fix flake8 alerts

* minor fixes to better comments and doc

* add entries for all domains

* add void entry into contrib providers

* format cuda_contrib_kernels.cc

* format cpu_contrib_kernels.cc

* add all providers

* add default entry to all providers

* include op_kernel header

* cancelling change in providers beyond cpu/cuda

* rename file and switch file format to domain;opset;op1,op2...

* update doc

* restore non-regular ending grammar in cuda_contrib_kernels.cc

* add ort_root as input argument of script

* enable test in ci

* update doc

* update doc

* revert change on linux gnu ci

* switch to set to host ops

* simplify trimming logic

* add domain map to track current model

* allow ort_root to take relative path
2020-08-21 19:50:13 -07:00
Scott McKay
e00ad83f2b
Initial changes to disable code in a minimal build (#4872)
* Initial set of changes to start disabling code in the minimal build. Breaking changes into multiple PRs so they're more easily reviewed. Focus on InferenceSession, Model and Graph here. SessionState will be next.
Needs to be integrated with de/serialization code before being testable so changes are all off by default.

Changes are limited to
  - #ifdef'ing out code
  - moving some things around so there are fewer #ifdef statements
  - moving definition of some one-line methods into the header so we don't need to #ifdef out in a .cc as well
  - exclude some things in the cmake setup

* Update session state and a few other places.

The core code builds if ORT_MINIMAL_BUILD is specified.
2020-08-22 07:14:53 +10:00
Scott McKay
ef19916d07
Add Node::SinceVersion() (#4874)
* Add Node::SinceVersion() so that the value is known when loading a graph from the ORT format (OpSchema is not available).

* Fix build warning from returning 'const int'
2020-08-21 16:48:52 +10:00