Rework TensorSeq in a manner consistent with Tensor and SparseTensor
in terms of type system setup.
Reduce templating. Introduce helpers to ensure the same
data type.
Make OrtValue __dtor not virtual.
Introduce ContainerChecker
* Disable optimizers for operator unit tests as they're intended to test the operator directly rather than something that could have been modified by an optimizer.
Disable TensorRT for Scan9 unit tests that fails when optimizers are enabled. Bug 525222 tracks that.
* Disable TRT for the lenient shape inferencing test as it uses Unsqueeze and TRT doesn't cope with that op.
* work on slice elimination for opset 10
* more work on slice elimination
* first working version
* adding python notebook for building models; fixing test
* fixing build error in macOS
* Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity.
* Fix cuda build
* Modify the test slightly
* Make C and C# APIs consistent with Python.
* Mention OrtCreateSessionFromArray in C API doc
* fix seq of tensors
* changes on 9/30
* All tests passing
* Add SequenceAt op
* Fix shared_lib non_tensor_types test
* Address some PR comments
* Address PR comments
* Add support in python bindings to accept seq(tensor)
* Change data type from vector<Tensor> to TensorSeq
* Change data type from vector<Tensor> to TensorSeq
* Added some documentation
* Added missing test model
* Fix Linux build
* Fix Mac build
* Fix Mac build
Remove gsl subodule and replace with a local copy of gsl-lite
Refactor for onnxruntime::make_unique
gsl::span size and index are now size_t
Remove lambda auto argument type detection.
Remove constexpr from fail_fast in gsl due to Linux not being happy.
Comment out std::stream support due to MacOS std lib broken.
Move make_unique into include/core/common so it is accessible for server builds.
Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
due to x86 build.
Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
Description: Refine threading control options and move inter op thread pool to session state.
Added thread_utils.h/cc to centralize the decision around the thread pool size under various conditions.
Motivation and Context
Currently the thread pool size of the parallel executor is hardcoded to 32 for some reason. This PR makes the options to configure the thread pool sizes clearer.
* Implement Nuphar execution provider
Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.
* Fix submodules
* Fix TVM submodule
* Update Nuphar to latest and resolve confliction
* Remove stale files caused by merge -X theirs
* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test
* Fix bad merge
* Merge from Nuphar
* Fix warning treated as error, revert some unnecessary changes
* Revert some more test changes
* Some more test revert or comments to make review easier
New tests could be added later
* One more revert of unnecessary changes
* More change revert. Test could be added back later.
1.Let mlas use session thread pool
2.Remove onnxruntime_USE_MLAS cmake option
3. Remove the win32 thread pool code inside mlas
mlas will:
1.use ort thread pool if it get passed in
2.use openmp if the threadpool parameter is nullptr
3.run single threaded if the threadpool parameter is nullptr and openmp is disabled.
* remove memory copy between CUDA and TRT
* add info to RegisterExecutionProvider input
* use new IDeviceAllocator for trt allocator
* remove SetDefaultInputsMemoryType from TRT EP
* remove onnx-tensorrt 5.0
* add submodule onnx-tensorrt branch 5.1
* remove redundancy
* Update transformer_memcpy.cc
* Update tensorrt_execution_provider.cc
* switch to TensorRT 5.1.5.0
* update python binding
* disable failed test case on TensorRT
* Update activation_op_test.cc
* upgrade to TensorRT container 19.06
* update according to feedback
* add comments
* remove tensorrt allocator and use cuda(gpu) allocator
* update onnx-tensorrt submodule
* change ci build cuda directory name
* If there is an outer scope value that matches a subgraph input, don't create an implicit input from the outer scope value.
Minor unrelated change for issue noticed while debugging: Use unordered_set for implicit inputs so we don't add them multiple times.
* Add unit test based on onnx issue.
* Initial commit
* Add test case
* Revert unintentional change
* Update comments
* Resolve PR feedback
* Craft test casse and add more logs
* Fix build failures
* Fix minor bug in the way modified is updated
* Remove full model inference session test
* Resolve PR comments
* Resolve more PR feedback
* Resolve more PR feedback
* Resolve more PR comments
* Remove logging
* Move GetInitializer() method to memcpy_transformer scope
* Remove some unnecessary blank lines
* Make GetInitializer static
* Now that we check for a constant initializer in an ancestor graph we also need to be able to retrieve and replace that initializer.
Add helpers to do so.
Update optimizers to use the new helpers.
Fix bug in UnsqueezeElimination where it wasn't checking if the initializer it was replacing was constant.
Description:
Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version.
Separate constant and overridable initializers so that it's clear which ones constant folding can utilize.
Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4.
Motivation and Context
In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent.
The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.