* schema change
* cc channges
* remove temp debug code
* Adding fbs namespace to session_state_flatbuffers_utils.h
* Add fbs namepsace to all ort format utils
* Construct valid graphs for ONNX checker for IR version < 4.
Previously the constructed graph was not guaranteed to have its
initializers be a subset of its inputs, which is required for IR
version < 4. This resulted in spurious failures.
Fixes#9663
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101.
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
* Remove unused NodeArgs
* Handle case where a node arg from an initializer from initializer_names_to_preserve
* Fix CI failure
* update test
* Fix outer scope node args failure
* Use NodeArg* as the key of the std::set instead of string
* Minor updates
* implement cuda provider
* define profiler common
* call start after register
* add memcpy event
* add cuda correlation
* format code
* add cupti to test path
* switch to CUpti_ActivityKernel3
* reset cupti path
* fix test case
* fix trt pipeline
* add namespace
* format code
* exclude training from testing
* remove mutex
Add documentation for these C API functions:
RunOptionsGetRunLogSeverityLevel
RunOptionsGetRunLogVerbosityLevel
RunOptionsGetRunTag
RunOptionsSetRunLogSeverityLevel
RunOptionsSetRunLogVerbosityLevel
RunOptionsSetRunTag
Update some existing documentation.
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* draft - enable model local function
* enable model local functions in ORT
* update to latest rel onnx commit
* plus tests
* plus more updates
* plus updates
* test updates
* Fix for nested functions + shape inference
* plus bug fix and updates per review
* plus fixes per review
* plus test updates
* plus updates per review
* plus fixes
* fix a test
* support register external ep lib inforation; make eager mode share the same ep pools with training workloads
* fix inference code
* fix build break
* fix the message
* seperate the training python module; share the execution proivder instance
* fix build break
* fix cuda test crash; reorg the python module code base
* se correct env
* use provider customized hash func
* fixbuild break
* fix rocm break
* use const ref in argument
* rename the file
* move hash func to trainiing module
* Revert "Cleanup C# bindings to add EP (#8810)"
This reverts commit b21ea00020.
* Add back in a minimal set of changes.
Provide stubs in for a limited set of things
- things called from C# using a static lib of ORT built for mac/ios
- things in OrtApis that are not included in the build by default
- things in OrtApis that are excluded in a minimal build
* Cleanup order or EPs in test
* Fix unused function in ROCM build
Add cmake parameter and #ifdefs to allow for disabling sparse tensor support. This comes with a significant binary size cost so we want to be able to exclude it in a minimal build.
Fix C# add EP bindings.
Add stubs to ORT so that if EP is not included in the build we return a graceful error message.
Move declaration of stubs into C API and out for EP so they're in one place and are easier to use (no extra header required in the C/C++ world and consistent with the CUDA EP setup).
Fix inconsistency in ROCM EP.
Cleanup a few other things.
Add IsSparseTensor
Add CreateSparseTensor
Add utilities and test fully sparse instantiation
Fully sparse blocksparse
Add test and docs for fully sparse tensor instantiation
Rework creation API
Use API
Non string API
Retrofit of existing String API
Add tests
Add documentation
Address build issues (Winml pending)
Add inference test
Bump binary size
Add ifdef DISABLE CONTRIB
* Do not copy the model_data when session is started by CreateSessionFromArray
* Add config option for disabling copy model bytes
* Add one additional test
* Address CR comments
SparseTensor support
Implement Builder pattern
Fix support for 1-D and 2-D COO indices
Implement and test CSR support.
Handle shape inference for SparseTensors
Implement conversion for COO, CSR and tests.
Address the case where constant sparse initializer is the output.
Implement test infra for SparseTensors
Implement SparseDenseMatMul for Csr and COO and tested it.
Add hash for SparseToDenseMatMul
Finish shared provider refactor
Refactor GetOrCreate to Create
Working on py interface
Expose OrtDevice and use it in allocate_numpy
Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
Add and test to_cuda()
Add accessors to format specific indices
Test values and indices views, read-only flag, after GC access
Add sparse related methods to OrtValue
Re-work SparseTensor wrapper, add OrtValue methods
Rework numpy_array_to_cuda/to_cpu
Add run_with_ort_values
Add models and test sparse_mat_mul with run_with_ort_values
Refactor sparse tensor to use a single buffer
Ifdef x86 Eigen CSR sparse matmul implementation
Exclude broken test, check for string type when copying cross device
Split pybind schema, regenerate docs, add exclusion
Conditionally exclude schema module
Update docs fix cuda build
Add test to a filter and renerate JS docs
Add conversion and test string support for sparse tensors
Exclude conversion utils from minimal build
Add CUDA Memcpy and adjust provider interfaces