Commit graph

9 commits

Author SHA1 Message Date
Justin Chu
c203d89958
Update ruff and clang-format versions (#21479)
ruff -> 0.5.4
clang-format -> 18
2024-07-24 11:50:11 -07:00
mindest
5b9369e93c
Fix typos according to reviewdog report. (#21335)
### Description
Fix typos based on reviewdog report but with some
exceptions/corrections.
2024-07-22 13:37:32 -07:00
Jian Chen
1ad2c0a4b2
fix Window_CI in Github Action (#21070)
### Description
fix Window_CI in Github Action
2024-06-18 23:14:08 -07:00
Dmitri Smirnov
08ecf30e0b
Implement numpy array over CPU OrtValues on return values (#20539)
### Description
Create numpy arrays based on the native buffers of returned OrtValues.
Hold on to the OrtValue until the numpy array is garbage collected.

### Motivation and Context
This saves cpu on tensor copies and addresses customer concerns.
2024-05-08 10:56:36 -07:00
Tang, Cheng
a81faee41e
Multi-stream execution support (#13495)
**Description**: This PR including following works:
1. provide stream and related synchronization abstractions in
onnxruntime.
2. enhance onnxruntime's execution planner / executor / memory arena to
support execute multiple streams in parallel.
3. deprecate the parallel executor for cpu.
4. deprecate the Fence mechanism. 
5. update the cuda / tensorrt EP to support the stream mechanism,
support running different request in different cuda stream.

**Motivation and Context**
- Why is this change required? 
currently, the execution plan is just a linear list of those primitives,
ort will execute them step by step. For any given graph, ORT will
serialize it to a fixed execution order. This sequential execution
design simplifies most scenarios, but it has the following limitations:
1. it is difficult to enable inter-node parallelization, we have a
half-baked parallel executor but it is very difficult to make it work
with GPU.
2. The fence mechanism can work with single gpu stream + cpu thread
case, but when extend to multiple stream, it is difficult to manage the
cross GPU stream synchronizations.
3. our cuda EP rely on the BFCArena to make the memory management work
with the GPU async kernels, but current BFCArena is not aware of the
streams, so it doesn't behavior correctly when run with multiple
streams.

This PR enhance our existing execution plan and executor to support
multiple stream execution. we use an unified algorithm to mange both
single stream and multiple stream scenarios.
This PR mainly focus on the infrastructure support for multiple stream
execution, that is said, given a valid stream assignment, onnxruntime
can execute it correctly. How to generate a good stream assignment for a
given model will be in the future PR.

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Lei Cao <leca@microsoft.com>
2022-12-15 07:39:29 -08:00
Xavier Dupré
b88fb68fac
Adds missing numpy type when looking for the ort correspondance (#10943) 2022-03-22 14:44:48 -07:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Scott McKay
912e50f61c
Add CI minimal build with all options disabled. Fix python binding code if sparse tensors are disabled. (#9898)
* Add 2 builds to validate the cmake defines for excluding optional components work in both full and minimal builds.

* Create empty config for no-ops build

* Create empty config for no-ops build - attempt #2

* Create empty config for no-ops build - attempt #3

* Update python binding code to work when sparse tensors are disabled.
2021-12-03 06:56:51 +10:00
Dmitri Smirnov
950fe5e28b
Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038)
SparseTensor support
  Implement Builder pattern
  Fix support for 1-D and 2-D COO indices
  Implement and test CSR support.
  Handle shape inference for SparseTensors
  Implement conversion for COO, CSR and tests.
  Address the case where constant sparse initializer is the output.
  Implement test infra for SparseTensors
  Implement SparseDenseMatMul for Csr and COO and tested it.
  Add hash for SparseToDenseMatMul
  Finish shared provider refactor
  Refactor GetOrCreate to Create
  Working on py interface
  Expose OrtDevice and use it in allocate_numpy
	Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
	Add and test to_cuda()
	Add accessors to format specific indices
	Test values and indices views, read-only flag, after GC access
	Add sparse related methods to OrtValue
	Re-work SparseTensor wrapper, add OrtValue methods
	Rework numpy_array_to_cuda/to_cpu
	Add run_with_ort_values
	Add models and test sparse_mat_mul with run_with_ort_values
	Refactor sparse tensor to use a single buffer
        Ifdef x86 Eigen CSR sparse matmul implementation
        Exclude broken test, check for string type when copying cross device
       Split pybind schema, regenerate docs, add exclusion
       Conditionally exclude schema module
       Update docs fix cuda build
       Add test to a filter and renerate JS docs
      Add conversion and test string support for sparse tensors
      Exclude conversion utils from minimal build
      Add CUDA Memcpy and adjust provider interfaces
2021-07-22 15:24:36 -07:00