onnxruntime/onnxruntime
Patrice Vignola 49512e558a
[DML EP] Add I/O binding and If operator (#16859)
Being able to leverage I/O binding for DML and registering `If` for the
DML EP allows us to avoid copying the past/present key/values back and
forth between the CPU and the GPU after every token.

This gives us a 25% performance increase for Dolly V2 with 128 tokens on
an RTX 4090.
2023-07-31 19:45:59 -07:00
..
contrib_ops [JS/Web] Added Gelu contrib operator support to JSEP (#16909) 2023-07-31 09:18:58 -07:00
core [DML EP] Add I/O binding and If operator (#16859) 2023-07-31 19:45:59 -07:00
python [DML EP] Add I/O binding and If operator (#16859) 2023-07-31 19:45:59 -07:00
test Extend saving models optimized by inference session (#16912) 2023-07-31 16:39:35 -07:00
tool/etw
wasm [js/web] enable ONNX Runtime Web error messages in JS (#16335) 2023-06-15 09:45:41 -07:00
__init__.py ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833) 2023-06-19 17:44:45 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings