onnxruntime/docs
Tianlei Wu eb2ac72fa9
Stable Diffusion CUDA Optimizations Part 4 (#14680)
(1) Support packed QKV format in MultiHeadAttention. This format could
avoid add bias transpose when TRT fused kernel is used.
(2) Add cache for cumulated sequence length computation. For SD, it only
need computed once since sequence length is fixed.
(3) Do not allocate qkv workspace to save memory for packed KV or QKV.
(4) Add unit tests for packed kv and packed qkv format in
MultiHeadAttention
(5) Mark some fusion options for SD only

Performance tests show slight improvement in T4. Average latency reduced
0.15 seconds (from 5.25s to 5.10s) for 512x512 in 50 steps for SD 1.5
models. Memory usage drops from 5.1GB to 4.8GB.
2023-02-15 14:55:42 -08:00
..
c_cxx Upgrade doxygen to fix C API docs build issue (#13950) 2023-02-03 09:43:29 -08:00
execution_providers/images
images
python Bump ORT version number (#14226) 2023-01-26 12:33:47 -08:00
ABI_Dev_Notes.md skip windows GPU check if changes only in doc (#13248) 2022-10-11 13:51:44 +08:00
Android_testing.md
C_API_Guidelines.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
cmake_guideline.md fix some typo in docs (#13212) 2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md Fixed a minor typo (#13194) 2022-10-05 12:10:14 -07:00
ContribOperators.md Stable Diffusion CUDA Optimizations Part 4 (#14680) 2023-02-15 14:55:42 -08:00
FAQ.md Fix typo enviroment => environment (#13195) 2022-10-03 17:02:26 -07:00
How_To_Update_ONNX_Dev_Notes.md Remove exclusions for ONNX model tests that now pass. (#14337) 2023-01-24 08:04:27 +10:00
Memory_Optimizer.md Add guidelines for ORTModule (#13553) 2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
OperatorKernels.md Stable Diffusion CUDA Optimizations Part 3 (#14646) 2023-02-14 12:46:50 -08:00
ORT_Format_Update_in_1.13.md Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413) 2023-01-25 08:23:12 -08:00
ORTMobilePackageOperatorTypeSupport.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ORTModule_Training_Guidelines.md Refactor training build options (#13964) 2023-01-03 13:28:16 -08:00
PR_Guidelines.md Add guidelines for writing a good PR. (#3830) 2020-05-05 16:28:21 -07:00
Privacy.md [C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
ReleaseManagement.md
Roadmap.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Server.md
TVM_EP.md [C#][TVM EP] Fix issues related to using TVM EP in C# front-end (#12958) 2022-09-16 16:04:59 +02:00
Versioning.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
WinML_principles.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00