mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
### Description 1. Added CUDA EP support for blocked quantization in QuantizeLinear and DequantizeLinear ops. 2. Currently CUDA EP blocked quantization only supports int4/uint4 quantized types and float32/float16 unquantized types. 3. Added CUDA EP support in QDQ selector/action transformer. CUDA EP is only added to DQ + MatMul -> MatMulNBits rule. Other rules' EP support are not changed. ### Motivation and Context ONNX opset 21 introduced blocked quantization for Q/DQ opts. ORT originally only supports CPU EP blocked quantization. |
||
|---|---|---|
| .. | ||
| c_cxx | ||
| execution_providers/images | ||
| images | ||
| python | ||
| ABI_Dev_Notes.md | ||
| Android_testing.md | ||
| C_API_Guidelines.md | ||
| cmake_guideline.md | ||
| Coding_Conventions_and_Standards.md | ||
| ContribOperators.md | ||
| FAQ.md | ||
| How_To_Update_ONNX_Dev_Notes.md | ||
| Memory_Optimizer.md | ||
| Model_Test.md | ||
| NotesOnThreading.md | ||
| ONNX_Runtime_Server_Usage.md | ||
| onnxruntime_dependencies.dot | ||
| onnxruntime_dependencies.png | ||
| onnxruntime_extensions.md | ||
| OperatorKernels.md | ||
| ORT_Format_Update_in_1.13.md | ||
| ORT_Use_Triton_Kernel.md | ||
| ORTModule_Convergence_Notes.md | ||
| ORTModule_ModuleWithLoss_Wrapper.md | ||
| ORTModule_PythonOp_Notes.md | ||
| ORTModule_Training_Guidelines.md | ||
| PR_Guidelines.md | ||
| Privacy.md | ||
| Reduced_Operator_Kernel_build.md | ||
| ReleaseManagement.md | ||
| Roadmap.md | ||
| Server.md | ||
| TVM_EP.md | ||
| Versioning.md | ||
| WinML_principles.md | ||