onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

Yufeng Li ceeb1a65d6 Add quantization support of GEMM directly with QGemm (#8447 ) QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision. Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute. The formula for quantized GEMM is: Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which, C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)		2021-07-27 21:21:49 -07:00
..
execution_providers/images	Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225 )	2021-02-05 18:09:27 -08:00
images	Expand the documentation on using compiling EPs with a minimal build (#5893 )	2020-12-02 09:12:36 +10:00
python	Improves documentation, show InferenceSession contructor attributes (#8494 )	2021-07-26 15:58:47 +02:00
ABI_Dev_Notes.md
Android_testing.md	Removed BUILD.md from master as source now lives in gh-pages (#6709 )	2021-02-19 11:34:21 -08:00
C_API_Guidelines.md	Add C API Guidelines document (#5686 )	2020-11-04 18:50:31 -08:00
cmake_guideline.md
Coding_Conventions_and_Standards.md	Change onnxruntime::make_unique to std::make_unique (#7502 )	2021-04-29 17:04:53 -07:00
ContribOperators.md	Add quantization support of GEMM directly with QGemm (#8447 )	2021-07-27 21:21:49 -07:00
FAQ.md
How_To_Update_ONNX_Dev_Notes.md	CGManifest - add training entries and generate entries for submodules. (#3933 )	2020-05-15 13:34:18 -07:00
Model_Test.md
NotesOnThreading.md	Support multi-loop parallel sections, use multi-loop sections in GRU (#5602 )	2020-11-10 12:24:57 +00:00
ONNX_Runtime_Server_Usage.md	Update docs/ONNX_Runtime_Server_Usage.md (#7818 )	2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Update submodule onnxruntime-extensions. (#8282 )	2021-07-13 10:21:11 +08:00
OperatorKernels.md	Add quantization support of GEMM directly with QGemm (#8447 )	2021-07-27 21:21:49 -07:00
ORTMobilePackageOperatorTypeSupport.md	Add supported operators/types documentation for the ORT Mobile package (#7807 )	2021-05-26 15:57:40 +10:00
PR_Guidelines.md
Privacy.md	[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481 )	2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md	Support required types when excluding typed registrations (#6871 )	2021-03-08 08:22:07 -08:00
ReleaseManagement.md	Updated TPN for OpenMPI and cleanup (#3932 )	2020-05-14 11:42:44 -07:00
Roadmap.md	Doc updates for 1.5 (#5302 )	2020-09-30 09:53:33 -07:00
Server.md	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
Versioning.md	bumping onnxruntime version to 1.8.1 (#8429 )	2021-07-19 16:48:56 -07:00
WinML_principles.md	Winml_principles_change (#5727 )	2020-11-12 10:39:24 -08:00