onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-21 21:52:11 +00:00

History

Chen Fu 040c2f4517 x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.		2022-07-13 10:12:25 -07:00
..
environment.h	Revert "Call pluggable EP's shutdown function in Environment::~Environment() (#11120 )" (#11393 )	2022-05-02 14:38:31 -07:00
experimental_onnxruntime_cxx_api.h
experimental_onnxruntime_cxx_inline.h	Deprecate APIs returning raw ptrs and provide replacements (#11922 )	2022-06-24 09:50:04 -07:00
onnxruntime_c_api.h	Generalize native op creation (#11539 )	2022-06-27 21:12:15 -07:00
onnxruntime_cxx_api.h	Generalize native op creation (#11539 )	2022-06-27 21:12:15 -07:00
onnxruntime_cxx_inline.h	Generalize native op creation (#11539 )	2022-06-27 21:12:15 -07:00
onnxruntime_run_options_config_keys.h	Add ability for memory arenas to "shrink" periodically (#7284 )	2021-05-08 07:53:21 -07:00
onnxruntime_session_options_config_keys.h	x86/64 U8S8 Gemm Precision Fix (#12088 )	2022-07-13 10:12:25 -07:00
snippets.dox	[C API Docs] Add docs for run options tag/log level accessors/modifiers. (#9045 )	2021-09-14 08:53:35 -07:00