onnxruntime/cmake/external
luoyu-intel 459c750b03
Update x64 template kernel library for 'sqnbitgemm' (#19016)
### Description
<!-- Describe your changes. -->
1. Make JBLAS codes an external module of ORT.
2. Move q4 gemm code to contrib_ops.
3. Update template kernel library to v0.1 release.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We found that the current LLM model performance is far below our
expectations. Here is some performance data collected on Mistral-7B
model with Xeon-8480:
8 threads | prompt length=32 past_len=32 | prompt length=1   past_len=32
-- | -- | --
ORT-main | 1220ms | 263ms
Neural-speed | 564ms | 87ms
ORT-this PR|597ms|120ms

Although `Neural-speed` and `ORT-this PR` use the same int4 kernel code,
there is a 33ms(87ms vs. 120ms) latency gap between the two frameworks.
Through some statistics analysis, the summary latency of `MatMulNBits`
is 86.7ms
The summary latency of all int4 GEMMs in `Neural-speed` is 84.8ms. So
other OPs introduce an extra 30ms latency.

The performance of MatMulNBits in this PR meets our expectations.

### Remain Issues
1. For hybrid CPUs, like core 12900K, the ONNXRuntime thread pool uses
TaskGranularityFactor to scale its number of threads. This is not
expected in our code design. It may slow down the hybrid CPU performance
by 30~40%.
2. Prepack uses a single thread which is very slow to init a session.
3. MatMulNBits with zero points will fall through to COMP_FP32 even
accuracy_level=4. Our COMP_INT8 IGemmCore with zero points process is
not optimized for now. It will be updated in the future. So, for an int4
model with zero points, whether the accuracy_level is 0 or 4 will be no
difference.
2024-01-18 13:16:34 -08:00
..
emsdk@4e2496141e update to emsdk-3.1.51 (#18844) 2024-01-12 16:04:33 -08:00
git.Win32.2.41.03.patch Fix ability to use patch on Windows CI machines (#18356) 2023-11-11 07:32:14 +10:00
libprotobuf-mutator@7a2ed51a6b
onnx@b86cc54efc use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177) 2023-10-31 14:58:21 -07:00
abseil-cpp.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
abseil-cpp.natvis Create edges with arg positons correctly accounting for non-existing args (#18462) 2023-11-20 14:49:09 -08:00
composable_kernel.cmake [ROCm] Update CK version (#17628) 2023-11-13 15:43:38 -08:00
cutlass.cmake Fix build when flash attention and memory efficient attention are disabled (#18761) 2023-12-26 08:57:58 +08:00
dml.cmake Update DirectML nuget version to 1.13.1 (#19122) 2024-01-15 19:04:41 -08:00
dnnl.cmake [DNNL] add Arm Compute Library (ACL) backend for dnnl execution provider (#15847) 2023-12-01 09:16:44 -08:00
eigen.cmake Fix ability to use patch on Windows CI machines (#18356) 2023-11-11 07:32:14 +10:00
extensions.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
find_snpe.cmake
FindNumPy.cmake
helper_functions.cmake Improve cache hit rate in windows build (#15538) 2023-04-18 09:31:35 -07:00
ipp-crypto.cmake
mimalloc.cmake
neural_speed.cmake Update x64 template kernel library for 'sqnbitgemm' (#19016) 2024-01-18 13:16:34 -08:00
onnx_minimal.cmake Fix some build issues on MacOS with Xcode 14.3. (#15878) 2023-06-07 12:07:11 -07:00
onnx_protobuf.natvis Fix visualization issues with Attribute/Tensor protos (#17188) 2023-08-16 13:56:51 -07:00
onnxruntime_external_deps.cmake Revert "iOS packaging pipeline stability" (#19135) 2024-01-16 09:18:35 -08:00
protobuf_function.cmake Fix some build issues on MacOS with Xcode 14.3. (#15878) 2023-06-07 12:07:11 -07:00
pybind11.cmake
pyxir.cmake
tvm.cmake [TVM EP] Support zero copying TVM EP output tensor to ONNX Runtime output tensor (#12593) 2023-02-08 10:02:20 -08:00
wil.cmake Rework WIL dependency retrieval/usage (#17130) 2023-08-15 09:11:46 -07:00
xnnpack.cmake Update XNNPACK to latest version (#18038) 2023-11-03 09:04:28 -07:00