mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-16 21:00:14 +00:00
Adding ARM64 depthwise convolution kernel for symmetric quantization Motivation and Context Two improvements against current kernel code : 1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication. 2. Unrolled loop with manual software pipelining Co-authored-by: Chen Fu <fuchen@microsoft.com> |
||
|---|---|---|
| .. | ||
| docker | ||
| ort_minimal | ||
| build_yocto.sh | ||
| copy_strip_binary.sh | ||
| create_package.sh | ||
| extract_and_bundle_gpu_package.sh | ||
| java_copy_strip_binary.sh | ||
| java_linux_final_test.sh | ||
| run_build.sh | ||
| run_dockerbuild.sh | ||
| test_custom_ops_pytorch_export.sh | ||
| upload_code_coverage_data.sh | ||
| upload_ortsrv_binaries.sh | ||
| yocto_build_toolchain.sh | ||