Various updates to the int8_t GEMMs:
1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before.
2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size.
3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.
* Update EyeLike CPU kernel.
* Update Mod CPU kernel.
* Update Multinomial CPU kernel.
* Slight improvement to Pad CPU kernel binary size.
* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.
* Added code for Relugrad with GPU support.
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Add GPU support for DNNL ConvGrad
Signed-off-by: George Nash <george.nash@intel.com>
* Add GPU support for DNNL MaxPoolGrad
Updates to MaxPool for training with GPU
Update oneDNN to version 1.8.1
Signed-off-by: George Nash <george.nash@intel.com>
* Fixed issues found durring code review
- error in code comment
- using auto when the direct type would have been better
- removed ternary operators that were returning bool values
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Cleaning up some naming in the op kernel type control infrastructure.
"Supported types" was a bit semantically overloaded. Renamed it to "default types". They are the types that are supported by default.
Implement an alternate workaround for the LLVM x86 problem described in PR #5088. That change made the x86 assembly files build with the GNU assembler by using -fno-integrated-as
Implemented following change to avoid the error when using both --use_external_data_form and --precision int8 with GPT2LMHeadModel, which results in
line 161, in save_external_data; open(external_data_file_path, 'ab').close()
FileNotFoundError: [Errno 2] No such file or directory:
This may also be related to the identified bug #6047.
* add config allow_spinning
* add config allow_spinning
* set true as default
* split configures for inter and intra ops
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Change msbuild condition for UAP
* update .netcore target as well
* create nuget packages with _native path
* validate path under _native directory for windowsai package
* pep8
* add diagnostic error message
* pep8
* use baseame
* lib\uap10.0
* uap10
* build\\uap10.0
* Manually binplace winmds into appx when PackageReference is used.
* always binplace winmd regardless of packagereference since c# should work with packages.config also
* resolve all paths to full paths to avoid some reference warnings
* move winmds out of lib folder to prevent automatic component registration
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
In the previous shared providers there aren't many OpKernel classes, and the existing Provider_OpKernel wrapper was fine. With the opposibility of making Cuda a shared provider, having this need to be changed per OpKernel adds a lot of complexity.
It was fairly straightforward to make OpKernel work with shared providers with minimal changes.
In this change, the ONNX_OPERATOR_* macros can also be shared with the shared providers.
Adds support for required types to the op kernel type control infrastructure. Required types are always enabled.
Added int64 as a required type for certain ops.
* update Attention operator spec to support pruned model
* update Attention and QAttention cpu & cuda kernel
* Fix invalid embed layer norm fusion test models.
* Handle case where bias_name is already quantized
If bias is shared between multiple nodes and we've already quantized it, just return the quantized name from the map
* Remove qType attribute from QuantizedValue and QuantizedInitializer
These are unused (and were incorrectly set in the case of int8 quantization)
* Add Reshape op to quantizer
* Add test for Reshape quant
* Fixed issue in python cmake to update wheel package
* Fixes python cmake issue for OV EP
Added post build step for libonnxruntime_providers_openvino
that copies the updated libonnxruntime_providers_openvino.so file
to /onnxruntime/capi directory every time this target is rebuilt.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Removed post_build step from onnxruntime_python.cmake
Now that we have added the post build step to copy
onnxruntime_providers_openvino.so and providers_shared.so
to /onnxruntime/capi directory in onnxruntime_providers.cmake file.
so removing the duplication of the same from here.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed python cmake issue for OpenVINO-EP
->Fixed issue for both Linux and windows
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>