onnxruntime/tools
Hubert Lu dbcf54aa41
Add hipified SkipLayerNorm code for ROCmEP (#12107)
* First attempt for half2 vectorized memory access in SkipLayerNorm

* Add some functions for debugging

* Clean up the code

* Clean up the code

* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp

* Add a unit test for a larger input size

* Fix some Lint C++ warnings

* Use ILP = 4 for the vectorized kernels

* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm

* Use conditional operator for input_v

* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel

* Clean some comments and rename the layernorm function

* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel

* Resolve a Lint C++ warning

* Fix SkipLayerNormBatch1_Float16_vec output data

* Add hipified code of bert SkipLayerNorm for ROCmEP

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve some Lint C++ warnings

* Resolve Python formatting issue
2022-07-06 22:13:11 -07:00
..
android_custom_build Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
ci_build Add hipified SkipLayerNorm code for ROCmEP (#12107) 2022-07-06 22:13:11 -07:00
doc Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
natvis Refactor transformers and other code to reduce memory allocation calls (#10523) 2022-02-24 16:17:14 -08:00
nuget DML EP Update to DML 1.9 (#12090) 2022-07-05 16:30:54 -07:00
perf_view fix json format (#11046) 2022-03-30 16:15:33 -07:00
python Set black's target version (#11370) 2022-04-27 14:52:19 -07:00