onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-20 19:12:24 +00:00

History

Hubert Lu f4ba199bad Optimize FastGelu with float2 and float4 vectorized kernels on ROCm (#11491 ) * Using vectorized loads (float2) for fp16 to improve performance * Fix a few warnings from cpplint * Fix a few warnings from cpplint * Use __float2half2_rn and fix some cpplint warnings * Move some computaions to LaunchFastGeluKernel * Fix some Lint C++ warning * Using vectorized loads (float4) for fp16 to improve performance * Switch whether to optimize FastGelu with float4 vectorization * Switch to float4 memory access based on input_length in FastGelu * Comment how to set the threshold of float2 and float4 vectorized kernels * Add FastGelu fp16 unit tests for bias_length = 2 and 8 * Make vectorized kernels generic with aligned_vector * Unify the vectorized kernels with/without bias * Refactor the code to suppress cpplint warnings * Solve formatting issues * Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel * Move fast_gelu_impl.h to rocm/bert * Fix some Lint C++ warnings and code alignment		2022-06-24 12:46:17 -07:00
..
github	Fix orttraining-linux-ci-pipeline - Symbolic shape infer (#11965 )	2022-06-23 08:23:36 -07:00
__init__.py
amd_hipify.py	Optimize FastGelu with float2 and float4 vectorized kernels on ROCm (#11491 )	2022-06-24 12:46:17 -07:00
build.py	Fix orttraining-linux-ci-pipeline - Symbolic shape infer (#11965 )	2022-06-23 08:23:36 -07:00
clean_docker_image_cache.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
coverage.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
gen_def.py	Snpe ep (#11665 )	2022-06-03 14:10:02 -07:00
get_docker_image.py	Set black's target version (#11370 )	2022-04-27 14:52:19 -07:00
logger.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
op_registration_utils.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
op_registration_validator.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
policheck_exclusions.xml	A new pipeline to replace the existing WindowsAI packaging pipeline (#10646 )	2022-03-03 08:56:49 -08:00
reduce_op_kernels.py	Include layout transformation ops in extended minimal build and above. (#11355 )	2022-04-27 10:31:02 -07:00
requirements.txt	Bump numpy from 1.19.2 to 1.21.0 in /tools/ci_build	2022-01-12 17:45:35 -08:00
upload_python_package_to_azure_storage.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
upload_python_package_to_azure_storage_with_python.py	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00