mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-04 23:59:56 +00:00
* Using vectorized loads (float2) for fp16 to improve performance * Fix a few warnings from cpplint * Fix a few warnings from cpplint * Use __float2half2_rn and fix some cpplint warnings * Move some computaions to LaunchFastGeluKernel * Fix some Lint C++ warning * Using vectorized loads (float4) for fp16 to improve performance * Switch whether to optimize FastGelu with float4 vectorization * Switch to float4 memory access based on input_length in FastGelu * Comment how to set the threshold of float2 and float4 vectorized kernels * Add FastGelu fp16 unit tests for bias_length = 2 and 8 * Make vectorized kernels generic with aligned_vector * Unify the vectorized kernels with/without bias * Refactor the code to suppress cpplint warnings * Solve formatting issues * Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel * Move fast_gelu_impl.h to rocm/bert * Fix some Lint C++ warnings and code alignment |
||
|---|---|---|
| .. | ||
| github | ||
| __init__.py | ||
| amd_hipify.py | ||
| build.py | ||
| clean_docker_image_cache.py | ||
| coverage.py | ||
| gen_def.py | ||
| get_docker_image.py | ||
| logger.py | ||
| op_registration_utils.py | ||
| op_registration_validator.py | ||
| policheck_exclusions.xml | ||
| reduce_op_kernels.py | ||
| requirements.txt | ||
| upload_python_package_to_azure_storage.py | ||
| upload_python_package_to_azure_storage_with_python.py | ||