mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-04 23:59:56 +00:00
There is accuracy regression in GPT-2 model. Top1 match rate (vs PyTorch model) drops about 1%. The cause is the fused causal attention uses fp16 accumulation. Disable it by default and add an environment variable ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1 to turn on it manually. It also updated the GPT-2 parity test script to generate left side padding to reflect the actual usage. To test: ``` python -m onnxruntime.transformers.models.gpt2.convert_to_onnx -m gpt2 --output gpt2.onnx -o -p fp16 --use_gpu ``` The top1-match-rate in the output is on-par with ORT 1.13.1. |
||
|---|---|---|
| .. | ||
| backend | ||
| datasets | ||
| providers/tvm | ||
| tools | ||
| torch_cpp_extensions | ||
| training | ||
| __init__.py | ||
| _ld_preload.py | ||
| _pybind_state.py.in | ||
| exported_symbols.lst | ||
| numpy_helper.h | ||
| onnxruntime_collect_build_info.py | ||
| onnxruntime_inference_collection.py | ||
| onnxruntime_pybind.h | ||
| onnxruntime_pybind_exceptions.cc | ||
| onnxruntime_pybind_exceptions.h | ||
| onnxruntime_pybind_iobinding.cc | ||
| onnxruntime_pybind_mlvalue.cc | ||
| onnxruntime_pybind_mlvalue.h | ||
| onnxruntime_pybind_module.cc | ||
| onnxruntime_pybind_ortvalue.cc | ||
| onnxruntime_pybind_schema.cc | ||
| onnxruntime_pybind_sparse_tensor.cc | ||
| onnxruntime_pybind_state.cc | ||
| onnxruntime_pybind_state.h | ||
| onnxruntime_pybind_state_common.cc | ||
| onnxruntime_pybind_state_common.h | ||
| onnxruntime_validation.py | ||
| pybind.def | ||
| version_script.lds | ||
| version_script_expose_onnx_protobuf.lds | ||