onnxruntime/onnxruntime/python
Tianlei Wu c0d2472ede
Disable fused causal attention (#14732)
There is accuracy regression in GPT-2 model. Top1 match rate (vs PyTorch
model) drops about 1%. The cause is the fused causal attention uses fp16
accumulation. Disable it by default and add an environment variable 
ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1 to turn on it manually.

It also updated the GPT-2 parity test script to generate left side
padding to reflect the actual usage.

To test:
```
python -m onnxruntime.transformers.models.gpt2.convert_to_onnx -m gpt2 --output gpt2.onnx -o -p fp16 --use_gpu
```
The top1-match-rate in the output is on-par with ORT 1.13.1.
2023-02-21 09:53:31 -08:00
..
backend replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
datasets
providers/tvm Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
tools Disable fused causal attention (#14732) 2023-02-21 09:53:31 -08:00
torch_cpp_extensions [ORTModule] ATen Support for aten::upsample_nearest (#13364) 2022-10-20 08:30:04 +08:00
training
__init__.py
_ld_preload.py
_pybind_state.py.in Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460) 2022-08-22 09:40:40 -07:00
exported_symbols.lst
numpy_helper.h
onnxruntime_collect_build_info.py Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
onnxruntime_inference_collection.py Offline tuning (#14558) 2023-02-15 14:17:34 +08:00
onnxruntime_pybind.h fix windows ci debug build break (#11495) 2022-05-12 16:54:00 -07:00
onnxruntime_pybind_exceptions.cc
onnxruntime_pybind_exceptions.h
onnxruntime_pybind_iobinding.cc Adds missing numpy type when looking for the ort correspondance (#10943) 2022-03-22 14:44:48 -07:00
onnxruntime_pybind_mlvalue.cc Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
onnxruntime_pybind_mlvalue.h Move OrtValueVector from onnxruntime-training to onnxruntime (#11176) 2022-06-15 09:36:28 +02:00
onnxruntime_pybind_module.cc
onnxruntime_pybind_ortvalue.cc Enable ORT in TorchDynamo (#13259) 2022-11-01 11:19:29 -07:00
onnxruntime_pybind_schema.cc Pass SessionOptions to XnnpackProviderFactoryCreator. (#13318) 2022-12-10 14:23:46 +08:00
onnxruntime_pybind_sparse_tensor.cc Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
onnxruntime_pybind_state.cc Offline tuning (#14558) 2023-02-15 14:17:34 +08:00
onnxruntime_pybind_state.h
onnxruntime_pybind_state_common.cc Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601) 2022-11-15 14:43:54 +08:00
onnxruntime_pybind_state_common.h [oneDNN] Improved thread handling (#13618) 2023-01-31 14:37:13 -08:00
onnxruntime_validation.py Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
pybind.def
version_script.lds
version_script_expose_onnx_protobuf.lds