onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-02 03:55:34 +00:00

History

Prathik Rao 544407038d SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 ) ### Description <!-- Describe your changes. --> Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which will provide speedup for Llama-v2 on A100 using bfloat16 numerical format. _layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_ ![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284) ### Repro Instructions ```python from torch import nn from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel import torch dtype = torch.bfloat16 # dtype = torch.float16 class Net(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(784, 10, dtype=dtype) self.layernorm = nn.LayerNorm([784], dtype=dtype) def forward(self, x): x = x.view(x.shape[0], -1) x = self.layernorm(x) x = self.fc(x) return x model = Net() model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO)) model.to("cuda") images = torch.randn((8, 28, 28), dtype=dtype).to("cuda") output = model(images) ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ONNX Runtime integration with Llama-v2 family of LLMs. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>		2024-02-14 10:05:16 -08:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images	Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225 )	2021-02-05 18:09:27 -08:00
images	API Documentation (#8948 )	2021-09-09 22:04:51 -07:00
python	Bump ruff linter to 0.2.1 (#19471 )	2024-02-08 16:08:27 -08:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md	Removed BUILD.md from master as source now lives in gh-pages (#6709 )	2021-02-19 11:34:21 -08:00
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	[docs] Specify Objective-C max line length. (#16503 )	2023-06-28 16:58:23 -07:00
ContribOperators.md	GQA Rotary and Packed QKV with Flash (#18906 )	2024-01-23 16:34:26 -08:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	Remove exclusions for ONNX model tests that now pass. (#14337 )	2023-01-24 08:04:27 +10:00
Memory_Optimizer.md	Allow layer-wise recompute (#18566 )	2023-12-12 08:44:05 +08:00
Model_Test.md	Renaming MKL-DNN as DNNL (#2515 )	2019-12-03 07:34:23 -08:00
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md	Update docs/ONNX_Runtime_Server_Usage.md (#7818 )	2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot	Update dependencies graph	2020-04-17 07:38:45 -07:00
onnxruntime_dependencies.png	Update dependencies graph	2020-04-17 07:38:45 -07:00
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 )	2024-02-14 10:05:16 -08:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_Use_Trtion_Kernel.md	[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196 )	2023-07-11 13:55:30 +08:00
ORTMobilePackageOperatorTypeSupport.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md	Introduce ZeROOffloadSubscriber for ORTModule (#17006 )	2023-08-25 00:15:22 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	ORTModule memory improvement (#18924 )	2024-01-16 08:57:37 +08:00
PR_Guidelines.md	Add guidelines for writing a good PR. (#3830 )	2020-05-05 16:28:21 -07:00
Privacy.md	[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481 )	2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md	Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871 )	2020-05-12 17:07:06 -07:00
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md	Updated TPN for OpenMPI and cleanup (#3932 )	2020-05-14 11:42:44 -07:00
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00