onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-21 19:18:55 +00:00

History

Prathik Rao 544407038d SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 ) ### Description <!-- Describe your changes. --> Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which will provide speedup for Llama-v2 on A100 using bfloat16 numerical format. _layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_ ![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284) ### Repro Instructions ```python from torch import nn from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel import torch dtype = torch.bfloat16 # dtype = torch.float16 class Net(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(784, 10, dtype=dtype) self.layernorm = nn.LayerNorm([784], dtype=dtype) def forward(self, x): x = x.view(x.shape[0], -1) x = self.layernorm(x) x = self.fc(x) return x model = Net() model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO)) model.to("cuda") images = torch.randn((8, 28, 28), dtype=dtype).to("cuda") output = model(images) ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ONNX Runtime integration with Llama-v2 family of LLMs. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-14 10:05:16 -08:00
..
orttraining	SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 )	2024-02-14 10:05:16 -08:00
tools	Bump ruff linter to 0.2.1 (#19471 )	2024-02-08 16:08:27 -08:00

SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 )

### Description
<!-- Describe your changes. -->

Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which
will provide speedup for Llama-v2 on A100 using bfloat16 numerical
format.

_layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_

![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284)

### Repro Instructions

```python
from torch import nn
from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel
import torch

dtype = torch.bfloat16
# dtype = torch.float16

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(784, 10, dtype=dtype)
        self.layernorm = nn.LayerNorm([784], dtype=dtype)

    def forward(self, x):
        x = x.view(x.shape[0], -1)
        x = self.layernorm(x)
        x = self.fc(x)

        return x

model = Net()
model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO))
model.to("cuda")

images = torch.randn((8, 28, 28), dtype=dtype).to("cuda")
output = model(images)
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

ONNX Runtime integration with Llama-v2 family of LLMs.

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

2024-02-14 10:05:16 -08:00

orttraining

SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 )

2024-02-14 10:05:16 -08:00

tools

Bump ruff linter to 0.2.1 (#19471 )

2024-02-08 16:08:27 -08:00