pytorch/test/cpp/api
Matthew Hoffman 258f47fc0b Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884)
Fixes #10536

Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work!

As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral.

The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)), which requires only a very small change in the C++ code.

Here are the benchmarks of the new implementation!

`float32`:

![eaaa95ef-9384-45d2-be56-6898bc1d3514](https://github.com/user-attachments/assets/3b0eb309-e5a0-4a4d-97bb-4e3298783dbb)

`bool`:

![892f32da-8d9a-492b-9507-18d3f0a41e8e](https://github.com/user-attachments/assets/6824ea15-7d4e-4b89-95f0-8546635f0c2e)

Code:

```python
from __future__ import annotations

import random
import time
from typing import Literal

import numpy as np
import torch

def pad_sequence_with_flips(
    sequences: list[torch.Tensor],
    batch_first: bool = False,
    padding_value: int | float | bool = 0.0,
    padding_side: Literal["left", "right"] | str = "left",
) -> torch.Tensor:
    if padding_side == 'right':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value)
    elif padding_side=='left':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value)  # pyright: ignore[reportArgumentType]
        padded_sequence = padded_sequence.flip(int(batch_first))
    else:
        raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}")

    return padded_sequence

sequence_lengths: list[int] = []

flip_left_pad_times: list[float] = []
flip_left_pad_times_std: list[float] = []

left_pad_times: list[float] = []
left_pad_times_std: list[float] = []

RUNS_PER_LOOP: int = 100

for i in range(1, 7):
    sequence_length = i * int(1e6) // 6
    sequence_lengths.append(sequence_length)

    sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)]

    inner_left_pad_times: list[float] = []
    inner_right_pad_times: list[float] = []

    inner_flip_left_pad_times: list[float] = []
    inner_flip_right_pad_times: list[float] = []

    for _ in range(RUNS_PER_LOOP):

        start = time.perf_counter()
        torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_left_pad_times.append(end - start)

        start = time.perf_counter()
        pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_flip_left_pad_times.append(end - start)

    left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times))
    left_pad_times_std.append(np.std(inner_left_pad_times))

    flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times))
    flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times))

    print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}")

import matplotlib.pyplot as plt

plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left")
plt.scatter(sequence_lengths, left_pad_times)
plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^')

plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)")
plt.scatter(sequence_lengths, flip_left_pad_times)
plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^')

plt.xlabel("Sequence Length")
plt.ylabel("Time (s)")
plt.legend(loc="upper right")

# Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794
# Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562
# Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863
# Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877
# Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024
# Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751
```

Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884
Approved by: https://github.com/ezyang
2024-08-07 15:53:07 +00:00
..
any.cpp
autograd.cpp
CMakeLists.txt
dataloader.cpp
dispatch.cpp
enum.cpp
expanding-array.cpp
fft.cpp
functional.cpp
grad_mode.cpp
inference_mode.cpp
init.cpp
init_baseline.h
init_baseline.py [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758) 2024-07-31 10:54:03 +00:00
integration.cpp
ivalue.cpp
jit.cpp
memory.cpp
meta_tensor.cpp
misc.cpp
module.cpp
moduledict.cpp
modulelist.cpp
modules.cpp [structural binding][11/N] Replace std::tie with structural binding (#130830) 2024-07-18 00:45:06 +00:00
namespace.cpp
nested.cpp
nested_int.cpp
nn_utils.cpp Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884) 2024-08-07 15:53:07 +00:00
operations.cpp
optim.cpp
optim_baseline.h
optim_baseline.py
ordered_dict.cpp
parallel.cpp
parallel_benchmark.cpp
parameterdict.cpp
parameterlist.cpp
README.md
rnn.cpp
sequential.cpp
serialize.cpp
special.cpp
static.cpp
support.cpp
support.h
tensor.cpp
tensor_cuda.cpp
tensor_flatten.cpp
tensor_indexing.cpp
tensor_options.cpp
tensor_options_cuda.cpp
torch_include.cpp
transformer.cpp

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.