onnxruntime/cmake
pengwa 2f5bf75e51
Optimize computation orders (#13672)
### Optimize computation orders

In `Roberta/Electra`, when `ClassificationHead` is used, there is
slicing operation on features on sequence_length dimensions, then loss
calculations only depend on this sliced data. This is a slicing at axis
1. Before slicing the shape is [batch, sequence_length, hidden], after
slicing, it becomes [batch , hidden_stage]

We had opportunities to bring this slicing earlier as much as possible,
by passing through simple elementwise ops (like Add/Div), or
Layernorm/Softmax(if their reduce axis is after the slicing axis), or
even MatMul's the left operand (if only it did not affect the last
dims).

For operators like Reshape/Transpose, it is special since they have
either data specified (after slicing we need update), or they have perm
specified, which requires the input rank remain unchanged. So for those
kinds of operators, we can remain the original rank, but just leave the
sliced dim to be 1, after the compute completed, we do a Squeeze.

```
class RobertaClassificationHead(nn.Module):
    """Head for sentence-level classification tasks."""

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(classifier_dropout)
        self.out_proj = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, features, **kwargs):
        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
        x = self.dropout(x)
        x = self.dense(x)
        x = torch.tanh(x)
        x = self.dropout(x)
        x = self.out_proj(x)
        return x
```

src\transformers\models\roberta\modeling_roberta.py
src\transformers\models\electra\modeling_electra.py

#### Benchmark

A simple benchmark shows Robeta training latency dropped from 208ms ~
199ms. 4.5+% reduction.
More comprehensive tests are on the way.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-22 15:12:52 +08:00
..
external Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
patches Update absl to the latest release (#13990) 2022-12-19 14:25:13 -08:00
tensorboard Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
CMakeLists.txt Fix deprecated-builtins (#14001) 2022-12-17 18:17:05 +08:00
CMakeSettings.json
codeconv.runsettings
deps.txt Update absl to the latest release (#13990) 2022-12-19 14:25:13 -08:00
EnableVisualStudioCodeAnalysis.props Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
gdk_toolchain.cmake Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake Remove miscellaneous nuphar configs (#13070) 2022-09-26 13:41:28 -07:00
onnxruntime_codegen_tvm.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake Enabling thread pool to be numa-aware (#13778) 2022-12-12 10:33:55 -08:00
onnxruntime_config.h.in [wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
onnxruntime_csharp.cmake Enable nuget packages for on device training (#13637) 2022-12-05 14:54:09 -08:00
onnxruntime_eager.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_flatbuffers.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_framework.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Add linux and macos arm64 java aritifacts (#10981) 2022-03-25 16:23:17 -07:00
onnxruntime_java_unittests.cmake [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader (#8013) 2021-07-20 22:33:15 -07:00
onnxruntime_kernel_explorer.cmake Share TunableOp between CUDA and ROCM EP (#13560) 2022-11-11 13:56:44 +08:00
onnxruntime_language_interop_ops.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_mlas.cmake Switch GSL to MS GSL 4.0.0 (#13416) 2022-10-29 04:15:20 -07:00
onnxruntime_nodejs.cmake Add Node.js binding support to packaging pipeline (#9577) 2021-11-05 15:29:40 -07:00
onnxruntime_objectivec.cmake Remove SafeInt dependency from Objective-C API. (#13698) 2022-11-18 17:06:12 -08:00
onnxruntime_opschema_lib.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake Optimize computation orders (#13672) 2022-12-22 15:12:52 +08:00
onnxruntime_providers.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_pyop.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_python.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_rocm_hipify.cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
onnxruntime_session.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_snpe_provider.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_unittests.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_util.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
precompiled_header.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
Sdl.ruleset Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
set_winapi_family_desktop.h
target_delayload.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
uwp_stubs.h Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
wcos_rules_override.cmake
winml.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
winml_cppwinrt.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00