onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

pengwa 2f5bf75e51 Optimize computation orders (#13672 ) ### Optimize computation orders In `Roberta/Electra`, when `ClassificationHead` is used, there is slicing operation on features on sequence_length dimensions, then loss calculations only depend on this sliced data. This is a slicing at axis 1. Before slicing the shape is [batch, sequence_length, hidden], after slicing, it becomes [batch , hidden_stage] We had opportunities to bring this slicing earlier as much as possible, by passing through simple elementwise ops (like Add/Div), or Layernorm/Softmax(if their reduce axis is after the slicing axis), or even MatMul's the left operand (if only it did not affect the last dims). For operators like Reshape/Transpose, it is special since they have either data specified (after slicing we need update), or they have perm specified, which requires the input rank remain unchanged. So for those kinds of operators, we can remain the original rank, but just leave the sliced dim to be 1, after the compute completed, we do a Squeeze. ``` class RobertaClassificationHead(nn.Module): """Head for sentence-level classification tasks.""" def __init__(self, config): super().__init__() self.dense = nn.Linear(config.hidden_size, config.hidden_size) classifier_dropout = ( config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob ) self.dropout = nn.Dropout(classifier_dropout) self.out_proj = nn.Linear(config.hidden_size, config.num_labels) def forward(self, features, **kwargs): x = features[:, 0, :] # take <s> token (equiv. to [CLS]) x = self.dropout(x) x = self.dense(x) x = torch.tanh(x) x = self.dropout(x) x = self.out_proj(x) return x ``` src\transformers\models\roberta\modeling_roberta.py src\transformers\models\electra\modeling_electra.py #### Benchmark A simple benchmark shows Robeta training latency dropped from 208ms ~ 199ms. 4.5+% reduction. More comprehensive tests are on the way. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2022-12-22 15:12:52 +08:00
..
external	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
patches	Update absl to the latest release (#13990 )	2022-12-19 14:25:13 -08:00
tensorboard	Improve dependency management (#13523 )	2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake	Multi-stream execution support (#13495 )	2022-12-15 07:39:29 -08:00
CMakeLists.txt	Fix deprecated-builtins (#14001 )	2022-12-17 18:17:05 +08:00
CMakeSettings.json
codeconv.runsettings
deps.txt	Update absl to the latest release (#13990 )	2022-12-19 14:25:13 -08:00
EnableVisualStudioCodeAnalysis.props	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
gdk_toolchain.cmake	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake	Remove miscellaneous nuphar configs (#13070 )	2022-09-26 13:41:28 -07:00
onnxruntime_codegen_tvm.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake	Enabling thread pool to be numa-aware (#13778 )	2022-12-12 10:33:55 -08:00
onnxruntime_config.h.in	[wasm] update emscripten v2.0.34 (#10391 )	2022-01-26 14:46:02 -08:00
onnxruntime_csharp.cmake	Enable nuget packages for on device training (#13637 )	2022-12-05 14:54:09 -08:00
onnxruntime_eager.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_flatbuffers.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_framework.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake	Add linux and macos arm64 java aritifacts (#10981 )	2022-03-25 16:23:17 -07:00
onnxruntime_java_unittests.cmake	[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader (#8013 )	2021-07-20 22:33:15 -07:00
onnxruntime_kernel_explorer.cmake	Share TunableOp between CUDA and ROCM EP (#13560 )	2022-11-11 13:56:44 +08:00
onnxruntime_language_interop_ops.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_mlas.cmake	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
onnxruntime_nodejs.cmake	Add Node.js binding support to packaging pipeline (#9577 )	2021-11-05 15:29:40 -07:00
onnxruntime_objectivec.cmake	Remove SafeInt dependency from Objective-C API. (#13698 )	2022-11-18 17:06:12 -08:00
onnxruntime_opschema_lib.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake	Optimize computation orders (#13672 )	2022-12-22 15:12:52 +08:00
onnxruntime_providers.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_pyop.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_python.cmake	Improve dependency management (#13523 )	2022-12-01 09:51:59 -08:00
onnxruntime_rocm_hipify.cmake	Multi-stream execution support (#13495 )	2022-12-15 07:39:29 -08:00
onnxruntime_session.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_snpe_provider.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_unittests.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_util.cmake	Improve dependency management (#13523 )	2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake	Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888 )	2022-12-14 08:32:46 -08:00
precompiled_header.cmake	Fix Windows Store build (#8753 )	2021-08-23 11:19:03 -07:00
Sdl.ruleset	Update Sdl.ruleset to remove C26812 from the rules (#12695 )	2022-09-01 20:05:20 -07:00
set_winapi_family_desktop.h
target_delayload.cmake	Remove Windows Store specific code	2022-03-17 23:38:14 -07:00
uwp_stubs.h	Fix Windows Store build (#8753 )	2021-08-23 11:19:03 -07:00
wcos_rules_override.cmake
winml.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
winml_cppwinrt.cmake	Fix Windows Store build (#8753 )	2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00