mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
### Description Optimize compute graph by eliminating padding in embedding. ### Motivation and Context The computation for padding in nodes after embedding is unnecessary and waste computation resources. This pr just add an Optimizer of PaddingElimination to check and eliminate the padding after embedding automatically by modifying the graph. ### Implementation: 1. Find and check embedding node in graph. 2. Iterate the subgraph afterward the embedding node and record all the input nodes and output nodes to this subgraph. 3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape from [batch_size, seqlen, ...] to [valid_token_without_padding, ...], and insert 'GatherGrad + Reshape' to unflatten each output node shape from [valid_token_without_padding, ...] to [batch_size, seqlen, ...] --------- Co-authored-by: mindest <linminuser@gmail.com> |
||
|---|---|---|
| .. | ||
| c_cxx | ||
| execution_providers/images | ||
| images | ||
| python | ||
| ABI_Dev_Notes.md | ||
| Android_testing.md | ||
| C_API_Guidelines.md | ||
| cmake_guideline.md | ||
| Coding_Conventions_and_Standards.md | ||
| ContribOperators.md | ||
| FAQ.md | ||
| How_To_Update_ONNX_Dev_Notes.md | ||
| Memory_Optimizer.md | ||
| Model_Test.md | ||
| NotesOnThreading.md | ||
| ONNX_Runtime_Server_Usage.md | ||
| onnxruntime_dependencies.dot | ||
| onnxruntime_dependencies.png | ||
| onnxruntime_extensions.md | ||
| OperatorKernels.md | ||
| ORT_Format_Update_in_1.13.md | ||
| ORT_use_trtion_kernel.md | ||
| ORTMobilePackageOperatorTypeSupport.md | ||
| ORTModule_Convergence_Notes.md | ||
| ORTModule_Training_Guidelines.md | ||
| PR_Guidelines.md | ||
| Privacy.md | ||
| Python_Dev_Notes.md | ||
| Reduced_Operator_Kernel_build.md | ||
| ReleaseManagement.md | ||
| Roadmap.md | ||
| Server.md | ||
| TVM_EP.md | ||
| Versioning.md | ||
| WinML_principles.md | ||