onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-29 03:30:52 +00:00

History

guyang3532 341484e67c Embedding sparsity optimization (#16141 ) ### Description Optimize compute graph by eliminating padding in embedding. ### Motivation and Context The computation for padding in nodes after embedding is unnecessary and waste computation resources. This pr just add an Optimizer of PaddingElimination to check and eliminate the padding after embedding automatically by modifying the graph. ### Implementation: 1. Find and check embedding node in graph. 2. Iterate the subgraph afterward the embedding node and record all the input nodes and output nodes to this subgraph. 3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape from [batch_size, seqlen, ...] to [valid_token_without_padding, ...], and insert 'GatherGrad + Reshape' to unflatten each output node shape from [valid_token_without_padding, ...] to [batch_size, seqlen, ...] --------- Co-authored-by: mindest <linminuser@gmail.com>		2023-06-19 20:34:53 +08:00
..
c_cxx	Training Documentation (#15612 )	2023-04-25 11:44:12 -07:00
execution_providers/images
images
python	Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147 )	2023-06-16 19:47:09 -07:00
ABI_Dev_Notes.md	skip windows GPU check if changes only in doc (#13248 )	2022-10-11 13:51:44 +08:00
Android_testing.md
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	Enable RUFF as a formatter (#15699 )	2023-04-26 14:04:07 -07:00
ContribOperators.md	optimization for whisper model with decoder masked multihead attention (#15827 )	2023-05-18 15:38:31 -07:00
FAQ.md	Fix typo enviroment => environment (#13195 )	2022-10-03 17:02:26 -07:00
How_To_Update_ONNX_Dev_Notes.md	Remove exclusions for ONNX model tests that now pass. (#14337 )	2023-01-24 08:04:27 +10:00
Memory_Optimizer.md	Add guidelines for ORTModule (#13553 )	2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Fix broken and outdated links in documentation (#14092 )	2023-02-23 10:48:04 -08:00
OperatorKernels.md	Fix MS domain QuantizeLinear and DequantizeLinear type registrations … (#16298 )	2023-06-15 18:21:56 -07:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_use_trtion_kernel.md	integrate triton into ort (#15862 )	2023-05-17 09:35:28 +08:00
ORTMobilePackageOperatorTypeSupport.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md	Enhance StatisticsSubscriber (#16098 )	2023-06-12 18:32:08 +08:00
ORTModule_Training_Guidelines.md	Embedding sparsity optimization (#16141 )	2023-06-19 20:34:53 +08:00
PR_Guidelines.md
Privacy.md
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md
TVM_EP.md	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00