onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

Ankit Maheshkar a6ea57b8f3 OpenVINO EP Weights Sharing Feature (#23553 ) ### Description These changes are done to ensure that weight sharing happens between two model using session context option ep_weight_sharing. Key changes introduced in this feature are: Creating a shared context between two models Extracting external constant initializers and re labelling them back as inputs to the model to allow weight loading in the direct blob. Creating EP Context Nodes when Subgraph partitioning is happening. ### Motivation and Context This change was required to ensure that LLM with prefill and kvcache models can use the same share The change was also required to ensure EP Context nodes can be formed even when model is being subgraph partitioned. --------- Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>		2025-02-06 14:57:38 -08:00
..
attention	Extend Attention Bias Broadcast Support (#21710 )	2024-08-16 15:40:04 -07:00
cloud_models
CNTK	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
custom_execution_provider_library	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
custom_op_get_const_input_test_library
custom_op_invalid_library
custom_op_library	[ROCm] prefer hip interfaces over roc during hipify (#22394 )	2024-10-14 20:34:03 -07:00
custom_op_local_function	Enable comprehension simplification in ruff rules (#23414 )	2025-01-17 08:43:06 -08:00
custom_op_openvino_wrapper_library
float8
lora	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00
mobilenet_v3_small_excerpt_gen
multi_stream_models	Avoid call to Node::ToProto on first Graph::Resolve to improve session creation performance. (#20296 )	2024-04-17 10:07:12 +10:00
ort_minimal_e2e_test_data
ort_minimal_test_models
qnn_ctx	Support loading from model with multiple QNN context binary (#20930 )	2024-06-06 14:44:57 -07:00
snpe
squeezenet	Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. (#20379 )	2024-04-22 18:34:01 -07:00
test_data_generation	Use ruff as the formatter to replace black-isort (#23397 )	2025-01-16 11:14:15 -08:00
training_api	Use ruff as the formatter to replace black-isort (#23397 )	2025-01-16 11:14:15 -08:00
transform	Update BiasGelu fusion and related ops (#23518 )	2025-01-30 22:53:59 -08:00
transformers
TRTEP_test_model
abs_free_dimensions.onnx
add_opset_314159.onnx
alloc_tensor_reuse.onnx
attention_mask1d_fp16.onnx
attention_mask1d_fp32.onnx
attention_mask2d_fp32.onnx
attention_no_mask_fp16.onnx
attention_past_state.onnx
attention_past_state.u8s8.onnx
attention_past_state.u8u8.onnx
avoid_reuse_of_buffer_for_node_output_with_no_consumers.onnx	Avoid reusing buffer for node outputs with no consumers (#21019 )	2024-06-13 16:08:16 -07:00
bart_tiny.onnx
bert_toy_optimized.onnx
bert_toy_postprocessed.onnx
capi_symbolic_dims.onnx
capi_symbolic_dims.py
ckpt_mnist.pt
clip_div_shared_initializer.onnx	[NNAPI EP] Track skipped initializer usage (#21286 )	2024-07-09 13:43:22 -07:00
clip_div_shared_initializer.py	[NNAPI EP] Track skipped initializer usage (#21286 )	2024-07-09 13:43:22 -07:00
constant_floats.onnx
conv.int4_weights.qdq.onnx	[QNN EP] Initial INT4 support (#21171 )	2024-07-10 10:03:53 -07:00
conv_autopad.onnx
conv_qdq_external_ini.bin	enable model with external data be loaded from memory buffer (#19089 )	2024-04-17 19:01:01 -07:00
conv_qdq_external_ini.onnx	enable model with external data be loaded from memory buffer (#19089 )	2024-04-17 19:01:01 -07:00
conv_qdq_s8s8.onnx
conv_qdq_s8s8_perchannel.onnx
conv_qdq_u8u8.onnx
coreml_argmax_cast_test.onnx	[CoreML EP] Fix ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access. (#21797 )	2024-08-23 10:19:53 -07:00
coreml_argmax_cast_test.py	[CoreML EP] Fix ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access. (#21797 )	2024-08-23 10:19:53 -07:00
coreml_argmax_unsupported_cast_test.onnx	[CoreML EP] Fix ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access. (#21797 )	2024-08-23 10:19:53 -07:00
crop_and_resize.onnx
cuda_graph_with_shape_nodes.onnx
custom_op_negpos.onnx
custom_op_single_schema_multi_kernel.onnx
custom_op_string_lower.onnx
custom_op_variadic_io.onnx
custom_op_variadic_undef_io.onnx
dummy_t5.onnx	Fix BeamSearch T5 if initializers are on outer scope (#23044 )	2024-12-09 15:15:20 -08:00
dummy_t5_model_generator.py	Use ruff as the formatter to replace black-isort (#23397 )	2025-01-16 11:14:15 -08:00
dummy_t5_pointer_generator.onnx	Enable pointer-generator T5 models in BeamSearch (#23134 )	2024-12-22 21:30:49 -08:00
dummy_t5_with_outer_scope_initializers.onnx	Fix BeamSearch T5 if initializers are on outer scope (#23044 )	2024-12-09 15:15:20 -08:00
dummy_t5_with_sequence_input_ids.onnx	[CUDA EP] Fix BeamSearch on T5 with sequence_as_input_ids (#20667 ) (#20668 )	2024-12-10 16:20:47 -08:00
dynamic_quantize_matmul_int8.onnx
dynamic_quantize_matmul_int8_bias.onnx
dynamic_quantize_matmul_test.py	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
dynamic_quantize_matmul_uint8.onnx
dynamic_quantize_matmul_uint8_bias.onnx
ep_dynamic_graph_input_test.onnx
ep_dynamic_graph_input_test.py
ep_partitioning_test_1.onnx
ep_partitioning_test_2.onnx
ep_partitioning_tests.py	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
flatten_broadcast.onnx
foo_1.onnx
foo_1.onnx.ort
foo_1_clip_11.onnx
foo_3.onnx
foo_bar_1.onnx
foo_bar_2.onnx
foo_bar_3.onnx
function_opset_test.onnx
function_with_variadics.onnx
fuse_mul_1.onnx
fuse_select_filter.onnx
fuse_select_filter_opset_8.onnx
gather_with_scalar_indices_then_shape.onnx
gather_with_scalar_indices_then_shape.py
gh_issue_9671.onnx
gh_issue_11717.onnx
identity_9799.onnx
identity_opt.onnx
identity_string.onnx
initializer_as_output.onnx
invalid_dim_param_value_repetition.onnx	Avoid reusing buffer for node outputs with no consumers (#21019 )	2024-06-13 16:08:16 -07:00
invalid_dim_param_value_repetition.py	Avoid reusing buffer for node outputs with no consumers (#21019 )	2024-06-13 16:08:16 -07:00
issue4829.onnx
LabelEncoder.onnx
layernorm.onnx
layernorm_no_bias.onnx
layout_transform_const_folding.qdq.onnx	Layout transform: Fix-up QDQ units and add constant folding (#20685 )	2024-05-20 20:19:06 -07:00
layout_transform_fix_transpose_without_dq.qdq.onnx	Layout transform: Fix-up QDQ units and add constant folding (#20685 )	2024-05-20 20:19:06 -07:00
layout_transform_nonconst_broadcast_input.onnx
layout_transform_reshape.onnx
layout_transform_reshape.qdq.onnx
logicaland.onnx
make_conv_int4_weights_model.py	[QNN EP] Initial INT4 support (#21171 )	2024-07-10 10:03:53 -07:00
make_qdq_layout_transform_const_folding.py	Layout transform: Fix-up QDQ units and add constant folding (#20685 )	2024-05-20 20:19:06 -07:00
make_transpose_optimizer_empty_dq_q_at_output_model.py	Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer (#22172 )	2024-09-24 21:02:17 -07:00
make_transpose_optimizer_per_axis_qdq_models.py	[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821 )	2024-09-05 17:26:17 -07:00
matmul_1.onnx
matmul_2.onnx
matmul_integer_to_float.py	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
matmul_integer_to_float_int8.onnx
matmul_integer_to_float_int8_bias.onnx
matmul_integer_to_float_int8_int8.onnx
matmul_integer_to_float_int8_int8_bias.onnx
matmul_integer_to_float_uint8.onnx
matmul_integer_to_float_uint8_bias.onnx
matmul_with_dynamic_input_shape.onnx
matmul_with_dynamic_input_shape.py
merge.onnx
mlnet_encoder.onnx
mnist.basic.ort
mnist.basic.v4.ort
mnist.internal_testing_ep.ort
mnist.onnx
mnist.readme.txt
mobilenet_v3_small_excerpt.onnx
mobilenet_v3_small_excerpt_gen.py
model_containing_op_with_function_body.onnx
model_resize_empty_optional_input.onnx
model_with_external_initializer_come_from_user.onnx
model_with_external_initializer_come_from_user.py
model_with_external_initializers.onnx	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 )	2024-11-11 09:59:05 -08:00
model_with_external_initializers.py	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 )	2024-11-11 09:59:05 -08:00
model_with_invalid_ort_config_json.onnx
model_with_metadata.onnx
model_with_metadata.py
model_with_orig_ext_data.bin
model_with_orig_ext_data.onnx	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 )	2024-11-11 09:59:05 -08:00
model_with_valid_ort_config_json.onnx
mul_1.noopset.onnx
mul_1.onnx
mul_1_dynamic.onnx
mul_16.onnx
nhwc_conv_clip_relu.onnx
nhwc_conv_clip_relu.txt
nhwc_resize_scales_opset11.onnx
nhwc_resize_scales_opset18.onnx
nhwc_resize_sizes_opset11.onnx
nhwc_resize_sizes_opset18.onnx
nhwc_resize_sizes_opset18.quant.onnx
nnapi_internal_uint8_support.onnx
nnapi_internal_uint8_support.py
nnapi_reshape_flatten_test.onnx
nnapi_reshape_flatten_test.py
nnapi_sigmoid_input_rank_test.onnx
nnapi_sigmoid_input_rank_test.py
onnx_backend_test_series_filters.jsonc	OpenVINO EP Weights Sharing Feature (#23553 )	2025-02-06 14:57:38 -08:00
onnx_backend_test_series_overrides.jsonc
optional_1.onnx
optional_2.onnx
optional_3.onnx
optional_inputs_ir3.onnx
optional_inputs_ir4.onnx
optional_sequence_tensor.onnx
ort_github_issue_4031.onnx
ort_github_issue_4031.onnx.ort
ort_github_issue_4031.py
ort_github_issue_10305.onnx
ort_github_issue_10305.py
ort_github_issue_11536.onnx
ort_github_issue_12151.onnx
ort_github_issue_12151_neg_dq_axis.onnx	[TransposeOptimizer] Fix axis for QuantizeLinear inserted after DQ (per-channel) -> Unsqueeze (#21793 )	2024-08-20 16:26:02 -07:00
ort_github_issue_15949.onnx
ort_github_issue_17000.onnx
ort_github_issue_17000.ort
ort_github_issue_17000.py
ort_github_issue_19590.onnx
ort_github_issue_19590.py
overridable_initializer.onnx
packed_attention_fp16.onnx
packed_attention_fp16.rbp.onnx
packed_attention_fp16.rbp.py
packed_attention_fp32.onnx
Pads.bin
pipeline_vectorize.onnx
pyop_1.onnx
pyop_2.onnx
pyop_3.onnx
qdq_minimal_model.onnx	Fix quantization tools for issue #19529 (#19591 )	2024-04-24 19:16:27 +02:00
qdq_with_multi_consumer_dq_nodes.onnx
qdq_with_multi_consumer_q_dq_axis.onnx
qnn_ctx_2_inputs_order_test.onnx
qnn_ep_partial_support.onnx
reduced_build_test.onnx_model_with_excluded_ops
reduced_build_test.readme.txt
relu_with_optional.onnx	#22890 Fix profiling on empty Optional (#22891 )	2024-11-26 11:18:47 -08:00
required_ops.config
required_ops_and_types.config
required_ops_config.readme.txt
scan_1.onnx
sequence_construct.onnx
sequence_insert.onnx
sequence_length.onnx
shape_then_slice_and_gather.onnx
shape_then_slice_and_gather.py
sklearn_bin_voting_classifier_soft.onnx
sklearn_bin_voting_classifier_soft.onnx.ort
sklearn_bin_voting_classifier_soft.readme.txt
sparse_initializer_as_output.onnx
sparse_initializer_as_output.py	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
sparse_to_dense_matmul.onnx
sparse_to_dense_matmul.py	Target py310 and modernize codebase with ruff (#23401 )	2025-01-16 19:10:14 -08:00
subgraph_implicit_input_from_initializer.onnx
subgraph_input_shadows_outer_scope_value.onnx
test_cast_back_to_back_non_const_mixed_types_origin.onnx
test_conv_follow_convtrans.onnx
test_conv_follow_convtrans_s8.onnx
test_kernel_info_get_const_input.onnx
test_kernel_info_get_const_input.py
test_model_with_fullonnxdomain.onnx
test_resize.onnx
test_training_model.onnx
test_training_model_0.onnx
test_training_model_1.onnx
test_training_model_2.onnx
transpose_optimizer_cancel_squeeze_per_axis_dq.onnx	[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821 )	2024-09-05 17:26:17 -07:00
transpose_optimizer_cancel_transpose_per_axis_dq.onnx	[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821 )	2024-09-05 17:26:17 -07:00
transpose_optimizer_empty_dq_q_at_graph_output.onnx	Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer (#22172 )	2024-09-24 21:02:17 -07:00
transpose_optimizer_in_place_transpose_unsqueeze_per_axis_dq.onnx	[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821 )	2024-09-05 17:26:17 -07:00
transpose_optimizer_qdq_fixup_unsqueeze_per_axis_dq.onnx	[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821 )	2024-09-05 17:26:17 -07:00
transpose_optimizer_shared_initializers.onnx
transpose_optimizer_shared_initializers.py	Fix typos according to reviewdog report. (#21335 )	2024-07-22 13:37:32 -07:00
transpose_optimizer_shared_initializers_broadcast.onnx
transpose_optimizer_shared_initializers_broadcast2.onnx
tree_ensemble_as_tensor.onnx
trt_plugin_custom_op_test.onnx
trt_plugin_custom_op_test.py
trt_reshape.onnx
trt_reshape_test.py
unused_initializer.onnx
VariedInputCustomOp.onnx
zipmap_int64float.onnx
zipmap_stringfloat.onnx