onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-19 21:32:23 +00:00

Author	SHA1	Message	Date
pengwa	ccc4487553	fix CI onnxruntime_test_python_sparse_matmul.py (#14039 ) ### Description Numpy1.24.0 removed the np.float. ``` /opt/hostedtoolcache/Python/3.8.15/x64/bin/python onnxruntime_test_python_sparse_matmul.py EE. ====================================================================== ERROR: testRunContribSparseMatMul (__main__.TestSparseToDenseMatmul) Mutliple sparse COO tensor to dense ---------------------------------------------------------------------- Traceback (most recent call last): File "onnxruntime_test_python_sparse_matmul.py", line 407, in testRunContribSparseMatMul np.float, File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__ raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' ====================================================================== ERROR: testRunSparseOutputOnly (__main__.TestSparseToDenseMatmul) Try running models using the new run_with_ort_values ---------------------------------------------------------------------- Traceback (most recent call last): File "onnxruntime_test_python_sparse_matmul.py", line 39, in testRunSparseOutputOnly values = np.array([1.764052391052246, 0.40015721321105957, 0.978738009929657], np.float) File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__ raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-12-21 17:31:52 +08:00
Zhang Lei	fba09faf5b	Implement reuse past and present tensor in Attention Ops. (#13791 ) Implement reuse kv_cache past and present tensor in Attention Ops. Unit test for abover feature. Utilize the reuse kv_cache for past and present tensor in Greedy Search. Correctness test for it. Co-authored-by: Zhang Lei <phill.zhang@gmail.com>	2022-12-18 10:03:53 -08:00
Jian Chen	d7d932c1c2	Cjian/where python operator (#12795 ) Description: This PR will enable the python tool to run QWhere and QDQWhere operation Limitation: s8s8 Where is still not supported.	2022-12-12 13:27:47 -08:00
Jian Chen	b8d941f065	Cjian/pad ops bug (#13930 )	2022-12-12 10:23:49 -08:00
Tianlei Wu	abe1642a0c	Update fusion for distilbert accuracy test on SQuAD (#13748 ) (1) Embed layer fusion to work with --use_mask_index. (2) Parse num_heads and hidden_size from a pattern of Concat shape node. (3) Fix a typo (CUDAExcecutionProvider=> CUDAExecutionProvider) in eval_squad.py (4) Update example comments in eval_squad.py to use optimized fp16 model. (5) Update tests in test_optimizer.py	2022-11-29 13:06:39 -08:00
Ted Themistokleous	c6bea4f02f	Modify MIGraphX EP for Accuracy tests (#13455 ) Allows MIGraphX EP to run the following additional tests. Also adds support to get MIGraphX to run eval_squad.py Reference to the Rocm EP changes: https://github.com/microsoft/onnxruntime/pull/13306 Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com> Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2022-11-27 18:26:49 +08:00
PeixuanZuo	8f3c6ea0df	[ROCm] Add GemmFastGelu TunableOp (#13589 ) ### Description <!-- Describe your changes. --> 1. Update the rules for GemmFastGelu fusion, MatMul input x should >= two dimension, input weight should == two dimension. 2. Add GemmFastGelu fusion test. 3. Add GemmFastGelu TunableOp, only contains the original implementation(Gemm + FastGelu). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-11-22 12:58:01 +08:00
cloudhan	9e649d1ac4	Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601 ) This ports #13116 from ROCm EP to CUDA EP	2022-11-15 14:43:54 +08:00
Ye Wang	df796bbb62	cast logits to half when T=MLFloat16 (#13454 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-11-03 16:40:19 -07:00
Yi Zhang	1885460776	skip some models failed in dynamic shape infer (#13400 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Some models from model zoo failed in the Linux CPU workflow. https://github.com/onnx/models/issues/562 Skip them temporarily. ###Verfication Linux CPU CI passed with beta image https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789772&view=results 2022-10-21T13:31:17.6740348Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/Inception-1-int8/inception-v1-12-int8.onnx 2022-10-21T13:31:17.6740998Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/DenseNet-121-12-int8/densenet-12-int8.onnx 2022-10-21T13:31:17.6741618Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MNIST-12/mnist-12.onnx 2022-10-21T13:31:17.6742207Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-int8/ssd-12-int8.onnx 2022-10-21T13:31:17.6742898Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet50_fp32/resnet50-v1-12.onnx 2022-10-21T13:31:17.6743544Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MobileNet v2-1.0-fp32/mobilenetv2-12.onnx 2022-10-21T13:31:17.6744259Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet101_DUC_HDC-12/ResNet101-DUC-12.onnx 2022-10-21T13:31:17.6744891Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/YOLOv3-12-int8/yolov3-12-int8.onnx 2022-10-21T13:31:17.6745501Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/AlexNet/bvlcalexnet-12.onnx 2022-10-21T13:31:17.6746114Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ZFNet-512-int8/zfnet512-12-int8.onnx 2022-10-21T13:31:17.6746768Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-MobilenetV1-12-int8/ssd_mobilenet_v1_12-int8.onnx	2022-10-25 01:48:46 +08:00
cloudhan	fc12abf6b1	Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116 ) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.	2022-10-19 22:35:08 -07:00
fxmarty	4fe6b23699	Fix typo OpTypesToExcludeOutputQuantizatioin (#13096 ) Change all occurences of `OpTypesToExcludeOutputQuantizatioin` into `OpTypesToExcludeOutputQuantization`	2022-10-14 14:11:37 -07:00
Yufeng Li	1342baf1c7	refine QuantConfig (#13155 ) Refine the QuantConfig: 1. Remove the default EP config. 2. pass QuantConfig to quantize API direclty.	2022-10-03 08:34:49 -07:00
Chen Fu	e9b1bbc6a5	fix Numpy array None judgement bug (#13103 ) fix https://github.com/microsoft/onnxruntime/issues/13054	2022-09-26 15:15:32 -07:00
Jian Chen	44c14e8cbb	Adding test case for conv per channel with QDQ format (#13041 ) Description: Adding test case for conv per channel with QDQ format	2022-09-26 16:25:28 -04:00
Jian Chen	6248b69795	Fixes bug which makes quantized_input_names = [] (#13029 ) Description: Fixes bug in `tools/quantization/operators/split.py` which would make `quantized_input_names == []`	2022-09-21 14:25:38 -04:00
Chen Fu	77b567df66	test qdq loss presence (#12928 ) Description: Change qdq debugger test oracle instead of testing a threshold, which occasionally fails, we just test the loss value is present.	2022-09-19 15:58:27 -07:00
Yufeng Li	b48f71fcfc	fix bug: quantization shape inference (#12983 ) model path for onnx.shape_inference.infer_shapes_path and the external data needs to be under the same directory as doc here: `f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)`	2022-09-16 10:17:22 -07:00
RandySheriffH	d3b684cd9e	Drop nuphar (#11555 ) * drop nuphar code and configs * refactor test case * format python * remove nuphar from training test * remove commented nuphar logics * restore llvm setting * drop nuphar ci * fix compile err * fix compile err Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-07 15:11:18 -07:00
petermcaughan	69f7cc6494	Add pybind support for all memory config options in OrtArenaCfg (#12658 ) * Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind * Add overloaded constructor for KVP, UT still in progress * Fix class member access in pybind, fix unit test * Resolve linter warnings * Improve formatting * Simplify UT * Fix linter formatting Co-authored-by: Peter Mcaughan <petermca@microsoft.com>	2022-09-07 11:15:00 -07:00
Chen Fu	d761a7ceb3	Pre-processing of Quantization (#12729 ) Shape Inference and Model Optimization before Quantization Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on the tensor, requires tensor shape information to perform its best. Currently, shape inferencing works best with optimized model. As a result, it is highly recommended to run quantization on optimized model with shape information. This change adds code for model optimization and shape inferencing of the following three steps: 1. Symbolic shape inference. 2. Model optimization 3. ONNX shape inference At the same time we should recommend model optimization should be turned off during quantization. As the optimization might change the computation graph, making it harder for the QDQ debugger to locate matching tensors between original and the quantized models.	2022-08-29 15:47:52 -07:00
Chen Fu	8456f5fd97	qdq_util bug fix (#12647 ) bugfix: when creating a temp infer file, an existing file maybe accidentally deleted	2022-08-22 09:32:43 -04:00
Chen Fu	56dd0176a1	QDQ debugger - Adding Error Calculator (#12632 ) QDQ debugger - Adding Error Calculator	2022-08-18 09:30:43 -07:00
Chen Fu	f2db6bb293	weight matching (#12607 ) QDQ loss debug - Weights Matching Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.	2022-08-17 11:01:10 -07:00
Chen Fu	eb6aa861cf	QDQ debugger - activations compare (#12544 ) Debugger for QDQ loss - activation matching This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.	2022-08-15 17:03:28 -07:00
Yufeng Li	30ee5a4f79	release calibrator before deleting temporary files (#12601 )	2022-08-15 16:03:46 -07:00
Yufeng Li	95df5dac51	do not quantize Relu/Clip if their inputs are not quantized (#12565 )	2022-08-11 16:16:10 -07:00
Chen Fu	b2382dc43a	fix qdq relu removal bug (#12542 ) Fix minor bug in qdq quantization tool Motivation and Context Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input	2022-08-10 14:06:51 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Chen Fu	47b787c28f	Python module for dumping activation tensors when running an ONNX model (#12474 ) Python module for dumping activation tensors when running an ONNX model This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.	2022-08-09 13:15:45 -07:00
Jian Chen	8c5c283471	new quantized operators split (#12495 ) * adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit `d5e34acb` * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black	2022-08-08 15:12:09 -04:00
Yufeng Li	bdd6b00c9a	set zero point to 0 if all value are 0.0 (#12470 ) * set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal	2022-08-07 21:34:58 -07:00
Yufeng Li	ac10f33d2d	Enable quant op to share quantization parameter between input and ouput (#12408 ) * share quant param between tensors	2022-08-03 21:25:35 -07:00
Ye Wang	89ac61f4d4	support gpt2 model with greedy search (#12068 ) * greedy search gpt2 cpu checkin * add cuda support * add test * provider * update * fix some bugs * refactor impl class * refactor test * remove unused func * refactor parameters class * simplify padding * fix lint warnings * python format * Revert "python format" This reverts commit f25fe1017fa33d960b2418ebbb5dba6a4bd043cf. * python format * fix pipelines * fix pipeline * move bufferallocater to generate_impl_base * review comments(alignment, filename/namespace change) * rebase2 * python reformat * reformat * fix rocm build * review comment * review comments * review comments * fix a bug * rebase test files * python format * format import order * review comments * fix build	2022-07-22 15:45:16 -07:00
Yufeng Li	7194ec1894	fix bug: output of Concat is quantized twice in qdq format (#12254 )	2022-07-21 14:55:47 -07:00
Ye Wang	5066ef1185	Fix a bug in beam search custom attention mask allocation (#12240 )	2022-07-20 23:42:54 -07:00
Yulong Wang	0c78b71352	prepare test folder from GitHub (#12220 ) * consume onnx test data from github * ensure tests * update script and allow opset specification * fix python format * fix python format * consume new filter format * fix linting error	2022-07-20 22:01:08 -07:00
Tianlei Wu	568d08994f	fix test_optimizer.py (#12219 ) * fix optimizer test * update message and skip test instead of uncomment * fix deprecated warning	2022-07-20 19:21:26 -07:00
Tianlei Wu	972e5e7300	Improve symbolic shape inference in transformers tools (#12217 ) improve symbolic shape inference handling n transformers tools: avoid infinite loop and suppress duplicated warnings	2022-07-19 13:27:35 -07:00
Alexey Gladyshev	66978c7ef5	[TVM EP][CI] Added TVMso EP testing into CI (#12188 ) * refactor test for model with undefined shapes * add test for TVMso EP * update build script for TVM EP tests * fix pylint * disable test for Windows * fix black * fix python format * fix pylint * fix python format * replace Path.resolve with os.path.join * fix python path issue Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-07-19 16:05:28 +02:00
Yufeng Li	3446a3750c	generate quantization parameter for outputs (#12089 )	2022-07-05 14:57:43 -07:00
Tianlei Wu	ecca6f4d16	Move beamsearch shared initializers from subgraphs to main graph (#12025 ) * move shared initializers to parent graph * add --disable_shared_initializers	2022-06-29 22:43:41 -07:00
Gary Miguel	4bf22e2a40	Update ONNX to 1.12 (#11924 ) Follow-ups that need to happen after this and before the next ORT release: * Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731 * Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778 Follow-ups that need to happen after this but don't necessarily need to happen before the release: * Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916 Fixes #11640	2022-06-21 17:19:52 -07:00
Tianlei Wu	6ee2c1b5fc	Remove temperature input from BeamSearch operator (#11896 ) * remove temperature input * update index of remaining inputs	2022-06-20 09:50:45 -07:00
George Wu	df5ee6aa4e	[TensorRT EP] support TensorRT 8.4 (#11866 ) * update trt 8.4ga * trt 8.4 linux ci pipeline * fix cmake * placeholder_builder * trt 8.4 windows pipeline * gpu package pipeline * trt 8.4.1.5 , packaging pipeline updates * python packaging * ctest timeout * python packaging test * bump timeout * python format * format * revert * newline * enable trt python tests * typo * python format * disable on windows	2022-06-16 07:46:40 -07:00
Xavier Dupré	a805a49363	Move OrtValueVector from onnxruntime-training to onnxruntime (#11176 ) * Move OrtValueVector from onnxruntime-training to onnxruntime * disable dlpack on onnxruntime * disable dlpack * dlpack * opaque inlcuded in any cc file of the python binding * fix type issue * fix incomplete name * remove len() * remove unused parameter * black * black * black * remove unused import * add unit test to check the output type * black * lint * lint * lint * fix method name * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * check return type of C API * lint * lint * fix missing ; * fix type issue * fix merge issue Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-06-15 09:36:28 +02:00
Gary Miguel	e8b0d24071	Support per-test tolerances for ONNX tests (#11775 ) Prior to this every test shared the same tolerances. This meant that if an ONNX test failed due to a small but acceptable difference in output, the only alternative was to disable the test entirely. In op set 17, the DFT operator is being added. Without this change, the tests for that operator fail because the output is off by about 5e-5. It's better to keep test coverage for this new op rather than disable the test entirely. Also prior to this change, the global tolerances were not shared between C++, JavaScript, and Python tests. Now they are. Also fix various minor issues raised by linters. Unblocks https://github.com/microsoft/onnxruntime/issues/11640.	2022-06-14 15:12:23 -07:00
Tianlei Wu	def78a1b81	Support T5 in BeamSearch operator (#11450 ) (1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation. (2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute (3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default. (4) Add more tests in best_beam_search.py (5) fix ORT_ENFORCE of hypothesis_buffer_offset_ (6) Improve ONNX conversion: (a) Change encoder some dynamic axes to fixed dim value (b) add --separate_encoder_and_decoder_init (c) correct name t5-3B => t5-3b, t5-11B => t5-11b (d) Add --use_int32_inputs in convert t5 to onnx (e) Allow t5 beam search conversion in one step	2022-06-10 15:06:57 -07:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Yufeng Li	f6f457aa57	not remove relu/clip for symmetric activation (#11696 ) * not remove relu/clip for symmetric activation	2022-06-07 18:02:31 -07:00

1 2 3 4 5 ...

382 commits