onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-04 23:59:56 +00:00

Author	SHA1	Message	Date
Ti-Tai Wang	87f55505b3	[ONNX] Support huggingface BART to ONNX (#12779 ) Add BART into transformer support, specificalyy for `BartForConditionalGeneration` Motivation and Context - fixes #11210 Currently, the custom op beam search is not working in nightly, this PR should be run with a [custom commit](`10f3d46d92`)	2022-10-06 12:20:03 -07:00
cloudhan	72076b1eb2	Update ROCm CI to use HIP LANGUAGE (#13214 ) Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang	2022-10-05 16:15:16 +08:00
Tianlei Wu	b6c04f48c1	Fix reshape fusion (#13150 ) (1) Hot fixes reshape fusion, which causes stable diffusion unet model invalid. (2) Update remove_cascaded_cast_nodes to make it faster	2022-10-04 00:26:29 -07:00
Tony Xia	962fee5fe5	Fix typo enviroment => environment (#13195 )	2022-10-03 17:02:26 -07:00
Yufeng Li	1342baf1c7	refine QuantConfig (#13155 ) Refine the QuantConfig: 1. Remove the default EP config. 2. pass QuantConfig to quantize API direclty.	2022-10-03 08:34:49 -07:00
PeixuanZuo	c26bb1bb19	Allow fastgelu/skiplayernorm profile by pass args from commandline (#13025 ) Description: Describe your changes. This allow us quickly launch a microbench session by, for example: `python skip_layer_norm_test.py 8 128 128 float32 `	2022-09-28 15:48:59 -07:00
PeixuanZuo	13d1a3c007	[ROCm] add SkipLayerNorm vectorize Regular case (#12821 ) Description: Describe your changes. add SkipLayerNorm vectorize regular case 1. when hidden size <= 1024, SkipLayerNormTunable op can use both small case and regular case 2. when hidden size > 1024, SkipLayerNormTunable op can only use regular case. Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-09-27 12:52:10 -07:00
Yufeng Li	c746083344	use parameter names to specify argument mapping (#13108 ) use parameter names to specify argument mapping to avoid mismatches.	2022-09-26 20:56:59 -07:00
Chen Fu	e9b1bbc6a5	fix Numpy array None judgement bug (#13103 ) fix https://github.com/microsoft/onnxruntime/issues/13054	2022-09-26 15:15:32 -07:00
Hariharan Seshadri	19c51376c4	Introduce QDQ transformer fusion tools for ordered quantized ops (#12661 )	2022-09-24 23:22:44 -07:00
PeixuanZuo	2ef1f8b93e	[ROCm] add tunable SkipLayerNorm for ROCm EP (#12817 ) Description: Describe your changes. Related PR: https://github.com/microsoft/onnxruntime/pull/12803 https://github.com/microsoft/onnxruntime/pull/12816 https://github.com/microsoft/onnxruntime/pull/12821 1.add tunable skip layernorm for rocm ep 2. keep origin implementation when disable tuning. Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-09-23 16:39:44 +08:00
cloudhan	a24b41d92e	Move all TunableOp related falicilities to EP level directory (#12857 ) Some Ops in EP directory instead of contrib_ops directory will require TunableOp. We will also need to add EP level session tuning options for it. So move those code all at once. Also remove duplicated utility functions.	2022-09-23 11:10:19 +08:00
wangxiyuan	952c99304a	Add CANN EP (#12416 ) Description: This PR adds Ascend CANN execution provider support. Motivation and Context - Why is this change required? What problem does it solve? As the info shown in the issue. CANN is the API layer for Ascend processor. Add CANN EP can allow user run onnx model on Ascend hardware via onnxruntime The detail change: 1. Added CANN EP framework. 2. Added the basic operators to support ResNet and VGG model. 3. Added C/C++、Python API support - If it fixes an open issue, please link to the issue here. https://github.com/microsoft/onnxruntime/issues/11477 Author: lijiawei <lijiawei19@huawei.com> wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: FFrog <ljw1101.vip@gmail.com>	2022-09-22 14:53:40 -07:00
Hariharan Seshadri	057567f39f	Fix bug in Attention Fusion (#13050 )	2022-09-22 13:46:59 -07:00
sfatimar	cccbe90764	Openvino ep 2022.2 v4.2 (#13023 ) This changes are to align OV 2022.2 Release with ORT . Changes CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile Motivation and Context - This change is required to ensure ORT-OpenVINO Execution Provider is aligned with latest changes. - If it fixes an open issue, please link to the issue here. Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: shamaksx <shamax.kshirsagar@intel.com> Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com> Co-authored-by: pratiksha <mohsinx.mohammad@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com> Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com> Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>	2022-09-22 12:31:40 -07:00
Jian Chen	051a0a67a5	Cjian/per channels not working (#13038 ) Description: This fix the bug where per_channel quantization isn't working when axis == 0	2022-09-21 16:24:23 -04:00
Jian Chen	6248b69795	Fixes bug which makes quantized_input_names = [] (#13029 ) Description: Fixes bug in `tools/quantization/operators/split.py` which would make `quantized_input_names == []`	2022-09-21 14:25:38 -04:00
Adrian Lizarraga	39e20686a0	[EP Perf Dashboard] Fix incorrect calls to trtexec with fp16 inputs (#13018 )	2022-09-21 10:31:45 -07:00
cloudhan	a5d70d8609	Allow bert_perf_test.py make some noise by log_severity option (#13024 ) This enables developers inspecting into the benchmark session much easier.	2022-09-21 18:38:46 +08:00
Justin Chu	1245c6397e	Remove usage of torch.onnx symbolic_registry (#13011 ) Description: symbolic_registry is deprecated in torch.onnx. This PR removes its usage. Fixes #13008	2022-09-20 10:59:41 -07:00
PeixuanZuo	189aef2bea	[ADD] add skip layernorm to kernel explorer for ROCm EP (#12816 ) Description: Describe your changes. Related PR: https://github.com/microsoft/onnxruntime/pull/12803 https://github.com/microsoft/onnxruntime/pull/12817 https://github.com/microsoft/onnxruntime/pull/12821 Add skip layernorm to kernel explorer for profiling. Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-09-20 17:17:01 +08:00
cloudhan	ffeba98a9d	Allow gemm profile by pass args from commandline (#12991 ) This allow us quickly launch a microbench session by, for example: ```bash python gemm_test.py T N float16 256 256 65536 ``` So that we can quickly see which one is the fastest.	2022-09-20 16:18:56 +08:00
Yufeng Li	b48f71fcfc	fix bug: quantization shape inference (#12983 ) model path for onnx.shape_inference.infer_shapes_path and the external data needs to be under the same directory as doc here: `f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)`	2022-09-16 10:17:22 -07:00
cloudhan	d2aa2109c0	Make TunableOp follow stream semantics (#12856 )	2022-09-15 21:11:27 +08:00
Dmitri Smirnov	bc2df1bf95	Remove previously deprecated API (#12935 ) Remove previously deprecated API Format JS code, address review comments NPM Formatting	2022-09-14 10:58:03 -07:00
Tianlei Wu	95c4fc6877	[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814 ) * Add TensorRT fused attention fp16 kernels * drop sm 72; seq 512 for sm75; and head_size 32 kernels * Add env variable ORT_DISABLE_FUSED_ATTENTION * exclude files in hipify * update AttentionPastState_dynamic test threshold * fix --use_mask_index in benchmark	2022-09-13 15:16:12 -07:00
Tianlei Wu	30ebc9e00a	Useless Cast removal after converting model from float32 to float16 (#12871 )	2022-09-12 11:07:33 -07:00
Jian Chen	e561a7cf29	Adding QuantConfig Class (#12810 ) * Initial commit for testing * Adding DynamicQuantConfig * Adding DynamicQuantConfig * Format file * Adding Default configuration placeholder. * Update onnxruntime/python/tools/quantization/quantize.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> * Reformat file * Reformat Rest Docstring style to google * Updatge set to frozeset * Uopdate Quant Config * Updates Quant Config * Update enum comparison * Update onnxruntime/python/tools/quantization/quantize.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> * Update Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2022-09-09 14:08:47 -04:00
Dwayne Robinson	8e4eb24648	Update operator kernel table to include DML operators (#12887 ) * Fix bug in pybind get_all_operator_schema due to premature reference dropping * Add updated operator kernels markdown table * Update build.py to include documentation generation for DML operators too * Update GPU pipeline to include DML in the build to so operators can be generated. * Use a separate pipeline stage, feedback from Changming and Scott * Appease annoying Python linter * Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage	2022-09-09 10:21:25 -07:00
RandySheriffH	d3b684cd9e	Drop nuphar (#11555 ) * drop nuphar code and configs * refactor test case * format python * remove nuphar from training test * remove commented nuphar logics * restore llvm setting * drop nuphar ci * fix compile err * fix compile err Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-07 15:11:18 -07:00
Jian Chen	acc8bdc6c5	Splitting quantize_tensor and quantize_input (#12873 ) * Splitting quantize_tensor and quantize_input * Reformat code * Reformat code * Update is_input_a_weight to is_input_a_initializer	2022-09-07 18:05:42 -04:00
petermcaughan	69f7cc6494	Add pybind support for all memory config options in OrtArenaCfg (#12658 ) * Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind * Add overloaded constructor for KVP, UT still in progress * Fix class member access in pybind, fix unit test * Resolve linter warnings * Improve formatting * Simplify UT * Fix linter formatting Co-authored-by: Peter Mcaughan <petermca@microsoft.com>	2022-09-07 11:15:00 -07:00
Chen Fu	8004db4bf1	fix python import sequence warning (#12864 ) fix python import sequence warning	2022-09-07 09:53:39 -07:00
Tianlei Wu	d19955fd89	fix transformers script issues (#12802 ) Fix a few obvious issues: (1) bert_perf_test.py create session without provider in line 65. (2) compare_bert_results.py miss a parameter in create_session in line 37 (3) onnx_exporter.py returns value mismatch in lines 667, 690. (4) remove some imports not used in the scripts. (5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"... (6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+	2022-09-06 16:15:16 -07:00
Chen Fu	9ad5b95e4f	Fix math domain error with log10 (#12841 ) fix math domain error with log10	2022-09-06 08:54:41 -07:00
Yulong Wang	1a402a3f25	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
Chen Fu	d761a7ceb3	Pre-processing of Quantization (#12729 ) Shape Inference and Model Optimization before Quantization Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on the tensor, requires tensor shape information to perform its best. Currently, shape inferencing works best with optimized model. As a result, it is highly recommended to run quantization on optimized model with shape information. This change adds code for model optimization and shape inferencing of the following three steps: 1. Symbolic shape inference. 2. Model optimization 3. ONNX shape inference At the same time we should recommend model optimization should be turned off during quantization. As the optimization might change the computation graph, making it harder for the QDQ debugger to locate matching tensors between original and the quantized models.	2022-08-29 15:47:52 -07:00
Dmitri Smirnov	3ff75fa05f	Address static analysis warnings (#12711 ) Address static analysis warnings	2022-08-26 14:24:14 -07:00
cloudhan	5bdb1d4146	Add Tunable GEMM composed from rocblas and composable kernels (#12599 ) * Add tunable gemm	2022-08-26 14:32:56 +08:00
cloudhan	f76b40aa5b	Change TunableOp to use a type erased interface (#12597 ) * Change to type erased interface, so that there is no need to implement a class for a simple kernel launch function	2022-08-25 19:46:04 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Wei-Sheng Chin	dc486d146b	Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460 ) * Make ORT as Pytorch JIT backend LORT likely doesn't work with aten fallback so we only test LORT in its own CI. * Revert changes to enable external CUDA allocator. Will add it later. Revert "Revert changes to enable external CUDA allocator. Will add it later." This reverts commit d5487f2e193014c805505afae8fb577c53667658. Fix external allocator * Relax tolerance and remove commented code * Print more information in CI * Fix pointer * Address comments. 1. Reuse ORT-eager mode's environment. 2. Remove unused ctor. * Use Pytorch master branch as all PRs are merged Fix * Refine based on cpplint feedbacks * Revert changes to allow custom CUDA allocator in public APIs * Use torch.testing.assert_close * Use unittest framework * Switch docker repo * Rename .cpp to .cc * Address comments * Add comment * Use same pipeline file for eager and lort pipelines * Address comments * Add yaml comment * Fix cmake files * Address comments * Rename flags, remove printing code, remove dead comment	2022-08-22 09:40:40 -07:00
Chen Fu	8456f5fd97	qdq_util bug fix (#12647 ) bugfix: when creating a temp infer file, an existing file maybe accidentally deleted	2022-08-22 09:32:43 -04:00
Chen Fu	56dd0176a1	QDQ debugger - Adding Error Calculator (#12632 ) QDQ debugger - Adding Error Calculator	2022-08-18 09:30:43 -07:00
Chen Fu	f2db6bb293	weight matching (#12607 ) QDQ loss debug - Weights Matching Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.	2022-08-17 11:01:10 -07:00
Tianlei Wu	ce01ed02da	Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448 ) * add AddBiasTranspose kernel, new format of weights * Use compact global_q in GEMM * sequence_index from BxS to S; new stream for copy * merge input and output pointers in scratch2 * update default benchmark tests * add new format 0 for weight and bias * avoid integer overflow * check gpu memory * output summary in benchmark * add logging * update unit tests with non empty bias value * add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm	2022-08-17 09:36:48 -07:00
Chen Fu	eb6aa861cf	QDQ debugger - activations compare (#12544 ) Debugger for QDQ loss - activation matching This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.	2022-08-15 17:03:28 -07:00
Yufeng Li	30ee5a4f79	release calibrator before deleting temporary files (#12601 )	2022-08-15 16:03:46 -07:00
Yufeng Li	95df5dac51	do not quantize Relu/Clip if their inputs are not quantized (#12565 )	2022-08-11 16:16:10 -07:00
Cheng	819c36701f	[xnnpack] basic QDQ operators support (#11912 ) * basic ops for mobilenet,qconv,qsoftmax,qavgpool update Xnnpack to latest unit test * NodeUnit: use outputedge to replace output-node * qdq model e2e test * use inlinedvector to replace vector * conv bias check * tensorshape helpers * Refactor xnn_op minmax * Qlinearsoftmax schema update * Remove qlinearsoftmax registration Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-08-11 10:12:51 +08:00

1 2 3 4 5 ...

820 commits