onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-16 01:33:39 +00:00

Author	SHA1	Message	Date
Ye Wang	4f82ad1b58	Topo sort the model before saving (#7913 ) * checkin toposort * review comments * revert and add TODO	2021-06-02 16:57:08 -07:00
Tianlei Wu	71b05f74a2	fix duplicated node name (#7865 )	2021-05-27 17:16:17 -07:00
Yufeng Li	94bb09bf47	fix topo sort in quant tool (#7833 ) * fix topo sort in quant tool * add unit test and make the topo sort stable	2021-05-26 17:53:35 -07:00
Xiaoyu Liu	224a664811	GPT-2 one step search tutorial (#7718 ) * GPT2 with one step search tutorial * remove quantization section Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>	2021-05-18 12:31:39 -07:00
Young Jin Kim	e9057d2e49	ZCode FastFormers changes (#5827 ) * Add FBGEMM submodule * Add fbgemm based per-channel quantization * Add missing logic for pre-layernorm transformer model fusion * add support for structured pruning architecture -fastformers * Fix windows build * Add a default behavior when head_size is not present for the backward compatibility * Remove FBGEMM and default to tensor-wise quantization, column-wise quantization will be enabled later * Fixed some unit test errors * Fix windows compile error and unit test errors * delete the option removed from the upstream * Addresses review comments and fixes a merge error * Remove commented out code * add non-zero zp support * support A and B scale with any dimensions * fix build breaks * fix warning in MSVC * Fix bug for not checking original float value names when treat it as not existing. * Clean up head size * Clean up python tools * Enable per column quantization * fix quant weight cleanup bug * A few code clean up * Some code clean-up * Some code clean-up * Change option name * update default value * Rename option and parameter names * Missing argument name change * Add tests for quantization options for attention and matmul Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>	2021-05-17 21:12:21 -07:00
Ye Wang	5e8086ad8e	Support fusions inside subgraphs in optimizer tool (#7701 ) * skip subgraph when updating model * intreim checkin * interim checkin 2 * support transformers optimizations in subgraph * change more files * fix comments typo	2021-05-17 12:43:55 -07:00
Yufeng Li	6b0a7905ed	fix quant weight cleanup bug (#7707 )	2021-05-14 22:04:35 -07:00
Zhang Lei	0f7721a019	Fix bug for not checking original float value names when treat it as not existing. (#7695 )	2021-05-14 12:50:30 -07:00
Zhang Lei	033f0b3b7c	fix typo. (#7690 )	2021-05-14 10:25:34 -07:00
Vincent Wang	dac24f7d63	Add ATenOp and call aten::embedding and its Backward Op from ORT (#7590 ) * build with libtorch and impl torchembedding * fix op shape infer * local commit * atenfunctionop * call aten operator from online extension * rollback build.py * resolve comments * bugfix * fix build * fix ortmodule test * remove external outputs, resolve comments * resolve comments * export embedding to microsoft::atenop * bugfix	2021-05-13 09:24:27 +08:00
Zhang Lei	1c7e683a95	Add Squeeze and Unsqueeze support for quantizaton tools. (#7673 )	2021-05-12 14:56:46 -07:00
Zhang Lei	31d4413919	fix quantization tool bug when existing pass through only input (#7674 )	2021-05-12 14:54:42 -07:00
Olivia Jain	29172d8f54	Setup EP Dashboard (#7321 ) * setting up dashboard * posting to ort dashboard * creating separate docker file * including common deps * tracking latency over time	2021-05-11 10:33:39 -07:00
Tianlei Wu	55c086b664	symbolic shape inference improvements for contrib ops (#7606 ) * add EmbedLayerNormalization * use onnx shape inference for Unsqueeze * Fix type warning in Attention	2021-05-07 17:03:24 -07:00
Tianlei Wu	d88da44066	Allow flexible order of Add inputs in Attention fusion (#7565 )	2021-05-06 09:43:28 -07:00
Zhang Lei	9465948715	Quantization tools using one more extra_options on interface. (#7293 ) handle nnapi special sigmoid options.	2021-05-05 13:51:50 -07:00
Zhang Lei	f6cefc92e2	Add quantized value map after quantize input node added. (#7558 )	2021-05-04 15:27:56 -07:00
Tianlei Wu	3c9ece4a11	[transformers optimizer] catch symbolic shape inference exception and clean up (#7560 ) catch symbolic shape inference exception. no prune graph when there is inner graph (Loop/If/Scan) add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data remove fuse_mask that is not used any more in onnx_model_bert_tf.py	2021-05-03 20:42:13 -07:00
Tianlei Wu	731f9e5033	Fix symbolic shape inference for Unsqueeze (#7555 ) * fix Unsqueeze shape inference * add tests	2021-05-03 18:06:59 -07:00
Ryota Tomioka	d1cb8c9dc9	Support negative indices and fix bound checking in symbolic shape inference for Slice (#7401 ) * Use positivity everywhere; handle negative index in Slice * limit positivity to inputs * make handle_negative_index private * strengthen sympy comparison * further strengthen compariso n and a minor refactoring * Add flip test * Fall through if -int_max in handle_negative_index() * minor fix for infer_Concat to include initializers * Add more tests * use simplify * more tests	2021-05-03 09:07:55 -07:00
Xiaoyu Liu	994c2ed420	GPT2 one step beam search update with configuration support (#7425 ) * check in early stop search as separate type * rename to beam search configurations * update do sample configuration flag help * rename to configurable search step * add option groups * add more unit tests Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>	2021-04-29 13:19:56 -07:00
thilow	22d7cde725	Fix a 'Squeeze' related issue in symbolic_shape_infer.py (#7380 ) * Update symbolic_shape_infer.py don't rely on static code infer in _infer_Squeeze_ * checking if dorpped axes might be =! 1 * Checking opset. Logging assumption that symbolic dimensions are unequal to 1. * more checks	2021-04-28 13:13:04 -07:00
Zhang Lei	ada0fbbd2d	Implement qlinear concat and unit test. (#7341 ) * Implement qlinear concat and unit test. Add quantization tools for QLinearConcat and it quantization tests. * Add kernel def hash for QLinearConcat. * Change according to PR. Add qdq transformer support for QLinearConcat. * Add QDQ Transformer unittest. Fix typo on domain. * remove dup logic of no use. * fix x86 build error. * Update operator docs.	2021-04-26 13:38:40 -07:00
Xiaoyu Liu	913ea8264b	GPT2 with one step beam search (#7163 ) * beam search refactoring checkin * add factory class and deduplicate code * one step beam search works on gpu Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>	2021-04-20 06:23:52 -07:00
Tianlei Wu	aa9ab565f5	FastGelu fusion for Megatron model (#7344 ) * add a fastgelu pattern from Megatron model * update comment * add test	2021-04-15 00:39:33 -07:00
Oliver Rausch	87bd836886	Fixes in symbolic shape inference (#7258 ) * Add symbolic shape inference for Transpose * Support steps in symbolic shape inference for Slice * Add inference for BatchNormalization * Address review changes * Address review changes	2021-04-13 22:17:30 -07:00
Zhang Lei	f62db1a09c	quantization tools support qlinear average pool (#7309 )	2021-04-13 18:22:42 -07:00
Zhang Lei	a4fdb4dbd9	Support transpose by merge Reshape etc into direct xint8 operators. (#7265 ) * Suppose transpose by merge Reshape etc into direct xint8 operators. * Add resize operator quantization support * Add QDQ tests for resize, reshape, maxpool, transpose.	2021-04-08 18:00:35 -07:00
KeDengMS	0d49e53985	[Symbolic shape infer] fix scalar shape in Expand (#7285 )	2021-04-08 10:26:28 -07:00
Olivia Jain	fb40602ea2	Mem trt (#6868 ) * adding trt comparison and memory consumption * creating separate docker file	2021-04-05 22:16:12 -07:00
Marek Šuppa	008065aab1	Update README.md (#7043 ) * Fix the precision type (switch from nonexistent `int32` to `fp32`).	2021-04-05 10:03:14 -07:00
Yufeng Li	8d737f9770	handle optional input in quant topo sort (#7223 )	2021-04-02 20:42:48 -07:00
Yufeng Li	c4ebc60870	sort quantized nodes in topo logical order (#7172 )	2021-03-30 09:01:15 -07:00
Yufeng Li	77c19436c0	add a notebook for mobilenetv2 quantization (#7164 ) * add a notebook for quant mobilenetv2	2021-03-29 13:24:14 -07:00
Ashwini Khade	b22e60bd44	pull onnx latest commit (#7102 ) * update onnx commit * fix test scripts to remove deprecated call * update filters * add registration for relu and cumsum ver 14 * add promote trilu to onnx domain * update onnx-tensorrt submodule * update flag * update flag * update dependencies * fix android ci failure	2021-03-29 11:00:38 -07:00
Yufeng Li	3771e0bf10	update bert quantization notebook (#7137 )	2021-03-25 18:12:53 -07:00
Yufeng Li	8e54b76e2d	QDQ implementation (#7033 ) * Add QDQ basic implementation	2021-03-25 09:17:23 -07:00
Yufeng Li	fffe16cb43	Fix a bug in quant GEMM and add an unit test (#7111 )	2021-03-23 16:39:35 -07:00
Yufeng Li	c965878a69	fix a bug in global average pool and add unit test (#6913 ) * fix bug in QGlobalAveragePool * add unit test for quant GlobalAveragePool * not run quantization tests if disable_contrib_ops enabled	2021-03-22 20:01:27 -07:00
Scott McKay	b2c6617b0f	Use 'as_scalar' when checking the 'cond' value of 'If' (#7063 ) #6884	2021-03-22 18:04:38 +10:00
Chi Lo	8c3b59a026	Quantization calibration refactor (#6893 ) * Code refactor * Modify code to tackle OOM when calibrating on larget dataset * Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax * Add COCO val 2017 annotation * Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax * Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers" * Check and install flatbuffers module * Add script to donwload coco dataset image and refactor example * Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers" * Add CalTableFaltBuffers as module * Remove annotation, user can download by themselves. * Uncommet code * Add back instances_val2017.json * Make sure flatbuffers installed when ORT is installed * Refactor code to call coco api * Enable FP16 for example	2021-03-19 01:09:11 -07:00
Cecilia Liu	4fd9fef9ee	Support HuggingFace Models Converted From tf2onnx in Python Script (#6985 ) Support tf2onnx huggingface models in python script	2021-03-17 15:33:57 -07:00
Tianlei Wu	73d085ccdd	add slow test (#7035 )	2021-03-16 20:49:51 -07:00
Ye Wang	4e670f7ab1	Support larger hidden size in Attention Cuda kernel (#7002 ) * Support larger hidden size in Attention Cuda kernel * Update attention_transpose.cu * review comments * fix typo and add check in quantization * update readme	2021-03-15 15:46:10 -07:00
Ye Wang	b57a85d863	Support symbolic shape infer in transformers tool (#6899 ) * fusion support runtime edge shape checking * trim ctor * add test * fix * Update test_shape_infer_helper.py * use torch input size as dynamic axis hints * check dir * update * support longformerattention * update and add support for bert ops * trim * review comments * review comments	2021-03-10 21:37:12 -08:00
Tianlei Wu	4884eee642	Attention fusion detect num_heads and hidden_size automatically (#6920 )	2021-03-10 10:17:00 -08:00
Funtowicz Morgan	9126faa35b	Ability to fuse non-square (pruned) attention weights for BERT-like models (#6850 )	2021-03-04 17:08:08 -08:00
Reuben Zotz-Wilson	107c9672fd	No such file or directory with --use_external_data_form and int8 (#6867 ) Implemented following change to avoid the error when using both --use_external_data_form and --precision int8 with GPT2LMHeadModel, which results in line 161, in save_external_data; open(external_data_file_path, 'ab').close() FileNotFoundError: [Errno 2] No such file or directory: This may also be related to the identified bug #6047.	2021-03-04 15:14:23 -08:00
Tianlei Wu	8f1786d5d2	Save output tensors in bert_test_data tool (#6872 )	2021-03-04 13:09:05 -08:00
Faith Xu	6285ee2398	Reroute quantization tool readme to /docs page (#6854 )	2021-03-02 13:49:42 -08:00

1 2 3 4 5 ...

251 commits