* GPT2 with one step search tutorial
* remove quantization section
Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
* Add FBGEMM submodule
* Add fbgemm based per-channel quantization
* Add missing logic for pre-layernorm transformer model fusion
* add support for structured pruning architecture -fastformers
* Fix windows build
* Add a default behavior when head_size is not present for the backward compatibility
* Remove FBGEMM and default to tensor-wise quantization, column-wise quantization will be enabled later
* Fixed some unit test errors
* Fix windows compile error and unit test errors
* delete the option removed from the upstream
* Addresses review comments and fixes a merge error
* Remove commented out code
* add non-zero zp support
* support A and B scale with any dimensions
* fix build breaks
* fix warning in MSVC
* Fix bug for not checking original float value names when treat it as not existing.
* Clean up head size
* Clean up python tools
* Enable per column quantization
* fix quant weight cleanup bug
* A few code clean up
* Some code clean-up
* Some code clean-up
* Change option name
* update default value
* Rename option and parameter names
* Missing argument name change
* Add tests for quantization options for attention and matmul
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
catch symbolic shape inference exception.
no prune graph when there is inner graph (Loop/If/Scan)
add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data
remove fuse_mask that is not used any more in onnx_model_bert_tf.py
* Use positivity everywhere; handle negative index in Slice
* limit positivity to inputs
* make handle_negative_index private
* strengthen sympy comparison
* further strengthen compariso
n and a minor refactoring
* Add flip test
* Fall through if -int_max in handle_negative_index()
* minor fix for infer_Concat to include initializers
* Add more tests
* use simplify
* more tests
* check in early stop search as separate type
* rename to beam search configurations
* update do sample configuration flag help
* rename to configurable search step
* add option groups
* add more unit tests
Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
* Update symbolic_shape_infer.py
don't rely on static code infer in _infer_Squeeze_
* checking if dorpped axes might be =! 1
* Checking opset. Logging assumption that symbolic dimensions are unequal to 1.
* more checks
* Implement qlinear concat and unit test.
Add quantization tools for QLinearConcat and it quantization tests.
* Add kernel def hash for QLinearConcat.
* Change according to PR. Add qdq transformer support for QLinearConcat.
* Add QDQ Transformer unittest. Fix typo on domain.
* remove dup logic of no use.
* fix x86 build error.
* Update operator docs.
* beam search refactoring checkin
* add factory class and deduplicate code
* one step beam search works on gpu
Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
* Code refactor
* Modify code to tackle OOM when calibrating on larget dataset
* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax
* Add COCO val 2017 annotation
* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax
* Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers"
* Check and install flatbuffers module
* Add script to donwload coco dataset image and refactor example
* Fix bug of "No module
named:onnxruntime.quantization.CalTableFlatBuffers"
* Add CalTableFaltBuffers as module
* Remove annotation, user can download by themselves.
* Uncommet code
* Add back instances_val2017.json
* Make sure flatbuffers installed when ORT is installed
* Refactor code to call coco api
* Enable FP16 for example
* fusion support runtime edge shape checking
* trim ctor
* add test
* fix
* Update test_shape_infer_helper.py
* use torch input size as dynamic axis hints
* check dir
* update
* support longformerattention
* update and add support for bert ops
* trim
* review comments
* review comments
Implemented following change to avoid the error when using both --use_external_data_form and --precision int8 with GPT2LMHeadModel, which results in
line 161, in save_external_data; open(external_data_file_path, 'ab').close()
FileNotFoundError: [Errno 2] No such file or directory:
This may also be related to the identified bug #6047.