Commit graph

382 commits

Author SHA1 Message Date
pengwa
ccc4487553
fix CI onnxruntime_test_python_sparse_matmul.py (#14039)
### Description

Numpy1.24.0 removed the np.float. 
```

  /opt/hostedtoolcache/Python/3.8.15/x64/bin/python onnxruntime_test_python_sparse_matmul.py
EE.
======================================================================
ERROR: testRunContribSparseMatMul (__main__.TestSparseToDenseMatmul)
Mutliple sparse COO tensor to dense
----------------------------------------------------------------------
Traceback (most recent call last):
  File "onnxruntime_test_python_sparse_matmul.py", line 407, in testRunContribSparseMatMul
    np.float,
  File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'

======================================================================
ERROR: testRunSparseOutputOnly (__main__.TestSparseToDenseMatmul)
Try running models using the new run_with_ort_values
----------------------------------------------------------------------
Traceback (most recent call last):
  File "onnxruntime_test_python_sparse_matmul.py", line 39, in testRunSparseOutputOnly
    values = np.array([1.764052391052246, 0.40015721321105957, 0.978738009929657], np.float)
  File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'

```



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-21 17:31:52 +08:00
Zhang Lei
fba09faf5b
Implement reuse past and present tensor in Attention Ops. (#13791)
Implement reuse kv_cache past and present tensor in Attention Ops. 
Unit test for abover feature.
Utilize the reuse kv_cache for past and present tensor in Greedy Search.
Correctness test for it.

Co-authored-by: Zhang Lei <phill.zhang@gmail.com>
2022-12-18 10:03:53 -08:00
Jian Chen
d7d932c1c2
Cjian/where python operator (#12795)
**Description**: 
This PR will enable the python tool to run QWhere and QDQWhere operation

**Limitation**:
s8s8 Where is still not supported.
2022-12-12 13:27:47 -08:00
Jian Chen
b8d941f065
Cjian/pad ops bug (#13930) 2022-12-12 10:23:49 -08:00
Tianlei Wu
abe1642a0c
Update fusion for distilbert accuracy test on SQuAD (#13748)
(1) Embed layer fusion to work with --use_mask_index.
(2) Parse num_heads and hidden_size from a pattern of Concat shape node.
(3) Fix a typo (CUDAExcecutionProvider=> CUDAExecutionProvider) in eval_squad.py
(4) Update example comments in eval_squad.py to use optimized fp16 model.
(5) Update tests in test_optimizer.py
2022-11-29 13:06:39 -08:00
Ted Themistokleous
c6bea4f02f
Modify MIGraphX EP for Accuracy tests (#13455)
Allows MIGraphX EP to run the following additional tests. Also adds support to get MIGraphX to run eval_squad.py

Reference to the Rocm EP changes: https://github.com/microsoft/onnxruntime/pull/13306

Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2022-11-27 18:26:49 +08:00
PeixuanZuo
8f3c6ea0df
[ROCm] Add GemmFastGelu TunableOp (#13589)
### Description
<!-- Describe your changes. -->

1. Update the rules for GemmFastGelu fusion, MatMul input x should >=
two dimension, input weight should == two dimension.
2. Add GemmFastGelu fusion test.
3. Add GemmFastGelu TunableOp, only contains the original
implementation(Gemm + FastGelu).


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-22 12:58:01 +08:00
cloudhan
9e649d1ac4
Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601)
This ports #13116 from ROCm EP to CUDA EP
2022-11-15 14:43:54 +08:00
Ye Wang
df796bbb62
cast logits to half when T=MLFloat16 (#13454)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-03 16:40:19 -07:00
Yi Zhang
1885460776
skip some models failed in dynamic shape infer (#13400)
### Description
<!-- Describe your changes. -->

### Motivation and Context
Some models from model zoo failed in the Linux CPU workflow.
https://github.com/onnx/models/issues/562
Skip them temporarily.

###Verfication
Linux CPU CI passed with beta image

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789772&view=results
**2022-10-21T13:31:17.6740348Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/Inception-1-int8/inception-v1-12-int8.onnx**
2022-10-21T13:31:17.6740998Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/DenseNet-121-12-int8/densenet-12-int8.onnx
2022-10-21T13:31:17.6741618Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/MNIST-12/mnist-12.onnx
**2022-10-21T13:31:17.6742207Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-int8/ssd-12-int8.onnx**
2022-10-21T13:31:17.6742898Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet50_fp32/resnet50-v1-12.onnx
2022-10-21T13:31:17.6743544Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/MobileNet
v2-1.0-fp32/mobilenetv2-12.onnx
2022-10-21T13:31:17.6744259Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet101_DUC_HDC-12/ResNet101-DUC-12.onnx
2022-10-21T13:31:17.6744891Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/YOLOv3-12-int8/yolov3-12-int8.onnx
2022-10-21T13:31:17.6745501Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/AlexNet/bvlcalexnet-12.onnx
2022-10-21T13:31:17.6746114Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ZFNet-512-int8/zfnet512-12-int8.onnx
**2022-10-21T13:31:17.6746768Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-MobilenetV1-12-int8/ssd_mobilenet_v1_12-int8.onnx**
2022-10-25 01:48:46 +08:00
cloudhan
fc12abf6b1
Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116)
Related PRs #12853

This allows the user enable/disbale tunable GEMM on demand.
2022-10-19 22:35:08 -07:00
fxmarty
4fe6b23699
Fix typo OpTypesToExcludeOutputQuantizatioin (#13096)
Change all occurences of `OpTypesToExcludeOutputQuantizatioin` into `OpTypesToExcludeOutputQuantization`
2022-10-14 14:11:37 -07:00
Yufeng Li
1342baf1c7
refine QuantConfig (#13155)
Refine the QuantConfig: 1. Remove the default EP config. 2. pass
QuantConfig to quantize API direclty.
2022-10-03 08:34:49 -07:00
Chen Fu
e9b1bbc6a5
fix Numpy array None judgement bug (#13103)
fix https://github.com/microsoft/onnxruntime/issues/13054
2022-09-26 15:15:32 -07:00
Jian Chen
44c14e8cbb
Adding test case for conv per channel with QDQ format (#13041)
**Description**: Adding test case for conv per channel with QDQ format
2022-09-26 16:25:28 -04:00
Jian Chen
6248b69795
Fixes bug which makes quantized_input_names = [] (#13029)
**Description**: Fixes bug in `tools/quantization/operators/split.py`
which would make `quantized_input_names == []`
2022-09-21 14:25:38 -04:00
Chen Fu
77b567df66
test qdq loss presence (#12928)
**Description**: Change qdq debugger test oracle

instead of testing a threshold, which occasionally fails, we just test
the loss value is present.
2022-09-19 15:58:27 -07:00
Yufeng Li
b48f71fcfc
fix bug: quantization shape inference (#12983)
model path for onnx.shape_inference.infer_shapes_path and the external
data needs to be under the same directory as doc here:
f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)
2022-09-16 10:17:22 -07:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
petermcaughan
69f7cc6494
Add pybind support for all memory config options in OrtArenaCfg (#12658)
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind

* Add overloaded constructor for KVP, UT still in progress

* Fix class member access in pybind, fix unit test

* Resolve linter warnings

* Improve formatting

* Simplify UT

* Fix linter formatting

Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
2022-09-07 11:15:00 -07:00
Chen Fu
d761a7ceb3
Pre-processing of Quantization (#12729)
Shape Inference and Model Optimization before Quantization

Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on
the tensor, requires tensor shape information to perform its best. Currently, shape inferencing
works best with optimized model. As a result, it is highly recommended to run quantization
on optimized model with shape information.

This change adds code for model optimization and shape inferencing of the following three steps:

1. Symbolic shape inference.
2. Model optimization
3. ONNX shape inference

At the same time we should recommend model optimization should be turned off during quantization.
As the optimization might change the computation graph, making it harder for the QDQ debugger
to locate matching tensors between original and the quantized models.
2022-08-29 15:47:52 -07:00
Chen Fu
8456f5fd97
qdq_util bug fix (#12647)
bugfix: when creating a temp infer file, an existing file maybe accidentally deleted
2022-08-22 09:32:43 -04:00
Chen Fu
56dd0176a1
QDQ debugger - Adding Error Calculator (#12632)
QDQ debugger - Adding Error Calculator
2022-08-18 09:30:43 -07:00
Chen Fu
f2db6bb293
weight matching (#12607)
QDQ loss debug - Weights Matching

Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.
2022-08-17 11:01:10 -07:00
Chen Fu
eb6aa861cf
QDQ debugger - activations compare (#12544)
Debugger for QDQ loss - activation matching

This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.
2022-08-15 17:03:28 -07:00
Yufeng Li
30ee5a4f79
release calibrator before deleting temporary files (#12601) 2022-08-15 16:03:46 -07:00
Yufeng Li
95df5dac51
do not quantize Relu/Clip if their inputs are not quantized (#12565) 2022-08-11 16:16:10 -07:00
Chen Fu
b2382dc43a
fix qdq relu removal bug (#12542)
Fix minor bug in qdq quantization tool

Motivation and Context
Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input
2022-08-10 14:06:51 -07:00
Cheng
64e991a9fc
[Qlinearsoftmax] contrib cpu (#12177)
* [Qlinearsoftmax] contrib cpu

* int8 implementation

* contrib operator md

* qdq transformer test

* new attribute: opset

* doc

* quantized tool

* remove template to reduce Binary size

* doc of contribe operators

* enforce x_shape is valid

* fix reduce_size if input-shape is dynamic

* add UT

* register one op for reducing binarysize

* kernel hash update

* docs/ContribOperators.md
2022-08-10 10:52:02 +08:00
Chen Fu
47b787c28f
Python module for dumping activation tensors when running an ONNX model (#12474)
Python module for dumping activation tensors when running an ONNX model

This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
2022-08-09 13:15:45 -07:00
Jian Chen
8c5c283471
new quantized operators split (#12495)
* adding conditional variable again

* Adding split test cases in python

* Adding python cases for split

* Enable s8s8 split

* Optimize input

* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"

This reverts commit d5e34acb

* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""

This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.

* format file

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Reformat file

* Reformat file

* format file

* Optimize input

* Remove unused import

* Remove useless init

* Format split.py with black
2022-08-08 15:12:09 -04:00
Yufeng Li
bdd6b00c9a
set zero point to 0 if all value are 0.0 (#12470)
* set zero point to 0 if all value are 0.0

* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal

* check scale to make sure it is not subnormal
2022-08-07 21:34:58 -07:00
Yufeng Li
ac10f33d2d
Enable quant op to share quantization parameter between input and ouput (#12408)
* share quant param between tensors
2022-08-03 21:25:35 -07:00
Ye Wang
89ac61f4d4
support gpt2 model with greedy search (#12068)
* greedy search gpt2 cpu checkin

* add cuda support

* add test

* provider

* update

* fix some bugs

* refactor impl class

* refactor test

* remove unused func

* refactor parameters class

* simplify padding

* fix lint warnings

* python format

* Revert "python format"

This reverts commit f25fe1017fa33d960b2418ebbb5dba6a4bd043cf.

* python format

* fix pipelines

* fix pipeline

* move bufferallocater to generate_impl_base

* review comments(alignment, filename/namespace change)

* rebase2

* python reformat

* reformat

* fix rocm build

* review comment

* review comments

* review comments

* fix a bug

* rebase test files

* python format

* format import order

* review comments

* fix build
2022-07-22 15:45:16 -07:00
Yufeng Li
7194ec1894
fix bug: output of Concat is quantized twice in qdq format (#12254) 2022-07-21 14:55:47 -07:00
Ye Wang
5066ef1185
Fix a bug in beam search custom attention mask allocation (#12240) 2022-07-20 23:42:54 -07:00
Yulong Wang
0c78b71352
prepare test folder from GitHub (#12220)
* consume onnx test data from github

* ensure tests

* update script and allow opset specification

* fix python format

* fix python format

* consume new filter format

* fix linting error
2022-07-20 22:01:08 -07:00
Tianlei Wu
568d08994f
fix test_optimizer.py (#12219)
* fix optimizer test
* update message and skip test instead of uncomment
* fix deprecated warning
2022-07-20 19:21:26 -07:00
Tianlei Wu
972e5e7300
Improve symbolic shape inference in transformers tools (#12217)
improve symbolic shape inference handling n transformers tools:  avoid infinite loop and suppress duplicated warnings
2022-07-19 13:27:35 -07:00
Alexey Gladyshev
66978c7ef5
[TVM EP][CI] Added TVMso EP testing into CI (#12188)
* refactor test for model with undefined shapes

* add test for TVMso EP

* update build script for TVM EP tests

* fix pylint

* disable test for Windows

* fix black

* fix python format

* fix pylint

* fix python format

* replace Path.resolve with os.path.join

* fix python path issue

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-19 16:05:28 +02:00
Yufeng Li
3446a3750c
generate quantization parameter for outputs (#12089) 2022-07-05 14:57:43 -07:00
Tianlei Wu
ecca6f4d16
Move beamsearch shared initializers from subgraphs to main graph (#12025)
* move shared initializers to parent graph
* add --disable_shared_initializers
2022-06-29 22:43:41 -07:00
Gary Miguel
4bf22e2a40
Update ONNX to 1.12 (#11924)
Follow-ups that need to happen after this and before the next ORT release:
* Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731
* Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778

Follow-ups that need to happen after this but don't necessarily need to happen before the release:
* Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916

Fixes #11640
2022-06-21 17:19:52 -07:00
Tianlei Wu
6ee2c1b5fc
Remove temperature input from BeamSearch operator (#11896)
* remove temperature input
* update index of remaining inputs
2022-06-20 09:50:45 -07:00
George Wu
df5ee6aa4e
[TensorRT EP] support TensorRT 8.4 (#11866)
* update trt 8.4ga

* trt 8.4 linux ci pipeline

* fix cmake

* placeholder_builder

* trt 8.4 windows pipeline

* gpu package pipeline

* trt 8.4.1.5 , packaging pipeline updates

* python packaging

* ctest timeout

* python packaging test

* bump timeout

* python format

* format

* revert

* newline

* enable trt python tests

* typo

* python format

* disable on windows
2022-06-16 07:46:40 -07:00
Xavier Dupré
a805a49363
Move OrtValueVector from onnxruntime-training to onnxruntime (#11176)
* Move OrtValueVector from onnxruntime-training to onnxruntime

* disable dlpack on onnxruntime

* disable dlpack

* dlpack

* opaque inlcuded in any cc file of the python binding

* fix type issue

* fix incomplete name

* remove len()

* remove unused parameter

* black

* black

* black

* remove unused import

* add unit test to check the output type

* black

* lint

* lint

* lint

* fix method name

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* check return type of C API

* lint

* lint

* fix missing ;

* fix type issue

* fix merge issue

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-06-15 09:36:28 +02:00
Gary Miguel
e8b0d24071
Support per-test tolerances for ONNX tests (#11775)
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.

In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.

Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.

Also fix various minor issues raised by linters.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 15:12:23 -07:00
Tianlei Wu
def78a1b81
Support T5 in BeamSearch operator (#11450)
(1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation.
(2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute
(3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default.
(4) Add more tests in best_beam_search.py
(5) fix ORT_ENFORCE of hypothesis_buffer_offset_
(6) Improve ONNX conversion:
   (a) Change encoder some dynamic axes to fixed dim value
   (b) add --separate_encoder_and_decoder_init
   (c) correct name t5-3B => t5-3b, t5-11B => t5-11b
   (d) Add --use_int32_inputs in convert t5 to onnx
   (e) Allow t5 beam search conversion in one step
2022-06-10 15:06:57 -07:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
Yufeng Li
f6f457aa57
not remove relu/clip for symmetric activation (#11696)
* not remove relu/clip for symmetric activation
2022-06-07 18:02:31 -07:00