### Description
Numpy1.24.0 removed the np.float.
```
/opt/hostedtoolcache/Python/3.8.15/x64/bin/python onnxruntime_test_python_sparse_matmul.py
EE.
======================================================================
ERROR: testRunContribSparseMatMul (__main__.TestSparseToDenseMatmul)
Mutliple sparse COO tensor to dense
----------------------------------------------------------------------
Traceback (most recent call last):
File "onnxruntime_test_python_sparse_matmul.py", line 407, in testRunContribSparseMatMul
np.float,
File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'
======================================================================
ERROR: testRunSparseOutputOnly (__main__.TestSparseToDenseMatmul)
Try running models using the new run_with_ort_values
----------------------------------------------------------------------
Traceback (most recent call last):
File "onnxruntime_test_python_sparse_matmul.py", line 39, in testRunSparseOutputOnly
values = np.array([1.764052391052246, 0.40015721321105957, 0.978738009929657], np.float)
File "/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Implement reuse kv_cache past and present tensor in Attention Ops.
Unit test for abover feature.
Utilize the reuse kv_cache for past and present tensor in Greedy Search.
Correctness test for it.
Co-authored-by: Zhang Lei <phill.zhang@gmail.com>
(1) Embed layer fusion to work with --use_mask_index.
(2) Parse num_heads and hidden_size from a pattern of Concat shape node.
(3) Fix a typo (CUDAExcecutionProvider=> CUDAExecutionProvider) in eval_squad.py
(4) Update example comments in eval_squad.py to use optimized fp16 model.
(5) Update tests in test_optimizer.py
Allows MIGraphX EP to run the following additional tests. Also adds support to get MIGraphX to run eval_squad.py
Reference to the Rocm EP changes: https://github.com/microsoft/onnxruntime/pull/13306
Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
### Description
<!-- Describe your changes. -->
1. Update the rules for GemmFastGelu fusion, MatMul input x should >=
two dimension, input weight should == two dimension.
2. Add GemmFastGelu fusion test.
3. Add GemmFastGelu TunableOp, only contains the original
implementation(Gemm + FastGelu).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
Some models from model zoo failed in the Linux CPU workflow.
https://github.com/onnx/models/issues/562
Skip them temporarily.
###Verfication
Linux CPU CI passed with beta image
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789772&view=results
**2022-10-21T13:31:17.6740348Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/Inception-1-int8/inception-v1-12-int8.onnx**
2022-10-21T13:31:17.6740998Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/DenseNet-121-12-int8/densenet-12-int8.onnx
2022-10-21T13:31:17.6741618Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/MNIST-12/mnist-12.onnx
**2022-10-21T13:31:17.6742207Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-int8/ssd-12-int8.onnx**
2022-10-21T13:31:17.6742898Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet50_fp32/resnet50-v1-12.onnx
2022-10-21T13:31:17.6743544Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/MobileNet
v2-1.0-fp32/mobilenetv2-12.onnx
2022-10-21T13:31:17.6744259Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet101_DUC_HDC-12/ResNet101-DUC-12.onnx
2022-10-21T13:31:17.6744891Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/YOLOv3-12-int8/yolov3-12-int8.onnx
2022-10-21T13:31:17.6745501Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/AlexNet/bvlcalexnet-12.onnx
2022-10-21T13:31:17.6746114Z Running symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/ZFNet-512-int8/zfnet512-12-int8.onnx
**2022-10-21T13:31:17.6746768Z Skip symbolic shape inference on :
/mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-MobilenetV1-12-int8/ssd_mobilenet_v1_12-int8.onnx**
* drop nuphar code and configs
* refactor test case
* format python
* remove nuphar from training test
* remove commented nuphar logics
* restore llvm setting
* drop nuphar ci
* fix compile err
* fix compile err
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind
* Add overloaded constructor for KVP, UT still in progress
* Fix class member access in pybind, fix unit test
* Resolve linter warnings
* Improve formatting
* Simplify UT
* Fix linter formatting
Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
Shape Inference and Model Optimization before Quantization
Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on
the tensor, requires tensor shape information to perform its best. Currently, shape inferencing
works best with optimized model. As a result, it is highly recommended to run quantization
on optimized model with shape information.
This change adds code for model optimization and shape inferencing of the following three steps:
1. Symbolic shape inference.
2. Model optimization
3. ONNX shape inference
At the same time we should recommend model optimization should be turned off during quantization.
As the optimization might change the computation graph, making it harder for the QDQ debugger
to locate matching tensors between original and the quantized models.
QDQ loss debug - Weights Matching
Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.
Debugger for QDQ loss - activation matching
This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.
Fix minor bug in qdq quantization tool
Motivation and Context
Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input
Python module for dumping activation tensors when running an ONNX model
This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
* adding conditional variable again
* Adding split test cases in python
* Adding python cases for split
* Enable s8s8 split
* Optimize input
* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"
This reverts commit d5e34acb
* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""
This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.
* format file
* Update c-api-linux-cpu.yml
* Update c-api-linux-cpu.yml
* Update c-api-linux-cpu.yml
* Reformat file
* Reformat file
* format file
* Optimize input
* Remove unused import
* Remove useless init
* Format split.py with black
* set zero point to 0 if all value are 0.0
* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal
* check scale to make sure it is not subnormal
* consume onnx test data from github
* ensure tests
* update script and allow opset specification
* fix python format
* fix python format
* consume new filter format
* fix linting error
* refactor test for model with undefined shapes
* add test for TVMso EP
* update build script for TVM EP tests
* fix pylint
* disable test for Windows
* fix black
* fix python format
* fix pylint
* fix python format
* replace Path.resolve with os.path.join
* fix python path issue
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* update trt 8.4ga
* trt 8.4 linux ci pipeline
* fix cmake
* placeholder_builder
* trt 8.4 windows pipeline
* gpu package pipeline
* trt 8.4.1.5 , packaging pipeline updates
* python packaging
* ctest timeout
* python packaging test
* bump timeout
* python format
* format
* revert
* newline
* enable trt python tests
* typo
* python format
* disable on windows
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.
In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.
Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.
Also fix various minor issues raised by linters.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
(1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation.
(2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute
(3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default.
(4) Add more tests in best_beam_search.py
(5) fix ORT_ENFORCE of hypothesis_buffer_offset_
(6) Improve ONNX conversion:
(a) Change encoder some dynamic axes to fixed dim value
(b) add --separate_encoder_and_decoder_init
(c) correct name t5-3B => t5-3b, t5-11B => t5-11b
(d) Add --use_int32_inputs in convert t5 to onnx
(e) Allow t5 beam search conversion in one step
* aten op for inference
* fix build error
* more some code to training only
* remove domain from operator name
* move aten_op_executor ext out from ortmodule
* add pipeline
* add exec mode
* fix script
* fix ut script
* fix test pipeline
* failure test
* rollback
* bugfix
* resolve comments
* enable aten for python build only
* fix win build
* use target_compile_definitions
* support io binding
* turn off aten by default
* fix ut
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>