Commit graph

820 commits

Author SHA1 Message Date
Ti-Tai Wang
87f55505b3
[ONNX] Support huggingface BART to ONNX (#12779)
Add BART into transformer support, specificalyy for
`BartForConditionalGeneration`

**Motivation and Context**
- fixes #11210 

Currently, the custom op beam search is not working in nightly, this PR
should be run with a [custom
commit](10f3d46d92)
2022-10-06 12:20:03 -07:00
cloudhan
72076b1eb2
Update ROCm CI to use HIP LANGUAGE (#13214)
Update for ROCm CI before reland tunable GEMM #12853. This PR also update
composable kernel to use CMakes's HIP language support so that we can
mix C/C++ compiler with HIP compiler instead of locking to hip-clang
2022-10-05 16:15:16 +08:00
Tianlei Wu
b6c04f48c1
Fix reshape fusion (#13150)
(1) Hot fixes reshape fusion, which causes stable diffusion unet model invalid.
(2) Update remove_cascaded_cast_nodes to make it faster
2022-10-04 00:26:29 -07:00
Tony Xia
962fee5fe5
Fix typo enviroment => environment (#13195) 2022-10-03 17:02:26 -07:00
Yufeng Li
1342baf1c7
refine QuantConfig (#13155)
Refine the QuantConfig: 1. Remove the default EP config. 2. pass
QuantConfig to quantize API direclty.
2022-10-03 08:34:49 -07:00
PeixuanZuo
c26bb1bb19
Allow fastgelu/skiplayernorm profile by pass args from commandline (#13025)
**Description**: Describe your changes.
This allow us quickly launch a microbench session by, for example:
`python skip_layer_norm_test.py 8 128 128 float32 `
2022-09-28 15:48:59 -07:00
PeixuanZuo
13d1a3c007
[ROCm] add SkipLayerNorm vectorize Regular case (#12821)
**Description**: Describe your changes.
add SkipLayerNorm vectorize regular case
1. when hidden size <= 1024, SkipLayerNormTunable op can use both small
case and regular case
2. when hidden size > 1024, SkipLayerNormTunable op can only use regular
case.

**Motivation and Context**
- Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here.
2022-09-27 12:52:10 -07:00
Yufeng Li
c746083344
use parameter names to specify argument mapping (#13108)
use parameter names to specify argument mapping to avoid mismatches.
2022-09-26 20:56:59 -07:00
Chen Fu
e9b1bbc6a5
fix Numpy array None judgement bug (#13103)
fix https://github.com/microsoft/onnxruntime/issues/13054
2022-09-26 15:15:32 -07:00
Hariharan Seshadri
19c51376c4
Introduce QDQ transformer fusion tools for ordered quantized ops (#12661) 2022-09-24 23:22:44 -07:00
PeixuanZuo
2ef1f8b93e
[ROCm] add tunable SkipLayerNorm for ROCm EP (#12817)
**Description**: Describe your changes.
Related PR: https://github.com/microsoft/onnxruntime/pull/12803
https://github.com/microsoft/onnxruntime/pull/12816
https://github.com/microsoft/onnxruntime/pull/12821

1.add tunable skip layernorm for rocm ep
2. keep origin implementation when disable tuning.

**Motivation and Context**
- Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here.
2022-09-23 16:39:44 +08:00
cloudhan
a24b41d92e
Move all TunableOp related falicilities to EP level directory (#12857)
Some Ops in EP directory instead of contrib_ops directory will
require TunableOp. We will also need to add EP level session tuning
options for it. So move those code all at once.

Also remove duplicated utility functions.
2022-09-23 11:10:19 +08:00
wangxiyuan
952c99304a
Add CANN EP (#12416)
**Description**: This PR adds Ascend CANN execution provider support.

**Motivation and Context**
- Why is this change required? What problem does it solve?
As the info shown in the issue. CANN is the API layer for Ascend
processor. Add CANN EP can allow user run onnx model on Ascend hardware
via onnxruntime
  The detail change:
  1. Added CANN EP framework.
  2. Added the basic operators to support ResNet and VGG model.
  3. Added C/C++、Python API support
- If it fixes an open issue, please link to the issue here.
   https://github.com/microsoft/onnxruntime/issues/11477

Author: 
lijiawei <lijiawei19@huawei.com>
wangxiyuan <wangxiyuan1007@gmail.com>

Co-authored-by: FFrog <ljw1101.vip@gmail.com>
2022-09-22 14:53:40 -07:00
Hariharan Seshadri
057567f39f
Fix bug in Attention Fusion (#13050) 2022-09-22 13:46:59 -07:00
sfatimar
cccbe90764
Openvino ep 2022.2 v4.2 (#13023)
This changes are to align OV 2022.2 Release with ORT . Changes
CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile 

**Motivation and Context**
- This change is required to ensure ORT-OpenVINO Execution Provider is
aligned with latest changes.
- If it fixes an open issue, please link to the issue here.

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: shamaksx <shamax.kshirsagar@intel.com>
Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com>
Co-authored-by: pratiksha <mohsinx.mohammad@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: nmaajidk <n.maajid.khan@intel.com>
Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com>
Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>
2022-09-22 12:31:40 -07:00
Jian Chen
051a0a67a5
Cjian/per channels not working (#13038)
**Description**: This fix the bug where per_channel quantization isn't
working when axis == 0
2022-09-21 16:24:23 -04:00
Jian Chen
6248b69795
Fixes bug which makes quantized_input_names = [] (#13029)
**Description**: Fixes bug in `tools/quantization/operators/split.py`
which would make `quantized_input_names == []`
2022-09-21 14:25:38 -04:00
Adrian Lizarraga
39e20686a0
[EP Perf Dashboard] Fix incorrect calls to trtexec with fp16 inputs (#13018) 2022-09-21 10:31:45 -07:00
cloudhan
a5d70d8609
Allow bert_perf_test.py make some noise by log_severity option (#13024)
This enables developers inspecting into the benchmark
session much easier.
2022-09-21 18:38:46 +08:00
Justin Chu
1245c6397e
Remove usage of torch.onnx symbolic_registry (#13011)
**Description**: symbolic_registry is deprecated in torch.onnx. This PR
removes its usage.

Fixes #13008
2022-09-20 10:59:41 -07:00
PeixuanZuo
189aef2bea
[ADD] add skip layernorm to kernel explorer for ROCm EP (#12816)
**Description**: Describe your changes.
Related PR: https://github.com/microsoft/onnxruntime/pull/12803
https://github.com/microsoft/onnxruntime/pull/12817
https://github.com/microsoft/onnxruntime/pull/12821

Add skip layernorm to kernel explorer for profiling.

**Motivation and Context**
- Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here.
2022-09-20 17:17:01 +08:00
cloudhan
ffeba98a9d
Allow gemm profile by pass args from commandline (#12991)
This allow us quickly launch a microbench session by, for example:
```bash
python gemm_test.py T N float16 256 256 65536 
```
So that we can quickly see which one is the fastest.
2022-09-20 16:18:56 +08:00
Yufeng Li
b48f71fcfc
fix bug: quantization shape inference (#12983)
model path for onnx.shape_inference.infer_shapes_path and the external
data needs to be under the same directory as doc here:
f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)
2022-09-16 10:17:22 -07:00
cloudhan
d2aa2109c0
Make TunableOp follow stream semantics (#12856) 2022-09-15 21:11:27 +08:00
Dmitri Smirnov
bc2df1bf95
Remove previously deprecated API (#12935)
Remove previously deprecated API
Format JS code, address review comments
NPM Formatting
2022-09-14 10:58:03 -07:00
Tianlei Wu
95c4fc6877
[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814)
* Add TensorRT fused attention fp16 kernels
* drop sm 72;  seq 512 for sm75; and head_size 32 kernels
* Add env variable ORT_DISABLE_FUSED_ATTENTION
* exclude files in hipify
* update AttentionPastState_dynamic test threshold
* fix --use_mask_index in benchmark
2022-09-13 15:16:12 -07:00
Tianlei Wu
30ebc9e00a
Useless Cast removal after converting model from float32 to float16 (#12871) 2022-09-12 11:07:33 -07:00
Jian Chen
e561a7cf29
Adding QuantConfig Class (#12810)
* Initial commit for testing

* Adding DynamicQuantConfig

* Adding DynamicQuantConfig

* Format file

* Adding Default configuration placeholder.

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Reformat file

* Reformat Rest Docstring style to google

* Updatge set to frozeset

* Uopdate Quant Config

* Updates Quant Config

* Update enum comparison

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Update

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-09-09 14:08:47 -04:00
Dwayne Robinson
8e4eb24648
Update operator kernel table to include DML operators (#12887)
* Fix bug in pybind get_all_operator_schema due to premature reference dropping
* Add updated operator kernels markdown table
* Update build.py to include documentation generation for DML operators too
* Update GPU pipeline to include DML in the build to so operators can be generated.
* Use a separate pipeline stage, feedback from Changming and Scott
* Appease annoying Python linter
* Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage
2022-09-09 10:21:25 -07:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Jian Chen
acc8bdc6c5
Splitting quantize_tensor and quantize_input (#12873)
* Splitting quantize_tensor and quantize_input

* Reformat code

* Reformat code

* Update is_input_a_weight to is_input_a_initializer
2022-09-07 18:05:42 -04:00
petermcaughan
69f7cc6494
Add pybind support for all memory config options in OrtArenaCfg (#12658)
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind

* Add overloaded constructor for KVP, UT still in progress

* Fix class member access in pybind, fix unit test

* Resolve linter warnings

* Improve formatting

* Simplify UT

* Fix linter formatting

Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
2022-09-07 11:15:00 -07:00
Chen Fu
8004db4bf1
fix python import sequence warning (#12864)
fix python import sequence warning
2022-09-07 09:53:39 -07:00
Tianlei Wu
d19955fd89
fix transformers script issues (#12802)
Fix a few obvious issues:
(1) bert_perf_test.py create session without provider in line 65.
(2) compare_bert_results.py miss a parameter in create_session in line 37
(3) onnx_exporter.py returns value mismatch in lines 667, 690.
(4) remove some imports not used in the scripts.
(5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"...
(6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+
2022-09-06 16:15:16 -07:00
Chen Fu
9ad5b95e4f
Fix math domain error with log10 (#12841)
fix math domain error with log10
2022-09-06 08:54:41 -07:00
Yulong Wang
1a402a3f25
replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
Chen Fu
d761a7ceb3
Pre-processing of Quantization (#12729)
Shape Inference and Model Optimization before Quantization

Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on
the tensor, requires tensor shape information to perform its best. Currently, shape inferencing
works best with optimized model. As a result, it is highly recommended to run quantization
on optimized model with shape information.

This change adds code for model optimization and shape inferencing of the following three steps:

1. Symbolic shape inference.
2. Model optimization
3. ONNX shape inference

At the same time we should recommend model optimization should be turned off during quantization.
As the optimization might change the computation graph, making it harder for the QDQ debugger
to locate matching tensors between original and the quantized models.
2022-08-29 15:47:52 -07:00
Dmitri Smirnov
3ff75fa05f
Address static analysis warnings (#12711)
Address static analysis warnings
2022-08-26 14:24:14 -07:00
cloudhan
5bdb1d4146
Add Tunable GEMM composed from rocblas and composable kernels (#12599)
* Add tunable gemm
2022-08-26 14:32:56 +08:00
cloudhan
f76b40aa5b
Change TunableOp to use a type erased interface (#12597)
* Change to type erased interface, so that there is no need to implement a class for a simple kernel launch function
2022-08-25 19:46:04 -07:00
Yulong Wang
c144acc534
Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Wei-Sheng Chin
dc486d146b
Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460)
* Make ORT as Pytorch JIT backend

LORT likely doesn't work with aten fallback so we only test LORT in its own CI.

* Revert changes to enable external CUDA allocator. Will add it later.

Revert "Revert changes to enable external CUDA allocator. Will add it later."

This reverts commit d5487f2e193014c805505afae8fb577c53667658.

Fix external allocator

* Relax tolerance and remove commented code

* Print more information in CI

* Fix pointer

* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.

* Use Pytorch master branch as all PRs are merged

Fix

* Refine based on cpplint feedbacks

* Revert changes to allow custom CUDA allocator in public APIs

* Use torch.testing.assert_close

* Use unittest framework

* Switch docker repo

* Rename *.cpp to *.cc

* Address comments

* Add comment

* Use same pipeline file for eager and lort pipelines

* Address comments

* Add yaml comment

* Fix cmake files

* Address comments

* Rename flags, remove printing code, remove dead comment
2022-08-22 09:40:40 -07:00
Chen Fu
8456f5fd97
qdq_util bug fix (#12647)
bugfix: when creating a temp infer file, an existing file maybe accidentally deleted
2022-08-22 09:32:43 -04:00
Chen Fu
56dd0176a1
QDQ debugger - Adding Error Calculator (#12632)
QDQ debugger - Adding Error Calculator
2022-08-18 09:30:43 -07:00
Chen Fu
f2db6bb293
weight matching (#12607)
QDQ loss debug - Weights Matching

Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.
2022-08-17 11:01:10 -07:00
Tianlei Wu
ce01ed02da
Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)
* add AddBiasTranspose kernel, new format of weights
* Use compact global_q in GEMM
* sequence_index from BxS to S; new stream for copy
* merge input and output pointers in scratch2
* update default benchmark tests
* add new format 0 for weight and bias
* avoid integer overflow
* check gpu memory
* output summary in benchmark
* add logging
* update unit tests with non empty bias value
* add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm
2022-08-17 09:36:48 -07:00
Chen Fu
eb6aa861cf
QDQ debugger - activations compare (#12544)
Debugger for QDQ loss - activation matching

This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.
2022-08-15 17:03:28 -07:00
Yufeng Li
30ee5a4f79
release calibrator before deleting temporary files (#12601) 2022-08-15 16:03:46 -07:00
Yufeng Li
95df5dac51
do not quantize Relu/Clip if their inputs are not quantized (#12565) 2022-08-11 16:16:10 -07:00
Cheng
819c36701f
[xnnpack] basic QDQ operators support (#11912)
* basic ops for mobilenet,qconv,qsoftmax,qavgpool

update Xnnpack to latest

unit test

* NodeUnit: use outputedge to replace output-node

* qdq model e2e test

* use inlinedvector to replace vector

* conv bias check

* tensorshape helpers

* Refactor xnn_op minmax

* Qlinearsoftmax schema update

* Remove qlinearsoftmax registration

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-08-11 10:12:51 +08:00