* initial implementation of longformer tools for onnx conversion and benchmark
* Support ONNX conversion for transformers 4.0
Add an option to optimize onnx model, and export fp16 model
* optimize a bert model converted using tf2onnx
* add test data
* update
* remove comments
* format
* Revert "format"
This reverts commit f8ae88cb564bce5caf4780e56561403f3ba3d524.
* Revert "remove comments"
This reverts commit 59d8a693581a731fd0291b70fe2c9cec6c4950fe.
* add a squeeze node to convert a 3-d mask to 2-d
* update
* update
* verify and add comments
* build off a specific commit and archive wheel file
* rename to fp32, prefix results w/ commit, add CPU col
* rename 99th to 90 percentile
* get symbolic_shape from master each time
* add install archive wheel, parallel build
* shortening hash
Quantize LSTM:
1. dynamically quantizes MatMul inside the LSTM. It doesn't quantize activation function.
2. support per-channel on the input weight and recurrent weight.
* Add YAML file for pipeline
* Modify typo
* Add working directory
* Modify and test
* Modfiy and test
* Modify and test
* Modify and test
* Modify
* Modify
* Modify
* Modify
* Make sure to copy all the result files
* Add clearn up
* Modify
* Modify agent pool name
* Upload only specific artifacts
* Modify
* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target
* Fix bug
* Fix bug
* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation
* Modify the script file for CI
* Replace print with logger.info
* Fix bug
* Fix bug
* Refine the code
* Modify the script so that it can capture script segmentation fault while
running ORT
* Fix bug
* fix bug
* fix bug
* Add debug info
* fix bug
* Refine perf code
* Refine the code
* fix bug
* Code refactoring
* change many-models path
* remove metadata after validation/benchmark are done
* Update README.md
* Fix bug so that metadata doesn't hold stale value
* Remove hardcode and update README
* Add arguments to the script to make it run correctly
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Fix bug so that metadata doesn't hold stale value
* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information
* use -i random for perf test of TRT changes
Co-authored-by: Olivia Jain <oljain@microsoft.com>
* Some fixes to symbolic shape inference
1. Topological sort before iteration in graph
2. Fix a case in slice: start=100000, end=-100000, step=-1, dim=2
3. Fix Nuphar Gemm test's random seed
4. Slice opset 1 axes is optional
The bug happens when merging following shapes:
input0: [1, 1, 'Min(1024, input1_dynamic_axes_3)', 'Min(1024, input1_dynamic_axes_3)']
input1: ['input1_dynamic_axes_1*input1_dynamic_axes_2', 12, 'input1_dynamic_axes_3', 'input1_dynamic_axes_3']
input2: []
The fix is to avoid broadcasting merge on input2
* Symbolic shape inference: fix a case when concat requires merge multiple dims
* Fix a bug triggered in newer version of sympy
Fix a bug in output data type guessing
* remove shape inference and fix save large model problem
* remove unnecessary import
* refine code and add external format for quantize_qat
* remove initializers in tensors_to_calibrate
* small refine
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Initialize tensorrt perf script
* Add bert-squad dependencies
* Modified code to make ort inference with CUDA/Tensorrt
* Add get CUDA/TRT version
* uncomment bert-squad
* Add BERT-SQUAD inputs.json
* Add FastRCNN
* Make preprocess/validation in to common functions
* Add MaskRCNN and SSD and consolidate the code
* Add dependencies for MaskRCNN
* following modifications are made:
- create common fetch function to get inputs/outputs of model from ONNX model zoo.
- create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
- move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
- generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.
* Add approache to analyze profling file and also update model related
settings
* Add models
* Add most of models from ONNX model zoo
* Add model input name and print all the model names at the end of run
* Add system info
* Add TRT fp16 support
* Refine the code
* Handle TRT fall back and modify the way to get input data
* Refine code
* Modify code
* Add more precise approach to measure inference
* Add io-binding
* Add YoLoV4
* Refine the code
* Refine the code
* Add models
* Add yolov4 notebook for jetson device
* Update notebook
* Update notebook
* Add CVS models
* Add missing model
* Add support of float16
* Add new way to get trt version
* Add "validate" and "benchmark" mode
* Add randomly generated input
* Refine perf script
* Refine the code.
* Add README
* Refine the code
* Update README.md
* Refine code
* Update README.md
* Remove all the model related python and instead using model_list.json as
models configuration.
Refine the benchmark.py
* Refine the code
Co-authored-by: Chi Lo <lochi@microsoft.com>