### Description
Bump ruff version in CI and fixed new lint errors.
- This change enables the flake8-implicit-str-concat rules which helps
detect unintended string concatenations:
https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc
- Update gitignore to include common python files that we want to
exclude.
### Motivation and Context
Code quality
### Description
`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.
This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.
Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.
Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.
### Notable changes
1. This PR **removed some suboptimal patterns**:
- `not xxx in` -> `xxx not in` membership checks
- bare excepts (`except:` -> `except Exception`)
- unused imports
The follow up PR will remove:
- `import *`
- mutable values as default in function definitions (`def func(a=[])`)
- more unused imports
- unused local variables
2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.
3. Removed the legacy flake8 ci flow and updated docs.
4. The added workflow supports SARIF code scanning reports on github,
example snapshot:

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unified linting experience in CI and local.
Replacing https://github.com/microsoft/onnxruntime/pull/14306
---------
Signed-off-by: Justin Chu <justinchu@microsoft.com>
### Description
- Adds a dockerfile for Ubuntu with TensorRT 8.5.1.1.
- Adds option to run EP Perf pipeline with TensorRT 8.5
### Motivation and Context
Necessary to benchmark models with TensorRT 8.5
Accuracy loss is observed when transformer models such as BERT, DeBERTa,
ViT are running in TRT FP16 mode. The cause is that overflow happens at
Pow op in layer norm.
This PR provides the option to force Pow to run in TRT FP32 precision if
overflow occurs.
Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
### Description
Properly cleans up all temporary resources created while running
benchmarks.
Details:
- Dump all temporary artifacts (TRT engines, TRT profiles, inference
profiles, fp16 models) into a temp directory in `/tmp/`. Each model/EP
combination has its own temp directory that is deleted after validation
and benchmarking.
- Allow running both validation and benchmarking in one invocation of
the benchmark.py script. This is necessary to allow the benchmarking
step to reuse artifacts (e.g., TRT engines) created during validation.
Before this PR, we ran validation on all model/EP combinations before
running benchmarks on all combinations again. This required us to keep
all temporary artifacts for all model/EP combinations throughout the
entire run (expensive).
- Create individual functions for validation and benchmarking (split-up
large function that did it all)
### Motivation and Context
The EP Perf pipeline failed to run because the script generated too much
output and the VM ran out of disk space.
Updates EP perf benchmarking scripts to upload new data with an improved table schema. In order to preserve compatibility with the current benchmarking pipeline, we still upload data that uses the old schema as well. These changes are required in order to improve data filtering capabilities and general UX in dashboards that visualize this data.
Details:
- EP names no longer hardcoded as columns for tables that store inference latency, session creation times, memory usage, and model/EP status.
- Add explicit branch, commit ID, and commit date columns to all tables
- Improvements to the docker image building scripts (simplify docker image build; support installing binary TensorRT packages)
- Remove use of deprecated DataFrame.append in favor of pandas.concat.
* move all logic for ubuntu dockerfiles
* pass in trt version
* update trt 8.0 file
* downgrade protobuf
* uncomment
* and
* change to 8.0
* update dockerfiles
* checkout protobuf based on version
* adding last dockerfile:
:
* checkout 3.10 protobuf
* fix checkout version
* update to 8.2
* keep only one submodule sync
* cleanup
* Delete Dockerfile.custom-trt-perf
* create checkout submodules script
* properly compare decimals in bin/sh
* combine build ort paths
* deprecate TRT 7.2
* only checkout protobuf if we checkout older onnx-tensorrt
* only pull nvidia container if true, update image
* downgrade protobuf only if we checkout onnx-trt
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* Add quotes to avoid path splitting
* address shellcheck
* use shellcheck suggestions
Fix the order of onnx and onnxruntime imports. Importing onnx before onnxruntime causes a dependency issue in the tensorrt containers that prevents onnxruntime_pybind11_state.so from finding the system libstdc++. This is a workaround to get the EP Perf pipeline working until we can investigate the issue more closely.
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
* add aten export for max, max.dim
* rewrite grad of max (no dim); add cases for min
* update UT cases
* mod sym shape infer
* resolve comments: shape infer, add comments, etc.
* add test for torch.max of two tensors
* resolve peng's comments: keepdim; test case
* correct python format
* fix recently introduced lint error
Description: Format all python files under onnxruntime with black and isort.
After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.
#11315, #11316
* delete unused files
* only use one dockerfile, otherwise install
* Update pipeline file
* get other changes
* minimal packages
* update pull nightly variable
* try logical boolean
* test boolean
* have build ort as boolean
* case senstive
* use the current head not the previous commit
* add helpful note
* get inputs independently for trtexec
* track one process only
* remove engine and profile files
* change time to commit time
* add runtime option for io binding
* move to commit date
* fixes
* add option for graph optimization
* cleanup docker script
* note second time creation
* allow for parameters to be configured from pipeline at runtime
* uncomment
* include optional arguments at runtime
* post second session creation
* update cmake version
* Revert "update cmake version"
This reverts commit 09a1364eae68610724c8e90eeea777b7ee03f74b.
* Move data format import
* get inputs independently for trtexec
* track one process only
* remove engine and profile files
* change time to commit time
* add runtime option for io binding
* move to commit date
* fixes
* add option for graph optimization
* cleanup docker script
* include remaining changes
* choose graph optimization option
* add space in option
* move table names to one location
* remove session metadata
* reload trt inputs
* fix posting names
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* remove comments
* Split up anubis job and perf run
* add trt environ variables
* No embedded links
* add back previous changes lost in merge
* post session to dashboard
* post session creation time to dashboard
* fix trt 8 functionality:
* add component governance
* Remove hardcoded values
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* cleanup errors
* post results only once
* checkout 8.0 GA
* try build 8.0 without building shared lib
* add back build_shared_lib, not the problem
* add upload_time to table
* use identifier to post
* Shorten to TRT x.x
* shorten commit hash using rev_parse
* use shortened commit hash
* use nvidia's default TRT_VERSION
* migrate to 1ES Hosted Pool
* migrate to Kusto database
* refactor and organize ep names with ORT prefix
* standardize TRT benchmarking with save/load engine, input binding, and workspace
* Add TRT 8.2 to ep perf pipeline
* update model_list.json with full onnx zoo
* add anubis credentials
* add anubis credentials
* clarify trt variables
* get system info from docker image
* remove unwanted commenting
* copy changes from trt_and_mem
* second edits
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* change to cuda 11.4
* build with cuda 11.4
* Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2
* add cmake extra defines
* cmake architectures
* fix cmake arch
* Delete ubuntu-18.04.Dockerfile
* Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* removing previous ort args
* rename to cuda 11.4
* remove cuda 10_2
* delete trt 7.1
* remove 7.1
* Passing in cuda architecture to reduce build time
* always add submodule sync due to recursive cloning
* fix run command
* add and
* take away unused arms and share python installation script
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update Dockerfile.tensorrt
* cleanup file
* install python directly on dockerfile - move to scripts in future
* Update Dockerfile.custom-trt-perf
* adding cuda 11.1 for missing Libnvrtc.so.11.1
* Delete install_python.sh
* Add memory check for TRT perf
* Revise test app
* Add memory check for TRT perf
* Revise test app
* add test cases
* Modify script and add pipeline YAML
* remove redundant code
* temporarily change
* Change YAML
* revise test app
* fix minor bug
* code refactor
* small fix
* temporarily change for test
* prepare result log
* rm container when it exits
* code refactor
- Allow anyone to kick off a perf test here. Customize: branch, eps, model selection, cuda version.
- Only run shape inference when required.
- Kill errored out memory processes.
- Remove warmup run.
- Clean up script.
- Standalone_TRT is it's own "EP" vs as an additional run with TRT EP
* merge master, keep postprocess status commit
* download float16.py everytime
* using variables to reference eps
* adding ACL EP to ep perf tool
* accuracy with absolute tolerance configurable
* add acl to dict + remove commented line
* build off a specific commit and archive wheel file
* rename to fp32, prefix results w/ commit, add CPU col
* rename 99th to 90 percentile
* get symbolic_shape from master each time
* add install archive wheel, parallel build
* shortening hash
* Add YAML file for pipeline
* Modify typo
* Add working directory
* Modify and test
* Modfiy and test
* Modify and test
* Modify and test
* Modify
* Modify
* Modify
* Modify
* Make sure to copy all the result files
* Add clearn up
* Modify
* Modify agent pool name
* Upload only specific artifacts
* Modify
* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target
* Fix bug
* Fix bug
* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation
* Modify the script file for CI
* Replace print with logger.info
* Fix bug
* Fix bug
* Refine the code
* Modify the script so that it can capture script segmentation fault while
running ORT
* Fix bug
* fix bug
* fix bug
* Add debug info
* fix bug
* Refine perf code
* Refine the code
* fix bug
* Code refactoring
* change many-models path
* remove metadata after validation/benchmark are done
* Update README.md
* Fix bug so that metadata doesn't hold stale value
* Remove hardcode and update README
* Add arguments to the script to make it run correctly
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Fix bug so that metadata doesn't hold stale value
* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information
* use -i random for perf test of TRT changes
Co-authored-by: Olivia Jain <oljain@microsoft.com>
* Initialize tensorrt perf script
* Add bert-squad dependencies
* Modified code to make ort inference with CUDA/Tensorrt
* Add get CUDA/TRT version
* uncomment bert-squad
* Add BERT-SQUAD inputs.json
* Add FastRCNN
* Make preprocess/validation in to common functions
* Add MaskRCNN and SSD and consolidate the code
* Add dependencies for MaskRCNN
* following modifications are made:
- create common fetch function to get inputs/outputs of model from ONNX model zoo.
- create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
- move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
- generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.
* Add approache to analyze profling file and also update model related
settings
* Add models
* Add most of models from ONNX model zoo
* Add model input name and print all the model names at the end of run
* Add system info
* Add TRT fp16 support
* Refine the code
* Handle TRT fall back and modify the way to get input data
* Refine code
* Modify code
* Add more precise approach to measure inference
* Add io-binding
* Add YoLoV4
* Refine the code
* Refine the code
* Add models
* Add yolov4 notebook for jetson device
* Update notebook
* Update notebook
* Add CVS models
* Add missing model
* Add support of float16
* Add new way to get trt version
* Add "validate" and "benchmark" mode
* Add randomly generated input
* Refine perf script
* Refine the code.
* Add README
* Refine the code
* Update README.md
* Refine code
* Update README.md
* Remove all the model related python and instead using model_list.json as
models configuration.
Refine the benchmark.py
* Refine the code
Co-authored-by: Chi Lo <lochi@microsoft.com>