### Description
`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.
This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.
Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.
Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.
### Notable changes
1. This PR **removed some suboptimal patterns**:
- `not xxx in` -> `xxx not in` membership checks
- bare excepts (`except:` -> `except Exception`)
- unused imports
The follow up PR will remove:
- `import *`
- mutable values as default in function definitions (`def func(a=[])`)
- more unused imports
- unused local variables
2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.
3. Removed the legacy flake8 ci flow and updated docs.
4. The added workflow supports SARIF code scanning reports on github,
example snapshot:

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unified linting experience in CI and local.
Replacing https://github.com/microsoft/onnxruntime/pull/14306
---------
Signed-off-by: Justin Chu <justinchu@microsoft.com>
### Description
<!-- Describe your changes. -->
Changes to support standalone custom ops in a minimal build. Also
incorporates changes from #14492 (needed to test builds prior to that
being checked in).
We first need to save the schema info from the operators used by the
standalone op invoker in the ORT format model. Add mechanism for that.
Merge the kernel lookup logic so the same is used in full and minimal
build. NOTE: the version matching is now consistent with all other
kernel lookups, and the call to CreateOp MUST use the exact version for
the operator. Previously matching wasn't as strict, but this can lead to
the incorrect kernel being chosen.
Add tests.
NOTE: There is currently no way to detect the ops/types/opsets used
inside these custom ops as they don't exist until we create kernels,
which is after model loading completes (which is the point the ORT
format model is saved). Due to that they have to be manually added to
the configuration used to do the reduced ops build. That shouldn't be
too hard for the custom op author to add given the custom op
implementation is specifying the op, opset and type constraints (i.e.
they have the info and it's just a case of capturing/formatting it
correctly).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable usage of the standalone op invoker by custom ops in a minimal
build.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
# Motivation
Currently, ORT minimal builds use kernel def hashes to map from nodes to
kernels to execute when loading the model. As the kernel def hashes must
be known ahead of time, this works for statically registered kernels.
This works well for the CPU EP.
For this approach to work, the kernel def hashes must also be known at
ORT format model conversion time, which means the EP with statically
registered kernels must also be enabled then. This is not an issue for
the always-available CPU EP. However, we do not want to require that any
EP which statically registers kernels is always available too.
Consequently, we explore another approach to match nodes to kernels that
does not rely on kernel def hashes. An added benefit of this is the
possibility of moving away from kernel def hashes completely, which
would eliminate the maintenance burden of keeping the hashes stable.
# Approach
In a full build, ORT uses some information from the ONNX op schema to
match a node to a kernel. We want to avoid including the ONNX op schema
in a minimal build to reduce binary size. Essentially, we take the
necessary information from the ONNX op schema and make it available in a
minimal build.
We decouple the ONNX op schema from the kernel matching logic. The
kernel matching logic instead relies on per-op information which can
either be obtained from the ONNX op schema or another source.
This per-op information must be available in a minimal build when there
are no ONNX op schemas. We put it in the ORT format model.
Existing uses of kernel def hashes to look up kernels are replaced
with the updated kernel matching logic. We no longer store
kernel def hashes in the ORT format model’s session state and runtime
optimization representations. We no longer keep the logic to
generate and ensure stability of kernel def hashes.
Description: Format all python files under onnxruntime with black and isort.
After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.
#11315, #11316
Move binary size check(s) to a separate pipeline. In the future, other binary size-related builds can go here.
Add publishing of build artifacts for easier analysis.
Add optional build with debug info.
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
Adding ARM64 depthwise convolution kernel for symmetric quantization
Motivation and Context
Two improvements against current kernel code :
1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining
Co-authored-by: Chen Fu <fuchen@microsoft.com>
ORT format model runtime optimization implementation is in progress.
This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.
Add IsSparseTensor
Add CreateSparseTensor
Add utilities and test fully sparse instantiation
Fully sparse blocksparse
Add test and docs for fully sparse tensor instantiation
Rework creation API
Use API
Non string API
Retrofit of existing String API
Add tests
Add documentation
Address build issues (Winml pending)
Add inference test
Bump binary size
Add ifdef DISABLE CONTRIB
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates
SparseTensor support
Implement Builder pattern
Fix support for 1-D and 2-D COO indices
Implement and test CSR support.
Handle shape inference for SparseTensors
Implement conversion for COO, CSR and tests.
Address the case where constant sparse initializer is the output.
Implement test infra for SparseTensors
Implement SparseDenseMatMul for Csr and COO and tested it.
Add hash for SparseToDenseMatMul
Finish shared provider refactor
Refactor GetOrCreate to Create
Working on py interface
Expose OrtDevice and use it in allocate_numpy
Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
Add and test to_cuda()
Add accessors to format specific indices
Test values and indices views, read-only flag, after GC access
Add sparse related methods to OrtValue
Re-work SparseTensor wrapper, add OrtValue methods
Rework numpy_array_to_cuda/to_cpu
Add run_with_ort_values
Add models and test sparse_mat_mul with run_with_ort_values
Refactor sparse tensor to use a single buffer
Ifdef x86 Eigen CSR sparse matmul implementation
Exclude broken test, check for string type when copying cross device
Split pybind schema, regenerate docs, add exclusion
Conditionally exclude schema module
Update docs fix cuda build
Add test to a filter and renerate JS docs
Add conversion and test string support for sparse tensors
Exclude conversion utils from minimal build
Add CUDA Memcpy and adjust provider interfaces
* Add ability to generate ios static framework
* Fix typos
* Add pod cache clean, update some comments of previous commit
* Fix CI failure with newly added cpuinfo library
* Update test model (CoreML requires node has a name)
* Addressed CR comments
Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations.
Unfortunately it does not work under Windows/ARM. We need to develop our own later
* Add metadata_props to ORT model
* Minor update
* Update python binding, and increase the minimal pipeline size threshold
* Fixed a small bug in serializing ir_version
* Remove temp ort.py.fbs and add it to .gitignore
1. Remove openmp related packaging pipelines and build jobs.
2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently.
3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support
4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
Add python 3.8/3.9 support for Windows GPU and Linux ARM64
Delete jemalloc from cgmanifest.json.
Add onnx node test to Nuphar pipeline.
Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate.
Delete Java GPU packaging pipeline
Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure.
Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch.
Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits
And, due to some internal restrictions, I need to rename some of the agent pools
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
- Add python bindings for ORT format models
- Add script to update bindings and help info
- Add parsing of ORT format models
- Add ability to enable type reduction to config generation
- Update build.py to only allow operator/type reduction via config
- simpler to require config to be generated first
- can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
- Add script to create reduced build config
- Update CIs
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)
* Remove unused function
Description: Add ORT minimal with NNAPI EP to Android CI
Motivation and Context
The added build/test to Android CI will only run UT, additional onnx_test_runner with customer .ort models will be added later
* Add validation of operator registrations to the reduction script
- the script has all the logic to process the registrations, and there's a CI that uses it
Fix some operator registrations
* Fix CUDA PRelu registration
* Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script.
Run op validation in minimal build CI
* Fix PEP8 error and some comments
* Add copy sparse model in minimal CI
* Add squeeze 13 support
* fix small typo
* Add ut for squeeze in NNAPI
* Fix some issue in the UT and code
* Modify based on the master change
* Fix build break