### FP16 optimizer automatically detect DeepSpeed compatibility
Optimum/Transformers are using accelerate lib to prepare models, so our
FP16 optimizer wrapper does not work for long time. Because the
namespace is `accelerate.utils.deepspeed.DeepSpeedOptimizerWrapper`,
which underlying is still calling into DeepSpeed stage1and2 optimizer.
This PR includes following changes:
1. Add `accelerate.utils.deepspeed.DeepSpeedOptimizerWrapper` in the
modifier registry, plus a check on its contained `optimizer` property
MUST be DeepSpeed stage 1 and 2 optimizer. (let's cover Stage 3
optimizer later)
2. For DeepSpeed version > 0.9.1, we will store the source code in a
version list. As long as the related function in DeepSpeed remains
unchanged during its new release, we won't need manually upgrade the
version check any more. If some day, the source code did not match, a
warning will be raised to users, to add a new version of source code in
the list.
With the above change, we will have our FP16 Optimizer working again in
Optimum.

Bump ruff version and remove pylint from the linter list. Fix any new
error detected by ruff.
### Motivation and Context
Ruff covers many of the pylint rules. Since pylint is not enabled in
this repo and runs slow, we remove it from the linters
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789
Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.
Signed-off-by: Justin Chu <justinchu@microsoft.com>
### Description
torch.norm is deprecated as mentioned in issue #16751. This PR replaces
the call to torch.norm by the options suggested by torch documentation.
### Description
<!-- Describe your changes. -->
support the latest deepspeed 0.9.1 for the next release
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will avoid the warn message `Skip modifying optimizer because of
unsupported DeepSpeed version`
---------
Co-authored-by: ruiren <ruiren@microsoft.com>
### Description
Bump ruff version in CI and fixed new lint errors.
- This change enables the flake8-implicit-str-concat rules which helps
detect unintended string concatenations:
https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc
- Update gitignore to include common python files that we want to
exclude.
### Motivation and Context
Code quality
### Description
<!-- Describe your changes. -->
Update the support deepspeed to 0.8.3 as it's the latest version
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will fix the error of `Skip modifying optimizer because of
unsupported DeepSpeed version`
Co-authored-by: ruiren <ruiren@microsoft.com>
### Description
`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.
This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.
Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.
Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.
### Notable changes
1. This PR **removed some suboptimal patterns**:
- `not xxx in` -> `xxx not in` membership checks
- bare excepts (`except:` -> `except Exception`)
- unused imports
The follow up PR will remove:
- `import *`
- mutable values as default in function definitions (`def func(a=[])`)
- more unused imports
- unused local variables
2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.
3. Removed the legacy flake8 ci flow and updated docs.
4. The added workflow supports SARIF code scanning reports on github,
example snapshot:

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Unified linting experience in CI and local.
Replacing https://github.com/microsoft/onnxruntime/pull/14306
---------
Signed-off-by: Justin Chu <justinchu@microsoft.com>
### Description
change deepspeed version in warning from 0.7.3 to 0.8.0
### Motivation and Context
The version was updated for Deepspeed support in ORT from 0.7.3 to 0.8.0
but wasn't updated in the warnings message and this PR is to fix that.
Description: Format all python files under onnxruntime with black and isort.
After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.
#11315, #11316
* improve NonZero
* fix megatron_fp16 optimzier, fix the doc
* multi_tensor_applier
* resolve comment
* fix building warning
* fix build error when enabling training and use tensorrt
* optimize python overhead of _post_amp_backward
* overwrite apex amp's zero_grad for faster implementation
* move unscale_fp16_grads_into_fp32_grads into C++ impl
* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.
* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.
* refine the logic a bit after validating
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
* Exception when duplicated autograd.Function name detected
* reorder a bit for a bittle bit better perf
* fix a bug in previous PR :(
* correct the error message a bit