Commit graph

32 commits

Author SHA1 Message Date
pengwa
2c6b31c5aa
FP16 optimizer automatically detect DeepSpeed compatibility (#18084)
### FP16 optimizer automatically detect DeepSpeed compatibility

Optimum/Transformers are using accelerate lib to prepare models, so our
FP16 optimizer wrapper does not work for long time. Because the
namespace is `accelerate.utils.deepspeed.DeepSpeedOptimizerWrapper`,
which underlying is still calling into DeepSpeed stage1and2 optimizer.

This PR includes following changes:
1. Add `accelerate.utils.deepspeed.DeepSpeedOptimizerWrapper` in the
modifier registry, plus a check on its contained `optimizer` property
MUST be DeepSpeed stage 1 and 2 optimizer. (let's cover Stage 3
optimizer later)
2. For DeepSpeed version > 0.9.1, we will store the source code in a
version list. As long as the related function in DeepSpeed remains
unchanged during its new release, we won't need manually upgrade the
version check any more. If some day, the source code did not match, a
warning will be raised to users, to add a new version of source code in
the list.

With the above change, we will have our FP16 Optimizer working again in
Optimum.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/d35b4aa9-b371-46f1-98ae-73114f91179b)
2023-10-25 15:11:02 +08:00
Justin Chu
be7541ef4a
[Linter] Bump ruff and remove pylint (#17797)
Bump ruff version and remove pylint from the linter list. Fix any new
error detected by ruff.

### Motivation and Context

Ruff covers many of the pylint rules. Since pylint is not enabled in
this repo and runs slow, we remove it from the linters
2023-10-05 21:07:33 -07:00
Justin Chu
d79515041c
[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789

Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-07-21 12:53:41 -07:00
Xavier Dupré
b508c7236f
Replace call to deprecated torch.norm (#16758)
### Description
torch.norm is deprecated as mentioned in issue #16751. This PR replaces
the call to torch.norm by the options suggested by torch documentation.
2023-07-20 19:52:19 -07:00
jingyanwangms
5dcaf70501
Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375)
### Description
torch.optim Adam zero_grad() signature is
zero_grad(set_to_none=True)

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad

We set this flag in initialization, similar to deepspeed:
https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam

Adding this flag to have signature parity with pytorch Adam

### Motivation and Context
Easier model integration

Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:27:41 -07:00
Rui Ren
db6a9bc033
support latest deepspeed version for optim (#15682)
### Description
<!-- Describe your changes. -->

support the latest deepspeed 0.9.1 for the next release


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will avoid the warn message `Skip modifying optimizer because of
unsupported DeepSpeed version`

---------

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-25 20:12:23 -07:00
Justin Chu
a36caba073
Bump ruff in CI (#15533)
### Description

Bump ruff version in CI and fixed new lint errors. 

- This change enables the flake8-implicit-str-concat rules which helps
detect unintended string concatenations:
https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc
- Update gitignore to include common python files that we want to
exclude.


### Motivation and Context

Code quality
2023-04-17 10:11:44 -07:00
Rui Ren
5e2f46df2b
update deepspeed version 0.8.3 (#15415)
### Description
<!-- Describe your changes. -->
Update the support deepspeed to 0.8.3 as it's the latest version


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will fix the error of `Skip modifying optimizer because of
unsupported DeepSpeed version`

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-04-07 17:59:50 -07:00
Justin Chu
938e2136c6
Enable pylint and numpy rules (#15218)
### Description

Enable pylint and numpy rules

### Motivation and Context

Modernize numpy usage and enable more quality checks
2023-03-27 20:37:53 -07:00
Justin Chu
d834ec895a
Adopt linrtunner as the linting tool - take 2 (#15085)
### Description

`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.

This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.

Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.

Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.

### Notable changes

1. This PR **removed some suboptimal patterns**:

	- `not xxx in` -> `xxx not in` membership checks
	- bare excepts (`except:` -> `except Exception`)
	- unused imports
	
	The follow up PR will remove:
	
	- `import *`
	- mutable values as default in function definitions (`def func(a=[])`)
	- more unused imports
	- unused local variables

2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.

3. Removed the legacy flake8 ci flow and updated docs.

4. The added workflow supports SARIF code scanning reports on github,
example snapshot:
	

![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png)

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Unified linting experience in CI and local.

Replacing https://github.com/microsoft/onnxruntime/pull/14306

---------

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-03-24 15:29:03 -07:00
Abhishek Jindal
3d388a1aea
change deepspeed version in warning from 0.7.3 to 0.8.0 (#14527)
### Description
change deepspeed version in warning from 0.7.3 to 0.8.0



### Motivation and Context
The version was updated for Deepspeed support in ORT from 0.7.3 to 0.8.0
but wasn't updated in the warnings message and this PR is to fix that.
2023-02-01 12:00:43 -08:00
Abhishek Jindal
6fa4555a06
Including support for Deepspeed 0.8.0 (#14506)
### Description
Including Support for Deepspeed 0.8.0.



### Motivation and Context
Deepspeed 0.8.0 has a bug fix and mlfow integration.
2023-02-01 06:19:41 -08:00
Vincent Wang
6fb70a82df
[ORTModule] Update Supported DeepSpeed Version for FP16_Optimizer (#13305)
Update supported deepspeed highest version from 0.7.1 to 0.7.3 for
FP16_Optimizer. Also add version info to warning log.
2022-10-13 13:03:01 +08:00
pengwa
a0c25e5c2f
Fix segment fault for alltoall (#12701)
* fix segment fault

* formatting
2022-08-30 11:27:14 +08:00
Vincent Wang
53ecb9e635
Update Supporting DS Version to 0.7.1 for ORTModule (#12696)
update ds version support for fp16_optimizer
2022-08-24 14:56:12 +08:00
Vincent Wang
a078c8d99b
Update Supporting Deepspeed Version of ORTModule's FP16_Optimizer (#12668) 2022-08-22 22:22:53 +08:00
Vincent Wang
a7eb9fe3ac
Remove Apex Dependency For Deepspeed FP16_Optimizer (#12077)
* remove apex dependency

* fix amd build
2022-07-14 11:15:53 +08:00
Vincent Wang
04f7c2deda
FP16_Optimizer Support for more Deepspeed Versions (#12046)
* fp16_optimizer for more ds versions

* change ds version

* bugfix

* fix bug
2022-06-30 18:36:17 +08:00
zhijxu
9f260fb60f resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
100aebbd26 resolve comments 2022-06-30 11:26:13 +08:00
zhijxu
2295b24cd5 support optimizer opt for deepspeed 0.5.9 2022-06-30 11:26:13 +08:00
Justin Chu
fdce4fa6af
Format all python files under onnxruntime with black and isort (#11324)
Description: Format all python files under onnxruntime with black and isort.

After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.

#11315, #11316
2022-04-26 09:35:16 -07:00
pengwa
89ef987ab1
Improve NonZero on CUDA/ROCM (#10307)
* improve NonZero

* fix megatron_fp16 optimzier, fix the doc

* multi_tensor_applier

* resolve comment

* fix building warning

* fix build error when enabling training and use tensorrt
2022-03-25 07:35:45 +08:00
Baiju Meswani
141606534c
Add support for FusedAdam to be mathematically equivalent to pytorch/AdamW (#10106) 2022-01-21 13:37:59 -08:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
ashbhandare
0270ff7951
Minor import fix (#9538) 2021-10-25 21:29:31 -07:00
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW (#9343) 2021-10-18 16:03:18 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected (#9351)
* Exception when duplicated autograd.Function name detected

* reorder a bit for a bittle bit better perf

* fix a bug in previous PR :(

* correct the error message a bit
2021-10-15 12:23:13 +08:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00
baijumeswani
bcdb411c8d
Implement FusedAdam for ORT adapted from DeepSpeed (#9266) 2021-10-05 20:50:34 -07:00
pengwa
453431f7bb
Add max_norm for gradient clipping. (#6289)
* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests
2021-01-21 01:01:11 +08:00
Thiago Crepaldi
6594d6672f
Move onnxruntime.experiment to onnxruntime.training namespace (#5045) 2020-09-09 09:46:06 -07:00