Commit graph

86 commits

Author SHA1 Message Date
Jian Chen
2eb3db6bf0
Adding python3.12 support to ORT (#18814)
### Description
Adding python3.12 support to ORT



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-11 08:34:28 -08:00
Ashwini Khade
897a4163d7
Update transformer version for training CIs (#19046)
### Description
Updating version to resolve security vulnerability.
2024-01-09 12:00:34 -08:00
Ashwini Khade
4dff154f51
Fix nightly pipeline failure (#18867)
### Description
Fixes a failure in the ortmodule nightly pipeline. 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-19 09:18:00 -08:00
Ashwini Khade
16df8377d3
Update transformers package to fix the security issue (#18730)
### Description
Updating transformers package in test pipeline to fix a security
vulnerability.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-11 09:15:23 -08:00
Abhishek Jindal
680a526e73
Training packaging pipeline for cuda12 (#18524)
### Description
<!-- Describe your changes. -->
Build ORT-training packaging pipeline for CUDA 12.2


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will help any customer using CUDA 12 and would not need to build
ORT-training from source

Test run:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0
2023-11-21 13:19:21 -08:00
Changming Sun
398ef677ba
Update protobuf python package's version (#18203)
1. Now we use a released version of ONNX, so we can directly download a
prebuilt package from pypi.org. We do not need to build one from source.
2. Update protobuf python package's version to match the C/C++ version
we are using.
3. Update tensorboard python python because the current one is
incompatible with the newer protobuf version.
2023-11-06 09:22:54 -08:00
Wei-Sheng Chin
b5a103ae16
Upgrade transformers to fix CI (#17823)
Python package pipeline fails due to "tokenizers" compilation. Since
"tokenizers" is a dep of "transformers", we update its version and hope
a new solution had been there.

```
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib/src/models/bpe/trainer.rs:517:47
```
2023-10-07 09:51:24 -07:00
Changming Sun
276e8733bd
Update onnx python package and setuptools (#17709)
### Description
A follow-up for #17125
2023-09-27 07:54:48 -07:00
pengwa
ac100ebb64
Fix orttraining-ortmodule-distributed CI (#16569)
### Fix orttraining-ortmodule-distributed CI

https://pypi.org/project/pydantic/#history released version 2.0 1st
July, Deepspeed has known issue on newer version of it
(https://github.com/microsoft/DeepSpeed/issues/3280). So fix this by add
similar check as DS did in
https://github.com/microsoft/DeepSpeed/pull/3290
2023-07-03 13:18:59 +08:00
Baiju Meswani
11b0a18de6
Add support for cuda 11.8 and python 3.11 for training (#15548) 2023-04-20 12:56:45 -07:00
Jian Chen
af28754e6f
Update python package pipeline to support 3.11 (#15311)
### Description
Update python package pipeline to support 3.11

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-04 10:55:32 -07:00
zhijiang
80e25ad6ac
fix cg issue (#14372)
### Description
tensorboard depends on rsa>=3.1.4, while rsa 4.5 has vuln issue, so pin
it to higher version as suggested

Fixed
[AB#7352](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/7352)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-09 15:28:11 +08:00
Baiju Meswani
c6ff5bac9d
Update torch in eager mode CI pipeline (#14094) 2023-01-06 11:46:44 -08:00
Ashwini Khade
e5e3570ac5
fix cg issue (#14112)
### Description
Update torch version to 1.13.1 to fix CG issue:
https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/10666/
2023-01-04 09:07:13 -08:00
Baiju Meswani
0ff61f7b97
Update torch to 1.13.1 in CI and packaging pipelines for ort training (#14055) 2023-01-03 20:03:33 -08:00
Baiju Meswani
b85878953f
Fix nightly ort training ci pipeline (#14007) 2022-12-30 12:28:57 -08:00
shalvamist
d22be84add
Pin packaging to version 21.3 to address training pipeline failures 2022-12-09 09:05:55 -08:00
Edward Chen
9e65f3bfdb
Replace deprecated Python dependency sklearn with scikit-learn. (#13585) 2022-11-08 09:08:29 -08:00
Baiju Meswani
5182d6610d
Upgrade pytorch to 1.12.1 for training pipelines (#13128) 2022-09-28 17:59:49 -07:00
Prathik Rao
8ea742b507 downgrade setuptools 2022-09-19 12:39:35 -07:00
PeixuanZuo
19ca2a0089
[ADD] python package pipeline for ROCm5.2.3 (#12770)
* [TEST] test rocm5.2.3

[TEST] rm torchversion

[Update]sort

Co-authored-by: Ubuntu <peixuanzuo@peixuanzuomi200vm.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-08-30 11:05:59 +08:00
Adam Louly
ee543a47f6
upgrade cuda version on ci pipelines (training CI pipelines) (#12708)
* upgrade cuda version on ci pipelines

* keeping folder name same

* keeping folder name same

* setting manual seed for primitive test case

* resolving comments

* changing atol and rtrol only for test case

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-26 16:51:19 -07:00
Adam Louly
3bb5fb0f90
moving training pipelines from cuda 11.5 to 11.6 and deprecating 11.3 (packaging pipeline) (#12688)
* moving training pipelines from cuda 11.5 to 11.6 and deprecating cuda 11.3

* change to cuda 11.6.2

* change pytorch's & torchvision's cuda version to 11.6

* specify deps version to 11.6.2

* update pytorch and torch text version

* torch 1.12.1

* change torchvision and torchtext version to be compatible with torch 1.12.1

* change cuda to 11.6 for cuda_home comaptibility

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-25 22:12:01 -07:00
Adam Louly
94f76b944e
nightly pipeline build using PTCA image. (#12605)
* nightly pipeline yaml and requirements files

* changed names, removed torchvision installing

* delete old file

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-24 10:40:55 -07:00
Changming Sun
b270334e1e
Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644) 2022-08-18 22:11:48 -07:00
PeixuanZuo
3e1b0ac4b3
[DELETE] delete python package rocm4.3.1 (#12480)
[delete] delete rocm4.3.1
2022-08-05 13:27:42 +08:00
Jian Chen
7a7e372b9f
Remove training cuda 10.2 pipeline (#12347)
* update to 2022

* Update the VS version

* Rolling back to gcc 10

* Rolling back

* Update cuda home

* remove "CMAKE_CUDA_ARCHITECTURES=52"

* update cuda Architure to 70

* Delete cuda 10.2 training pipeline

* rolling back a mistake

* Update win-gpu-reduce-op-ci-pipeline.yml

* Update win-gpu-reduce-op-ci-pipeline.yml

* Update win-gpu-reduce-op-ci-pipeline.yml

* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.10.0_cu10.2 directory

* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_cu10.2 directory
2022-07-28 14:58:17 -04:00
msftlincoln
9cf6912bba
Fix ORT Eager Mode to work with Pytorch 1.12 (#12323) 2022-07-27 16:24:46 -04:00
pengwa
2b2367efbf
Fix orttraining-linux-gpu-ci-pipeline (fairscale dependency) (#12320)
authored by: @pengwa
2022-07-26 15:11:04 -07:00
LironKesem
9647a3be40
Add tests for all unary aten ops supported in eager mode (#12087)
* Add tests for all uniary aten ops supported in eager mode

* fixing the PR draft

* fixing the merge

* changing eval to be at compile time

* adding requirements for eager

* 1.adding function to {ops}_out
2.cleaning the code
  and adding comments

* editing the code according to code review

Co-authored-by: root <root@AHA-LIRONKESE-1>
2022-07-12 08:53:19 -04:00
PeixuanZuo
1c39d22f4e
[ADD] Rocm5.2 for Rocm python packaging pipeline (#12129)
[ADD] rocm5.2
2022-07-11 11:10:45 +08:00
Wil Brady
fdf12a5c35
Fix windows eager build break by pinning to torch version 1.11.0 (#12033)
Fix windows and linux eager build to torch 1.11.0.
2022-06-30 07:01:13 -04:00
PeixuanZuo
c556f5f22f
Add AMD python package ROCm5.1.1+torch1.11 (#11516)
* [FIX] fix name error

* [ADD] add rocm5.1.1 python package

* [ADD] torch1.10.0 rocm requirements

* [UPDATE] update docker Repository name
2022-05-16 08:14:11 +08:00
Justin Chu
a1f9847b23
[Fix] Add the extra param to match gelu in PyTorch in the contrib symbolic function (#11318)
Description:

Add the extra param to match gelu in PyTorch in the contrib symbolic function

Motivation and Context

Why is this change required? What problem does it solve?
The symbolic function in /onnxruntime/python/tools/pytorch_export_contrib_ops.py is missing a recently added parameter approximate. We add this parameter and use the exporter defined gelu if approximate is "tanh".
2022-05-04 10:36:38 -07:00
ytaous
eec5187801
Remove Rocm 4.2 from CI Build (#11130)
* remove rocm42 CI

* update torch to v1.11.0

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-07 11:42:09 -07:00
Baiju Meswani
f9940f17b1
Remove extra-index-url to avoid nuget security analysis vulnerability (#11082) 2022-04-01 18:30:55 -07:00
Baiju Meswani
249c4dec7f
Update orttraining release pipelines to use torch 1.11.0 (#11018)
* Update orttraining release pipelines to use torch 1.11.0

* Change requirements_torch...txt to requirements.txt

* Update cuda cmake architectures and clean up old files
2022-03-31 21:51:06 -07:00
dependabot[bot]
79e4ed8064 Bump pytorch-lightning
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.5.10 to 1.6.0.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases)
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.5.10...1.6.0)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-31 16:51:24 -07:00
raviskolli
480c793125
Update training packages to Pytorch 1.11.0 (#10851)
* Update ortmodule training packages to Pytorch 1.11.0

Co-authored-by: Harshitha Venkata <havenka@microsoft.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-03-22 16:45:51 -07:00
Changming Sun
6260733533
Fix eager mode pipeline (#10802)
It was still using python 3.6
2022-03-08 09:26:20 -08:00
dependabot[bot]
e3c85d4262 Bump numpy
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:51:32 -08:00
dependabot[bot]
b780a3784e Bump numpy in /tools/ci_build/github/linux/docker/scripts/training
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:38:38 -08:00
dependabot[bot]
0b0e8ccf92 Bump numpy
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:34:58 -08:00
Changming Sun
feae842a7c
Update pytorch-lightning (#10421) 2022-01-27 21:15:00 -08:00
Thiago Crepaldi
6a7d3deb22
Update pytorch-lightning (#10276) 2022-01-14 16:49:10 -05:00
Abhishek Jindal
d5742f3a43
moving from torch nightly build to stable build (#10150)
* moving from torch nightly build to stable build

* using torch cpu version

* using torch cpu version from link
2021-12-29 19:35:10 -08:00
Suffian Khan
7e55a942cd
Add torch 1.10 requirements for rocm (#10028) 2021-12-13 20:39:58 -08:00
Xavier Dupré
42c176b60c
Update default opset to 14 in ORTModule (#9743)
* update to torch 1.10
* update torchvision version
* update torchtext version
* remove deprecated option enable_onnx_checker
* add unit test to test gradient of GatherElements
* add ORTMODULE_ONNX_OPSET_VERSION in a docker file
2021-12-09 12:45:35 +01:00
Tang, Cheng
8db49e3d0f
add ortmodule and eager mode test (#9888)
* add ortmodule and eager mode test

* add ortmodule dependency

* fix eager pipeline

* skip tthe ortmodule test for windows due to win ci issue

* remove useless win ci change

* add torch

Co-authored-by: Abhishek Jindal <abjindal@microsoft.com>
2021-12-02 19:49:18 -08:00
raviskolli
9f4e8cf6a0
Update training pipelines to pytorch 1.10 (#9709)
* Update training pipelines to pytorch 1.10

* Fixed a typo in cuda version.

* Downgraded gcc to 8 for cuda 10.2
2021-11-15 11:21:55 -08:00