Commit graph

1164 commits

Author SHA1 Message Date
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Yi Zhang
c571b99336
Refactor setup_test_data (#12818)
* refactory setup_test_data

* mv setup test data to test stage

* model link for C# test

* add comment
2022-09-07 08:33:27 +08:00
Baiju Meswani
295bd26980
Remove orttraining-distributed CI pipeline (#12738) 2022-09-02 14:34:26 -07:00
PeixuanZuo
adbc0757ad
[UPDATE] update ROCm ci pipeline to ROCm5.2.3 (#12799)
* [Update] update to rocm5.2.3

* [Fix] cmake version

* [Fix] disbale ortmodule tests

* [revert] revert performance number
2022-09-01 10:32:24 +08:00
Baiju Meswani
a52543ecd8
Generate windows training package (#12789) 2022-08-30 16:35:50 -07:00
Yulong Wang
82a28cc2c3
upgrade emsdk to 3.1.19 (#12690)
* upgrade emsdk to 3.1.19

* fix build break

* ignore '-Wunused-but-set-variable' in eigen

* add malloc and free in exported functions

* EXPORTED_FUNCTIONS
2022-08-30 13:42:45 -07:00
Yi Zhang
b4f6dad7c9
increase timeout limit of mac silicon package workflow (#12784)
increase timeout
2022-08-30 13:57:01 +08:00
PeixuanZuo
19ca2a0089
[ADD] python package pipeline for ROCm5.2.3 (#12770)
* [TEST] test rocm5.2.3

[TEST] rm torchversion

[Update]sort

Co-authored-by: Ubuntu <peixuanzuo@peixuanzuomi200vm.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-08-30 11:05:59 +08:00
Edward Chen
1ce14e752b
Increase timeout for clean-build-docker-image-cache-pipeline. (#12776) 2022-08-29 15:30:35 -07:00
Baiju Meswani
80c8d934b8
Add debug option to packaging pipeline (#12685) 2022-08-26 20:25:52 -07:00
Adam Louly
ee543a47f6
upgrade cuda version on ci pipelines (training CI pipelines) (#12708)
* upgrade cuda version on ci pipelines

* keeping folder name same

* keeping folder name same

* setting manual seed for primitive test case

* resolving comments

* changing atol and rtrol only for test case

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-26 16:51:19 -07:00
Baiju Meswani
34d90dd5bd
mac-objc-static-analysis-ci-pipeline increase timeout (#12737) 2022-08-26 12:49:49 -07:00
Adam Louly
3bb5fb0f90
moving training pipelines from cuda 11.5 to 11.6 and deprecating 11.3 (packaging pipeline) (#12688)
* moving training pipelines from cuda 11.5 to 11.6 and deprecating cuda 11.3

* change to cuda 11.6.2

* change pytorch's & torchvision's cuda version to 11.6

* specify deps version to 11.6.2

* update pytorch and torch text version

* torch 1.12.1

* change torchvision and torchtext version to be compatible with torch 1.12.1

* change cuda to 11.6 for cuda_home comaptibility

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-25 22:12:01 -07:00
Scott McKay
8483b9c6e3
MacOS pipeline and MAUI CoreML fixes (#12724)
* Add asm statement to model.mm to force linker to link against CoreML.Framework.

Update targets.xml as per Rolf's suggestions

* Remove explicit numpy version from macos build. We don't specify it for other CIs and the version specified doesn't have a pre-built 3.10 wheel. This leads to the CI attempting to build numpy which fails.
2022-08-26 08:51:37 +10:00
Cassie Breviu
e85dce8cea
Add csharp docfx (#12596)
* add docfx and gh action to build docs

* kick off build from feature branch

* Fix LGTM linting

* update az pipeline to win22 & remove nuget install

* remove azure ci changes

* fix implicit using to support 5.0

* fix more js issues

* remove resource designer changes

* remove space

* fix linting misspellings in autogenerated js temp

* fix misspellings in generated code

* delete log file
2022-08-25 09:51:32 -05:00
Yi Zhang
dee2fdffb0
Remove debug build/test in Mac CPU training (#12698)
* run mac training parallely

* update jobname

* remove debug build/test
2022-08-25 13:38:53 +08:00
Yi Zhang
d91f017da1
remove redundant publish unit test results (#12697)
rm redundant publish unit test results
2022-08-25 11:18:07 +08:00
Cheng
eba4f77d00
enable xnnpack in default_full_aar_build_settings (#12682) 2022-08-25 10:41:06 +08:00
Changming Sun
7927d525a7
Remove CUDNN path from CI build scripts (#12671) 2022-08-24 18:21:50 -07:00
Adam Louly
94f76b944e
nightly pipeline build using PTCA image. (#12605)
* nightly pipeline yaml and requirements files

* changed names, removed torchvision installing

* delete old file

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-24 10:40:55 -07:00
Changming Sun
cb2601c5ea
Update mac-ci.yml to increase macOS build jobs' timeout value to 3 hours (#12675) 2022-08-22 21:31:30 -07:00
Wei-Sheng Chin
dc486d146b
Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460)
* Make ORT as Pytorch JIT backend

LORT likely doesn't work with aten fallback so we only test LORT in its own CI.

* Revert changes to enable external CUDA allocator. Will add it later.

Revert "Revert changes to enable external CUDA allocator. Will add it later."

This reverts commit d5487f2e193014c805505afae8fb577c53667658.

Fix external allocator

* Relax tolerance and remove commented code

* Print more information in CI

* Fix pointer

* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.

* Use Pytorch master branch as all PRs are merged

Fix

* Refine based on cpplint feedbacks

* Revert changes to allow custom CUDA allocator in public APIs

* Use torch.testing.assert_close

* Use unittest framework

* Switch docker repo

* Rename *.cpp to *.cc

* Address comments

* Add comment

* Use same pipeline file for eager and lort pipelines

* Address comments

* Add yaml comment

* Fix cmake files

* Address comments

* Rename flags, remove printing code, remove dead comment
2022-08-22 09:40:40 -07:00
Changming Sun
b270334e1e
Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644) 2022-08-18 22:11:48 -07:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Baiju Meswani
3e78f3cf1f
Add win-ci pipeline for on-device training (#12513) 2022-08-10 14:45:39 -07:00
Changming Sun
c0d396d176
Restrict "Component Detection" task to Lotus project only (#12536)
It is related to PR #12426
2022-08-10 03:25:29 -07:00
Changming Sun
e810480403
Replace the occurrences of "master" to "main" in yaml files (#12534) 2022-08-09 22:03:21 -07:00
Vincent Wang
e85e31ee80
Update ORTModule Default Opset Version to 15 (#12419)
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
2022-08-05 16:55:04 +08:00
PeixuanZuo
3e1b0ac4b3
[DELETE] delete python package rocm4.3.1 (#12480)
[delete] delete rocm4.3.1
2022-08-05 13:27:42 +08:00
Changming Sun
5d610bc8eb
Disable CG task in PR pipelines (#12426) 2022-08-02 19:01:41 -07:00
Yulong Wang
feed5da435
[js] loosen test timeout (#12427)
Losen the following test timeout:

1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
2022-08-02 19:01:19 -07:00
Changming Sun
1a64b94f60
Fix a small issue in nuget packaging pipeline (#12405)
In #12358 I typed a wrong path in the yaml file.
2022-08-02 15:44:43 -07:00
Yi Zhang
5d1173fe68
Run IOS pipeline concurrently (#12400)
split ios pipelines
2022-08-02 11:07:17 +08:00
Yi Zhang
63d64636f6
Add the comment linking to wiki (#12398)
add the comment
2022-08-02 10:09:16 +08:00
Yi Zhang
8b4ad77ea2
pipeline can use last run's artifacts (#12379)
* first step

* depends on stage

* temp change

* specific

* runId

* parameters

* fix typo

* fix typo

* add nnapi

* add nnapi

* fix typo

* minor fix

* condition on stage

* format

* format
2022-07-30 21:34:57 +08:00
Changming Sun
7b4ce0c1e1
Delete the build scripts that were copied from manylinux project (#12358)
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
2022-07-29 18:24:19 -07:00
Jian Chen
7a7e372b9f
Remove training cuda 10.2 pipeline (#12347)
* update to 2022

* Update the VS version

* Rolling back to gcc 10

* Rolling back

* Update cuda home

* remove "CMAKE_CUDA_ARCHITECTURES=52"

* update cuda Architure to 70

* Delete cuda 10.2 training pipeline

* rolling back a mistake

* Update win-gpu-reduce-op-ci-pipeline.yml

* Update win-gpu-reduce-op-ci-pipeline.yml

* Update win-gpu-reduce-op-ci-pipeline.yml

* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.10.0_cu10.2 directory

* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_cu10.2 directory
2022-07-28 14:58:17 -04:00
Edward Chen
6e892a95b4
Use specific Android NDK version in CI builds. (#12350)
Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control.
This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it.
2022-07-28 11:01:04 -07:00
Changming Sun
e6bb447101
Change native folder name for java macos arm64 (#12335) 2022-07-27 15:13:07 -07:00
msftlincoln
9cf6912bba
Fix ORT Eager Mode to work with Pytorch 1.12 (#12323) 2022-07-27 16:24:46 -04:00
Yi Zhang
4df4471d5e
add missing build_java in Android testing stage. (#12187)
add missing build_java in testing
2022-07-27 14:13:08 +08:00
pengwa
2b2367efbf
Fix orttraining-linux-gpu-ci-pipeline (fairscale dependency) (#12320)
authored by: @pengwa
2022-07-26 15:11:04 -07:00
Baiju Meswani
ddb45e9126
On device training CI pipeline (#11987) 2022-07-25 10:07:17 -07:00
Rachel Guo
496618594f
Update supported ops md for NNAPI/CoreML EP (#12245)
* update supported ops md

* address pr comments

* address pr comments

* wording
2022-07-21 10:23:08 -07:00
Jian Chen
43e1e89453
Update aarch64 building pool to aiinfra-linux-ARM64-CPU-2019 (#12243)
* Setting new pool for arm64

* Setting defualt pool name

* adding DockerInstaller stage

* try to install docker from apt-get

* change to specific

* adding chmod to docker.sock

* install dotnet sdk

* specic dotnet 3.1.x

* add manuall step to install dotnet

* typo bass

* remove inputs

* change dotnet installation dir

* skipComponentGovernanceDetection on arm64 linux

* variables typo

* variables:
    - name: skipComponentGovernanceDetection
      value: true

* update variables

* skipComponentGovernanceDetection set to true

* moving varliables

* moving the variables again

* setting condition on cgd

* indentation

* indentation again

* conditional variable

* if

* remove cgd

* conditionl on cgd

* condition

* parameters

* clean up
2022-07-20 12:08:02 -04:00
mindest
add631410a
[ROCm] Re-enable ReduceL1, L2 and related tests (#12209)
Re-enable ReduceL1,L2 and related tests
2022-07-20 13:13:02 +08:00
leqiao-1
09af4a7fdd
remove wrong placed libs (#12201) 2022-07-18 09:22:22 -07:00
PeixuanZuo
7b53b223b8
[UPDATE] update AMD CI pipeline to Rocm5.2 with torch1.11 (#12162)
* [UPDATE] update ci to rocm5.2 + torch1.11

* [Revert] disable ort module test

* [DELETE] delete Rocm5.1.1 ci test result

* [UPDATE] update the comments
2022-07-14 16:38:16 +08:00
Edward Chen
6e051016c1
Add Python package to perf test pipeline. (#12135) 2022-07-12 10:50:24 -07:00
LironKesem
9647a3be40
Add tests for all unary aten ops supported in eager mode (#12087)
* Add tests for all uniary aten ops supported in eager mode

* fixing the PR draft

* fixing the merge

* changing eval to be at compile time

* adding requirements for eager

* 1.adding function to {ops}_out
2.cleaning the code
  and adding comments

* editing the code according to code review

Co-authored-by: root <root@AHA-LIRONKESE-1>
2022-07-12 08:53:19 -04:00