Commit graph

287 commits

Author SHA1 Message Date
Ryan Hill
9ef92f352f Test Java pipeline fix 2021-05-19 16:33:55 -07:00
Changming Sun
38d90b0f15
Cleanup install_deps.sh (#7734) 2021-05-17 19:27:47 -07:00
liqunfu
d604281a86
Liqun/training pkg to run tests (#7662) 2021-05-16 09:10:57 -07:00
liqunfu
3ead2f2f39
update pt lightning version (#7711)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-15 21:46:16 -07:00
liqunfu
359fe1d197
Liqun/ort training version (#7620) 2021-05-14 09:54:19 -07:00
ashbhandare
56e993a434
Bump to rel-1.9.1 (#7684) 2021-05-13 18:41:28 -07:00
Hariharan Seshadri
4b691a5c0d
Add ability for memory arenas to "shrink" periodically (#7284) 2021-05-08 07:53:21 -07:00
Changming Sun
41e370c2b3
Update protobuf to 3.16 (#7616) 2021-05-07 14:09:23 -07:00
baijumeswani
f3a70f1aec
Ignore invalid input argument to install_os_deps.sh (#7566) 2021-05-05 14:33:31 -07:00
Changming Sun
a284eede64
Fix Linux CPU pipeline (#7584) 2021-05-05 13:26:10 -07:00
Scott McKay
594dde2647
Validate that the conversion script from the python package can be used to convert models. (#7517) 2021-05-04 16:25:04 +10:00
George Wu
faea7a222d
linux trt package pipeline (#7537) 2021-05-03 19:14:20 -07:00
baijumeswani
cab84d902e
Install and use conda on ortmodule CI pipelines (#7530)
* Install and use conda on ortmodule CI pipelines

* Update build script to install onnxruntime wheel before running unit tests

* Remove python 3.5 from install_python_deps

* Pinning deepspeed version to 0.3.15
2021-05-03 15:52:22 -07:00
liqunfu
196e6702ad
to support multiple cuda versions in published onnxruntime-training package (#7468)
to support multiple CUDA versions in published onnxruntime-training package
2021-04-27 17:15:33 -07:00
Suffian Khan
7a3c1787af
Add CI pipeline to publish Python training package targeting Rocm (#7417)
* first attempt rocm training wheel

* modifications needed to python packaging pipeline for Rocm 4.1

* changges to not conflict with cuda

missed stage1 changes

remove package push

add option r to getopt

try again without python install

try again without python install

try again without python install

split pipelines and add back push to remote storage

try on cuda gpu pool

try again

try again

try running without az subscription set

try again on original pipeline

change pool

passing AMD Rocm whl on AMD-GPU pool

split rocm pipeline from cuda pipeline

remove comments

* try adding Rocm tests as well

* try with tests in place

* fix trailing ws

* add training data

* try again as root for tests

* use python3

* typo

* try to map video, render group into container

* try again

* try again

* try to avoid yum error code

* make UID 1001

* try without yum downgrade

* define rocm_version=None

* remove CUDA related comments for Rocm Dockerfile

* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)

* missed requirements-rocm.txt from last commit

* fix whitespace
2021-04-23 17:22:31 -07:00
Ashwini Khade
75e054cd33
pick onnx release candidate (#7177)
* pick onnx release candidate

* fix typo

* filter batchnorm tests

* add implementation for reshape 14

* add identity op kernel for opset 14

* fix typo

* update onnx commit

* update commit to latest master

* add hashes for new kernel registrations and update 1

* TEST commit

* update onnx back to right commit

* Update onnx to latest in rel-1.9.0

* temp fix

* remove nonzeroshapesetter transformer

* pick rel branch latest commit

* fix build failures

* fix build failures

* fix build failures

* update the commit to latest in release branch

* add test filters for not impemented op14 ops in c# tests

* plus review comments
2021-04-22 23:57:09 -07:00
Changming Sun
6822ae95ec
Reduce the number of TensorRT tests needed to run (#7419) 2021-04-22 19:14:39 -07:00
Changming Sun
65b2b87f83
Update CI build docker images (#7386)
Update CI build docker images: delete ubuntu 16.04 support.
2021-04-21 13:18:34 -07:00
Changming Sun
b4cfa88bf7
Update protobuf to the latest version (#7396) 2021-04-21 10:30:06 -07:00
Guoyu Wang
fce67e2b9b
Create Android Package pipeline (#7295)
* Create Android Package pipeline

* adress CR comments

* Switch to jdk11
2021-04-12 17:56:25 -07:00
Edward Chen
0ebeaf529d
Check kernel def hashes (#7120)
Add unit test for verifying kernel def hashes.
Add way to add new types to kernel definition without changing hash.
2021-04-01 17:42:58 -07:00
sfatimar
52bcef4d4f
Openvino ep 2021.3 (#7180)
* Integrate openvino-ep-2021.3

* operators type

* changed the myriad as it is case sensitive

* logging information for openvino-ep-2021.3

* Unit test fix

* Resize operator added for myriad

* Fixed python tests for CPU and GPU

* data commit for loop tile and gatherelements failure

* adding checks for Where

* fixing gatherelements and loop tests

* disabling instance normalization test for now as there seems to be a
myriad bug, putting loop in ops supported only because all the tests
fail

* gather elements op test taking care of warning message

* condition needs to be an intializers

* Disabled python test for Myriad

* Disable compilation warning for MSVC windows compiler

* softmax_test, threedimaxis0 and 1 test give accuracy mismatch
tensoroptest disables test gives accuracy mismatch
gather test gives accuracy mismatch

* Updated with ov version 2021.3

* Updated with ov version 2021.3

* Updated README

* Disabling python tests for cpu

* Disabling python tests with accuracy mismatch on cpu

* Added fix for Linux CI Pipeline failure

-> Disabled tests that were throwing segfault

Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
2021-04-01 11:28:54 -07:00
baijumeswani
249a2c14ef
Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167)
* Pin version of pytorch to 1.8.1 for ORTModule CI pipeline
* Use pytorch-lightning stable version 1.2.5
* Revert to cuda 10.1
2021-04-01 09:37:47 -07:00
liqunfu
e545604499
. (#7165) 2021-03-30 13:58:30 -07:00
Ashwini Khade
b22e60bd44
pull onnx latest commit (#7102)
* update onnx commit

* fix test scripts to remove deprecated call

* update filters

* add registration for relu and cumsum ver 14

* add promote trilu to onnx domain

* update onnx-tensorrt submodule

* update flag

* update flag

* update dependencies

* fix android ci failure
2021-03-29 11:00:38 -07:00
harshithapv
540eac253e
Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078)
* adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule

* fixed typo in args

* addressed Thiago's comments

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-24 09:43:05 -07:00
baijumeswani
a7a2a16edd
Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094) 2021-03-22 21:48:32 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Changming Sun
701e73b5b8
Move Linux minimal build CI pipeline to the new Linux machine pool (#7050) 2021-03-18 12:09:12 -07:00
Thiago Crepaldi
335edaa2c4
Merge pull request #6973 from microsoft/thiagofc/merge-ortmodule-into-master
Introduce ORTModule training API to ONNX Runtime
2021-03-17 10:30:06 -07:00
Changming Sun
ed2d441a2e
Update ORT server build pipeline (#7030)
1. Migrated it to Ed's new docker build script
2. Use python 3.6 instead, because it is the default one in ubuntu 18.04
3. Move the "pip install" command to the docker image build stage(instead of when running the image)
2021-03-16 18:02:09 -07:00
Changming Sun
4161758058
Remove openmp related packaging pipeline (#6991)
1. Remove openmp related packaging pipelines and build jobs.
2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently.
3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support
4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols
2021-03-12 10:02:59 -08:00
baijumeswani
79f832c682
Separate requirements.txt file for ORTModule pipelines (#6879)
* Move all ORTModule dependency installations to ortmodule subfolder
2021-03-05 14:12:11 -08:00
Baiju Meswani
d5667554e6 Merge branch 'master' of github.com:microsoft/onnxruntime into bmeswani/merge_master_onto_ortmodule 2021-03-03 20:37:29 -08:00
Guoyu Wang
36a44d55ed
Only report Android Baseline binary size for master branch (#6844)
* Only report binary size from master

* update script

* Correct the typo
2021-03-01 15:57:18 -08:00
Sherlock
12edf22f11
Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-master-210226
Sync from master
2021-02-27 12:32:36 -08:00
Thiago Crepaldi
f71d93ea2b
Enable PyTorch Lightning basic test on CI (#6809) 2021-02-27 09:35:42 -08:00
M. Zeeshan Siddiqui
ca48310d6d Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/ortmodule-api-sync-from-master-210226 2021-02-27 04:25:23 +00:00
Maajid khan
7465673e33
[OpenVINO-EP] Find package changes (#6801)
* Find package changes to cmake

* Removing unwanted code from cmake

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2021-02-25 05:12:57 -08:00
Edward Chen
5db0c9c648
Enable CI to cover globally allowed types (#6778)
Add test to CI build to cover type reduction with globally allowed types.
2021-02-23 10:24:12 -08:00
stevenlix
53eb948f4c
Upgrade TensorRT to v7.2.2 (#6452)
* upgrade to TensorRT 7.2.2

* extend GPU tensorrt CI timeout to 150 minutes

* update docker image name

* disable user interaction to avoid tensorrt container stuck when install tzdata

* upgrade to libssl1.1 for ubuntu20.04

* remove libicu60 from ubuntu20.04

* add libicu66 for ubuntu20.04

* debug

* llvm

* llvm

* disable ReverseSequenceTest.InvalidInput

* disable ReverseSequenceTest.InvalidInput

* fix issues

* fix issues

* Update linux-gpu-tensorrt-ci-pipeline.yml

* disable warning 4458 for TensorRT parser

* update onnx-tensorrt submodule

* disable warnings for TensorRT parser

* update onnx-tensorrt submodule to include latest bug fixes

* update setup_env_trt

* update pool for win trt ci pipeline'

Co-authored-by: George Wu <jywu@microsoft.com>
2021-02-18 04:30:47 -08:00
M. Zeeshan Siddiqui
40dda452cf Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/sync-from-master 2021-02-18 03:03:01 +00:00
liqunfu
dd8ef4409a
Liqun/migrate perf test (#6733)
move ort training perf tests to azure devops
2021-02-17 17:48:47 -08:00
Thiago Crepaldi
3184c47ad1 Merge branch 'master' into thiagofc/merge-from-master 2021-02-17 11:49:52 -08:00
baijumeswani
01dfa8e125
Support non tuple return values from torch.nn.module (#6660)
* Support dictionary, namedtuples and huffingface ModelOutput type for model return values
2021-02-16 20:48:32 -08:00
Scott McKay
02c7873b0e
Update ORT model conversion script to support custom ops (#6701)
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
2021-02-17 12:52:39 +10:00
Scott McKay
25f7c93504
Require explicit inclusion of custom op support in a minimal build (#6663)
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
2021-02-13 12:42:33 +10:00
Changming Sun
8378a45ae7
Add python 3.8/3.9 support for Windows GPU and Linux ARM64 (#6615)
Add python 3.8/3.9 support for Windows GPU and Linux ARM64

Delete jemalloc from cgmanifest.json.

Add onnx node test to Nuphar pipeline.

Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate.

Delete Java GPU packaging pipeline

Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure.

Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch.

Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits

And, due to some internal restrictions, I need to rename some of the agent pools
2021-02-11 16:43:35 -08:00
Edward Chen
e59cb9455e
Add CI build with type reduction enabled (#6622) 2021-02-10 13:31:51 -08:00
Changming Sun
0b89f931d0
Update CUDA build configs (#6598)
1. Fix Nuget package build break caused by #6225
2. Delete Dockerfile.centos_gpu. It is not used anywhere.
3. Fix Linux CUDA 10.2 build error caused by glibc upgrade
2021-02-08 22:55:42 -08:00