Commit graph

512 commits

Author SHA1 Message Date
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
Pranav Sharma
c2c78399ee
Include config keys header file in the release packages for Linux and Mac. (#5388) 2020-10-08 15:00:29 -07:00
Changming Sun
09aef240d6
Skip running onnx tests in python mac os pipeline (#5416) 2020-10-08 11:49:28 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
Wenbing Li
4721729fdc
Enable iOS CI pipeline (#5360)
* add the ios ci build.

* no dependency on mac ci pipeline.

* fix the command line.

* keep sync

* automatically retrieve sdpath

* fix the case errors and warnings

* fix the vlog switch issue.

* add parallel flag for build.

* update the display name of the pipeline.
2020-10-02 20:14:45 -07:00
Guoyu Wang
9df0790856
Update linux minimal CI to report Android mininal baseline binary size (#5361)
* Update linux minimal CI to report Android mininal baseline binary size

* Fix some issues in the script
2020-10-02 17:35:23 -07:00
edgchen1
d62873a331
Docker image release build updates (#5326)
- Update docker image release build to use build commit.
- Use valid default in component governance detection step.
- Use smaller docker build context.
2020-10-01 12:25:31 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
Dmitri Smirnov
89742411ec
Insert telemetry template into GPU build, add telemry build switches. (#5278) 2020-09-24 17:13:09 -07:00
edgchen1
6d5b93b805
Synchronize training dependency versions between Docker image and Python wheel. (#5261)
Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.
2020-09-23 19:03:42 -07:00
suffian khan
417929b049 jobs timeout .. 2020-09-21 21:51:59 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
Guoyu Wang
78a29aebbc
[ORT Mobile] ORT Minimal E2E CI (#5200)
* Modify the ort minimal CI to ort minimal e2e ci
2020-09-19 18:43:22 +10:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
liqunfu
f37e1292a1
--shm-size=1024m to fix nccl shared memory issue (#5214)
* --shm-size=256m to fix nccl shared memory issue

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 17:21:47 -07:00
Guoyu Wang
8156e0dd10
[ORT Mobile] Some updates to iOS/Android build settings (#5184)
* Update android CI and build settings

* add build_java to arm64 also

* Add ios signing param

* fix a small build warning

* address pr comments
2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata
1a2e289d2d
Fix nuget build (#5163)
* Fix nuget content

* Revert "Fix nuget content"

This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443.

* Nuget packaging

* skip tests

* msbuild path

* Force msbuild version

* Workaround https://github.com/NuGet/Home/issues/7621

* cleanup
2020-09-16 10:37:09 -07:00
Changming Sun
a0a435abc6
Add sympy==1.1.1 to Linux docker image (#5177) 2020-09-15 16:08:49 -07:00
Scott McKay
089789c135
Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168) 2020-09-15 15:11:06 +10:00
RandySheriffH
1dde215d96
promote cuda version on packacking pipelines (#5154)
* promote cuda version on packacking pipelines

* fix cudnn version in py packaing template

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-14 21:09:09 -07:00
RandySheriffH
9392aa2f64
Promote Cuda version to 10.2 for windows pipelines (#5138) 2020-09-13 20:32:06 -07:00
Scott McKay
323a1ba8a4
Add option to exclude support for loading ORT format models in full build. (#5129)
* Add ability to exclude support for loading ORT format models.
Disable support for ORT format models in packages
2020-09-12 12:21:30 +10:00
RandySheriffH
120e3cda74
fix path (#5131)
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-11 12:18:07 -07:00
Changming Sun
c5efb0085d
Update Linux GPU build pipelines to CUDA 10.2 (#5120)
* Update Linux GPU build pipelines to CUDA 10.2
2020-09-10 17:40:51 -07:00
Changming Sun
a5530358c9
Fix a path problem in Dockerfile.manylinux2014_cuda10_2 (#5106) 2020-09-10 10:30:13 -07:00
Tiago Koji Castro Shibata
62848c4de5
Add store builds to nuget packaging (#5040)
* Nuget store packaging

* Move DNNL workaround to EP

* Fix warning as error

* Disable store tests

* Skip store tests

* msbuild target

* Cross compile protoc in Store

* Disable DML in store

* Move store builds to CPU queue

* Copy uap10 to final nuget

* Fix pip8 error

* Remove extra dml copies

* Fix argparse

* pep8

* Forward IsStoreBuild

* Apply is_store_build to duplicate generate_nuspec

* runtimes

* Refactor uap10

* Store .NET

* uap

* PR feedback
2020-09-09 21:38:14 -07:00
RandySheriffH
5e10cde006
PipelinesForCuda11Cudnn8 (#4938)
* cancel night build on pyop

* setup win cuda11 pipeline

* add debug build

* test base gpu settings

* setup pipelines to test cuda 10.2 and 11

* rename linux docker images

* rename docker image tag and add clean up job

* fix typo in cuda 11 config

* set cuda11 env

* update linux cuda 11 pipeline

* reset docker image name

* disable uninitialized warning from linux build

* change the way to silence uninitialized warning

* add flags to linux gpu pipeline

* switch docker image for linux cuda 10.2

* switch linuc cuda 10.2 image

* test cuda11 with devtool8

* try latest built images

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-09 16:13:58 -07:00
Changming Sun
924ecb0623
Use manylinux2014 for Linux CPU build (#5091) 2020-09-09 10:09:52 -07:00
gwang-msft
a1a81470e3
Add minimal build binary size verification (arm64) to Android CI (#5087)
* Add minimal build binary size verification (arm64) to Android CI

* Add comments in the CI ymal
2020-09-09 19:06:20 +10:00
gwang-msft
a40d34386a
Add Linux CPU CI for ORT minimal build (#5074)
* initial test version

* update yml

* minor updates

* minor updates

* Test minimal build

* update with include ops for minimal build ut only

* error case to see build failure

* test no_exceptio

* Remove error cases

* address pr comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-08 17:09:33 -07:00
Changming Sun
370d194db7
Add a docker file for CI build CUDA 10.2 (#5065) 2020-09-04 16:28:45 -07:00
Scott McKay
b5c2932ae8
Last major set of ORT format model changes (#5056)
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
2020-09-05 07:59:01 +10:00
Changming Sun
d5d5e37e76
Build system enhancements (#5012)
1. Add a docker file for CUDA11
2. Support setting CUDA_ARCHITECTURES from command line.
2020-09-02 10:13:26 -07:00
RandySheriffH
14b51d6502
CiPipeline@ReducedOpsBuild (#4917)
* cancel night build on pyop

* setup ci pipeline for build of reduced ops

* add back c# test

* remove debugging print

* add testing model

* add more arg in pipeline script

* disable pipeline trigger temporarily

* fix yaml format

* fix yaml format

* fix pipeline error

* rid c# test

* add ops for test cases

* add Conv from domain com.microsoft.nchwc

* remove --reduce_ops

* fix typo

* remove --build_java

* add test case for excluded op

* update doc with --skip_test

* formatting code, renaming files and simplify yaml

* remove debug build from yaml

* remove surplus ops from included_ops.txt

* add MinSizeRel build to yaml

* rename test cases and models

* exclude ir test from minimum build

* restrict ir test to be only applied to reduced ops build
2020-08-31 21:21:18 -07:00
Ashwini Khade
8679a7244e
Enable rejecting models based on onnx opset (#4912)
* enable rejecting models based on onnx opset

* enable unreleased opsets in linux and mac CI

* test fixes and more updates

* enable unreleased opsets in CI builds

* enable released opsets in linux cis

* try fix windows ci yml

* yml fixes

* update yml

* yml updates post master merge

* review comments

* bug fix
2020-08-31 13:35:36 -07:00
Hariharan Seshadri
b945225de3
Include DirectML pdb in x86 bin folder (#4953) 2020-08-28 11:29:26 -07:00
Changming Sun
c37fa7c278
Delete Dockerfile.centos6_gpu (#4851) 2020-08-28 09:56:52 -07:00
edgchen1
71d8846635
Fix telemetry-steps.yml (#4903)
Fix bug in telemetry-steps.yml that causes telemetry setup to be disabled even if TELEMETRYGUID is set.
2020-08-24 22:14:40 -07:00
Changming Sun
f34ed3a576
Hot fix for the python packaging pipeline Linux ARM build (#4902) 2020-08-24 20:14:33 -07:00
Rayan-Krishnan
eb05db5a2a
Fix OptimizerConfig params groups (#4877)
* Copy samples to build folder and load models from there. Fix CI
* This PR also includes a fix to path validation for save_as_onnx API
* Add torchtext to CI for GPU training
* Remove new frontend tests from CI

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2020-08-22 22:04:17 -07:00
liqunfu
6260d073b3
Glue parallel training (#4550)
add mpi size, rank python API

add single node parallel training example
2020-08-21 21:24:27 -07:00
Yulong Wang
c6119a548c enable telemetry in node.js binding 2020-08-20 09:47:57 -07:00
suryasidd
3a00b50cf8
[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836)
* Removed building ngraph from source

* Disabled some tests temporarily

* Enabled softmax for all dims

* Added onnx importer to link libraries

* int64 changes

* fixed

* temp

* slice update start and end need to be initializer

* Disabled GatherND, ScatterND, ReverseSequence operators

* Added supported ops instead of unsupported ops

* Set precision only for CPU

* Removed some unecessary conditions

* Fixed segfault in slice

* Softmax restriction removed

* changes

* Setting precision for all plugins

* Changes added to include precision
and supported ops for gpu and vpu

* branch op support

* checking for disabled python test failure

* mapped input names and tensors directly rather than copying which was leading to mismatch

* last index is not supported
mkldnn does not support pow between integers

* included the code changes

* Rename inner-scoped variable to avoid MSVC warning

* applied changed to vadm as well and removed the utility function
getinputtensors() completely

* OpenVINO multi version support: CMake changes

* OpenVINO multi version support: C++ support

* removed commented code

* Remove redundant code lines

* Revert "Rename inner-scoped variable to avoid MSVC warning"

This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.

* vadm changes disabled reduction op test

* putting test_gather_negative_indices in unsupported list for now

* Update MCR Dockerfile with 2020.4

Installs OpenVINO 2020.4 from deb packages via APT tool.

* Update build docs with 2020.4 info

* Update dockerfile with OV 2020.4 info

Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04

* Added constant folding bypass logic

* Added cout statements for ci

* Added NDEBUG flag for debug symbols

* Update Ops info in docs

* fixes multiple unit tests

* mathoptest.ceil disabled for gpu and myriad

* activation test temp disabled

* Fix models for CPU

* Fixed a syntax error

* local cmmit

* fixing unit tests for myriad

* Fixed Variadic Split, Topk issues

* fix_model commit

* Fix models in myriad

* Added ifdefs for OpenVINO 2020.4

* temp

* made some changes to not operator

* Added unused parameter

* relu enabled

* Fixed bug in Conv output

* Consolidated GPU failing tests into one category

* Made it compatible to InternalCI 2020.4

* Made changes for ngraph

* Disabled test for mask,fastercnn,tinyyolov3

* Removed proxy for ci

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* Updated documentation for 2020.4

* Removed FP32 to FP16 transformation for GPU

* Disabled Coreml-FNS-Candy model test

* Added FP16 transformations

Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>
2020-08-19 23:18:08 -07:00
Changming Sun
1ba07ccfaf Codesign validator fixes 2020-08-18 16:20:15 -07:00
Changming Sun
e98697ec28
Fix nuget cpu package pipeline (#4832) 2020-08-17 17:08:48 -07:00
Ksenija Stanojevic
ea37a4d89b
Add Trilu custom op (#4537)
Co-authored-by: neginraoof <neginmr@utexas.edu>
2020-08-17 14:42:26 -07:00
Thiago Crepaldi
42408aa3ed
Add new PytTrch front-end (#4815)
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)

Add ORTTrainerOptions class and some placeholders

* Add _ORTTrainerModelDesc to perform validation for model description (#4416)

* Add Loss Scaler classes to the new frontend (#4306)

* Add TrainStepInfo used on the new frontend API (#4256)

* Add Optimizer classes to the new frontend (#4280)

* Add LRScheduler implementation (#4357)

* Add basic ORTTrainer API (#4435)

This PR presents the public API for ORTTrainer for the short term
development.

It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session

* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)

* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)

* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions

This PR keeps the public API intact, but changes how model description is stored on the backend

Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).

With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.

Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.

This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.

Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None

* Rename input name for test

* Add ONNX Model Export to New Frontend (#4612)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Create training session + minor improvements (#4668)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Save ONNX model in file (#4671)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add eval step (#4674)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add train_step (#4677)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add LR Scheduler (#4694)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add deterministic compute tests (#4716)


Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add Mixed precision/LossScaler + several fixes (#4739)

Additionally to the mixed precision/loss scaler code, this PR includes:

* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests

* Add ONNX BERT example + minor fixes (#4757)

* Fix training issue when passing ONNX file into ORTTrainer

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add Dynamic Shape support (#4758)

* Update DeepSpeed Zero Stage option to a separate option group (#4772)

* Add support to fetches (#4777)

* Add Gradient Accumulation Steps support (#4793)

* Fix Dynamic Axes feature and add unit test (#4795)

* Add frozen weights test (#4807)

* Move new pytorch front-end to 'experimental' namespace (#4814)

* Fix build

Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-17 09:45:25 -07:00
Changming Sun
5eec4f66ed
Refactor manylinux docker image and the related pipelines (#4751)
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
2020-08-17 09:40:31 -07:00
Yulong Wang
aa993e95c9
enable build flag '--use_openmp' on MacOS (#4774)
* enable build flag '--use_openmp' on MacOS

* cmake 3.16.1 to enable find_package(OpenMP) on mac
2020-08-13 15:56:42 -07:00