Commit graph

244 commits

Author SHA1 Message Date
Andrews548
20bc83400b
ACL/ArmNN update (#5515)
* Build ACL and ArmNN with custom library path

* Define import to tensor as a separate function for maintenance and readability

* Enabled optimized depthwise convolution for ACL v20.02

* Check operation status for ACL and ArmNN Execution Providers

* Enabled fused operation for convolution-activation

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-10-22 09:29:44 -07:00
Changming Sun
5802fe1699
Remove MKLML build config (#5559)
Remove MKLML build config
2020-10-21 13:11:25 -07:00
Guoyu Wang
915d475353
Android CI update (#5474)
* Update Android CI

* update comments
2020-10-14 16:56:50 -07:00
Wenbing Li
80d36eab86
enable the onnxruntime shared library test on iOS (#5443)
* enable the onnxruntime shared library test on iOS

* fixing as commented.

* add return status check.
2020-10-12 21:40:57 -07:00
liqunfu
dbe7e6623b
only use/import pytest if needed (by enable_training) (#5437)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-09 12:42:19 -07:00
liqunfu
1cceefc7d4
use run_orttraining_test_orttrainer_frontend_separately to work aroun… (#5408)
* use run_orttraining_test_orttrainer_frontend_separately to work around a sporadic segfault.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-09 09:16:10 -07:00
Changming Sun
09aef240d6
Skip running onnx tests in python mac os pipeline (#5416) 2020-10-08 11:49:28 -07:00
Hariharan Seshadri
6f54113a1b
Support OrtValue binding in Python to enable interesting IOBinding scenarios in Python (#5248) 2020-10-06 21:14:41 -07:00
Guoyu Wang
b4934b0016
Mitigate pybind11 build break using Xcode 12 on macOS (#5381)
* turn dev_mode off if we are using macos to build python with xcode 12

* Address CR comments

* Add ways to check compiler version
2020-10-06 19:03:33 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
Wenbing Li
4721729fdc
Enable iOS CI pipeline (#5360)
* add the ios ci build.

* no dependency on mac ci pipeline.

* fix the command line.

* keep sync

* automatically retrieve sdpath

* fix the case errors and warnings

* fix the vlog switch issue.

* add parallel flag for build.

* update the display name of the pipeline.
2020-10-02 20:14:45 -07:00
Guoyu Wang
9df0790856
Update linux minimal CI to report Android mininal baseline binary size (#5361)
* Update linux minimal CI to report Android mininal baseline binary size

* Fix some issues in the script
2020-10-02 17:35:23 -07:00
Ashwini Khade
ce49cfa67c
add support for configurable build dir when building nuget packages (#5352)
* add support for configurable build dir when building nuget packages

* rename vars
2020-10-02 09:31:35 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Wenbing Li
ed102e9d88
Add iOS test pipeline and a sample app. (#5298)
* Add iOS test pipeline and a sample app.

* clean up the unused code.

* clean up.

* revert the unknown change

* disable the shared library for iOS.

* add open source notice text.

* ignore the skipped test.

* extract the common ortenv setup
2020-09-29 13:53:11 -07:00
liqunfu
24d8b1bf42
to skip an unstable test to unblock release (#5314)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-28 22:30:11 -07:00
Changming Sun
1a04b8f8b7
Add valgrind support to our cmake files (#5296) 2020-09-28 09:31:08 -07:00
Guoyu Wang
d957dbebea
Fix possible ios build break after update to Xcode 12 (#5246)
* Fix possible ios build break after update to Xcode 12

* Address comments
2020-09-22 07:42:54 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
Tiago Koji Castro Shibata
cd663d58f5
Fix WinML warnings (#5228) 2020-09-19 12:41:42 -07:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
Guoyu Wang
8156e0dd10
[ORT Mobile] Some updates to iOS/Android build settings (#5184)
* Update android CI and build settings

* add build_java to arm64 also

* Add ios signing param

* fix a small build warning

* address pr comments
2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata
f3f119a945
Use onecore umbrella lib in onecore builds (#5182)
* delayload hack

* Skip tests

* Onecore uses onecore umbrella

* Uncomment tests

* cleanup

* Disable dev mode for WinML
2020-09-16 10:46:27 -07:00
Changming Sun
8946d212bf
Remove the dependency on CUDA SDk's version.txt (#5155) 2020-09-14 14:25:28 -07:00
Wenbing Li
2a456d16c0
Enable onnxruntime iOS shared library build. (#5148) 2020-09-14 10:32:39 -07:00
Scott McKay
323a1ba8a4
Add option to exclude support for loading ORT format models in full build. (#5129)
* Add ability to exclude support for loading ORT format models.
Disable support for ORT format models in packages
2020-09-12 12:21:30 +10:00
Tiago Koji Castro Shibata
62848c4de5
Add store builds to nuget packaging (#5040)
* Nuget store packaging

* Move DNNL workaround to EP

* Fix warning as error

* Disable store tests

* Skip store tests

* msbuild target

* Cross compile protoc in Store

* Disable DML in store

* Move store builds to CPU queue

* Copy uap10 to final nuget

* Fix pip8 error

* Remove extra dml copies

* Fix argparse

* pep8

* Forward IsStoreBuild

* Apply is_store_build to duplicate generate_nuspec

* runtimes

* Refactor uap10

* Store .NET

* uap

* PR feedback
2020-09-09 21:38:14 -07:00
Scott McKay
dbf4e7019d
Add ability to generate configuration file with required operators. (#5089)
* Add ability to generate configuration file with required operators.
2020-09-09 21:39:17 +10:00
Scott McKay
80ada0291f
Improve the minimal build size on android and linux (#5086)
Fix bug where linux build fails when python is enabled and rtti is disabled
Update doco for new build settings
2020-09-09 21:38:34 +10:00
gwang-msft
a1a81470e3
Add minimal build binary size verification (arm64) to Android CI (#5087)
* Add minimal build binary size verification (arm64) to Android CI

* Add comments in the CI ymal
2020-09-09 19:06:20 +10:00
Cameron Maske
4553b2eecd
Expose DirectML provider to python (conflicts resolved from #3359) (#4630) 2020-09-08 14:34:09 -07:00
liqunfu
de58720a97
Liqun/transformer test and e2e golden numbers (#5064)
* match new/old api numbers

* new golden numbers for Roberta and MC

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-04 18:11:37 -07:00
Scott McKay
b5c2932ae8
Last major set of ORT format model changes (#5056)
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
2020-09-05 07:59:01 +10:00
Thiago Crepaldi
0fc9c504fe
Re-enable CI tests for the new PyTorch frontend (#5017)
This PR includes:

* Re-enable CI tests for new PyTorch frontend
* Re-enable fp16 and adjust tolerances for number matching
2020-09-04 09:36:24 -07:00
Andrews548
bd215b79a2
ACL v20.02 (#4981)
* Add ACL version 20.02

* fix loging typo

* check depthwise operation based on group param

* Generate ArmNN runtime inside class constructor

* Update to the latest ONNX operation set

* Update BUILD.md

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-09-03 20:44:27 -07:00
Nat Kershaw (MSFT)
8a03b6e5c7
Render Operator documentation as compliant markdown (#3658) 2020-09-02 15:07:50 -07:00
RandySheriffH
14b51d6502
CiPipeline@ReducedOpsBuild (#4917)
* cancel night build on pyop

* setup ci pipeline for build of reduced ops

* add back c# test

* remove debugging print

* add testing model

* add more arg in pipeline script

* disable pipeline trigger temporarily

* fix yaml format

* fix yaml format

* fix pipeline error

* rid c# test

* add ops for test cases

* add Conv from domain com.microsoft.nchwc

* remove --reduce_ops

* fix typo

* remove --build_java

* add test case for excluded op

* update doc with --skip_test

* formatting code, renaming files and simplify yaml

* remove debug build from yaml

* remove surplus ops from included_ops.txt

* add MinSizeRel build to yaml

* rename test cases and models

* exclude ir test from minimum build

* restrict ir test to be only applied to reduced ops build
2020-08-31 21:21:18 -07:00
Ashwini Khade
0d3bbfdd0f
enable nuget packaging in local builds (#4884)
* enable building nuget packages

* add nuget creation from build.py

* add documentation

* fix flake8 errors

* fix nuget package version

* enable csharp tests

* update csharp tests

* copy nuget packges to nuget-artifacts

* add libmklml_gnu

* plus review updates

* fix references for release builds
2020-08-26 12:33:48 -07:00
Rayan-Krishnan
eb05db5a2a
Fix OptimizerConfig params groups (#4877)
* Copy samples to build folder and load models from there. Fix CI
* This PR also includes a fix to path validation for save_as_onnx API
* Add torchtext to CI for GPU training
* Remove new frontend tests from CI

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2020-08-22 22:04:17 -07:00
liqunfu
6260d073b3
Glue parallel training (#4550)
add mpi size, rank python API

add single node parallel training example
2020-08-21 21:24:27 -07:00
RandySheriffH
3fa73a5b6a
ReduceBinarySize (#4747)
* cancel night build on pyop

* add rewriter to rewrite cpu provider

* skip BuildKernelCreateInfo<void>

* refactor variable name and comment

* include ops from csv file

* process multiple eps

* add default function to cuda provider

* rename function and add license header

* fix import

* add doc

* fix typo

* deal with empty kernel entry in cuda

* rename the rewriter file

* add comment into provider file

* add comment and rename function

* log warnings

* refactor extracting logic

* add entry for script to run solo

* add better example

* avoid onnx importing

* fix flake8 alerts

* minor fixes to better comments and doc

* add entries for all domains

* add void entry into contrib providers

* format cuda_contrib_kernels.cc

* format cpu_contrib_kernels.cc

* add all providers

* add default entry to all providers

* include op_kernel header

* cancelling change in providers beyond cpu/cuda

* rename file and switch file format to domain;opset;op1,op2...

* update doc

* restore non-regular ending grammar in cuda_contrib_kernels.cc

* add ort_root as input argument of script

* enable test in ci

* update doc

* update doc

* revert change on linux gnu ci

* switch to set to host ops

* simplify trimming logic

* add domain map to track current model

* allow ort_root to take relative path
2020-08-21 19:50:13 -07:00
Thiago Crepaldi
42408aa3ed
Add new PytTrch front-end (#4815)
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)

Add ORTTrainerOptions class and some placeholders

* Add _ORTTrainerModelDesc to perform validation for model description (#4416)

* Add Loss Scaler classes to the new frontend (#4306)

* Add TrainStepInfo used on the new frontend API (#4256)

* Add Optimizer classes to the new frontend (#4280)

* Add LRScheduler implementation (#4357)

* Add basic ORTTrainer API (#4435)

This PR presents the public API for ORTTrainer for the short term
development.

It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session

* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)

* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)

* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions

This PR keeps the public API intact, but changes how model description is stored on the backend

Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).

With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.

Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.

This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.

Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None

* Rename input name for test

* Add ONNX Model Export to New Frontend (#4612)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Create training session + minor improvements (#4668)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Save ONNX model in file (#4671)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add eval step (#4674)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add train_step (#4677)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add LR Scheduler (#4694)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add deterministic compute tests (#4716)


Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add Mixed precision/LossScaler + several fixes (#4739)

Additionally to the mixed precision/loss scaler code, this PR includes:

* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests

* Add ONNX BERT example + minor fixes (#4757)

* Fix training issue when passing ONNX file into ORTTrainer

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add Dynamic Shape support (#4758)

* Update DeepSpeed Zero Stage option to a separate option group (#4772)

* Add support to fetches (#4777)

* Add Gradient Accumulation Steps support (#4793)

* Fix Dynamic Axes feature and add unit test (#4795)

* Add frozen weights test (#4807)

* Move new pytorch front-end to 'experimental' namespace (#4814)

* Fix build

Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-17 09:45:25 -07:00
Changming Sun
5eec4f66ed
Refactor manylinux docker image and the related pipelines (#4751)
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
2020-08-17 09:40:31 -07:00
gwang-msft
8507bc1f48
[Android NNAPI EP] Enable test for BatchNormalization, enable dev_mode for Android, fix some issues in concat (#4715)
* update batch_norm test, enable dev_mode for nnapi, ignore onnx protobuf warning for nnapi ep

* fix some issues in concat and mark input without shape as not supported for now

* address review comments

* addressed comments
2020-08-06 14:11:59 -07:00
Changming Sun
1e054739b8
Remove the requirement of CUDA's version.txt (#4706)
Sometimes there is a file named "version.txt" in your CUDA installation dir, but sometimes there isn't one. I couldn't figure out it why, but the latest CUDA 11 on our CI build machines doesn't have this file. As the file is not needed for building onnxruntime, so I removed the check.
2020-08-04 17:03:40 -07:00
RandySheriffH
1fcd3eb376
cancel night build on pyop (#4673) 2020-07-30 19:51:52 -07:00
gwang-msft
c2ec3b734b
[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576)
* remove dependency of external jd-dnnlibrary

* remove extra variables not used any more

* update /cgmanifest.json
2020-07-22 14:08:12 -07:00
Andrews548
f20afc4991
Update ACL/ArmNN EP (#4571)
* Add BN to ArmNN EP

* Add Concat to ArmNN EP

* ACL logging improvements

* ArmNN logging improvements

* Fallback to CPU for 9x9 convolution in ACL EP

* Fallback to CPU for 9x9 convolution in ArmNN EP

* Enable python support for ACL and ArmNN EPs when compiled with BSP toolchain

* Removed the matmul operator

* Fix conv infer shape function

* Fix provider_names list for armnn

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-07-21 22:25:58 -07:00
Changming Sun
bc1d197ddf
Re-enable dnnl in CI build (#4544)
* Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)"

Previously it fails because it used too much memory.
Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.
2020-07-19 23:20:03 -07:00
Changming Sun
8ada440961
Move model tests to onnxruntime_test_all (#4521)
1. Move model tests to onnxruntime_test_all
2. Publish TestResults of Windows CI build.
2020-07-15 16:46:18 -07:00