onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Edward Chen	919c270f3c	Increase build timeouts.	2020-11-09 22:26:27 -08:00
Chi Lo	92292de135	Tensorrt perf tool (#5436 ) * Add YAML file for pipeline * Modify typo * Add working directory * Modify and test * Modfiy and test * Modify and test * Modify and test * Modify * Modify * Modify * Modify * Make sure to copy all the result files * Add clearn up * Modify * Modify agent pool name * Upload only specific artifacts * Modify * Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target * Fix bug * Fix bug * Add reading the information regarding previously known failing models and then skip testing them during benchmark/validation * Modify the script file for CI * Replace print with logger.info * Fix bug * Fix bug * Refine the code * Modify the script so that it can capture script segmentation fault while running ORT * Fix bug * fix bug * fix bug * Add debug info * fix bug * Refine perf code * Refine the code * fix bug * Code refactoring * change many-models path * remove metadata after validation/benchmark are done * Update README.md * Fix bug so that metadata doesn't hold stale value * Remove hardcode and update README * Add arguments to the script to make it run correctly * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Fix bug so that metadata doesn't hold stale value * Fix small bug of finding test dataset directory for FP16 test data, as well as modification of some output information * use -i random for perf test of TRT changes Co-authored-by: Olivia Jain <oljain@microsoft.com>	2020-11-06 12:27:42 -08:00
RandySheriffH	71f90e08f1	Nuget packaging no omp (#5666 ) * create new nuget packaging pipeline without openmp * rename package * update image name * rename package name * rename managed package * reset project attribute * merge master * set package name * set NoOpenMP as cpu build * shorten line length Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-11-06 11:43:35 -08:00
Tiago Koji Castro Shibata	9e68e98423	Add static CRT DLLs to Nuget package (#5661 ) * Add static runtime yaml option * Add to WAI Nuget build matrix * Support empty build flags * Add DML to x64 * Bundle static rt * Bundle after Nugets are built * Fix typo * Skip static tests * Pack test artifact only in x64 dynamic * No DML static runtime * Add Store static * Revert "Add Store static" This reverts commit `69133e5838`. * Static subfolder	2020-11-05 09:26:17 -08:00
Changming Sun	357a51c75c	Update python packaging pipeline's docker image (#5680 )	2020-11-03 12:01:36 -08:00
Ashwini Khade	1cca903680	update onnx commit id (#5594 ) * update onnx commit id * update onnx commit for docker images * update docker images	2020-11-02 09:46:36 -08:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Changming Sun	e6956be40c	Publish no-openmp python packages to test pypi (#5610 ) Publish no-openmp python packages to test pypi	2020-10-28 19:49:53 -07:00
liqunfu	92662659ba	Liqun/remove number matching (#5606 ) replace number matching with relaxed comparison in frontend tests Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-27 21:27:37 -07:00
Changming Sun	5802fe1699	Remove MKLML build config (#5559 ) Remove MKLML build config	2020-10-21 13:11:25 -07:00
Ashwini Khade	df22611026	Update ONNX commit (#5487 ) * update ONNX * update onnx + register kernels for reduction ops * bug fix kernel reg * update cgmanifests * revert unsqueeze op 13 registration * filter ops which are not implemented yet * filter some tests * update onnx commit to include conv transpose bug fix * update docker images * undo not required test changes * fix test failures	2020-10-21 07:22:20 -07:00
Guoyu Wang	915d475353	Android CI update (#5474 ) * Update Android CI * update comments	2020-10-14 16:56:50 -07:00
sfatimar	6d2a30eae3	[OPENVINO-EP] 2021.1 Release (#5431 ) * Cmake changes for 2021.1 * added new ov version 2020.1 for faster rcnn * Added missing defs * equal op modified * changes to incoroporate faster rcnn * backend util.cc * hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp * changing myriad precision bool to i32 * gather is not enabled for gpu * conv2D and pooltest auto_pad attribute should not be null * negative indices are not valid for scatter op in myriad * non max suppression op only supported in faster rcnn mode * maxpool indices output is not supported * Cleaned redundant code in backends * Added ifdefs for HDDL config * cast output dimensions check topk operator k input it seems only resolved for myriad as it is throwing issues for ask rcnn . need to verify * we are limiting the subgraph size to 3 here * taking care of review comments * Fixed minor bugs * Modified Slice op checks * Added NonZero, Upsample * Removed TopK if it's in the middle of a subgraph * incorporated upsample conditions too * Dockerfile changes for 2021.1 release * dockerfile aptkey update * Minor fixes * ceil condition added again * Fixed few gpu models * Disabled LSTM and yolov3 in ModelTests * python softmax cross entropy tests and negative log likelihood * Update Build.md Updated for openvino 2021.1 * Update OpenVINO-ExecutionProvider.md update openvino execution provider for 2021.1 * Update READMe.md updated new openvino version * Update Dockerfile.openvino added environment variable for DEBIAN Frontend * Fixed myriad models * Fixed gather condition * Fixed mask rcnn model on myriad * Modified Gather condition * set default target of MCR dockerfile to MYRIAD_FP16 * Fixed tinyolov3 on CPU * Update OpenVINO-ExecutionProvider.md update openvino execution provider documentation * Update Dockerfile.openvino Removed environment variable * Update OpenVINO-ExecutionProvider.md update image manipulation networks supported * Update onnx_backend_test_series_filters.jsonc removed test_upsample_nearest from cpu test cases * New InternalCI changes for 2021.1 * Full protobuf removed for OpenVINO * Protobuf added * Updated with apt installation for openvino * Revert the testing changes * Reverted testing changes * File permessions are changed to original * Deleted openvino installation and cmake change * Optimized Dockerfile Removed unnecessary cmake installation, numpy * Added missing ifdefs * delete array fix * backend_utils.cc output_shape * Revert "set default target of MCR dockerfile to MYRIAD_FP16" This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d. Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Aravind <aravindx.gunda@intel.com> Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>	2020-10-14 15:56:00 -07:00
Pranav Sharma	c2c78399ee	Include config keys header file in the release packages for Linux and Mac. (#5388 )	2020-10-08 15:00:29 -07:00
Changming Sun	09aef240d6	Skip running onnx tests in python mac os pipeline (#5416 )	2020-10-08 11:49:28 -07:00
liqunfu	773992c7d4	Liqun/bert pretrain tb (#5377 ) * add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-06 16:28:31 -07:00
Wenbing Li	4721729fdc	Enable iOS CI pipeline (#5360 ) * add the ios ci build. * no dependency on mac ci pipeline. * fix the command line. * keep sync * automatically retrieve sdpath * fix the case errors and warnings * fix the vlog switch issue. * add parallel flag for build. * update the display name of the pipeline.	2020-10-02 20:14:45 -07:00
Guoyu Wang	9df0790856	Update linux minimal CI to report Android mininal baseline binary size (#5361 ) * Update linux minimal CI to report Android mininal baseline binary size * Fix some issues in the script	2020-10-02 17:35:23 -07:00
edgchen1	d62873a331	Docker image release build updates (#5326 ) - Update docker image release build to use build commit. - Use valid default in component governance detection step. - Use smaller docker build context.	2020-10-01 12:25:31 -07:00
liqunfu	fe50213491	Liqun/bert pretrain2 (#5327 ) * bert single node multi GPU pretrain w/o checkpoint Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-01 11:01:26 -07:00
Changming Sun	17f1178c2e	Downgrade GCC (#5269 ) Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-09-24 21:14:54 -07:00
Dmitri Smirnov	89742411ec	Insert telemetry template into GPU build, add telemry build switches. (#5278 )	2020-09-24 17:13:09 -07:00
edgchen1	6d5b93b805	Synchronize training dependency versions between Docker image and Python wheel. (#5261 ) Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.	2020-09-23 19:03:42 -07:00
suffian khan	417929b049	jobs timeout ..	2020-09-21 21:51:59 -07:00
Xueyun Zhu	55e4b5d302	add pipeline distributed training test (#5222 ) * add pipeline distributed training test * fix max line length error in windows build * function header indent * fix * fix flake8 error	2020-09-21 14:35:01 -07:00
Guoyu Wang	78a29aebbc	[ORT Mobile] ORT Minimal E2E CI (#5200 ) * Modify the ort minimal CI to ort minimal e2e ci	2020-09-19 18:43:22 +10:00
KeDengMS	ce3b67e0cd	[Python] Move symbolic_shape_infer from nuphar to tools (#5162 ) * [Python] Move symbolic shape inference from nuphar to tools * Fix PEP8 ERROR	2020-09-18 09:31:06 -07:00
liqunfu	f37e1292a1	--shm-size=1024m to fix nccl shared memory issue (#5214 ) * --shm-size=256m to fix nccl shared memory issue Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-17 17:21:47 -07:00
Guoyu Wang	8156e0dd10	[ORT Mobile] Some updates to iOS/Android build settings (#5184 ) * Update android CI and build settings * add build_java to arm64 also * Add ios signing param * fix a small build warning * address pr comments	2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata	1a2e289d2d	Fix nuget build (#5163 ) * Fix nuget content * Revert "Fix nuget content" This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443. * Nuget packaging * skip tests * msbuild path * Force msbuild version * Workaround https://github.com/NuGet/Home/issues/7621 * cleanup	2020-09-16 10:37:09 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Scott McKay	089789c135	Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168 )	2020-09-15 15:11:06 +10:00
RandySheriffH	1dde215d96	promote cuda version on packacking pipelines (#5154 ) * promote cuda version on packacking pipelines * fix cudnn version in py packaing template Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-14 21:09:09 -07:00
RandySheriffH	9392aa2f64	Promote Cuda version to 10.2 for windows pipelines (#5138 )	2020-09-13 20:32:06 -07:00
Scott McKay	323a1ba8a4	Add option to exclude support for loading ORT format models in full build. (#5129 ) * Add ability to exclude support for loading ORT format models. Disable support for ORT format models in packages	2020-09-12 12:21:30 +10:00
RandySheriffH	120e3cda74	fix path (#5131 ) Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-11 12:18:07 -07:00
Changming Sun	c5efb0085d	Update Linux GPU build pipelines to CUDA 10.2 (#5120 ) * Update Linux GPU build pipelines to CUDA 10.2	2020-09-10 17:40:51 -07:00
Changming Sun	a5530358c9	Fix a path problem in Dockerfile.manylinux2014_cuda10_2 (#5106 )	2020-09-10 10:30:13 -07:00
Tiago Koji Castro Shibata	62848c4de5	Add store builds to nuget packaging (#5040 ) * Nuget store packaging * Move DNNL workaround to EP * Fix warning as error * Disable store tests * Skip store tests * msbuild target * Cross compile protoc in Store * Disable DML in store * Move store builds to CPU queue * Copy uap10 to final nuget * Fix pip8 error * Remove extra dml copies * Fix argparse * pep8 * Forward IsStoreBuild * Apply is_store_build to duplicate generate_nuspec * runtimes * Refactor uap10 * Store .NET * uap * PR feedback	2020-09-09 21:38:14 -07:00
RandySheriffH	5e10cde006	PipelinesForCuda11Cudnn8 (#4938 ) * cancel night build on pyop * setup win cuda11 pipeline * add debug build * test base gpu settings * setup pipelines to test cuda 10.2 and 11 * rename linux docker images * rename docker image tag and add clean up job * fix typo in cuda 11 config * set cuda11 env * update linux cuda 11 pipeline * reset docker image name * disable uninitialized warning from linux build * change the way to silence uninitialized warning * add flags to linux gpu pipeline * switch docker image for linux cuda 10.2 * switch linuc cuda 10.2 image * test cuda11 with devtool8 * try latest built images Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-09 16:13:58 -07:00
Changming Sun	924ecb0623	Use manylinux2014 for Linux CPU build (#5091 )	2020-09-09 10:09:52 -07:00
gwang-msft	a1a81470e3	Add minimal build binary size verification (arm64) to Android CI (#5087 ) * Add minimal build binary size verification (arm64) to Android CI * Add comments in the CI ymal	2020-09-09 19:06:20 +10:00
gwang-msft	a40d34386a	Add Linux CPU CI for ORT minimal build (#5074 ) * initial test version * update yml * minor updates * minor updates * Test minimal build * update with include ops for minimal build ut only * error case to see build failure * test no_exceptio * Remove error cases * address pr comments Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-09-08 17:09:33 -07:00
Changming Sun	370d194db7	Add a docker file for CI build CUDA 10.2 (#5065 )	2020-09-04 16:28:45 -07:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Changming Sun	d5d5e37e76	Build system enhancements (#5012 ) 1. Add a docker file for CUDA11 2. Support setting CUDA_ARCHITECTURES from command line.	2020-09-02 10:13:26 -07:00
RandySheriffH	14b51d6502	CiPipeline@ReducedOpsBuild (#4917 ) * cancel night build on pyop * setup ci pipeline for build of reduced ops * add back c# test * remove debugging print * add testing model * add more arg in pipeline script * disable pipeline trigger temporarily * fix yaml format * fix yaml format * fix pipeline error * rid c# test * add ops for test cases * add Conv from domain com.microsoft.nchwc * remove --reduce_ops * fix typo * remove --build_java * add test case for excluded op * update doc with --skip_test * formatting code, renaming files and simplify yaml * remove debug build from yaml * remove surplus ops from included_ops.txt * add MinSizeRel build to yaml * rename test cases and models * exclude ir test from minimum build * restrict ir test to be only applied to reduced ops build	2020-08-31 21:21:18 -07:00
Ashwini Khade	8679a7244e	Enable rejecting models based on onnx opset (#4912 ) * enable rejecting models based on onnx opset * enable unreleased opsets in linux and mac CI * test fixes and more updates * enable unreleased opsets in CI builds * enable released opsets in linux cis * try fix windows ci yml * yml fixes * update yml * yml updates post master merge * review comments * bug fix	2020-08-31 13:35:36 -07:00
Hariharan Seshadri	b945225de3	Include DirectML pdb in x86 bin folder (#4953 )	2020-08-28 11:29:26 -07:00
Changming Sun	c37fa7c278	Delete Dockerfile.centos6_gpu (#4851 )	2020-08-28 09:56:52 -07:00
edgchen1	71d8846635	Fix telemetry-steps.yml (#4903 ) Fix bug in telemetry-steps.yml that causes telemetry setup to be disabled even if TELEMETRYGUID is set.	2020-08-24 22:14:40 -07:00
Changming Sun	f34ed3a576	Hot fix for the python packaging pipeline Linux ARM build (#4902 )	2020-08-24 20:14:33 -07:00
Rayan-Krishnan	eb05db5a2a	Fix OptimizerConfig params groups (#4877 ) * Copy samples to build folder and load models from there. Fix CI * This PR also includes a fix to path validation for save_as_onnx API * Add torchtext to CI for GPU training * Remove new frontend tests from CI Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2020-08-22 22:04:17 -07:00
liqunfu	6260d073b3	Glue parallel training (#4550 ) add mpi size, rank python API add single node parallel training example	2020-08-21 21:24:27 -07:00
Yulong Wang	c6119a548c	enable telemetry in node.js binding	2020-08-20 09:47:57 -07:00
suryasidd	3a00b50cf8	[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836 ) * Removed building ngraph from source * Disabled some tests temporarily * Enabled softmax for all dims * Added onnx importer to link libraries * int64 changes * fixed * temp * slice update start and end need to be initializer * Disabled GatherND, ScatterND, ReverseSequence operators * Added supported ops instead of unsupported ops * Set precision only for CPU * Removed some unecessary conditions * Fixed segfault in slice * Softmax restriction removed * changes * Setting precision for all plugins * Changes added to include precision and supported ops for gpu and vpu * branch op support * checking for disabled python test failure * mapped input names and tensors directly rather than copying which was leading to mismatch * last index is not supported mkldnn does not support pow between integers * included the code changes * Rename inner-scoped variable to avoid MSVC warning * applied changed to vadm as well and removed the utility function getinputtensors() completely * OpenVINO multi version support: CMake changes * OpenVINO multi version support: C++ support * removed commented code * Remove redundant code lines * Revert "Rename inner-scoped variable to avoid MSVC warning" This reverts commit 2f650493162675bc6fb70730de9656ec400be332. Merged separately in master. * vadm changes disabled reduction op test * putting test_gather_negative_indices in unsupported list for now * Update MCR Dockerfile with 2020.4 Installs OpenVINO 2020.4 from deb packages via APT tool. * Update build docs with 2020.4 info * Update dockerfile with OV 2020.4 info Instructions for building OpenVINO based docker image no longer require downloading installer package as it is installed by the dockerfile using OpenVINO 2020.4 APT package for Ubuntu 18.04 * Added constant folding bypass logic * Added cout statements for ci * Added NDEBUG flag for debug symbols * Update Ops info in docs * fixes multiple unit tests * mathoptest.ceil disabled for gpu and myriad * activation test temp disabled * Fix models for CPU * Fixed a syntax error * local cmmit * fixing unit tests for myriad * Fixed Variadic Split, Topk issues * fix_model commit * Fix models in myriad * Added ifdefs for OpenVINO 2020.4 * temp * made some changes to not operator * Added unused parameter * relu enabled * Fixed bug in Conv output * Consolidated GPU failing tests into one category * Made it compatible to InternalCI 2020.4 * Made changes for ngraph * Disabled test for mask,fastercnn,tinyyolov3 * Removed proxy for ci * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * Updated documentation for 2020.4 * Removed FP32 to FP16 transformation for GPU * Disabled Coreml-FNS-Candy model test * Added FP16 transformations Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: intel <you@example.com> Co-authored-by: gundaarx <aravindx.gunda@intel.com>	2020-08-19 23:18:08 -07:00
Changming Sun	1ba07ccfaf	Codesign validator fixes	2020-08-18 16:20:15 -07:00
Changming Sun	e98697ec28	Fix nuget cpu package pipeline (#4832 )	2020-08-17 17:08:48 -07:00
Ksenija Stanojevic	ea37a4d89b	Add Trilu custom op (#4537 ) Co-authored-by: neginraoof <neginmr@utexas.edu>	2020-08-17 14:42:26 -07:00
Thiago Crepaldi	42408aa3ed	Add new PytTrch front-end (#4815 ) * Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-17 09:45:25 -07:00
Changming Sun	5eec4f66ed	Refactor manylinux docker image and the related pipelines (#4751 ) 1. Publish the image ACR, instead of building it every time for every PR 2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect. 3. Split nuphar and DNNL to separated pipelines. 4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc. 5. Update the manylinux2010_x86_64 image to the latest.	2020-08-17 09:40:31 -07:00
Yulong Wang	aa993e95c9	enable build flag '--use_openmp' on MacOS (#4774 ) * enable build flag '--use_openmp' on MacOS * cmake 3.16.1 to enable find_package(OpenMP) on mac	2020-08-13 15:56:42 -07:00
jingyanwangms	adda8c66d9	Docker image release pipeline (#4682 ) * create orttraining-1p-linux-gpu-ci-pipeline.yml * fix syntax * fix file path * fix template path * publish docker image to test acr * use right task name * change parameter list * use variables * use python.version * remove --enable_onnx_tests due to segfault * add back --enable_onnx_tests * fix docker push command line * change docker login command * login differently * fix docker tag script * create password.txt * add ortrelease docker image * enable test in build.sh * add pipeline parameter * add pipeline parameter * change timeout * change timeout * fix run_dockerbuild.sh * use PR checkin build docker * fix strategy syntax * fix strategy syntax * change dockerfile * change run_dockerbuild.sh * change tag name * build with root user * use build id for docker image tag * remove all user lines * change docker tag * add mpi, mellanox * add missing args * use release dockerfile for ci build * remove install wheel * use release docker image * fix syntax * use different pool * add Dockerfile.training * remove sudo to run on Linux-Multi-GPU-V100 * change docker file path * update dockerfile * use latest dockerfile * change agent pool * remove --preserve-env * add back parameter * Add test_flag * use azuredevops docker * change repository * use cmd for docker login * echo build script * use ortrelrease ACR * change key vault connection * Move --build flag * change build command * add paramter for image tag * clean up for PR * remove unnecessary changes * whitespace changes * whitespace changes * change build flag * change flag name * change flag * use latest dockerfile * enable build tests * build builder stage and run test * Add back python.version * change build directory * always run build entire dockerfile * fix yml syntax * fix syntax * add en-UTF8 locale * rename * remove unused template * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Test commit sha1 in pipeline * fix parameter * update docker file * fix --from=build * remove commented blocks * PR comments * fix syntax * fix syntax * use timestamp as build number * remove latest tag * add build_timestamp variable * remove wrong property * fix docker run command * test build id * Use datestamp build id * change build tags * add no-cache to docker build * rename BUILD_VERSION -> BUILD_CONFIG Co-authored-by: Jingyan Wang <jingywa@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-12 13:29:37 -07:00
Dmitri Smirnov	ac4997665a	Make Java Publishing and Java GPU pipelines to run nightly (#4749 ) Schedule Java daily Bump up iInux GPU build timeout	2020-08-10 17:38:45 -07:00
stevenlix	77c69a0325	Upgrade TensorRT to v7.1.3.4 (#4704 ) * upgrade to TensorRT 7.1.3.4 * Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4 * fix format issue * fix format issue * fix format issue * Update tensorrt_execution_provider.cc * change cmake version to 3.14 * Remove --msvc_toolset 14.16 * change to onnxruntime::make_unique * use onnxruntime::make_unique * disable some tests for TensorRT * disable some tests for TensorRT * Update upsample_op_test.cc * Update tile_op_test.cc * disable some tests for TensorRT * Update constant_of_shape_test.cc * update parser * Update Dockerfile.ubuntu_tensorrt	2020-08-07 17:43:56 -07:00
Sheil Kumar	5c5efa900d	Add .NET Core 3.0 nuget e2e pipeline tests (#4695 ) * bump cswinrt version * add cswinrt * test dotnetcore 3.0 * rename buildpacakge source * set folder path to the package source and not the version * refactor .netframework tests * build .net core anycpu Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-08-05 13:02:24 -07:00
Changming Sun	d0297f8d24	Add 'Install ONNX' step to Windows GPU pipeline (#4696 ) Add 'Install ONNX' step to Windows GPU pipeline Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073	2020-08-03 18:51:24 -07:00
Changming Sun	01ca6392cb	Avoid building ONNX of every history ONNX versions in our CI (#4678 ) 1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail. 2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.	2020-08-03 10:18:10 -07:00
Changming Sun	f9f25c5559	Remove featurizer from CI build (#4661 )	2020-07-30 18:37:55 -07:00
Changming Sun	51332e3c81	Change Linux CI build time out value to 3 hours (#4664 ) Because it often need more than 1 hr 55 minutes, increase the value so that we'll less likely see pipeline failed.	2020-07-30 02:52:05 -07:00
Xiang Zhang	d73e01e5b9	remove ENABLE_TELEMETRY macro (#4633 )	2020-07-27 20:06:11 -07:00
gwang-msft	c2ec3b734b	[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576 ) * remove dependency of external jd-dnnlibrary * remove extra variables not used any more * update /cgmanifest.json	2020-07-22 14:08:12 -07:00
Sheil Kumar	fa6d035090	Create WindowsAI zip files automatically as part of the pipeline (#4584 ) * copy rename nupkg to zip as part of build task * update both symbols and regular package Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-22 10:53:47 -07:00
Changming Sun	c2c4e6760b	Fix code sign validation errors in nuget and nodejs pipeline (#4527 )	2020-07-20 14:18:47 -07:00
Changming Sun	bc1d197ddf	Re-enable dnnl in CI build (#4544 ) * Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)" Previously it fails because it used too much memory. Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.	2020-07-19 23:20:03 -07:00
Yulong Wang	5086e55a35	Fix condition of running tests in win CI (#4459 )	2020-07-16 16:33:30 -07:00
Changming Sun	8ada440961	Move model tests to onnxruntime_test_all (#4521 ) 1. Move model tests to onnxruntime_test_all 2. Publish TestResults of Windows CI build.	2020-07-15 16:46:18 -07:00
edgchen1	34f73fa1aa	Add sudo --preserve-env option to allow environment to go through to docker commands. (#4512 )	2020-07-14 18:12:31 -07:00
liqunfu	f721f5f1cd	Liqun/multiple choice (#4480 ) * multiple choice runner * add docker cleanup task to frontent pipeline	2020-07-14 17:57:58 -07:00
Sheil Kumar	ee5ca27ae2	Split Microsoft.AI.MachineLearning.nupkg in a NuGet package and symbol NuGet package (#4503 ) * add threadpool interface * generate snupkgs * include_pdb check * fix snupkg generation * Add task to merge snupkgs * folder exists * check dir * revert thread pool stuff Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-14 14:52:39 -07:00
gwang-msft	5f8f443ac4	Android CI build, test copy, emulator boot improvement (#4481 ) * Enable onnxruntime_test_all for NNAPI EP * switch to use ninja for ANdroid CI * make android elumator boot faster in android ci * simplify adb push * more style change * more tweaking on android ci * build.py style update	2020-07-13 14:18:34 -07:00
Dmitri Smirnov	35ee00d888	Pin typing version. (#4490 )	2020-07-13 11:48:30 -07:00
Hariharan Seshadri	26ebcfab88	Fix Nuget GPU pipeline (#4462 )	2020-07-10 14:02:28 -07:00
Yulong Wang	bec18eb3f4	[Node.js binding] support CentOS 7 in CI (#4447 )	2020-07-09 00:59:50 -07:00
Negin Raoof	71aec2adcb	Custom op export test template (#4383 ) * Adding pytorch custom op export tests to CI * Test clean build * Fix export for intended failure * update export script * Build onnxruntime	2020-07-08 10:14:56 -07:00
Hariharan Seshadri	6d6b6b54a5	Support binding a graph output to a specific device via the Python binding (#4439 )	2020-07-07 21:09:37 -07:00
Sheil Kumar	fdb4a3a2e8	Add cppwinrt and cswinrt tests in windowsai nuget pipeline (#4381 ) * build e2e cppwinrt tests * add use nuget task * make all referenced to package version prop/target-ified * remove dupe props/targets reference * work around project.assets.json error by deleting it * powershell test invocation * switch to batch script * print debug info * update x86->x64 * stdio.h * pushd/popd * add csharp tests * package.config -> packages.config * typo * x86 -> anycpu * debug is default * add test path * update csproj as well * debug * really replace all package versions * debug output * really use [PackageVersion] * sleep intead of converting async operation to task and waiting * dont close software bitmap * switch to powershell script * remove binding check * continue on failure * continuse on error action * continueOnError and errorActionPreference * tabbing Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-07 09:36:42 -07:00
suffiank	7a05b3ca87	Increase python packaging pipeline timeout (#4412 ) * increase python packaging pipeline from 90 to 110 min * change timeout to Linux GPU and do 120 min to match Win GPU	2020-07-02 15:38:39 -07:00
gwang-msft	0bef9d5114	Fix the broken Android NNAPI CI (#4403 ) * Change NNAPI CI to run on new NNAPI EP * update android ci to mac 10.15 and remove in install cmake * update the android ci to targe android api level 29 * remove unnecessary ndk install git submodule call	2020-07-02 10:22:18 -07:00
Changming Sun	3bb6a865cc	Revert "remove openmp and scipy from build pipelines (#4305 )"	2020-07-02 00:30:02 -07:00
Tiago Koji Castro Shibata	7fea332f93	Support builds without RTTI (#4333 ) * Support builds without RTTI * Disable RTTI in all builds	2020-07-01 13:05:35 -07:00
Dmitri Smirnov	49268c42da	Change the way java home is set on Mac OS for CI and Java publishing pipeline (#4385 ) * Change the way java_home is set on Mac. * Change the way JAVA_HOME is set on Mac OS	2020-07-01 07:37:14 -07:00
Negin Raoof	37cbe8551d	Adding export registration and tests for custom ops (#4248 )	2020-06-25 22:29:02 -07:00
Changming Sun	5db67ec000	Fix python package issue and upgrade the linux image to 2010 (#4342 ) 1. Increase job timeout, while we are investigating why the tests take much longer 2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy) 3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.	2020-06-25 20:22:39 -07:00
Dmitri Smirnov	a08805daf9	Fix a minor typon in POM file name (#4250 ) Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-06-25 11:15:14 -07:00
Changming Sun	deea945f80	Remove openmp and scipy from build pipelines (#4305 ) 1. Remove openmp because the default thread pool is already good enough. 2. Remove scipy from build pipelines because it stops support python 3.5.	2020-06-23 20:18:16 -07:00
edgchen1	4e39fda06a	Fix version of torch and torchvision in install_deps.sh. (#4316 )	2020-06-23 14:55:18 -07:00
edgchen1	737c22a911	Refactor Python packaging builds (#4283 ) Reuse the same template file for all Python packaging builds.	2020-06-22 17:13:22 -07:00
Pranav Sharma	2204d39a06	Add build option to disable traditional ML ops from the binary. (#4272 ) * Add build option to disable traditional ML ops from the binary. * Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources. * Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.	2020-06-20 06:36:06 -07:00
Changming Sun	0349479b19	Fix component governance and codesign validation errors (#4277 ) Adjust the job steps so that these security tasks run before the build directory clean up.	2020-06-18 15:54:18 -07:00

1 2 3 4 5 ...

574 commits