onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Adam Louly	3bb5fb0f90	moving training pipelines from cuda 11.5 to 11.6 and deprecating 11.3 (packaging pipeline) (#12688 ) * moving training pipelines from cuda 11.5 to 11.6 and deprecating cuda 11.3 * change to cuda 11.6.2 * change pytorch's & torchvision's cuda version to 11.6 * specify deps version to 11.6.2 * update pytorch and torch text version * torch 1.12.1 * change torchvision and torchtext version to be compatible with torch 1.12.1 * change cuda to 11.6 for cuda_home comaptibility Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-25 22:12:01 -07:00
Adam Louly	94f76b944e	nightly pipeline build using PTCA image. (#12605 ) * nightly pipeline yaml and requirements files * changed names, removed torchvision installing * delete old file Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-24 10:40:55 -07:00
Wei-Sheng Chin	dc486d146b	Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460 ) * Make ORT as Pytorch JIT backend LORT likely doesn't work with aten fallback so we only test LORT in its own CI. * Revert changes to enable external CUDA allocator. Will add it later. Revert "Revert changes to enable external CUDA allocator. Will add it later." This reverts commit d5487f2e193014c805505afae8fb577c53667658. Fix external allocator * Relax tolerance and remove commented code * Print more information in CI * Fix pointer * Address comments. 1. Reuse ORT-eager mode's environment. 2. Remove unused ctor. * Use Pytorch master branch as all PRs are merged Fix * Refine based on cpplint feedbacks * Revert changes to allow custom CUDA allocator in public APIs * Use torch.testing.assert_close * Use unittest framework * Switch docker repo * Rename .cpp to .cc * Address comments * Add comment * Use same pipeline file for eager and lort pipelines * Address comments * Add yaml comment * Fix cmake files * Address comments * Rename flags, remove printing code, remove dead comment	2022-08-22 09:40:40 -07:00
Changming Sun	b270334e1e	Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644 )	2022-08-18 22:11:48 -07:00
Changming Sun	e810480403	Replace the occurrences of "master" to "main" in yaml files (#12534 )	2022-08-09 22:03:21 -07:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
PeixuanZuo	3e1b0ac4b3	[DELETE] delete python package rocm4.3.1 (#12480 ) [delete] delete rocm4.3.1	2022-08-05 13:27:42 +08:00
Changming Sun	7b4ce0c1e1	Delete the build scripts that were copied from manylinux project (#12358 ) 1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead. 2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343 3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR) 4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.	2022-07-29 18:24:19 -07:00
Jian Chen	7a7e372b9f	Remove training cuda 10.2 pipeline (#12347 ) * update to 2022 * Update the VS version * Rolling back to gcc 10 * Rolling back * Update cuda home * remove "CMAKE_CUDA_ARCHITECTURES=52" * update cuda Architure to 70 * Delete cuda 10.2 training pipeline * rolling back a mistake * Update win-gpu-reduce-op-ci-pipeline.yml * Update win-gpu-reduce-op-ci-pipeline.yml * Update win-gpu-reduce-op-ci-pipeline.yml * Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.10.0_cu10.2 directory * Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_cu10.2 directory	2022-07-28 14:58:17 -04:00
msftlincoln	9cf6912bba	Fix ORT Eager Mode to work with Pytorch 1.12 (#12323 )	2022-07-27 16:24:46 -04:00
pengwa	2b2367efbf	Fix orttraining-linux-gpu-ci-pipeline (fairscale dependency) (#12320 ) authored by: @pengwa	2022-07-26 15:11:04 -07:00
LironKesem	9647a3be40	Add tests for all unary aten ops supported in eager mode (#12087 ) * Add tests for all uniary aten ops supported in eager mode * fixing the PR draft * fixing the merge * changing eval to be at compile time * adding requirements for eager * 1.adding function to {ops}_out 2.cleaning the code and adding comments * editing the code according to code review Co-authored-by: root <root@AHA-LIRONKESE-1>	2022-07-12 08:53:19 -04:00
PeixuanZuo	1c39d22f4e	[ADD] Rocm5.2 for Rocm python packaging pipeline (#12129 ) [ADD] rocm5.2	2022-07-11 11:10:45 +08:00
Wil Brady	fdf12a5c35	Fix windows eager build break by pinning to torch version 1.11.0 (#12033 ) Fix windows and linux eager build to torch 1.11.0.	2022-06-30 07:01:13 -04:00
pengwa	c398ad513f	Fix orttraining-linux-ci-pipeline - Symbolic shape infer (#11965 ) fix symbolic shape error due to upgraded numpy + legacy sympy	2022-06-23 08:23:36 -07:00
Gary Miguel	4bf22e2a40	Update ONNX to 1.12 (#11924 ) Follow-ups that need to happen after this and before the next ORT release: * Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731 * Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778 Follow-ups that need to happen after this but don't necessarily need to happen before the release: * Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916 Fixes #11640	2022-06-21 17:19:52 -07:00
Adrian Lizarraga	b20daeda81	Update Linux Multi GPU TensorRT pipeline to TensorRT 8.4 (#11923 ) * Try manually installing trt8.4 in multi-gpu pipeline * Remove stmts that clean up cmake, ctest. Update tensorrt repository name passed to get_docker_image.py * Update trt and cudnn home * Don't install trtexec cli tool. * Increase job timeout * Revert timeout change and use trt placeholder builder build option	2022-06-21 07:59:11 -07:00
Yi Zhang	7f1e9e8c67	Bash: there should be a whitespace after not operator. (#11910 ) add whitespace after not	2022-06-21 05:14:32 +08:00
sfatimar	f97bd38c4f	UEP 4.1 release (#11834 ) * Add pypi build changes to latest Master * Add ORT training part of OV build * Disabling SqueezeOpTest.BadAxes * Add ONNXruntime branch ARG to Docker build * Changes to include file details versions * Commit File Version Updates * Change naming for linux build * Add fix for pylint format errors * Fix pylint warnings. * Fix pylint errors - stage 2 Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com> * Fix pylint errors - stage 3 * Fix pylint format - stage4 Signed-off-by: Preetha Veeramalai <preetha.veeramalai@intel.com> * Commit for Wheel Release >0.35.1 Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com>	2022-06-17 14:49:04 -07:00
Yi Zhang	f70201c801	Make sure the command works in both centos and ubuntu. (#11894 ) make one bash condition compatible with POSIX	2022-06-17 12:19:22 -07:00
Adrian Lizarraga	ad4abbd75e	[EP-Perf-Dashboard] Add support for TensorRT 8.4 to EP Perf Dashboard (#11876 ) Co-authored-by: George Wu <jywu@microsoft.com>	2022-06-17 09:16:51 -07:00
Yi Zhang	8bb0062873	add manylinux_2_27 CPU wheel (#11886 ) * add manylinux_2_27 * minor refactory * change base image * minor refactor * add tests * fix condition	2022-06-17 19:38:38 +08:00
Changming Sun	10478a09ca	Revert "add manylinux_2_27 wheel (#11832 )" This reverts commit `bbace23d0c`.	2022-06-16 18:28:12 -07:00
George Wu	df5ee6aa4e	[TensorRT EP] support TensorRT 8.4 (#11866 ) * update trt 8.4ga * trt 8.4 linux ci pipeline * fix cmake * placeholder_builder * trt 8.4 windows pipeline * gpu package pipeline * trt 8.4.1.5 , packaging pipeline updates * python packaging * ctest timeout * python packaging test * bump timeout * python format * format * revert * newline * enable trt python tests * typo * python format * disable on windows	2022-06-16 07:46:40 -07:00
Yi Zhang	bbace23d0c	add manylinux_2_27 wheel (#11832 ) * add manylinux_2_27	2022-06-15 10:26:51 +08:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Valery Chernov	4296968f20	[TVM EP] update set input method for VirtualMachine (#11674 ) * update TVM * get alignment constant from TVM * update TVM_VM_SetInputs to upstream with TVM API * fix CI issue: update TVM EP dependencies * add sudo * revert changes needed to install missing package * add package for TVM EP CI Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2022-06-04 09:31:01 +02:00
Changming Sun	d5e34acb82	Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651 )	2022-06-03 20:00:54 -07:00
leqiao-1	2ac3649752	Update requirements.txt (#11682 ) set protobuf version	2022-06-01 12:31:21 +08:00
Changming Sun	6a45f9f059	Pin protobuf version to 3.18.1 (#11645 )	2022-05-26 21:14:56 -07:00
PeixuanZuo	c556f5f22f	Add AMD python package ROCm5.1.1+torch1.11 (#11516 ) * [FIX] fix name error * [ADD] add rocm5.1.1 python package * [ADD] torch1.10.0 rocm requirements * [UPDATE] update docker Repository name	2022-05-16 08:14:11 +08:00
Changming Sun	027fc1d391	Completely delete ORT server	2022-05-10 22:02:21 -07:00
Changming Sun	903743e823	Delete unused TRT docker files (#11486 ) * Delete unused TRT docker files * revert tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11_4_tensorrt8_0	2022-05-10 22:00:53 -07:00
Changming Sun	0ac2e6e546	Update install-entrypoint.sh: add version lock for NCCL (#11475 )	2022-05-10 15:37:55 -07:00
Justin Chu	a1f9847b23	[Fix] Add the extra param to match gelu in PyTorch in the contrib symbolic function (#11318 ) Description: Add the extra param to match gelu in PyTorch in the contrib symbolic function Motivation and Context Why is this change required? What problem does it solve? The symbolic function in /onnxruntime/python/tools/pytorch_export_contrib_ops.py is missing a recently added parameter approximate. We add this parameter and use the exporter defined gelu if approximate is "tanh".	2022-05-04 10:36:38 -07:00
Olivia Jain	49d7050b88	Create Checkout Submodules Script (#11344 ) * move all logic for ubuntu dockerfiles * pass in trt version * update trt 8.0 file * downgrade protobuf * uncomment * and * change to 8.0 * update dockerfiles * checkout protobuf based on version * adding last dockerfile: : * checkout 3.10 protobuf * fix checkout version * update to 8.2 * keep only one submodule sync * cleanup * Delete Dockerfile.custom-trt-perf * create checkout submodules script * properly compare decimals in bin/sh * combine build ort paths * deprecate TRT 7.2 * only checkout protobuf if we checkout older onnx-tensorrt * only pull nvidia container if true, update image * downgrade protobuf only if we checkout onnx-trt * Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines * Add quotes to avoid path splitting * address shellcheck * use shellcheck suggestions	2022-04-29 13:04:26 -07:00
Justin Chu	fdce4fa6af	Format all python files under onnxruntime with black and isort (#11324 ) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316	2022-04-26 09:35:16 -07:00
ytaous	eec5187801	Remove Rocm 4.2 from CI Build (#11130 ) * remove rocm42 CI * update torch to v1.11.0 Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-07 11:42:09 -07:00
Maajid khan	81fa28bc56	OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025 ) * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Modification to include new api 2.0 changes in the code * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Log comments updated * Changes to enable 2.0 api * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix build issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes issues Fixes compiler warnings c4458 on windows. Fixes the bug in device_type check logic Adds print info for enable_opencl_throttling option in onnxruntime_perf_test Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> commit to make openvino_2021.4 compatible * Fixed IO Buffer Optimization * Fix output names issue * Fix 2021.3 branch * Bug Fix for Multiple inputs/outputs - Assigns the right output_name and input_name for the graph when returned by CompiledModel::inputs() OV function. - Also takex care of output mismatch issue b/w openvino output and onnx output Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add comments for the changes made Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * IO Buffer Changes * Commit for Disabling GPU Throttling for 2021.4 * Updated branch * Fix windows build ->Fixed windows build in debug mode ->Disabled scatternd3_tensor_int64 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed CPP Unit tests for CPU -Fixed shrink, MVN, ReduceL2, Maxpool, upsample, scatter, slice, reshape, unsqueeze. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed first set of GPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed additional failing tests on GPU ->Added conditions to disable certain ops under certain conditions ->Disabled certain tests ->Added some op supports for no_dimension supported Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added Expand op support for CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added condition for squeeze op ->Shape can't have empty axes attribute Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add support for LessOrEqual op function Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV Interface wait for replaced by indefinite wait call * use names from ONNX model to access OV tensors This chnage is to use the input/output names retrieved from original onnx model to access OV tensors and to check if there's any input or output names mismatch b/w ONNX naming and OV naming. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes Myriad unit tests and other issues ->Fixes Myriad CPP unit tests ->Fixes output mismatch issue with models with sub graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix segfault issue ->Fixed case 3b condition in get_capability() which was causing the segfault issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed build isuse with ov 2021.4 with I/O buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disables performance counters for I/O Buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed inputs/outputs mismatch for HDDL with 2022.1 Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com> * Fix to enable GPU FP16 * Enabled mlperf_ssd_mobilenet_300 model fully on CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added ov version specific dll packaging for nuget * Fixed conditions for few ops Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Dockerfile updates * Updated License Info -Updated the copyrights License Info -modified FP16 transformations with OV 2022.1 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling mlperf_ssd_mobilenet_300 model ->Disabled this model for openvino. The test is failing in Internal_CI pipelines. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling failing python CPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 python errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: hdgx <harinix.d.g@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>	2022-04-06 13:30:33 -07:00
Changming Sun	fc7fe0012f	Fix: nodejs installer file name is wrong (#11097 )	2022-04-04 16:24:08 -07:00
Baiju Meswani	f9940f17b1	Remove extra-index-url to avoid nuget security analysis vulnerability (#11082 )	2022-04-01 18:30:55 -07:00
Baiju Meswani	249c4dec7f	Update orttraining release pipelines to use torch 1.11.0 (#11018 ) * Update orttraining release pipelines to use torch 1.11.0 * Change requirements_torch...txt to requirements.txt * Update cuda cmake architectures and clean up old files	2022-03-31 21:51:06 -07:00
dependabot[bot]	79e4ed8064	Bump pytorch-lightning Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.5.10 to 1.6.0. - [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases) - [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md) - [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.5.10...1.6.0) --- updated-dependencies: - dependency-name: pytorch-lightning dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-31 16:51:24 -07:00
Yulong Wang	179406bd25	[JS] upgrade package-lock.json from v1 to v2 (#11039 ) * upgrade package-lock.json from v1 to v2 * upgrade requirement of nodejs version to 16.x	2022-03-30 13:30:28 -07:00
raviskolli	480c793125	Update training packages to Pytorch 1.11.0 (#10851 ) * Update ortmodule training packages to Pytorch 1.11.0 Co-authored-by: Harshitha Venkata <havenka@microsoft.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2022-03-22 16:45:51 -07:00
leqiao-1	a6ea278502	add python3.10 support (#10848 ) * add python3.10 support * upgrade numpy version in build pipeline * add python 3.10 path * upgrade torch version in build pipeline * update docker run arguments * change torch version * fix typo * fix permission issue * change python version * remove python3.10 for openvino build * remove python 3.10 for openvino build	2022-03-21 09:46:02 +08:00
Chun-Wei Chen	5202efd11e	remove unused six in code and CIs (#10832 )	2022-03-10 15:38:44 -08:00
Changming Sun	cc6bc34c8c	Update protobuf submodule (#10801 )	2022-03-09 09:37:58 -08:00
dependabot[bot]	7e04dccca7	Bump numpy in /tools/ci_build/github/linux/docker/scripts (#10385 ) Bumps [numpy](https://github.com/numpy/numpy) from 1.16.6 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.16.6...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-08 11:02:36 -08:00
Changming Sun	6260733533	Fix eager mode pipeline (#10802 ) It was still using python 3.6	2022-03-08 09:26:20 -08:00
liqun Fu	da885a72e8	update with onnx 1.11 release (#10441 )	2022-03-07 21:10:55 -08:00
dependabot[bot]	4d943c9bd3	Bump numpy from 1.16.6 to 1.21.0 in /tools/ci_build/github/linux/docker/scripts/manylinux (#10387 ) * Bump numpy in /tools/ci_build/github/linux/docker/scripts/manylinux	2022-03-07 20:39:49 -08:00
PeixuanZuo	c07a27a008	[FIX] delete python3.6 from AMD python package docker image builder (#10790 ) * [UPDATE] delete python3.6 to cooperate numpy==1.21.0 * [UPDATE] delete python3.6 to cooperate numpy==1.21.0	2022-03-07 18:21:43 -08:00
dependabot[bot]	e3c85d4262	Bump numpy Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:51:32 -08:00
dependabot[bot]	b780a3784e	Bump numpy in /tools/ci_build/github/linux/docker/scripts/training Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:38:38 -08:00
dependabot[bot]	0b0e8ccf92	Bump numpy Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:34:58 -08:00
Baiju Meswani	f9b6eef05f	orttraining packaging pipeline for rocm 5.0.1 (#10725 )	2022-03-02 12:32:14 -08:00
leqiao-1	8d06e5a9df	Add openvino base image option (#10581 ) * add selectable python package build pipeline * update tensorrt version * update tensorrt version * Update Dockerfile.ubuntu_openvino * Update install_ubuntu.sh * add parameters for openvino base image * fix syntax error	2022-02-17 17:10:01 +08:00
leqiao-1	f22cd3af5d	Leqiao/add selectable pipeline (#10560 ) * add selectable python package build pipeline * update tensorrt version * update tensorrt version	2022-02-16 09:07:29 +08:00
Changming Sun	feae842a7c	Update pytorch-lightning (#10421 )	2022-01-27 21:15:00 -08:00
Thiago Crepaldi	6a7d3deb22	Update pytorch-lightning (#10276 )	2022-01-14 16:49:10 -05:00
Baiju Meswani	2affd6e71e	orttraining packaging and ci pipelines to use cuda 11.3 (#10252 )	2022-01-13 13:36:33 -08:00
Olivia Jain	4048ed326c	Update EP Perf Pipeline (#10149 ) * migrate to 1ES Hosted Pool * migrate to Kusto database * refactor and organize ep names with ORT prefix * standardize TRT benchmarking with save/load engine, input binding, and workspace * Add TRT 8.2 to ep perf pipeline * update model_list.json with full onnx zoo * add anubis credentials * add anubis credentials * clarify trt variables * get system info from docker image * remove unwanted commenting	2022-01-11 16:12:32 -08:00
George Wu	91f85dfdad	update Dockerfile.manylinux2014_cuda11_4_tensorrt8_2 to TensorRT 8.2.2.1 (#10167 )	2022-01-03 20:38:37 -08:00
Abhishek Jindal	d5742f3a43	moving from torch nightly build to stable build (#10150 ) * moving from torch nightly build to stable build * using torch cpu version * using torch cpu version from link	2021-12-29 19:35:10 -08:00
George Wu	3d6786c92e	update tensorrt multi gpu pipeline to tensorrt 8.2 (#10141 )	2021-12-27 15:43:27 -08:00
George Wu	16274beb6f	update TensorRT EP to use TensorRT 8.2 (#9981 ) * update base image from 11.4.0 to 11.4.2 * update Linux TRT GPU pipeline to TRT 8.2 * update onnx-tensorrt to 8.2-GA * disable failing TensorRT 8.2 tests. * update pad test. * fix * update win trt ci pipeline to trt 8.2 * test run with cuda 11.4 and cudnn 8.2 * increase timeout * revert * revert * update packaging pipelines to use trt 8.2 * fix typo * update trt gpu perf pipeline to trt 8.2 * increase timeout * delete deprecated ci-perf-pipeline.yml * bump timeout * adjust timeout packaging	2021-12-15 15:59:31 -08:00
Suffian Khan	7e55a942cd	Add torch 1.10 requirements for rocm (#10028 )	2021-12-13 20:39:58 -08:00
Xavier Dupré	42c176b60c	Update default opset to 14 in ORTModule (#9743 ) * update to torch 1.10 * update torchvision version * update torchtext version * remove deprecated option enable_onnx_checker * add unit test to test gradient of GatherElements * add ORTMODULE_ONNX_OPSET_VERSION in a docker file	2021-12-09 12:45:35 +01:00
Tang, Cheng	8db49e3d0f	add ortmodule and eager mode test (#9888 ) * add ortmodule and eager mode test * add ortmodule dependency * fix eager pipeline * skip tthe ortmodule test for windows due to win ci issue * remove useless win ci change * add torch Co-authored-by: Abhishek Jindal <abjindal@microsoft.com>	2021-12-02 19:49:18 -08:00
Maajid khan	0ae0f29f14	[OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848 ) * Changes to ensure openvino build go through in Windows * Modified Hetero plugin Logic Modified Hetero Feature logic. In Hetero, if the operator to be marked true in getcapability(), it should be supported by either of the devices specified with HETERO in the device_type. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> OV updated to 2021.4.2 version * OV updated to 2021.4.2 version * Updated OV to 2021.4.2 version, mono download link and dotnet version * Copying Managed nugets in openvino c# docker file *Copying Managed nuget to nugets artifacts directory Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: saharfraza <sfatima.3001@gmail.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>	2021-11-23 13:12:08 -08:00
Changming Sun	4ca11b05a5	Remove python 3.10 from rocm docker image (#9749 ) * Remove python 3.10 from rocm docker image * update	2021-11-15 12:49:59 -08:00
raviskolli	9f4e8cf6a0	Update training pipelines to pytorch 1.10 (#9709 ) * Update training pipelines to pytorch 1.10 * Fixed a typo in cuda version. * Downgraded gcc to 8 for cuda 10.2	2021-11-15 11:21:55 -08:00
Changming Sun	de018f58e8	Update manylinux build scripts (#9701 )	2021-11-09 11:55:49 -08:00
Yulong Wang	c6fddb263f	Add Node.js binding support to packaging pipeline (#9577 )	2021-11-05 15:29:40 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
Changming Sun	87b1fddd97	Add Linux/MacOS ARM64 support to nuget packaging pipeline (#9570 )	2021-10-27 19:00:43 -07:00
Suffian Khan	47888392ab	Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101 ) * make work for both rocm 4.2 and rocm 4.3.1 * fix rocm 4.3.1 docker image reference * fix CUDA_VERSION to ROCM_VERSION * fix ReduceConsts conflict def * add ifdef to miopen_common.h as well * trailing ws	2021-09-19 23:36:03 -07:00
Maajid khan	7fc28cd539	[OpenVINO-EP] UEP v3.1 Release with OpenVINO 2021.4.1 (#9081 ) * 2021.4.1 Docker and ci changes * OV version change * Removing Imagescaler op from the op's list Reverting this change which was added in last PR. Imagescaler is now deprecated. so removing it from the supported list. Also this op is causing regression in the performance of the FP16 models. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Re-writing the help message for num_of_threads Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>	2021-09-16 17:09:07 -07:00
Suffian Khan	4322f7e647	Fix ROCm wheels CI pipeline break by installing latest protobuf from source (#9047 ) * install protobuf from source * fix rm command in Dockerfile * fix options on rm command * fix cd into protobuf source directory * try again * remove strip step * debug list the files * ls on /usr * more debug * more debug * adjust LD_LIBRARY_PATH * try remove protobuf before ORT build	2021-09-14 12:07:00 -07:00
baijumeswani	1422a9ba6b	Remove previous temporary fixes and address TODOs (#9020 )	2021-09-13 10:10:07 -07:00
Ashwini Khade	ec63d10303	add model local function support (#8540 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * draft - enable model local function * enable model local functions in ORT * update to latest rel onnx commit * plus tests * plus more updates * plus updates * test updates * Fix for nested functions + shape inference * plus bug fix and updates per review * plus fixes per review * plus test updates * plus updates per review * plus fixes * fix a test	2021-09-08 11:47:01 -07:00
Olivia Jain	a0c9408f0d	Make TRT Version Configurable (#8864 ) * copy changes from trt_and_mem * second edits * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * change to cuda 11.4 * build with cuda 11.4 * Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2 * add cmake extra defines * cmake architectures * fix cmake arch * Delete ubuntu-18.04.Dockerfile * Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2 * Update linux-gpu-tensorrt-ci-perf-pipeline.yml * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * removing previous ort args * rename to cuda 11.4 * remove cuda 10_2 * delete trt 7.1 * remove 7.1 * Passing in cuda architecture to reduce build time * always add submodule sync due to recursive cloning * fix run command * add and * take away unused arms and share python installation script * Update linux-gpu-tensorrt-ci-perf-pipeline.yml * Update Dockerfile.tensorrt * cleanup file * install python directly on dockerfile - move to scripts in future * Update Dockerfile.custom-trt-perf * adding cuda 11.1 for missing Libnvrtc.so.11.1 * Delete install_python.sh	2021-09-03 13:32:27 -07:00
liqun Fu	a7f5bd226b	retarget torch181 to torch182 (#8947 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-09-03 09:44:42 -07:00
Abhishek Jindal	868c8af9ac	Abjindal/eager mode pipeline (#8870 ) * Adding pipeline file for eager mode * adding the build eager mode flag * adding torch wheel files for installation * Changing pytorch version for change in wheel files * updating requirements file path * Removing Java and NodeJS from the build * removing import torch for testing build of eager mode * changing the build command * import torch * building eager mode separately * removing Java tests * python path issues * changing python path location * changing the build path file loc * installing torch before build * setting environment for building eager mode * Copying the build file and getting rid of flags * changing python path * adding missing packages * moving build eager mode code * changing python path to python3 * adding amd_hipify * adding logger file * install torch before build * change requirements file location * install torch before build eager * modifying eager mode build * modifying build location * adding new docker image * handling gradle move issue * Typo fix * changing deps file * adding java and nodejs * changing repo name for docker image * removing pybind * building only eager mode * changing the image name * removing install wheel package * build complete onnxruntime with eager mode * building wheel * enabling pybind * adding build eager mode flag in unit tests * removing build java nodejs * adding build command * removing java tests * moving Debug tests before Release * building Debug only case * changing debug test code * running the build eager mode with tests * adding build dir * adding build dir path * changing build dir path * changing build command for eager mode * building eager mode and running tests simultaneously * adding more flags to the pipeline * chaning flag * adding Debug and Release * changing torch to nightly build * changing torch version for nightly build * chaning torch version * move to Ubuntu image * adding pool * adding dockerfile for eager mode * adding python deps file for eager * modifying python deps file for eager * changing deps file * changing deps file statements * changing python path * REMOVING ECHO line * going to original docker file * changing docker file * changing to eager requirements file * changing python deps file * changing paths * changing cmake path * changing build script * changing python installation * running debug mode only * changing pipeline file * test name * test name * test name2 * changing requirements file * final flags for eager mode * previous pipeline * moving to ubuntu image and including some deps * adding cmake path * returning to manylinux image * removing unncecessary files for pipeline	2021-08-30 18:24:39 -07:00
Changming Sun	ced2d8e597	Clean up TRT docker files (#8847 )	2021-08-25 22:26:31 -07:00
Changming Sun	9cd7d836f7	Delete Dockerfile.ubuntu_for_android (#8848 )	2021-08-25 22:25:14 -07:00
liqun Fu	2beb873c6b	move training CI agent pools to 1ES hosted (#8775 )	2021-08-18 18:36:19 -07:00
Olivia Jain	60089f7093	Cuda11.4 (#8709 ) * initial update from 11.1 to 11.4 * change 11.4.1 to 11.4.0 * adjusting to match nvidia/cuda image tags * adjusting to match nvidia/cuda image tags centos7 * correction to 11.4.0 * correction to 11.4.0 * update to cuda 11.4 * change training back to 11.1 * change training back to 11.1 * point to correct nvcr.io/nvidia/cuda 11.4.1 image * change centos8 to centos7 * correct cudnn path * Update linux-gpu-ci-pipeline.yml for Azure Pipelines * Update c-api-noopenmp-packaging-pipelines.yml * need to resolve centos images but remove space and change to 11.4 * Update linux-gpu-ci-pipeline.yml * add cudnn to docker image * bump devtoolset to 10 * revert cuda 11.4 change to setup_env_trt * orttraining back to 11.1 * use nvcr.io * Fix previous change back to cuda 11.1 * update cudnn path * use cudnn image (revert if failure)	2021-08-17 16:36:26 -07:00
Changming Sun	f04a235c77	Update manylinux build scripts (#8724 ) Update manylinux build scripts. Sync it with the latest upstream.	2021-08-13 12:04:00 -07:00
liqun Fu	bec24ca4c1	create packaging pipeline to support cuda11.4 (#8663 )	2021-08-11 17:44:57 -07:00
Edward Chen	20f006c580	Remove flake8 check from CMake build. (#8662 )	2021-08-09 14:10:36 -07:00
Suffian Khan	6dd59a1117	revert onnx version (#8643 )	2021-08-09 05:53:40 -07:00
Ashwini Khade	96eb9810ba	Update onnx (#8458 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * plus more fixes * updates per review * update to release commit * add filters for optional type tests * plus updates	2021-08-05 09:21:44 -07:00
stevenlix	ee99fb400c	Upgrade TensorRT to v8.0.1 (#8512 ) * update onnx-tensorrt parser to master * disable unsupported tests * add cuda sm 75 for T4 * update tensorrt pipeline * update trt pipelines * update trt pipelines * Update linux-gpu-tensorrt-ci-pipeline.yml * update trt cid pipeline * Update linux-gpu-tensorrt-ci-pipeline.yml * Update Tensorrt Windows build pool and TensorRT/CUDA/CuDNN version * update to cuda11.4 in trt ci pipeline * update base image to cuda11.4 * update packaging pipeline to cuda11.4 * clean up * remove cuda11.1 and cuda11.3 docker file * disable unsupported tensorrt tests at runtime * Update linux-multi-gpu-tensorrt-ci-pipeline.yml	2021-08-02 11:20:31 -07:00
Changming Sun	0510688411	Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471 ) 1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths. 2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it. 3. Fix some parentheses warnings 4. Update cmake to the latest. 5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.	2021-07-30 17:16:37 -07:00
Thiago Crepaldi	9073c094d4	Update torch litghning and re-enable test	2021-07-22 14:18:07 -07:00
Adam Pocock	55b26b6951	[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader (#8013 )	2021-07-20 22:33:15 -07:00
Maajid khan	1686e8ff57	[OpenVINO-EP] 2021.4 Release (#8369 ) * Changes to ensure the openvino-ep-2021.4 branch is created * Fix failing cpp and python unit tests * Fixed Myriad Tests for Ov_2021.4 * Disabled failing python tests for myriad * Fixes models which were breaking w.r.t 2021.4 * Added fixes to Fix tinyyolov3 working on Myriad and MaskRcnn, FasterRcnn using GPU_FP32 * Added FP16 output data type support for ngraph * Implemented ReadNetwork() method ->Using Core::ReadNetwork() method for reading and creating a CNNNework ->Since OpenVINO™ 2020.4 version, Inference Engine enables reading ONNX models via the Inference Engine Core API and there is no need to use directly the low-level ONNX* Importer API anymore. To read ONNX* models, it's recommended to use the Core::ReadNetwork() method that provide a uniform way to read models from ONNX format. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed ngraph f16 supported output type Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added comments in data_ops.cc Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed broken windows build Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disable failing CPP tests on CPU Some of the convtranspose tests are failing on OpenVINO-EP CPU due to accuracy mismatch w.r.t default CPU. so currently we are disbaling these tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated for ov version 2021.4 * Changes to include qdq ops in code * Disabled failing python tests on GPU Disabled two maxpool python tests on GPU as they were passing but throwing segfault Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix the backward compatibility issue ReadNetwork() API has a bug and will only work starting from OpenVINO 2021.4 version. The previous versions will still have to use onnx importer route Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix CMakeLists.txt for OpenVINO EP If a directory with OpenVINO is sourced, the latest OpenVINO settings have to be imported. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>	2021-07-19 10:40:56 -07:00
baijumeswani	090bae21ab	Pinning pillow version to 8.2.0 to circumvent regression introduced by 8.3.0 (#8303 )	2021-07-06 13:02:39 -07:00
Suffian Khan	008c5f7640	Use single builder image across Python versions for ROCm wheels (#8302 ) * first attempt share docker image across python and torch versons * set dependency between jobs * fix yaml grammer * remove python version from first stage * clean deepspeed directroy * split into two images according torch version * fix yaml syntax * invalidate cache * remove DS to prevent torch 1.9.0 upgrade	2021-07-06 11:56:00 -07:00
baijumeswani	2bda2a62fd	Pin version of Pillow to 8.2.0 to circumvent noncompatibility with numpy (#8278 )	2021-07-02 09:05:49 -07:00
Thiago Crepaldi	83be3759bc	Add post-install command to build PyTorch CPP extensions from within onnxruntime package (#8027 ) ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised PyTorch CPP Extensions for ORTModule can be compiled by running: python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install Full build environment is needed for this	2021-06-28 18:11:58 -07:00
liqunfu	9366114028	make pipelines to support torch1.8.1 and torch1.9.0 (#8084 )	2021-06-25 14:55:49 -07:00
Negin Raoof	80b7b134bf	Adding optional ops in contrib ops (#7946 ) * Added optional const spec	2021-06-24 13:16:31 -07:00
Changming Sun	6e2b064aec	Delete some unused code in run_dockerbuild.sh and Enable Nuget CUDA tests (#8089 ) 1. Remove some unused code and simplify tools/ci_build/github/linux/run_dockerbuild.sh. 2. Enable Nuget CUDA tests. The original design was we could leverage Directory.Build.props and let cmake generate the required properties(USE_CUDA/...) there. However, in nuget packaging pipeline we test the package on a different host that doesn't run cmake command and doesn't have the auto-generated Directory.Build.props file.	2021-06-22 18:43:33 -07:00
Chi Lo	27d1784d44	Add TRT 7.1 Pipeline (#8073 ) * Revert for testing TensorRT 7.1 * change to origianl googletest version * change machine * remove build arg * change back machine * revert back googletest version * Make it ready to merge to master * revert onnx-tensorrt to v7.1 * rename yml * use [[ ]] in bash command * add sudo * add chmod * add correct path * change another way to revert onnx-tensorrt * change docker image to manylinux build	2021-06-21 20:57:04 -07:00
baijumeswani	7701c8703e	Add module attribute to ORTModule to support HuggingFace Trainer save_model (#8088 )	2021-06-18 13:13:45 -07:00
Suffian Khan	35ca3c99d1	Fix ROCm wheels pipeline after changes to manylinux scripts (#8026 ) * update * try fix rocm pipeline * avoid already isntalled error * ignore python3.10 since build fails * fix * try setting user * try again * try again * try again * fix script * disable inference docs generation * try print device id * fix name qual * try again * try again * try again * provider_options * add device verify * rty again * try again * try aggain * print video/render gid * try again * run as root * try again with uid, gid * cleanup * run as root * temp fix * add /bin/bash Co-authored-by: Changming Sun <chasun@microsoft.com>	2021-06-10 21:01:28 -07:00
pengwa	cb5f411da3	Fix Python Packaging Pipeline && Build Clean Up (#7993 ) * remove link to python * revert orttraining-linux-ci build env change introduced by pr https://github.com/microsoft/onnxruntime/pull/7993. * fix builds * fix builds * clean up * fix builds * Fix unused params * fix some comments.	2021-06-09 17:35:17 +08:00
Changming Sun	4ecbae43b2	Use GCC 10 in Linux CPU CI pipeline (#7985 )	2021-06-08 11:53:29 -07:00
pengwa	9e4dc08483	training with custom autograd Functions (#7513 ) * Register Torch Custom autograd.Function * Add flag to supress pybind11 warning * Avoid unnecessary include in cmake * Add missing reference * Add getter for registerred functions * Format for making subsquent changes cleaner * Fix interop feature build failure * Forward pass, run PyOP on CPU EP * clean up the code * Fix build * Define new ops * refactor pyop - extract PyOpLibProxy class * Hacks to run example * implement the kernel compute func * add back PyOP for comparision experiments * debug info - thread id * refine the kernels * Polish code (cherry picked from commit `4ed606f9a0`) * Fix a the Tensor address mismatch in C++ side * PythonOpGrad compute * add distributed test case * refine test cases * get dist.get_rank() in Autograd forward pass * Add CUDA kernels * Store float, int, and tuple of them as PythonOp's attributes * Populate local changes * Fix bugs * PythonOp/PythonOpGrad CUDA kernels * Support non-tensor inputs * Single GPU FP16 Run Pass (cherry picked from commit e539989e91e18ee997900292d3493b97d3eafa8a) * Fix segement * add basic test cases * Save progress * fix gradient builder for a Add op who have same inputs * add test cases for auto grad fallback feature * fix ref cnt issue. add thread id for debugging * POC: remove interface class * Remove interface classes * Clean a bit * Coarse-grained clean up after rebase master * reset pyop and language_interop_ops to latest master * Fix missing part during merge * re-structure torch related language interop files * Fix build * Fix tests and build * Fix build and basic unit tests * Fix most of uts * remove unnecessary import * clean up and fix build when enabling language_interop_ops * Fix single-GPU UTs * Move runner register into ORT package * Update dist UTs to new style * Also fix distributed UTs and leaf gradient problem * Static generation for constant args * Move arg_positions_ to static field * Rename some functions * Move arg ceration into a function * Clean output logic in PythonOp * Move PythonOp's ctor * Revise PythonOpGrad * Fix "ORT only supports contiguous tensor for now" for inputs * Fix evaulation mode error, add test & clean up * clean up codes * Fix issues introduced by recent master change (enabled symbolic shape infer) * automatically register forward/backward function pointers && clean up * Fix multi-output case * Add a test back * fix build and clean up * RAII for function params PyObject * Use new exporter * Clean full name in new exporter * Fix UTs * Format a file * Add "inplace" back Remove a legacy comment * Refine TorchProxy 1. Make TorchProxy a formal singleton class. 2. Remove unused Scope class. 3. Simplify the call to Forward and Backward. The two functions now automatically acquire and release GIL state, so user doesn't need any GIL-related calls. * Format * Add lock to avoid racing condition when registering Python objs * Fix Python call param ref issues && Add RefcountTracker for debug build && Clean up * clean up print * Resolve part of comments && clean up * Fix a potential bug * track pyobject consistently * move kernels to cpu provider as base class * Refactor - 1. Extract PythonOpBase/PythonOpGradBase 2. Implement CPU kernels 3. Test coverage for CPU kernels * Refine register code * Add a missing macro * Release python call result objects with PythonObjectPtr && Add UnRegisterContext && Track PyObject for Debugging && Clena up * Fix random segfault issue - relasing a wrong ctx pointer for inplace cases * put ref count in debug macro * Move GIL out * Refine tests * Fix memory leak issue && forward output lifecycle issue: 1. Unregister the OrtValue PythonObject. Currently, the OrtValue shared same buffer with PythonOp/PythonOpGrad's output. So after those kernels outputs are released, the "leaked" OrtValue caused the shared buffer cannot be released. 2. According PyTorch forward+backward execution. The forward outputs (e.g. torch tensors) maintains the context/saved variables/dirty inputs, etc, which are used for backward execution, so its life should be after the backward runs. This change added such a depencencies between PythonOpGrad on PythonOp. * Move dlpack->ortvalue into C++ to avoid temp object registration * Fix the over released Py_False/Py_True && refine tests * Clean up unused functions * Always assume the first forward output is context so we don't need to test unused cases. * Fix a memory leak * move-copy unique_ptr & avoid C-style casting * Use inplace attribute to determine if input tensors are copied * Move DlpackCapsuleDestructor's to a common place * Thread-safe TorchProxy * Use OrtValue instead of OrtValue* * Only keep checks for Debug build * Wrap some long line per comment * onnx_export_type --> kwargs * Use requires_grads to create PythonOpGrad's inputs * add missing files during master merge * Fix build issue after merge * Address two comments. 1. Internalize DlpackCapsuleDestructor 2. Change "(" to "]" for describing closed interval. * Address some comments. 1. "override" -> "overwrite" to avoid using reserved keyword. 2. Call DLPack's helper to create OrtValue for avoiding repeated code. * Address comments. 1. Pass std::mutex to registeration helpers so their callers don't have to lock the mutex expclicitly. 2. Rename "func_context_pool_mutex_" to "mutex_". This mutex is the global mutex for OrtTorchFunctionPool. * Add bridging code to make cuda kernels work with merged master * put debue macro check within RefCountTracker && use default logger for debug info && remove useless ortvalue_ptr interface && typos && revert unncessary blank line changes * fix some comments * Resolve more comments * Capitalize a word * use unique_ptr instead of ObjectPointer for PyObject management && add converntion * Support symbolic shape * Remove unused variable * fix build * Enable function registration for training only && rectify ToDlpack/FromDlpack merge with master. * Don't add context for non-PythonOp opeartors (for example AtenOp) * Fix build error * Polish frontend part. 1. Avoid adding kwargs to ORTModule's ctor 2. Use onnx_export_type rather than kwargs for type safty 3. Fix some build bugs. * Resolve simpler comments * Resolve export related comments * sync master && fix tests && fix non-training build error * Fix build errors * add target link lib * windows build error * Fix orttraining-linux-ci build * disable autograd test && clean up * fix linux orttraining ci build * try fixing win build error * Revise append calls in runner * Enable custom function using a function * Rename to avoid using reservied keyword * Use list comprehension * Set ORT random seed in tests * Remove print code and fix ctx shape * [] -> list() * Move autograd.Function and nn.Module into corresponding functions * Move test helpers * Polish dist test a bit. Tried move helpers to helper file but it causes a deadlock. * trying fix undefined reference * Context is not managed by global pool * Polish dist test * Polish dist test * Add enable_custom_autograd_function * Remove enable_custom_autograd_function from ctors * Add doc strings * Shorter code * Address comments * Add one empty line * revert a minor and not needed change * Address comments * Back to reference * Fix windows builds * Fix windows debug build fail to find "'python39_d.lib'" * fix mac build error * revert _to_contiguous change * add debugging tag for orttraining-cpu-ci * Fix the wrong PYTHON_LIBRARIES which is affected by PYTHON_LIBRARY given in build command * add debugging info * Fix the build in this case: PYTHON_LIBDIR: /opt/_internal/cpython-3.7.10/lib, PYTHON_EXECUTABLE: /opt/python/cp37-cp37m/bin/python3, PYTHON_MULTIARCH: x86_64-linux-gnu PYTHON_LIBRARY_PATH python3.7m * fix build error due to python lib not found * Fixes 1. Release PyObject's 2. Not useing deepcopy because we assume autograd.Function's non-tensor inputs are static (constants) so there should be no side effect after calling any autograd.Function multiple times. * Revert dtoc for decreasing refcnt * add debugging log * add debugging tag * Fix a small leak * Remove ONNX_FALLTHROUGH flag * debug tag * debug tag * fix builds * remove debug tag * fix build * fix builds * fix build * install python3 in centos, in case there is no libpython3.xm.so * build python so for redhat * add training cpu specific docker, build python so inside * revert build-cpython change * try fixing numpy include issue * install_deps after re-installing cpython * fix build && remove debug tag * install openssl before cpython * let's say: builds pass! * add build flag for torch iterop, only enable it when training+Python is enabled * skip ComputeBroadcastBackwardAxesDynamic for the shared inputs * fix build * add debug info for padgrad test * Fix builds * Split dlpack_converter into C++ and Python interfaces respecitively. Then different build use them as needed. * clean up the changes * fix addsubgradient builder * Fix builds * clean up * clean up * Address some comments. 1. Use pointer wraper to avoid calling Py_DECREF 2. Remove unregister_* functions 3. Allow repeated registration by skipping those with existing keys 4. Unregister context in PythonOpGrad * Fix over-released Py_Boolean Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2021-06-07 13:01:21 -07:00
Changming Sun	5a7f65b831	Fix training e2e pipeline (#7942 ) 1. Fix training e2e pipeline. The failure was caused by my recent change #7632. The fix is adding "--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=70" to the build parameters because the machines are with V100 GPUs. 2. Simplify Nuphar pipeline. It doesn't need to install a separated ONNX version(1.5.0) 3. Fix a problem that run_dockerbuild.sh ignored OS version parameter. Now because it starts to take effect, I also set python version to the system default one(3.8 for ubuntu 20.04)	2021-06-04 09:37:09 -07:00
Changming Sun	b854f2399d	Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632 ) 1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master. 2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs). 3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2. (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11) 4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 . 5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use. 6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04. 7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1 8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines. In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build. 9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines. 10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.	2021-06-02 23:36:49 -07:00
Thiago Crepaldi	c45ac166d3	Add graphviz into Dockerfile images for Python API documentation (#7819 )	2021-06-02 16:12:54 -07:00
Suffian Khan	02c78a8aa8	test migration to rocm4.2 (#7800 )	2021-05-24 11:48:44 -07:00
Changming Sun	ee29330cab	Delete unused file: Dockerfile.ubuntu_gpu (#7797 )	2021-05-21 17:05:35 -07:00
liqunfu	f6eb0f76ae	to used cudnn7 to build onnxruntime-training wheel with Cuda 10.2 support (#7760 )	2021-05-20 09:18:41 -07:00
Changming Sun	3a68c389d9	Add version lock to manylinux build scripts (#7755 )	2021-05-19 09:28:40 -07:00
Changming Sun	38d90b0f15	Cleanup install_deps.sh (#7734 )	2021-05-17 19:27:47 -07:00
liqunfu	d604281a86	Liqun/training pkg to run tests (#7662 )	2021-05-16 09:10:57 -07:00
liqunfu	3ead2f2f39	update pt lightning version (#7711 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-05-15 21:46:16 -07:00
liqunfu	359fe1d197	Liqun/ort training version (#7620 )	2021-05-14 09:54:19 -07:00
ashbhandare	56e993a434	Bump to rel-1.9.1 (#7684 )	2021-05-13 18:41:28 -07:00
Changming Sun	41e370c2b3	Update protobuf to 3.16 (#7616 )	2021-05-07 14:09:23 -07:00
baijumeswani	f3a70f1aec	Ignore invalid input argument to install_os_deps.sh (#7566 )	2021-05-05 14:33:31 -07:00
Changming Sun	a284eede64	Fix Linux CPU pipeline (#7584 )	2021-05-05 13:26:10 -07:00
George Wu	faea7a222d	linux trt package pipeline (#7537 )	2021-05-03 19:14:20 -07:00
baijumeswani	cab84d902e	Install and use conda on ortmodule CI pipelines (#7530 ) * Install and use conda on ortmodule CI pipelines * Update build script to install onnxruntime wheel before running unit tests * Remove python 3.5 from install_python_deps * Pinning deepspeed version to 0.3.15	2021-05-03 15:52:22 -07:00
liqunfu	196e6702ad	to support multiple cuda versions in published onnxruntime-training package (#7468 ) to support multiple CUDA versions in published onnxruntime-training package	2021-04-27 17:15:33 -07:00
Suffian Khan	7a3c1787af	Add CI pipeline to publish Python training package targeting Rocm (#7417 ) * first attempt rocm training wheel * modifications needed to python packaging pipeline for Rocm 4.1 * changges to not conflict with cuda missed stage1 changes remove package push add option r to getopt try again without python install try again without python install try again without python install split pipelines and add back push to remote storage try on cuda gpu pool try again try again try running without az subscription set try again on original pipeline change pool passing AMD Rocm whl on AMD-GPU pool split rocm pipeline from cuda pipeline remove comments * try adding Rocm tests as well * try with tests in place * fix trailing ws * add training data * try again as root for tests * use python3 * typo * try to map video, render group into container * try again * try again * try to avoid yum error code * make UID 1001 * try without yum downgrade * define rocm_version=None * remove CUDA related comments for Rocm Dockerfile * Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1) * missed requirements-rocm.txt from last commit * fix whitespace	2021-04-23 17:22:31 -07:00
Ashwini Khade	75e054cd33	pick onnx release candidate (#7177 ) * pick onnx release candidate * fix typo * filter batchnorm tests * add implementation for reshape 14 * add identity op kernel for opset 14 * fix typo * update onnx commit * update commit to latest master * add hashes for new kernel registrations and update 1 * TEST commit * update onnx back to right commit * Update onnx to latest in rel-1.9.0 * temp fix * remove nonzeroshapesetter transformer * pick rel branch latest commit * fix build failures * fix build failures * fix build failures * update the commit to latest in release branch * add test filters for not impemented op14 ops in c# tests * plus review comments	2021-04-22 23:57:09 -07:00
Changming Sun	65b2b87f83	Update CI build docker images (#7386 ) Update CI build docker images: delete ubuntu 16.04 support.	2021-04-21 13:18:34 -07:00
Changming Sun	b4cfa88bf7	Update protobuf to the latest version (#7396 )	2021-04-21 10:30:06 -07:00
Guoyu Wang	fce67e2b9b	Create Android Package pipeline (#7295 ) * Create Android Package pipeline * adress CR comments * Switch to jdk11	2021-04-12 17:56:25 -07:00
sfatimar	52bcef4d4f	Openvino ep 2021.3 (#7180 ) * Integrate openvino-ep-2021.3 * operators type * changed the myriad as it is case sensitive * logging information for openvino-ep-2021.3 * Unit test fix * Resize operator added for myriad * Fixed python tests for CPU and GPU * data commit for loop tile and gatherelements failure * adding checks for Where * fixing gatherelements and loop tests * disabling instance normalization test for now as there seems to be a myriad bug, putting loop in ops supported only because all the tests fail * gather elements op test taking care of warning message * condition needs to be an intializers * Disabled python test for Myriad * Disable compilation warning for MSVC windows compiler * softmax_test, threedimaxis0 and 1 test give accuracy mismatch tensoroptest disables test gives accuracy mismatch gather test gives accuracy mismatch * Updated with ov version 2021.3 * Updated with ov version 2021.3 * Updated README * Disabling python tests for cpu * Disabling python tests with accuracy mismatch on cpu * Added fix for Linux CI Pipeline failure -> Disabled tests that were throwing segfault Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: Aravind <aravindx.gunda@intel.com>	2021-04-01 11:28:54 -07:00
baijumeswani	249a2c14ef	Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167 ) * Pin version of pytorch to 1.8.1 for ORTModule CI pipeline * Use pytorch-lightning stable version 1.2.5 * Revert to cuda 10.1	2021-04-01 09:37:47 -07:00
liqunfu	e545604499	. (#7165 )	2021-03-30 13:58:30 -07:00
Ashwini Khade	b22e60bd44	pull onnx latest commit (#7102 ) * update onnx commit * fix test scripts to remove deprecated call * update filters * add registration for relu and cumsum ver 14 * add promote trilu to onnx domain * update onnx-tensorrt submodule * update flag * update flag * update dependencies * fix android ci failure	2021-03-29 11:00:38 -07:00
harshithapv	540eac253e	Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078 ) * adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule * fixed typo in args * addressed Thiago's comments * Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-03-24 09:43:05 -07:00
baijumeswani	a7a2a16edd	Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094 )	2021-03-22 21:48:32 -07:00
Thiago Crepaldi	867804bea1	Add auto doc gen for ORTModule API during CI build (#7046 ) In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI	2021-03-22 10:20:33 -07:00
Thiago Crepaldi	335edaa2c4	Merge pull request #6973 from microsoft/thiagofc/merge-ortmodule-into-master Introduce ORTModule training API to ONNX Runtime	2021-03-17 10:30:06 -07:00
Changming Sun	ed2d441a2e	Update ORT server build pipeline (#7030 ) 1. Migrated it to Ed's new docker build script 2. Use python 3.6 instead, because it is the default one in ubuntu 18.04 3. Move the "pip install" command to the docker image build stage(instead of when running the image)	2021-03-16 18:02:09 -07:00
Changming Sun	4161758058	Remove openmp related packaging pipeline (#6991 ) 1. Remove openmp related packaging pipelines and build jobs. 2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently. 3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support 4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols	2021-03-12 10:02:59 -08:00
baijumeswani	79f832c682	Separate requirements.txt file for ORTModule pipelines (#6879 ) * Move all ORTModule dependency installations to ortmodule subfolder	2021-03-05 14:12:11 -08:00
Sherlock	12edf22f11	Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-master-210226 Sync from master	2021-02-27 12:32:36 -08:00
Thiago Crepaldi	f71d93ea2b	Enable PyTorch Lightning basic test on CI (#6809 )	2021-02-27 09:35:42 -08:00
M. Zeeshan Siddiqui	ca48310d6d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/ortmodule-api-sync-from-master-210226	2021-02-27 04:25:23 +00:00
Maajid khan	7465673e33	[OpenVINO-EP] Find package changes (#6801 ) * Find package changes to cmake * Removing unwanted code from cmake Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>	2021-02-25 05:12:57 -08:00
stevenlix	53eb948f4c	Upgrade TensorRT to v7.2.2 (#6452 ) * upgrade to TensorRT 7.2.2 * extend GPU tensorrt CI timeout to 150 minutes * update docker image name * disable user interaction to avoid tensorrt container stuck when install tzdata * upgrade to libssl1.1 for ubuntu20.04 * remove libicu60 from ubuntu20.04 * add libicu66 for ubuntu20.04 * debug * llvm * llvm * disable ReverseSequenceTest.InvalidInput * disable ReverseSequenceTest.InvalidInput * fix issues * fix issues * Update linux-gpu-tensorrt-ci-pipeline.yml * disable warning 4458 for TensorRT parser * update onnx-tensorrt submodule * disable warnings for TensorRT parser * update onnx-tensorrt submodule to include latest bug fixes * update setup_env_trt * update pool for win trt ci pipeline' Co-authored-by: George Wu <jywu@microsoft.com>	2021-02-18 04:30:47 -08:00
M. Zeeshan Siddiqui	40dda452cf	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/sync-from-master	2021-02-18 03:03:01 +00:00
liqunfu	dd8ef4409a	Liqun/migrate perf test (#6733 ) move ort training perf tests to azure devops	2021-02-17 17:48:47 -08:00
Thiago Crepaldi	3184c47ad1	Merge branch 'master' into thiagofc/merge-from-master	2021-02-17 11:49:52 -08:00
baijumeswani	01dfa8e125	Support non tuple return values from torch.nn.module (#6660 ) * Support dictionary, namedtuples and huffingface ModelOutput type for model return values	2021-02-16 20:48:32 -08:00
Changming Sun	8378a45ae7	Add python 3.8/3.9 support for Windows GPU and Linux ARM64 (#6615 ) Add python 3.8/3.9 support for Windows GPU and Linux ARM64 Delete jemalloc from cgmanifest.json. Add onnx node test to Nuphar pipeline. Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate. Delete Java GPU packaging pipeline Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure. Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch. Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits And, due to some internal restrictions, I need to rename some of the agent pools	2021-02-11 16:43:35 -08:00
Changming Sun	0b89f931d0	Update CUDA build configs (#6598 ) 1. Fix Nuget package build break caused by #6225 2. Delete Dockerfile.centos_gpu. It is not used anywhere. 3. Fix Linux CUDA 10.2 build error caused by glibc upgrade	2021-02-08 22:55:42 -08:00
Changming Sun	b5bd14fc9f	Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585 ) Update gpu packaging pipelines to CUDA11 In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.	2021-02-05 16:58:37 -08:00
Chun-Wei Chen	f2ce3aae13	add set_model_dir and update ONNX (#6119 )	2021-02-05 09:30:49 -08:00
baijumeswani	62ac164279	Cache datasets on CI machines (#6525 )	2021-02-02 21:11:35 -08:00
Thiago Crepaldi	8a890ddfd7	Sync ORTModule branch with master and fix tests (#6526 ) * Deprecate Python global configuration functions [Part 1] (#5923) Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions. * remove dnnl_dll_path from post build copy (#6142) * Model Fusion For Bart (#6105) Fusion fix for Bart models * Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108) * Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers * Change Provider_IExecutionProviderFactory to be the core version. * Enable running the mnist_training sample without cuda (#6085) Signed-off-by: George Nash <george.nash@intel.com> * nnapi add min max support (#6117) * Fix CUDA test hang: (#6138) - Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present. * Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115) * add static subgraph kernel index * change kernel naming to avoid conflicts * Add gradient registration for Abs. (#6139) * Partition initial optimizer state for Zero-1 (#6093) * Initial changes * Working changes * Working changes * Cleanup * fix windows CI * Review comments * review comments * Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145) #4656 * Revert "work around of the build break in mac (#6069)" (#6150) This reverts commit `3cae28699b`. * Fix clean_docker_image_cache.py detection of image pushes. (#6151) Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200. * MLAS: add NEON version of int8 depthwise convolution (#6152) * Using a map of of ops to stages as input of partition function. (#5940) * New partition algorithm running before AD * Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2 * Removing code for partition of bwd graphs * Remove old code * Adding some verification code * Handle Shared Initializer * Renaming rank with stage * Added first unit test * new test * redundant check * undo change in bert * Moved cut-based partition to testing utils file Co-authored-by: xzhu1900 Co-authored-by: wschin * New conversion function and tests * minor * remove test that is not needed2 * improve GetDeviceAssignment and PR comments * minor changes * PR comments * improving documentation and variable naming * add documentation * Variable naming and docs * more doc improvements * more doc improvements * missing static cast * Fix test file for windows * Fix test file for windows * Fix test file for windows * stage id is not the same as rank id * PR comments * PR comments * More comments * More comments * Minor fix to satisfy c++14 (#6162) * Deprecating Horovod and refactored Adasum computations (#5468) deprecated horovod submodule refactored adasum logic to be ort-native added tests for native kernel and e2e tests * Update TensorRT-ExecutionProvider.md (#6161) * Bugfix for topk cuda kernel (#6164) * fix the issue that std::numeric_limits cannot handle half type * adding a test Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169) This reverts commit `f2dcba7afe`. * Remove ignored build warnings for pybind on Mac (#6165) * save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136) * save_checkpoint and load_checkpoint implementations * checkpoint aggregation logic * unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints * Don't try to bind unused inputs in the Training frontend (#6166) * Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) * aggregate model states only for the case when mixed precision was true (#6176) * [NNAPI EP] Enable per-channel quantization for QlinearConv (#6155) * Enable qlinearconv per-channel quantization * Fix the android CI test failure * Add Android Version Check for Per-Channel Quant * Address PR comments * Fix some minor issues * Add verification of per-channel zero points * Make the error tolerance configurable * Fix typo in BERT pretraining script (#6175) A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail. * Update get_docker_image.py to enable use without image cache container registry. (#6177) Update get_docker_image.py to enable use without image cache container registry. * Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156) * Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions. Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach). * Restructure the helper so it can be called across the EP bridge. Add ability to call id generation helper from EP bridge - convert DNNL EP to use helper to validate Address issue where a new Model may be loaded into the same address as a previous one. - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time. - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution. * Backend APIs for checkpointing (#5803) * Add backend API GetOptimizerState and GetModelState * add GetPartitionInfoMap * Android coverage dashboard (#6163) * Write the report to a file. * Post code coverage to the Dashboard database. * Add usage details of unified MCR container image (#6182) Going forward, a single unifed docker image will be published in MCR. The hardware accelerator target choice will have to be made in the application using OpenVINO EP's runtime config options. * improve perf for softmax (#6128) * improve perf for both gathergrad and softmax * revert the change in gathergrad and will be done in another PR. * address comments from code review. * Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174) * tune fast gelu to use exp(x) instead of tanh(x) on rocm * update to use expression 2/(1+exp(-2x))-1 for stability * Add Status.csv to EP Perf Tool (#6167) * merge master, keep postprocess status commit * download float16.py everytime * removing hardcoded values * Lochi/quantization tool for trt (#6103) * Initial implementation of generating calibration dynamic range table * Initialize validation support for Quantization * Initialize validation support for Quantization (cont.) * Improve validation support for Quantization * Improve validation support for Quantization * Rewrite/Refine for calibration and validation * Rewrite/Refine for calibration and validation (cont.) * Refine code * Refine code * Add data reader for BERT * Add flatbuffers to serialize calibration table * Refine code and add BERT evaluation * Refine the code * minor modification * Add preprocess/postprocess of vision team yolov3 and refine the code * Update annotation * Make bbox cooridates more accurate * Fix bug * Add support of batch processing * Batch processing for model zoo yolov3 * Add batch inference for evaluation * Refine the code * Add README * Add comments * Refine the code for PR * Remove batch support checking in data_reader and refine the code * Refine the code for PR * Refine the code for PR review Co-authored-by: Olivia Jain <oljain@microsoft.com> * Implement ScatterND for CUDA EP (#6184) * Condition fix in Resize operator (#6193) * Clean up checkpoint tests to use the new checkpoint functions (#6188) * add deprecation warning for old checkpoint functions * update all the distributed checkpoint tests to use new checkpoint functions * Implement comparing outputs that are sequence of maps of strings to floats (#6180) * Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats * PR comments * Dockerfile to build onnxruntime with ROCm 4.0 * Add ability to skip GPU tests based on GPU adapter name (#6198) * Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats * PR comments * Add ability to skip gpu tests according to adapter description * spacing * spacing * spacing * Openvino ep 2021.2 (#6196) * Enabling fasterrcnn variant and vehicle detector * changes for 2021_2 branch * yolov3_pytorch commit * fixed braces in basic_backend.cc * ci information added * faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2 * some changes to support unit tests * disable some tests which are failing * fix myriad tests for vehicle detector * Did some cleanup cleaned up comments Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0 tests on MYRIAD_FP16 backend due to a bug cleaned up capability_2021_2.cc file Removed extra conditions which were added for some validation in backend_utils Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * yolov3 pytorch workaround to ensure that the output names are matched * gemmoptest fixed on myriad * Fixed MYRIADX CPP Test Failures Expand,GatherND,Range,Round op's are only supported in model where op with float input data types are not supported and fixed Scatter and ScatterElements op's with negative axis are fixed Reshape op with 0 dim value are not supported and fixed Disabled InstanceNorm_2 test on MYRIADX Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> make changes to yolov3 pytorch * Fixed python unit tests Fixed failing python tests on vpu, GPU and CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Fixes POW op failures on GPU_FP16 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Clean up capability_2021_2.cc Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for MultiThreading option Added extra info on setting the num_of_threads option using the API and it's actual usage Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> fixed slice and removed extra prints * Disabled failing python tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes added in capabilty_2021_2 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * made changes to slice to avoid failures * Disabling FP16 support for GPU_FP32 ->Inferencing an FP16 model on GPU_FP32 leads to accuracy mismatches. so, we would rather use GPU_FP16 to infer an FP16 model on GPU Device Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for Inferencing a FP16 Model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * fix for mask rcnn * Script for installing openvino from source * Updated with openvino 2021.2 online installation * code comment fixes fixed accuracy mismatch for div * Update OpenvinoEP-ExecutionProvider.md updated for 2021.2 branch * Update README.md updated dockerfile documentation * Update BUILD.md build.md update documentation * permissiong change of install_openvino.sh * made changes to align with microsoft onnxruntime changes * Updated with ov 2021.2.200 Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com> * Fix a memory leak in test_inference.cc (#6201) * Fix a memory leak in test_inference.cc * Use TArray in AMD element-wise kernels, rather than manually copying memory to device. * Remove most ROCm-specific element-wise code and reuse CUDA element-wise code. * Minor change to improve performance for operator Pad. (#5537) * small improvment for pad * Support double for operators Log, Reciprocal, Sum (CPU) (#6032) * Support double for operators Log, Reciprocal, Sum * remove tesdt erf_double * Support double for operators Where, LpNormalisation (#6034) * Support double for operators Relu, Tanh, Sigmoid (#6221) * Fix ImportError in build.py (#6231) There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already * Removed executor todo that looks dead. (#6234) * Remove MKLML/openblas/jemalloc build config (#6212) * Remove python 3.5 * Update the readme file * Upgrade build.py to assert for python 3.6+ Upgrade build.py to assert for python 3.6+ as python 3.5 cannot build anymore todays master. * Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233) * MLAS: handle MlasGemm(M/N/K==0) cases (#6238) * Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220) * Support double for operator TopK * add static classes for topk/double * fix cast issue in topk * Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223) * Support double for operator Gemm * fix type size while copying data in gemm operator for GPU * fix type in gemm implementation for rocm * Support double for operator ReduceMean, ReduceLogSumExp (#6217) * Support double for operators ReduceMean, ReduceLogSumExp * Support double for operator ArgMin (#6222) * Support double for operator ArgMin * add test specifically for double * add new test on pai-excluded-tests.txt * Update BUILD.md * Update manylinux docker image to the latest (#6242) * Fix allocator issue for TensorRT IOBinding (#6240) * Fix issue: https://github.com/microsoft/onnxruntime/issues/6094 Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt. Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT * Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239) * bias gelu grad use exp(...) instead * update cuda to rocm * missing semicolon * comment * remove dockerfile * missing factor of two * Refactor EP Perf Tool (#6202) * merge master, keep postprocess status commit * download float16.py everytime * using variables to reference eps * adding ACL EP to ep perf tool * accuracy with absolute tolerance configurable * add acl to dict + remove commented line * Documentation for distributed CI tests pipeline (#6140) * Remove a debug log in provider_test_utils.cc (#6200) * Add the Concat Slice Elimination transform, fix constant_folding transform (#5457) * Add concat slice transform + test * Cosmetic improvements in concat slice transform * Remove unrelated file, fix comment, fix constant folding bug * Add test onnx graph * fix windows build * Review comments * review comment Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252) * Add MakeStringLite which uses current locale, update macros to use that to generate messages. * Convert calls to MakeStringLite(). * Liqun/speech model loop to scan (#6070) Provide a tool to convert Loop to Scan for Nuphar performance Fix Nuphar CI pipeline failures. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * model parallel refinement (#6244) * Megatron Transformation as a seperate step * remove useless header * clang formating * Re-Structure megatron transformer for subsquent changes * fix comments * Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248) * Fix Linux/Mac error message on input type mismatch (#6256) * add bfloat16 to gathergrad type constrains (#6267) Co-authored-by: Cheng Tang <chenta@microsoft.com> * Fix VS 2017 build break (#6276) * Deprecate Python global configuration functions [Part 2] (#6171) Update Python API to allow more flexibility for setting providers and provider options. The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict). Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order. Convert some usages of the deprecated global configuration functions to use EP-specific options instead. Update some EP-specific option parsing to fail on unknown options. Other clean up. * Add script to preprocess python documentation before publishing (#6129) * add script to preprocessing python documentation before publishing * rename past to past_key_values for GPT-2 (#6269) rename past to past_key_values for transformers 4.* * Rename MakeString and ParseString functions. (#6272) Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, ParseString to ParseStringWithClassicLocale. Add missing pass-through versions of MakeStringWithClassicLocale for string types. * Increase timeout for Linux GPU CUDA11 build. (#6280) * Add helper to compare model with different precision (#6270) * add parity_check_helper.py * add real example * remove lines * Fix Min/Max CPU kernels for float16 type (#6205) * fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284) fix io binding crash for past_sequence_length=0 * A list of changes in transformers tool (#6224) * longformer fp16 e2e * add fp16/fp32 parity check helper file * excludes nodes with subgraph in profiling * use onnxconverter_common to do fp32->fp16 * add version check for onnxconverter_common * remove helper file * add pkg installation on notebooks and script * Workaround for static_cast<double>(half) * Add workaround to remove ROCm-specific binary-elementwise files. * Update nuget build (#6297) 1. Update the ProtoSrc path. The old one is not used anymore. 2. Regenerate OnnxMl.cs 3. Delete some unused code in tools/ci_build/build.py 4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build. 5. Fix a typo in the C API pipeline. * Enable ONNX backend test of SequenceProto input/output (#6043) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name * add --sequence_lengths option (#6285) * more dtype for Equal CUDA kernel (#6288) Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Force reinstall onnx python package on Windows (#6309) * update transformers required package versions (#6315) * Remove abs in LpPool (#6303) * Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295) * Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models. * Add longformer to python package (#6314) * add longformer to python package * move test related script and data to a new folder * Avoid false sharing on thread pool data structures (#6298) Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops. Motivation and Context MobileNet on a 212-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread). The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 214-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine. Before change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232 After change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317 * fix opset imports for function body (#6287) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix * Remove false positive prefast warning from threadpool (#6324) * Java: add Semmle to Java publishing pipelines (#6326) Add Semmle to Java API pipeline Add security results publishing and add Java GPU. * Quantization support for split operator with its NHWC support (#6107) * Make split working for quantization. * NHWC transformer support for split operator * Refactor some according to Feedback. Will add test cases soon. * Fix build error on windows. * Add test case for split op on uint8_t support * Add nhwc_transformer_test for split uint8_t support * Some change according to PR feedbacks. * Liqun/enable pipeline parallel test (#6331) enable pipeline parallel test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340) This removes a special case of the cuda EP. * MLAS: add fallback implementation for quantized GEMM (#6335) Add a non-vectorized version of the kernel used for the quantized version of MlasGemm. * Delete float16.py (#6336) No longer needed. Also doesn't pass policheck. * Enable add + softmax fusion for Rocm platform (#6259) * add bias softmax; tests appear to pass * check fusion occurs for rocm as well * check for rocm provider compatible as well * build for cpu scenario as well * try again; broader cope * proper scope on kGpuExecutionProvider * been editing wrong file * remove commented #include lines * try again due to mac os ci error * try again * test fusion both cuda and rocm to avoid mac ci error * add external data support to tensor proto utils (#6257) * update unpack tensor utilities to support loading external data * more updates * fix test * fix nuphar build * minor build fix * add tests * fix Android CI * fix warning * fix DML build failure and some warnings * more updates * more updates * plus few updates * plus some refactoring * changes per review * plus some change * remove temp code * plus updates to safeint usage * build fix * fix for safeint * changed wording. (#6337) * Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321) * remove gemmlowp submodule (#6341) * [NNAPI] Add pow support (#6310) * Add support for running Android emulator from build.py on Windows. (#6317) * fix the pipeline failure (#6346) * Train BERT Using BFloat16 on A100 (#6090) * traing bert using bf16 * Adam support bf16 * bugfix * add fusedmatmul support * fix after merge from master. * bugfix * bugfix after merge from master * fast reduction for bf16. * resolve comments * fix win build * bugfix * change header file. Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Fix DerefNullPtr issues raised by SDLNativeRules. (#6348) * update quantize to support basic optimization and e2e example for image classification (#6313) update the resnet50-v1 to standard one from onnx zoo. add an example for mobilenet run basic optimization before quantization fix a bug in Clip * Enable graph save for orttrainer (#6333) * Enable graph save for orttrainer * Fix CI * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add PREfast to python packaging pipeline (#6343) * Add PREfast to python packaging pipeline * fix longformer benchmark io_binding output_buffers (#6345) * fix longformer benchmark io_binding output_buffers * format * import benchmark_helper from parent directory. * Use readelf for minimal build binary size checks. (#6338) * Use readelf for minimal build binary size checks. The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes. Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored) * Remove unused function * Java: Set C language warnings to W4 and adjust JNI code (#6347) Set /W3 for C language and fix up JNI warnings. * Pipeline Parallel Experimental Python API (#5815) * Add create session to WinML telemetry to track WinML Usage (#6356) * Fix one more SDL warning (#6359) * fix -Wdangling-gsl (#6357) * Add python example of TensorRT INT8 inference on ResNet model (#6255) * add trt int8 example on resnet model * Update e2e_tensorrt_resnet_example.py * remove keras dependency and update class names * move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example * simplify e2e_tensorrt_resnet_example.py * Update preprocessing.py * merge tensorrt_calibrate * Update calibrate.py * Update calibrate.py * generalize calibrate * Update calibrate.py * fix issues * fix formating * remove augment_all * This added telemetry isn't needed (#6363) * Wezuo/memory analysis (#5658) * merged alloc_plan * pass compilation * Start running, incorrect allocation memory info * add in comments * fix a bug of recording pattern too early. * debugging lifetime * fix lifetime * passed mnist * in process of visualization * Add code to generate chrome trace for allocations. * in process of collecting fragmentation * before rebuild * passed mnist * passed bert tiny * fix the inplace reuse * fix the exception of weight in pinned memory * add guards to ensure the tensor is in AllocPlan * add customized profiling * debugging * debugging * fix the reuse of differnt location type * add rank * add the rank * add fragmentation * add time_step_trace * Add summary for each execution step (total bytes, used/free bytes). * add top k * change type of top k parameter * remove prints * change heap to set{ * add the name pattern * add the useage for pattern * add partition * change to static class * add custom group * remove const * update memory_info * in process of adding it as runtime config * change the memory profiling to be an argument * add some comments * add checks to recored meomry_info in traaining session * set the "local rank setting" to correct argument. * addressing comments * format adjustment * formatting * remove alloc_interval * update memory_info.cc to skip session when there is no tensor for a particular memory type * fix memory_info multiple iteration seg-fault * consolidate mainz changes * fixed some minor errors * guard by ORT_MINIMAL_BUILD * add ORT_MEMORY_PROFILE flag * added compiler flag to turn on/off memory profiling related code * clean up the code regarding comments * add comments * revoke the onnx version * clean up the code to match master * clean up the code to match master * clean up the code to match master Co-authored-by: Jesse Benson <benson.jesse@gmail.com> Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com> * Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355) * Add CumSum-14 for Cuda * fix convert_common version retrival (#6382) * Refine auto_pad based pad computation in ConvTranspose (#6305) * Fix SDL warning (#6390) * Add max_norm for gradient clipping. (#6289) * add max_norm as user option for gradient clipping * add adam and lamb test cases for clip norm * add frontend tests * Add the custom op project information (#6334) * Dont use default string marshalling in C# (#6219) * Fix Windows x86 compiler warnings in the optimizers project (#6377) * [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376) * Unblock Android CI code coverage failure (#6393) * fix build on cuda11 (#6394) Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Load the model path correctly (#6369) * Fix some compile warnings (#6316) * OpenVino docker file changes to bypass privileged mode Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device. Motivation and Context This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments. * Megatron checkpointing (#6293) * Add bart fairseq run script * Add frontend change to enable megatron * Initial changes for checkpointing * Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H * Add load_checkpoint changes * Fix CI * Cleanup * Fix CI * review comments * review comments * review comments: * Fix generate_submodule_cgmanifest.py Windows issues. (#6404) * Continue memory planning when unknown shape tensor is encountered. (#6413) * Reintroduce experimental api changes and fix remote build break (#6385) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Add support for custom ops to minimal build. (#6228) * Add support for custom ops to minimal build. Cost is only ~8KB so including in base minimal build. * enable pipeline to run quantization tests (#6416) * enable pipeline to run quantization tests setup test pipeline for quantization * Minor cmake change (#6431) * Liqun/liqun/enable pipeline parallel test2 (#6399) * enable data and pipeline parallism test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Farewell TrainableDropout (#5793) * Deprecate TrainableDropout kernel. * Update bert_toy_postprocessed.onnx to opset 12. * Add more dropout tests. * Fix BiasDropout kernel. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com> * fix null dereference warning (#6437) * Expose graph ModelPath to TensorRT shared library (#6353) * Update graph_viewer.cc * Update tensorrt_execution_provider.cc * Update graph_viewer.h * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update provider_api.h * Update provider_bridge_ort.cc * Update provider_interfaces.h * Update provider_interfaces.h * expose GraphViewer ModelPath API to TRT shared lib * add modelpath to compile * update * add model_path to onnx tensorrt parser * use GenerateMetaDefId to generate unique TRT kernel name * use GenerateMetaDefId to generate unique TRT engine name * fix issue * Update tensorrt_execution_provider.cc * remove GetVecHash * Update tensorrt_execution_provider.h * convert wchar_t to char for tensorrt parser * update tensorrt parser to include latest changes * fix issues * Update tensorrt_execution_provider.cc * merge trt parser latest change * add PROVIDER_DISALLOW_ALL(Path) * add tool for generating test data for longformer (#6415) * only build experimental api in redist (#6465) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * Add an option to save the training graph after optimization (#6410) * expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions` * Share allocator between CUDA EP & TRT EP. (#6332) * Share allocator between CUDA EP & TRT EP. limitation: 1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it 2. Need to have more identifiers to make it able to share CPU allocator across all EPs * fix max norm clipping test in python packaging pipeline test (#6468) * fix python packaging pipeline * make clip norm test compatabile with both V100 and M60 GPUs * Initial version of CoreML EP (#6392) * Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460) * update load library code to have the fullly qualified path * make it work for syswow32 * git Revert "make it work for syswow32" This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3. Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * dequantize 1st input of lstm back if it is quantized (#6444) * [java] Adds support for OrtEnvironment thread pools (#6406) * Updates for Gradle 7. * Adding support for OrtThreadingOptions into the Java API. * Fixing a typo in the JNI code. * Adding a test for the environment's thread pool. * Fix cuda test, add comment to failure. * Updating build.gradle * fix SDL native rule warning #6246 (#6461) * fix SDL rule (#6464) * use tickcount64 (#6447) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Update pypi package metadata (#6354) * Update setup file data * add missing comma * remove python 3.5 * fix typo bracket * Delete nuget extra configs (#6477) * Op kernel type reduction infrastructure. (#6466) Add infrastructure to support type reduction in Op kernel implementations. Update Cast and IsInf CPU kernels to use it. * Fixing a leak in OnnxSequences with String keys or values. (#6473) * Increase the distributes tests pipeline timeout to 120 minutes (#6479) * [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481) * Add macos coreml CI and coreml_flags * Move save debuggubg model to use environment var * Move pipeline off from macos CI template * Fix an issue building using unix make, add parallel to build script * Fixed build break for shared_lib and cmpile warning * Fix a compile warning * test * Revert the accidental push from another branch This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877. * Add ability to track per operator types in reduced build config. (#6428) * Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that. - Add python bindings for ORT format models - Add script to update bindings and help info - Add parsing of ORT format models - Add ability to enable type reduction to config generation - Update build.py to only allow operator/type reduction via config - simpler to require config to be generated first - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled - Add script to create reduced build config - Update CIs * merge e2e with distributed pipeline (#6443) merge e2e with distributed pipeline * Fix test breaks in Windows ingestion pipeline (#6476) * fix various build breaks with Windows build * fix runtime errors loading libraries from system32 * add build_inbox check to winml_test_common * use raw string * cleanup * fix dll load Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * Speed up the Mac CI runs (#6483) * expose learningmodelpixelrange property (#5877) * Fix of support api version bug for [de]quantize (#6492) * SDL fixes: add proper casts/format specifiers (#6446) * SDL annotation fixes (#6448) Co-authored-by: Ori Levari <orlevari@microsoft.com> * [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493) * Removed OpenVINO 2020.2 support * Updated documentation and build.py * Removed unnecessary libraries from setup.py * Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325) Support pad operator in quantization tool. Support pad operator in quantized nhwc transformer. Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value. Add more test cases to cover pad() operator bug fixed here. Fix quantization tools uint8/int8 value overflow issue when quantize weights in python. * Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454) Description: This PR makes two changes identified while looking at a PGAN model. First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes. Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer. Motivation and Context Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation. * Update document of transformer optimization (#6487) * nuphar test to avoid test data download to improve passing rate (#6467) nuphar test to avoid test data download to improve passing rate * Fuse cuda conv with activation (#6351) * optimize cuda conv by fused activation * remove needless print out * exclude test from cpu * handle status error from cudnn 8.x * add reference to base class * add hipify * [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498) * Init change * Move some helper from nnapi ep to shared * Add transpose support * Fix trt ci build break * Refine transformers profiler output (#6502) * output nodes in the original order; grouped by node name * add document for profiler * Update to match new test setup. (#6496) * Update to match new test setup. * Add Gemm(7) manually for now. Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model. * Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504) * Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD * enable Equal tests * enable fast_matrix_reduction test case * Optimize GatherGrad for AMD GPU (#6381) * optimize gathergrad * address comments Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * add explicit barriers for buffer overread and overrwrite (#6484) Co-authored-by: Ori Levari <orlevari@microsoft.com> * fix sdl bugs for uninitialized variables and returns (#6450) Co-authored-by: Ori Levari <orlevari@microsoft.com> * handle hr error conditions (#6449) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Dnnl training (#6045) * Add ReluGrad and ConvGrad ops for the dnnl provider * the mnist sample is updated to add the --use_dnnl option that will cause the sample to use the dnnl execution provider for nodes that exist in dnnl provider. * Added the ability to find forward ops. Dnnl backward gradient ops require the forward primitive description and workspace from the forward operation. * Enable specifying the execution provider for Gradient Checker Tests * Prevent memory leak when running dnnl_provider in training mode Prevent creating a SubgraphPrimitivePool when the code is built with the ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly. The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be stashed in a map for reuse. Due to the way the Training Loop uses threads the pool of SubgraphPrimitives were not being reuse instead a new pool of SubgraphPrimitives being created each run. The old pool was not instantly freed. This behavior could be a language error when using thread_local memory. Signed-off-by: George Nash <george.nash@intel.com> * Added fixes to maxpoolgrad and memory leak. Maxpoolgrad will now pass all unit tests. With the conv and convgrad disabled for dnnl, mnist is able to train till 95% Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Fixed misc issues when testing training code with dnnl provider * fix conv_grad dnnl tests with dilation to run dnnl execution provider * update mnist training sample to accept convolution type models convolution models require the input shape to be {1, 28, 28} instead of the flat {728} image that is used for the gemm models this will enable models that require the different shape by adding `--model_type conv` to the command line when running the mnist sample. (while testing a workaround was used see #4762) * Disable weight caching in dnnl conv operator when using training When training we can not use cached weights because the weight will be updated each run. This re-enables dnnl Conv and ConvGrad Ops. The weight caching was the source of the error from Conv when training. * Fix issues found when building grad ops on Linux * The dnnl_convgrad code was over using the scope operator causing a compilation problem. * The dnnl_maxpoolgrad code had a logic error that is was comparing with the source description when it should have been comparing with the destination despription. * Update BUILD.md so it shows DNNL for training * Updated the table of contents. Since the same providers are listed twice. Once for Infrance and again for Training an HTML anchor was added to distinguish the second header from the first for the TOC. * Fix build failure when not using --enable-training build option * reorganize the gradient operators so they are grouped together * Fix issues found when running onnx_backend_test_series.py * Pooling code only supports 2 outputs when built with --enable-training * Address code review feedback * class member variables end in underscore_ * use dst instead of dist to match pattern use elsewhere in DNNL code. * Remove workaround that was introduced to handle problems running convolution based training models. See issue #4762 Signed-off-by: George Nash <george.nash@intel.com> * Isolate training code and code cleanup * Do not build if dnnl_gpu_runtime if enable_training is set training code does not support dnnl_gpu_runtime yet. * Isolated Training code inside ifdefs so that they wont affect project if built without training enabled * Inadvertant changes in whitespace were removed to make code review simpler * Undid some code reordering that was not needed * comments added to closing #endif statments to simplify reading complex ifdefs * Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw pointer. This matches what was done in Pool code and is safer memory code. Signed-off-by: George Nash <george.nash@intel.com> * Address code review issues - whitespace changes caused by running clang-format on the code - Several spelling errors fixed - Removed/changed some ifdefs to improve readability - other misc. changes in responce to code review. Signed-off-by: George Nash <george.nash@intel.com> * Code changes to address code review - Simplify iteration code using `auto` keyword - remove C style cast that was not needed - remove instance variable that was not needed [relugrad.h] - added the execution providers to `ComputeGradientErrorInternal()` and `ComputeTheoreticalJacobianTranspose()` instead of using a pointer to an instance varaible [gradient_checker.h/.cc] Signed-off-by: George Nash <george.nash@intel.com> * Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function. This will reduce repeated code. Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created. This will prevent memory leak from convolution kernels being pushed constantly onto the stack. Signed-off-by: chethan.palangotu.keshava@intel.com * Code clean up and formating updates - Removed empty else statment - updated indentation of code that was causing double curly brackets to look unususal - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code. - isolated training code Signed-off-by: George Nash <george.nash@intel.com> * Restore inadvertantly removed ConvGrad tests When combining the DNNL and CPU version of the ConvGrad tests two test were inadvertantly excluded. This adds back the Conv3d and Conv3d with strides test cases. Signed-off-by: George Nash <george.nash@intel.com> * Add validation to ConvGrad This validates the dimensions of the ConvGrad match the passed in Convolution forward primitive description. The current code for DNNL ConvGrad makes the assumption that the ConvGrad nodes will be visited in the reverse order from the corresponding Conv nodes The added validation will return an error if this assumption is not true. Signed-off-by: George Nash <george.nash@intel.com> * Do not create new execution providers in provider_test_utils This removes the code that generated new execution providers in the OpTester::Run function. This was added because the std::move was leaving the `entry` value empty so subsequent calls would cause a segfault. Problem is this potentially changed the execution_provider because it would create the default provider dropping any custom arguments. When the now removed code was originally added the std::move was causing crashes when the GradientChecker unit tests were run. However, it is no longer causing problems even with the code removed. Signed-off-by: George Nash <george.nash@intel.com> * Change the forward conv stack to a forward conv map This changes how the forward conv kernel is mapped to the bwd ConvGrad kernel the problematic stack is no longer used. The convolution stack made the assumption that the corresponding ConvGrad operator would be visited in reverse order of the forward Conv operators. This was always problematic and was unlikely to work for inception models. Important changes: - The weight_name is added to the ConvGrad dnnl_node making it possible to use the weight_name as a lookup key to find the Conv forward Kernel - the `std::vector fwd_conv_stack_` has been replaced with a `std::map fwd_conv_kernel_map_` - Although it is not needed lock_guards were added when writing to and reading from the fwd_conv_kernel_map_ as well as the fwd_kernel_map_. These should always be accessed by a single thread when preparing the dnnl subgraphs so the guard should not be needed but its added just in case. - Updated the comments ConvGrad.h code to no longer mention the stack. The error check is not removed. It will be good to verify there are no errors as we continue to test against more models. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com> * Lochi/refactor yolov3 quantization (#6290) * Refactor the code and move data reader, preprocessing, evaluation to E2E_example_mode * Refactor the code. Move data reader, preprocessing, evaluation to model specific example under E2E_example_mode * refactor code * Move yolov3 example to specific folder and add additional pre/post processing * Print a warning message for using newer c_api header on old binary (#6507) * Fix issues with ArmNN build setup (#6495) * ArmNN build fixes * Update BUILD.md to document that the ACL paths must be specified to build ArmNN * Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that. * Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518) * Update onnxruntime_test_python.py to work with numpy 1.20. Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that. * Fix another test script * Fix ORTModule branch for orttraining-* pipelines * Update pytorch nightly version dependency Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: George Nash <george.nash@intel.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Derek Murray <Derek.Murray@microsoft.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com> Co-authored-by: Juliana Franco <jufranc@microsoft.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Tixxx <tix@microsoft.com> Co-authored-by: Jay Rodge <jayrodge@live.com> Co-authored-by: Du Li <duli1@microsoft.com> Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: baijumeswani <bmeswani@microsoft.com> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com> Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com> Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Olivia Jain <oljain@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Ryan Lai <rylai@microsoft.com> Co-authored-by: Jesse Benson <jesseb@microsoft.com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin@vols.utk.edu> Co-authored-by: Michael Giba <michaelgiba@gmail.com> Co-authored-by: William Tambellini <wtambellini@sdl.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: liqunfu <liqfu@microsoft.com> Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: Tang, Cheng <souptc@gmail.com> Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: Vincent Wang <wangwchpku@outlook.com> Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: Luyao Ren <375833274@qq.com> Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com> Co-authored-by: Tim Harris <tiharr@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com> Co-authored-by: Jesse Benson <benson.jesse@gmail.com> Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com> Co-authored-by: Martin Man <supermt@gmail.com> Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com> Co-authored-by: Ori Levari <ori.levari@microsoft.com> Co-authored-by: Ori Levari <orlevari@microsoft.com> Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sheil Kumar <smk2007@gmail.com> Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Ryota Tomioka <ryoto@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: Yulong Wang <f.s@qq.com> Co-authored-by: Faith Xu <faxu@microsoft.com> Co-authored-by: Xiang Zhang <xianz@microsoft.com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>	2021-02-02 08:59:56 -08:00
Scott McKay	c84bb9df9f	Add ability to track per operator types in reduced build config. (#6428 ) * Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that. - Add python bindings for ORT format models - Add script to update bindings and help info - Add parsing of ORT format models - Add ability to enable type reduction to config generation - Update build.py to only allow operator/type reduction via config - simpler to require config to be generated first - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled - Add script to create reduced build config - Update CIs	2021-01-29 07:59:51 +10:00
Hariharan Seshadri	d7bdd96425	Refine auto_pad based pad computation in ConvTranspose (#6305 )	2021-01-19 19:01:49 -08:00
Guoyu Wang	e35db194e3	fix the pipeline failure (#6346 )	2021-01-14 00:33:22 -08:00
baijumeswani	0586c610b2	Add ORTModule BERT classifier to CI the pipeline (#6330 )	2021-01-13 12:34:04 -08:00
baijumeswani	9b7510d88c	Add ORTModule distributed CI pipeline (#6278 ) * Add ortmodule distributed ci pipeline	2021-01-13 11:24:01 -08:00
Ashwini Khade	0ed56d491a	fix opset imports for function body (#6287 ) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix	2021-01-12 13:44:36 -08:00
Chun-Wei Chen	84024bdfa9	Enable ONNX backend test of SequenceProto input/output (#6043 ) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name	2021-01-11 11:30:33 -08:00
Changming Sun	1685167e46	Update manylinux docker image to the latest (#6242 )	2020-12-31 19:57:04 -08:00
sfatimar	7347996942	Openvino ep 2021.2 (#6196 ) * Enabling fasterrcnn variant and vehicle detector * changes for 2021_2 branch * yolov3_pytorch commit * fixed braces in basic_backend.cc * ci information added * faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2 * some changes to support unit tests * disable some tests which are failing * fix myriad tests for vehicle detector * Did some cleanup cleaned up comments Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0 tests on MYRIAD_FP16 backend due to a bug cleaned up capability_2021_2.cc file Removed extra conditions which were added for some validation in backend_utils Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * yolov3 pytorch workaround to ensure that the output names are matched * gemmoptest fixed on myriad * Fixed MYRIADX CPP Test Failures Expand,GatherND,Range,Round op's are only supported in model where op with float input data types are not supported and fixed Scatter and ScatterElements op's with negative axis are fixed Reshape op with 0 dim value are not supported and fixed Disabled InstanceNorm_2 test on MYRIADX Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> make changes to yolov3 pytorch * Fixed python unit tests Fixed failing python tests on vpu, GPU and CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Fixes POW op failures on GPU_FP16 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Clean up capability_2021_2.cc Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for MultiThreading option Added extra info on setting the num_of_threads option using the API and it's actual usage Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> fixed slice and removed extra prints * Disabled failing python tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes added in capabilty_2021_2 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * made changes to slice to avoid failures * Disabling FP16 support for GPU_FP32 ->Inferencing an FP16 model on GPU_FP32 leads to accuracy mismatches. so, we would rather use GPU_FP16 to infer an FP16 model on GPU Device Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for Inferencing a FP16 Model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * fix for mask rcnn * Script for installing openvino from source * Updated with openvino 2021.2 online installation * code comment fixes fixed accuracy mismatch for div * Update OpenvinoEP-ExecutionProvider.md updated for 2021.2 branch * Update README.md updated dockerfile documentation * Update BUILD.md build.md update documentation * permissiong change of install_openvino.sh * made changes to align with microsoft onnxruntime changes * Updated with ov 2021.2.200 Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com>	2020-12-23 08:47:22 -08:00
liqunfu	cde723a136	Liqun/move nightly pl to linux multi gpu v100 (#6024 ) * move e2e nightly pipeline to azure devop Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-14 12:43:41 -08:00
Edward Chen	d8139814fd	Clean up builds (#6015 ) Update training Python packaging build to use get_docker_image.py. Remove BUILD_EXTR_PAR docker build argument. Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.	2020-12-04 15:13:17 -08:00
Edward Chen	6d642a3dba	Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906 )	2020-12-01 19:10:23 -08:00
Changming Sun	2d9dcc4576	Add python 3.9 support (#5874 ) 1. Add python 3.9 support(except Linux ARM) 2. Add Windows GPU python 3.8 to our packaging pipeline.	2020-11-30 12:02:48 -08:00
Ashwini Khade	705d093167	Update onnx (#5720 ) * update onnx * update docker image for testing	2020-11-24 11:20:15 -08:00
baijumeswani	208f4c1d3c	Azure ci pipeline for distributed environment tests (#5881 )	2020-11-23 14:01:00 -08:00
Changming Sun	79350a642a	Update install_deps.sh: remove the unnecessary data generating step (#5758 ) We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.	2020-11-10 22:19:03 -08:00
Ashwini Khade	1cca903680	update onnx commit id (#5594 ) * update onnx commit id * update onnx commit for docker images * update docker images	2020-11-02 09:46:36 -08:00
liqunfu	92662659ba	Liqun/remove number matching (#5606 ) replace number matching with relaxed comparison in frontend tests Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-27 21:27:37 -07:00
Ashwini Khade	df22611026	Update ONNX commit (#5487 ) * update ONNX * update onnx + register kernels for reduction ops * bug fix kernel reg * update cgmanifests * revert unsqueeze op 13 registration * filter ops which are not implemented yet * filter some tests * update onnx commit to include conv transpose bug fix * update docker images * undo not required test changes * fix test failures	2020-10-21 07:22:20 -07:00
sfatimar	6d2a30eae3	[OPENVINO-EP] 2021.1 Release (#5431 ) * Cmake changes for 2021.1 * added new ov version 2020.1 for faster rcnn * Added missing defs * equal op modified * changes to incoroporate faster rcnn * backend util.cc * hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp * changing myriad precision bool to i32 * gather is not enabled for gpu * conv2D and pooltest auto_pad attribute should not be null * negative indices are not valid for scatter op in myriad * non max suppression op only supported in faster rcnn mode * maxpool indices output is not supported * Cleaned redundant code in backends * Added ifdefs for HDDL config * cast output dimensions check topk operator k input it seems only resolved for myriad as it is throwing issues for ask rcnn . need to verify * we are limiting the subgraph size to 3 here * taking care of review comments * Fixed minor bugs * Modified Slice op checks * Added NonZero, Upsample * Removed TopK if it's in the middle of a subgraph * incorporated upsample conditions too * Dockerfile changes for 2021.1 release * dockerfile aptkey update * Minor fixes * ceil condition added again * Fixed few gpu models * Disabled LSTM and yolov3 in ModelTests * python softmax cross entropy tests and negative log likelihood * Update Build.md Updated for openvino 2021.1 * Update OpenVINO-ExecutionProvider.md update openvino execution provider for 2021.1 * Update READMe.md updated new openvino version * Update Dockerfile.openvino added environment variable for DEBIAN Frontend * Fixed myriad models * Fixed gather condition * Fixed mask rcnn model on myriad * Modified Gather condition * set default target of MCR dockerfile to MYRIAD_FP16 * Fixed tinyolov3 on CPU * Update OpenVINO-ExecutionProvider.md update openvino execution provider documentation * Update Dockerfile.openvino Removed environment variable * Update OpenVINO-ExecutionProvider.md update image manipulation networks supported * Update onnx_backend_test_series_filters.jsonc removed test_upsample_nearest from cpu test cases * New InternalCI changes for 2021.1 * Full protobuf removed for OpenVINO * Protobuf added * Updated with apt installation for openvino * Revert the testing changes * Reverted testing changes * File permessions are changed to original * Deleted openvino installation and cmake change * Optimized Dockerfile Removed unnecessary cmake installation, numpy * Added missing ifdefs * delete array fix * backend_utils.cc output_shape * Revert "set default target of MCR dockerfile to MYRIAD_FP16" This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d. Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Aravind <aravindx.gunda@intel.com> Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>	2020-10-14 15:56:00 -07:00
liqunfu	773992c7d4	Liqun/bert pretrain tb (#5377 ) * add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-06 16:28:31 -07:00
liqunfu	fe50213491	Liqun/bert pretrain2 (#5327 ) * bert single node multi GPU pretrain w/o checkpoint Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-01 11:01:26 -07:00
Changming Sun	17f1178c2e	Downgrade GCC (#5269 ) Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-09-24 21:14:54 -07:00
edgchen1	6d5b93b805	Synchronize training dependency versions between Docker image and Python wheel. (#5261 ) Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.	2020-09-23 19:03:42 -07:00
Xueyun Zhu	55e4b5d302	add pipeline distributed training test (#5222 ) * add pipeline distributed training test * fix max line length error in windows build * function header indent * fix * fix flake8 error	2020-09-21 14:35:01 -07:00
KeDengMS	ce3b67e0cd	[Python] Move symbolic_shape_infer from nuphar to tools (#5162 ) * [Python] Move symbolic shape inference from nuphar to tools * Fix PEP8 ERROR	2020-09-18 09:31:06 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Changming Sun	c5efb0085d	Update Linux GPU build pipelines to CUDA 10.2 (#5120 ) * Update Linux GPU build pipelines to CUDA 10.2	2020-09-10 17:40:51 -07:00
Changming Sun	a5530358c9	Fix a path problem in Dockerfile.manylinux2014_cuda10_2 (#5106 )	2020-09-10 10:30:13 -07:00
RandySheriffH	5e10cde006	PipelinesForCuda11Cudnn8 (#4938 ) * cancel night build on pyop * setup win cuda11 pipeline * add debug build * test base gpu settings * setup pipelines to test cuda 10.2 and 11 * rename linux docker images * rename docker image tag and add clean up job * fix typo in cuda 11 config * set cuda11 env * update linux cuda 11 pipeline * reset docker image name * disable uninitialized warning from linux build * change the way to silence uninitialized warning * add flags to linux gpu pipeline * switch docker image for linux cuda 10.2 * switch linuc cuda 10.2 image * test cuda11 with devtool8 * try latest built images Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-09 16:13:58 -07:00
Changming Sun	924ecb0623	Use manylinux2014 for Linux CPU build (#5091 )	2020-09-09 10:09:52 -07:00
Changming Sun	370d194db7	Add a docker file for CI build CUDA 10.2 (#5065 )	2020-09-04 16:28:45 -07:00
Changming Sun	d5d5e37e76	Build system enhancements (#5012 ) 1. Add a docker file for CUDA11 2. Support setting CUDA_ARCHITECTURES from command line.	2020-09-02 10:13:26 -07:00
Changming Sun	c37fa7c278	Delete Dockerfile.centos6_gpu (#4851 )	2020-08-28 09:56:52 -07:00
Rayan-Krishnan	eb05db5a2a	Fix OptimizerConfig params groups (#4877 ) * Copy samples to build folder and load models from there. Fix CI * This PR also includes a fix to path validation for save_as_onnx API * Add torchtext to CI for GPU training * Remove new frontend tests from CI Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2020-08-22 22:04:17 -07:00
liqunfu	6260d073b3	Glue parallel training (#4550 ) add mpi size, rank python API add single node parallel training example	2020-08-21 21:24:27 -07:00
suryasidd	3a00b50cf8	[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836 ) * Removed building ngraph from source * Disabled some tests temporarily * Enabled softmax for all dims * Added onnx importer to link libraries * int64 changes * fixed * temp * slice update start and end need to be initializer * Disabled GatherND, ScatterND, ReverseSequence operators * Added supported ops instead of unsupported ops * Set precision only for CPU * Removed some unecessary conditions * Fixed segfault in slice * Softmax restriction removed * changes * Setting precision for all plugins * Changes added to include precision and supported ops for gpu and vpu * branch op support * checking for disabled python test failure * mapped input names and tensors directly rather than copying which was leading to mismatch * last index is not supported mkldnn does not support pow between integers * included the code changes * Rename inner-scoped variable to avoid MSVC warning * applied changed to vadm as well and removed the utility function getinputtensors() completely * OpenVINO multi version support: CMake changes * OpenVINO multi version support: C++ support * removed commented code * Remove redundant code lines * Revert "Rename inner-scoped variable to avoid MSVC warning" This reverts commit 2f650493162675bc6fb70730de9656ec400be332. Merged separately in master. * vadm changes disabled reduction op test * putting test_gather_negative_indices in unsupported list for now * Update MCR Dockerfile with 2020.4 Installs OpenVINO 2020.4 from deb packages via APT tool. * Update build docs with 2020.4 info * Update dockerfile with OV 2020.4 info Instructions for building OpenVINO based docker image no longer require downloading installer package as it is installed by the dockerfile using OpenVINO 2020.4 APT package for Ubuntu 18.04 * Added constant folding bypass logic * Added cout statements for ci * Added NDEBUG flag for debug symbols * Update Ops info in docs * fixes multiple unit tests * mathoptest.ceil disabled for gpu and myriad * activation test temp disabled * Fix models for CPU * Fixed a syntax error * local cmmit * fixing unit tests for myriad * Fixed Variadic Split, Topk issues * fix_model commit * Fix models in myriad * Added ifdefs for OpenVINO 2020.4 * temp * made some changes to not operator * Added unused parameter * relu enabled * Fixed bug in Conv output * Consolidated GPU failing tests into one category * Made it compatible to InternalCI 2020.4 * Made changes for ngraph * Disabled test for mask,fastercnn,tinyyolov3 * Removed proxy for ci * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * Updated documentation for 2020.4 * Removed FP32 to FP16 transformation for GPU * Disabled Coreml-FNS-Candy model test * Added FP16 transformations Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: intel <you@example.com> Co-authored-by: gundaarx <aravindx.gunda@intel.com>	2020-08-19 23:18:08 -07:00
Thiago Crepaldi	42408aa3ed	Add new PytTrch front-end (#4815 ) * Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-17 09:45:25 -07:00
Changming Sun	5eec4f66ed	Refactor manylinux docker image and the related pipelines (#4751 ) 1. Publish the image ACR, instead of building it every time for every PR 2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect. 3. Split nuphar and DNNL to separated pipelines. 4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc. 5. Update the manylinux2010_x86_64 image to the latest.	2020-08-17 09:40:31 -07:00

... 2 3 4 5 6 ...

486 commits