Commit graph

640 commits

Author SHA1 Message Date
liqunfu
6ed12402a4
Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-25 15:15:26 -08:00
ashbhandare
60c772e2bc
Megatron checkpointing (#6293)
* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:
2021-01-22 11:26:47 -08:00
Guoyu Wang
eb946c4177
Unblock Android CI code coverage failure (#6393) 2021-01-20 21:26:10 -08:00
pengwa
453431f7bb
Add max_norm for gradient clipping. (#6289)
* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests
2021-01-21 01:01:11 +08:00
Hariharan Seshadri
d7bdd96425
Refine auto_pad based pad computation in ConvTranspose (#6305) 2021-01-19 19:01:49 -08:00
Scott McKay
e54e2f969d
Use readelf for minimal build binary size checks. (#6338)
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function
2021-01-15 07:46:02 +10:00
Changming Sun
ea6789b754
Add PREfast to python packaging pipeline (#6343)
* Add PREfast to python packaging pipeline
2021-01-14 10:39:24 -08:00
Guoyu Wang
e35db194e3
fix the pipeline failure (#6346) 2021-01-14 00:33:22 -08:00
Edward Chen
042053c55e
Add support for running Android emulator from build.py on Windows. (#6317) 2021-01-13 19:21:49 -08:00
liqunfu
aeca96caba
Liqun/enable pipeline parallel test (#6331)
enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-13 10:24:04 -08:00
Dmitri Smirnov
6b73bae035
Java: add Semmle to Java publishing pipelines (#6326)
Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.
2021-01-12 15:12:13 -08:00
Ashwini Khade
0ed56d491a
fix opset imports for function body (#6287)
* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix
2021-01-12 13:44:36 -08:00
Changming Sun
c43ca45c4f
Force reinstall onnx python package on Windows (#6309) 2021-01-11 22:12:56 -08:00
Chun-Wei Chen
84024bdfa9
Enable ONNX backend test of SequenceProto input/output (#6043)
* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name
2021-01-11 11:30:33 -08:00
Changming Sun
5084ce0969
Update nuget build (#6297)
1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.
2021-01-11 10:49:05 -08:00
Edward Chen
04287ec770
Increase timeout for Linux GPU CUDA11 build. (#6280) 2021-01-07 15:44:42 -08:00
baijumeswani
93bf7c4d52
Documentation for distributed CI tests pipeline (#6140) 2021-01-04 10:09:39 -08:00
Changming Sun
1685167e46
Update manylinux docker image to the latest (#6242) 2020-12-31 19:57:04 -08:00
Xavier Dupré
cd14c1af29
Support double for operator ArgMin (#6222)
* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt
2020-12-31 11:25:46 +01:00
Xavier Dupré
84addcd2cf
Support double for operator ReduceMean, ReduceLogSumExp (#6217)
* Support double for operators ReduceMean, ReduceLogSumExp
2020-12-31 11:24:54 +01:00
Changming Sun
3911105f09 Remove python 3.5 2020-12-30 20:16:45 -08:00
Changming Sun
1b23b28706
Remove MKLML/openblas/jemalloc build config (#6212) 2020-12-30 17:18:19 -08:00
sfatimar
7347996942
Openvino ep 2021.2 (#6196)
* Enabling fasterrcnn variant and vehicle detector

* changes for 2021_2 branch

* yolov3_pytorch commit

* fixed braces in basic_backend.cc

* ci information added

* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2

* some changes to support unit tests

* disable some tests which are failing

* fix myriad tests for vehicle detector

* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* yolov3 pytorch workaround to ensure that the output names are matched

* gemmoptest fixed on myriad

* Fixed MYRIADX CPP Test Failures

*Expand,GatherND,Range,Round op's
are only supported in model

*where op with float input data
types are not supported and fixed

*Scatter and ScatterElements op's with
negative axis are fixed

*Reshape op with 0 dim value are not
supported and fixed

*Disabled InstanceNorm_2 test on MYRIADX

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* make changes to yolov3 pytorch

* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes POW op failures on GPU_FP16

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Clean up capability_2021_2.cc

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fixed slice and removed extra prints

* Disabled failing python tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes added in capabilty_2021_2

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* made changes to slice to avoid failures

* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for Inferencing a FP16 Model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fix for mask rcnn

* Script for installing openvino from source

* Updated with openvino 2021.2 online installation

* code comment fixes
fixed accuracy mismatch for div

* Update OpenvinoEP-ExecutionProvider.md

updated for 2021.2 branch

* Update README.md

updated dockerfile documentation

* Update BUILD.md

build.md update documentation

* permissiong change of install_openvino.sh

* made changes to align with microsoft onnxruntime changes

* Updated with ov 2021.2.200

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
2020-12-23 08:47:22 -08:00
satyajandhyala
201d0dbb1a
Android coverage dashboard (#6163)
* Write the report to a file.

* Post code coverage to the Dashboard database.
2020-12-21 10:34:01 -08:00
Edward Chen
cd3a5acca0
Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
2020-12-18 19:01:02 -08:00
Changming Sun
344a2a8ee5
Revert "work around of the build break in mac (#6069)" (#6150)
This reverts commit 3cae28699b.
2020-12-16 14:41:18 -08:00
Sheil Kumar
a6a23db130
Enable C# .NET5 for WinML (#6120)
* build for .net5

* only reference cswinrt for .net5

* remove netstandard2.0 references

* upgrade language version

* net5

* remove extra comment closure

* add targetframework

* set target framework

* remove net*

* pep8 errors

* make test project build with .net windows SDK projection

* disable c# builds for non-x64 builds

* fix pep8 errors

* disable for store build

* fix tests

* remove cswinrt and sdk references from package

* bump cswinrt down to 1.0.1

* fix bin path

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-12-14 15:05:15 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
Vincent Wang
7ddeafdfcc
Add ReduceL2Grad and ClipGrad (#5970)
* ReduceL2Grad and ClipGrad.

* fix win build and amd ci pipeline

* resolve comments.

Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-12-10 11:03:26 +08:00
Edward Chen
e357486707
Fix build definition template typo, add logging (#6065)
Fix a typo in tools/ci_build/github/azure-pipelines/templates/get-docker-image-steps.yml.
Add logging to tools/ci_build/get_docker_image.py for easier debugging.
2020-12-08 15:16:50 -08:00
satyajandhyala
f68a256140
Android code coverage (#6061)
* Added Onnxruntime_GCOV_COVERAGE flag for Android.

* Set CMAKE_SYSTEM_NAME explicityly for Android.

* Added GCOV_PREFIX option to collect code coverage data.
Added a new python script to generate code coverage info.
Modified build pipeline to geneate Android code coverage info

* Added build command line option --android_coverage

* Added a comment describing the GCOV environment variables

* Fixed PEP8 issues.

* Added --android_coverage option to the build command.

* Increased Android emulator memory from 3K to 8K.

* Increased Android partition-size from 2GB to 4GB to overcome no-space-left-on-device error

* Removed source_dir from command line args.

* Use cwd absolute path to run tests.

* Added commands to output the contents of /data/local/tmp on the emulator.

* Added run_adb_shell function.

* Format changes.

* Removed keywd argument cwd.

* Removed Android in the --build_dir path.

* Removed commands added for debugging.

* Removed exxtra new-lines.

* Fix MacOs build pipeline failures by uninstalling openssl before running build script.

* Revert "Fix MacOs build pipeline failures by uninstalling openssl before running build script."

This reverts commit 90d0568fe533e9456c20d061a2d435c8fea48266.

* Change dir to the build directory where the tar file is copied.

* Changed the option from --android_coverage to --code_coverage

* Moved steps to generate Android code coverage to run_nnap_code_coverage.sh

* Require --android option if --code_coverage is specified.

* No code coverage needed for onnx_test_runner.

* Expect that the emulator is running when the script is executed.

* Fixed the title in the buildpipeline step.

* Fixed the formatting issue.

* Added a command line argument, ORT_ROOT, to run_nnapi_code_coverage.sh script

Co-authored-by: Satya Jandhyala <satyajandhyala@Satyas-Mac-mini.local>
2020-12-08 10:55:02 -08:00
Suffian Khan
e35211c0ff
Fix AMD GPU pipeline by adjusting reference /opt/rocm-3.9.0 => /opt/rocm (#6063)
* use /opt/rocm instead

* fix indent
2020-12-08 08:53:20 -08:00
Yufeng Li
3cae28699b
work around of the build break in mac (#6069)
* Fix the build break in macos release

* revert android change
2020-12-07 20:39:36 -08:00
Edward Chen
b348538c8a
Update build docker image cache cleanup (#6048)
The current image cache cleanup is not removing many images. Upon examining the cache container registry logs, it appears there are some infrequent pulls of old images which may be made by something other than CI builds (perhaps some automated scan of the registry).
This change adds a minimum access count for images in the cache so that infrequently but periodically accessed images can be removed. The idea is that images used by CI builds that are worth caching will have a higher volume of accesses.
2020-12-07 13:07:19 -08:00
Changming Sun
925879a8b0
Remove python 3.8 Windows GPU build from python packaging pipeline (#6054)
Revert the last a few changes to get the pipeline back to a normal state.
2020-12-07 10:23:07 -08:00
Edward Chen
d8139814fd
Clean up builds (#6015)
Update training Python packaging build to use get_docker_image.py.
Remove BUILD_EXTR_PAR docker build argument.
Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.
2020-12-04 15:13:17 -08:00
Jesse Benson
98ea7372d3 Re-enable Lamb unit tests for AMD 2020-12-03 13:06:34 -08:00
Edward Chen
6572a4d306
Disable Python 3.9 for training Python packaging build. (#6012)
Disable Python 3.9 for training Python packaging build. Python 3.9 is not supported by the PyTorch dependency.
2020-12-03 11:42:28 -08:00
Edward Chen
6d642a3dba
Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906) 2020-12-01 19:10:23 -08:00
Changming Sun
2d9dcc4576
Add python 3.9 support (#5874)
1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.
2020-11-30 12:02:48 -08:00
Changming Sun
5fdd9f0fd2
Fix Python Linux GPU package name (#5943)
Fix Python Linux GPU package name. I accidentally added "noopenmp" to it.
2020-11-25 17:46:11 -08:00
Edward Chen
7546d251e0
Expose parameters in clean build Docker image cache build. (#5941)
Expose some parameters in the clean build Docker image cache build. In particular, whether to do a dry-run and the lifetime of unused cache images.
2020-11-25 14:15:54 -08:00
Ashwini Khade
705d093167
Update onnx (#5720)
* update onnx

* update docker image for testing
2020-11-24 11:20:15 -08:00
Suffian Khan
9b8189dd0a
Rework AMD CI pipeline to use pool AMD-GPU and disable more tests in order to enable it. (#5885)
Move AMD test pipeline to use self-hosted pool AMD-GPU. For time being, remove failing/flaky unit tests for AMD pipeline.
2020-11-24 09:38:14 -08:00
Guoyu Wang
846c5fb917
Report arm64 minimal baseline binary size only for continuous integration (#5913)
* Report binary size only for continuous integration
2020-11-24 20:24:08 +10:00
Guoyu Wang
4137c18d9b
Add ORT minimal with NNAPI EP to Android CI (#5890)
Description: Add ORT minimal with NNAPI EP to Android CI

Motivation and Context

The added build/test to Android CI will only run UT, additional onnx_test_runner with customer .ort models will be added later
2020-11-23 18:21:34 -08:00
Edward Chen
5e8fcda24a
Build docker image cache fixes. (#5902)
Fix Python 3.5 compatibility issue in tools/ci_build/get_docker_image.py.
Fix line endings in tools/ci_build/github/azure-pipelines/clean-build-docker-image-cache-pipeline.yml.
2020-11-23 14:43:12 -08:00
baijumeswani
208f4c1d3c
Azure ci pipeline for distributed environment tests (#5881) 2020-11-23 14:01:00 -08:00
satyajandhyala
353e071b7e
Fuzz testing misc (#5862)
* Run only required steps relevant to fuzz testing.

* Exit status non-zero for any uncaught exception other than ort_exception in the driver code
Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
2020-11-23 13:43:44 -08:00
Guoyu Wang
26e6ced172
Temporary fix for Android CI failure (#5889)
* Unblock the Android CI

* Add python to android ci's command
2020-11-21 17:58:32 +10:00
Dmitri Smirnov
ceedf5630b
Document all C# API pubic interfaces (#5853)
Address documentation shortcomings.
 Document all required public interfaces.
 Add pipeline configuration.
Make Doxygen lookup a env vars for paths.
2020-11-20 14:03:55 -08:00
Edward Chen
bef06dac93
Automatically clean up build docker image cache. (#5843)
Follow up to #5811 to automate cleanup of the build docker image cache.
Added a script and build definition to clean up docker images that haven't been accessed recently.
2020-11-20 11:56:26 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Hariharan Seshadri
62508ef0e4
Revert "Remove MKLML build config (#5559)" (#5855) 2020-11-19 10:53:08 -08:00
Changming Sun
26db396b4b
Reduce the number of CI build variants (#5856) 2020-11-18 20:41:30 -08:00
satyajandhyala
b495ae8103
ORT fuzz testing (#5771)
* Added fuzz testing using ORT model.

* The onnxruntime_security_fuzz driver code should accept either ONNX or ORT (based on the file extension) input file if /f flag is provided.

*  Added ValidateOrtFormatModelDoesNotRunOptimizersInFullBuild test.

* Added win-ci-fuzz-testing.yml to run build pipeline.

* Prevent out-of-range access in the graph.cpp.
2020-11-18 16:07:36 -08:00
Changming Sun
85f945a875
Regenerate CI build docker images (#5850) 2020-11-18 14:36:59 -08:00
Edward Chen
71e7c2b423
Cache build docker images in container registry. (#5811)
This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.

Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.

With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.

The cache container registry will need to be cleaned up periodically. This is not automated yet.
2020-11-17 17:02:24 -08:00
Justin Stoecker
bd236ecc26
Switch to unified DirectML 1.4.0 redistributable (#5794)
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
2020-11-17 13:42:23 -08:00
Scott McKay
c84bc25e28
Add validation of op registrations (#5817)
* Add validation of operator registrations to the reduction script
  - the script has all the logic to process the registrations, and there's a CI that uses it

Fix some operator registrations

* Fix CUDA PRelu registration

* Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script.
Run op validation in minimal build CI

* Fix PEP8 error and some comments
2020-11-17 10:44:09 -08:00
Sherlock
241b2226a7
Update orttraining-linux-gpu-ci-pipeline.yml for Azure Pipelines (#5826) 2020-11-17 09:27:59 -08:00
Guoyu Wang
1a66dfc0f9
Enable Squeeze Opset 13 for NNAPI (#5717)
* Add copy sparse model in minimal CI

* Add squeeze 13 support

* fix small typo

* Add ut for squeeze in NNAPI

* Fix some issue in the UT and code

* Modify based on the master change

* Fix build break
2020-11-17 00:26:06 -08:00
edgchen1
4d517c68a3
Fix reference to old download_e2e_test_data.py script. It was renamed to download_azure_blob.py. (#5790) 2020-11-12 15:48:06 -08:00
liqunfu
1416d12f0b
Liqun/merge e2e pipelines (#5702)
* Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-11 09:42:08 -08:00
Changming Sun
79350a642a
Update install_deps.sh: remove the unnecessary data generating step (#5758)
We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.
2020-11-10 22:19:03 -08:00
Changming Sun
4094a09a56
Merge pull request #5731 from microsoft/snnn/rtti
Disable RTTI in Windows GPU CI pipeline
2020-11-10 09:02:59 -08:00
Edward Chen
919c270f3c Increase build timeouts. 2020-11-09 22:26:27 -08:00
Chi Lo
92292de135
Tensorrt perf tool (#5436)
* Add YAML file for pipeline

* Modify typo

* Add working directory

* Modify and test

* Modfiy and test

* Modify and test

* Modify and test

* Modify

* Modify

* Modify

* Modify

* Make sure to copy all the result files

* Add clearn up

* Modify

* Modify agent pool name

* Upload only specific artifacts

* Modify

* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target

* Fix bug

* Fix bug

* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation

* Modify the script file for CI

* Replace print with logger.info

* Fix bug

* Fix bug

* Refine the code

* Modify the script so that it can capture script segmentation fault while
running ORT

* Fix bug

* fix bug

* fix bug

* Add debug info

* fix bug

* Refine perf code

* Refine the code

* fix bug

* Code refactoring

* change many-models path

* remove metadata after validation/benchmark are done

* Update README.md

* Fix bug so that metadata doesn't hold stale value

* Remove hardcode and update README

* Add arguments to the script to make it run correctly

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Fix bug so that metadata doesn't hold stale value

* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information

* use -i random for perf test of TRT changes

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-11-06 12:27:42 -08:00
RandySheriffH
71f90e08f1
Nuget packaging no omp (#5666)
* create new nuget packaging pipeline without openmp

* rename package

* update image name

* rename package name

* rename managed package

* reset project attribute

* merge master

* set package name

* set NoOpenMP as cpu build

* shorten line length

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-11-06 11:43:35 -08:00
Tiago Koji Castro Shibata
9e68e98423
Add static CRT DLLs to Nuget package (#5661)
* Add static runtime yaml option

* Add to WAI Nuget build matrix

* Support empty build flags

* Add DML to x64

* Bundle static rt

* Bundle after Nugets are built

* Fix typo

* Skip static tests

* Pack test artifact only in x64 dynamic

* No DML static runtime

* Add Store static

* Revert "Add Store static"

This reverts commit 69133e5838.

* Static subfolder
2020-11-05 09:26:17 -08:00
Changming Sun
357a51c75c
Update python packaging pipeline's docker image (#5680) 2020-11-03 12:01:36 -08:00
Ashwini Khade
1cca903680
update onnx commit id (#5594)
* update onnx commit id

* update onnx commit for docker images

* update docker images
2020-11-02 09:46:36 -08:00
Weixing Zhang
aec4cb489e
ROCm EP for AMD GPU (#5480)
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-10-29 17:13:04 -07:00
Changming Sun
e6956be40c
Publish no-openmp python packages to test pypi (#5610)
Publish no-openmp python packages to test pypi
2020-10-28 19:49:53 -07:00
liqunfu
92662659ba
Liqun/remove number matching (#5606)
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-27 21:27:37 -07:00
Changming Sun
5802fe1699
Remove MKLML build config (#5559)
Remove MKLML build config
2020-10-21 13:11:25 -07:00
Ashwini Khade
df22611026
Update ONNX commit (#5487)
* update ONNX

* update onnx + register kernels for reduction ops

* bug fix kernel reg

* update cgmanifests

* revert unsqueeze op 13 registration

* filter ops which are not implemented yet

* filter some tests

* update onnx commit to include conv transpose bug fix

* update docker images

* undo not required test changes

* fix test failures
2020-10-21 07:22:20 -07:00
Guoyu Wang
915d475353
Android CI update (#5474)
* Update Android CI

* update comments
2020-10-14 16:56:50 -07:00
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
Pranav Sharma
c2c78399ee
Include config keys header file in the release packages for Linux and Mac. (#5388) 2020-10-08 15:00:29 -07:00
Changming Sun
09aef240d6
Skip running onnx tests in python mac os pipeline (#5416) 2020-10-08 11:49:28 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
Wenbing Li
4721729fdc
Enable iOS CI pipeline (#5360)
* add the ios ci build.

* no dependency on mac ci pipeline.

* fix the command line.

* keep sync

* automatically retrieve sdpath

* fix the case errors and warnings

* fix the vlog switch issue.

* add parallel flag for build.

* update the display name of the pipeline.
2020-10-02 20:14:45 -07:00
Guoyu Wang
9df0790856
Update linux minimal CI to report Android mininal baseline binary size (#5361)
* Update linux minimal CI to report Android mininal baseline binary size

* Fix some issues in the script
2020-10-02 17:35:23 -07:00
edgchen1
d62873a331
Docker image release build updates (#5326)
- Update docker image release build to use build commit.
- Use valid default in component governance detection step.
- Use smaller docker build context.
2020-10-01 12:25:31 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
Dmitri Smirnov
89742411ec
Insert telemetry template into GPU build, add telemry build switches. (#5278) 2020-09-24 17:13:09 -07:00
edgchen1
6d5b93b805
Synchronize training dependency versions between Docker image and Python wheel. (#5261)
Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.
2020-09-23 19:03:42 -07:00
suffian khan
417929b049 jobs timeout .. 2020-09-21 21:51:59 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
Guoyu Wang
78a29aebbc
[ORT Mobile] ORT Minimal E2E CI (#5200)
* Modify the ort minimal CI to ort minimal e2e ci
2020-09-19 18:43:22 +10:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
liqunfu
f37e1292a1
--shm-size=1024m to fix nccl shared memory issue (#5214)
* --shm-size=256m to fix nccl shared memory issue

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 17:21:47 -07:00
Guoyu Wang
8156e0dd10
[ORT Mobile] Some updates to iOS/Android build settings (#5184)
* Update android CI and build settings

* add build_java to arm64 also

* Add ios signing param

* fix a small build warning

* address pr comments
2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata
1a2e289d2d
Fix nuget build (#5163)
* Fix nuget content

* Revert "Fix nuget content"

This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443.

* Nuget packaging

* skip tests

* msbuild path

* Force msbuild version

* Workaround https://github.com/NuGet/Home/issues/7621

* cleanup
2020-09-16 10:37:09 -07:00
Changming Sun
a0a435abc6
Add sympy==1.1.1 to Linux docker image (#5177) 2020-09-15 16:08:49 -07:00
Scott McKay
089789c135
Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168) 2020-09-15 15:11:06 +10:00
RandySheriffH
1dde215d96
promote cuda version on packacking pipelines (#5154)
* promote cuda version on packacking pipelines

* fix cudnn version in py packaing template

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-14 21:09:09 -07:00
RandySheriffH
9392aa2f64
Promote Cuda version to 10.2 for windows pipelines (#5138) 2020-09-13 20:32:06 -07:00