Commit graph

986 commits

Author SHA1 Message Date
liqunfu
dd8ef4409a
Liqun/migrate perf test (#6733)
move ort training perf tests to azure devops
2021-02-17 17:48:47 -08:00
Thiago Crepaldi
3184c47ad1 Merge branch 'master' into thiagofc/merge-from-master 2021-02-17 11:49:52 -08:00
Changming Sun
0be5475de6
Update packaging pipelines(#6664) 2021-02-17 09:53:36 -08:00
Changming Sun
46c06f6ac7
Change Windows GPU CI pipeline to CUDA11 (#6616) 2021-02-17 09:44:44 -08:00
RandySheriffH
9043df8b66
Deprecate OMP from nuget pipeline (release:1.7) (#6671)
* deprecate OMP from nuget

* remove omp build

* remove

* add openmp build

* add variants

* rename package

* move GPU to no omp pipeline

* reset path

* switch to abs path

* reset path

* add cpu package

* remove obsolete name

* set package name

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-02-17 00:03:44 -08:00
baijumeswani
01dfa8e125
Support non tuple return values from torch.nn.module (#6660)
* Support dictionary, namedtuples and huffingface ModelOutput type for model return values
2021-02-16 20:48:32 -08:00
Scott McKay
02c7873b0e
Update ORT model conversion script to support custom ops (#6701)
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
2021-02-17 12:52:39 +10:00
RandySheriffH
c36ee4bd40
Rename Python packaging pipelines (#6682)
* rename pipelines

* resync and rename

* resync master

* rename package id

* remove OrtPackageId which is for nuget

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-02-16 11:43:03 -08:00
RandySheriffH
497eef8d3d
remove omp (#6675)
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-02-16 11:42:32 -08:00
Changming Sun
d48a4c0a54
Add CG step to nuget GPU pipeline (#6678) 2021-02-16 09:46:20 -08:00
Changming Sun
9a01174037
Disable some unit tests for training (#6699) 2021-02-16 09:45:59 -08:00
RandySheriffH
df3d6bad5f
Deprecate OMP from Python package (#6610)
1. For previous openmp build, remove --use_openmp, so thread pool will become default;
2. For previous non-openmp build, add --use_openmp and rename the package to indicate the inclusion.
3. Add a mac build with openmp enabled.
2021-02-12 21:50:41 -08:00
Scott McKay
25f7c93504
Require explicit inclusion of custom op support in a minimal build (#6663)
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
2021-02-13 12:42:33 +10:00
Changming Sun
dd50c39ac6
Change Linux python packaging pipeline compile flags (#6668) 2021-02-12 15:28:56 -08:00
Sheil Kumar
87cb6fd495
Add LearningModelBuilder to WinML Experimental Namespace along with various Audio operators (#6623)
* model building

* fix build

* winml adapter model building api

* model building

* make build

* make build again

* add model building with audio op

* inplace and inorder fft

* add ifft

* works!

* cleanup

* add comments

* switch to iterative rather than recursive and use parallelization

* batched parallelization

* fft->dft

* cleanup

* window functions

* add melweightmatrix op

* updates to make spectrogram test work

* push latest

* add onesided

* cleanup

* Clean up building apis and fix mel

* cleanup

* cleanup

* naive stft

* fix test output

* middle c complete

* 3 tones

* cleanup

* signal def new line

* Add save functionality

* Perf improvements, 10x improvement

* cleanup

* use bitreverse lookup table for performance

* implement constant initializers for tensors

* small changes

* add matmul tests

* merge issues

* support add attribute

* add tests for double data type windowfunctions and minor cleanup

* stft onesided/and not tests

* cleanup

* cleanup

* clean up

* cleanup

* remove threading attribute

* forward declare orttypeinfo

* warnings

* fwd declare

* fix warnings

* 1 more warning

* remove saving to e drive...

* cleanup and fix stft test

* add opset picker

* small additions

* add onnxruntime tests

* add signed/unsigned

* fix warning

* fix warning

* finish onnxruntime tests

* make windows namespace build succeed

* add experimental flag

* add experimental api into nuget package

* add experimental api build flag and add to windows ai nuget package

* turn experimental for tests

* add minimum opset version to new experimental domain

* api cleanup

* disable ms experimental ops test when --ms_experimental is not enabled

* add macro behind flag

* remove unused x

* pr feedback

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-02-12 14:17:10 -08:00
Suffian Khan
e6de0eb813
Add nightly pipeline for MI100 to run convergence and batch size test similar to V100. (#6611)
* Partial updating of ROCM reduction code.

* Update reduction_all.cu

* Add reduce template parameters.

* miopen common

* Reuse CUDA's reduction_functions.cc

* Reduction ops.

* Update remaining reduction ops to use MIOpen.  double datatype is not supported, so disable those typed kernels.

* Disable a couple more unsupported tests.

* Code formatting.

* Delete ROCM-specific reduction code that is identical to CUDA reduction code.

* Fix scratch buffer early free.

* Fix merge conflict.

* first attempt nightly amd ci pipeline

* try fix bad yaml file

* try again with corrected model directory

* add convergence test as well

* update reference loss for amd mi100

* include mi100 test results csv

* update the mi100  convergence test reference values

* update batch sizes for mi100 32g

* fix gpu sku for run_convergence_test.py

* undo unrelated changes to master

* pr comments

* pr comment

Co-authored-by: Jesse Benson <jesseb@microsoft.com>
2021-02-12 13:22:06 -08:00
Changming Sun
8378a45ae7
Add python 3.8/3.9 support for Windows GPU and Linux ARM64 (#6615)
Add python 3.8/3.9 support for Windows GPU and Linux ARM64

Delete jemalloc from cgmanifest.json.

Add onnx node test to Nuphar pipeline.

Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate.

Delete Java GPU packaging pipeline

Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure.

Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch.

Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits

And, due to some internal restrictions, I need to rename some of the agent pools
2021-02-11 16:43:35 -08:00
Changming Sun
042964f633 Change how ONNX get installed 2021-02-10 14:41:26 -08:00
Edward Chen
e59cb9455e
Add CI build with type reduction enabled (#6622) 2021-02-10 13:31:51 -08:00
Changming Sun
e70344e648
Fix training python packaging pipeline (#6613)
In a previous PR, I set the docker file name to a wrong value.
2021-02-09 11:04:39 -08:00
Changming Sun
0b89f931d0
Update CUDA build configs (#6598)
1. Fix Nuget package build break caused by #6225
2. Delete Dockerfile.centos_gpu. It is not used anywhere.
3. Fix Linux CUDA 10.2 build error caused by glibc upgrade
2021-02-08 22:55:42 -08:00
Xavier Dupré
d3a2c8c1c7
Support double for operators ReduceMax, ReduceMin (#6265)
* Support double for operators ReduceMax, ReduceMin

* add unit test to pai-excluded-tests.txt

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-02-08 19:14:26 -08:00
Randy Shuai
ff063309b0 enable omp for debug build 2021-02-08 19:10:13 -08:00
Randy Shuai
6c5f50d00e deprecate omp in ci 2021-02-08 19:10:13 -08:00
Jesse Benson
d18aa45b46 Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models. 2021-02-06 14:40:34 -08:00
Changming Sun
b5bd14fc9f
Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585)
Update gpu packaging pipelines to CUDA11

In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
2021-02-05 16:58:37 -08:00
Chun-Wei Chen
f2ce3aae13
add set_model_dir and update ONNX (#6119) 2021-02-05 09:30:49 -08:00
Jesse Benson
21a47ec8d9 Disable a couple more unsupported tests. 2021-02-04 15:00:05 -08:00
Jesse Benson
0b147702af Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels. 2021-02-04 15:00:05 -08:00
Jesse Benson
a28ddb85b6 Reduction ops. 2021-02-04 15:00:05 -08:00
Changming Sun
aa31ba5774
Merge CPU packaging pipelines (#6480)
1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one.
2. Enable compile warnings for cuda files(*.cu) on Windows.
3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code.
4. Fix some warnings in the training code.
5. Enable code sign for Java. Previously we forgot it.
6. Update TPN.txt to remove Jemalloc.
2021-02-04 08:38:56 -08:00
Guoyu Wang
6cf54ff296
Switch Android CI java build to JDK 11 (#6552)
* switch to jdk11

* fix java

* Update
2021-02-03 17:49:23 -08:00
baijumeswani
62ac164279
Cache datasets on CI machines (#6525) 2021-02-02 21:11:35 -08:00
ashbhandare
85434273ff
Fix CUDA Reduction kernel for ArgMax/ArgMix for when reduction dim=1 (#6490)
* Fix for when reduction dim=1

* Disable test for AMD GPUs

* Specify Async
2021-02-02 09:50:16 -08:00
Thiago Crepaldi
8a890ddfd7
Sync ORTModule branch with master and fix tests (#6526)
* Deprecate Python global configuration functions [Part 1] (#5923)

Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.

* remove dnnl_dll_path from post build copy (#6142)

* Model Fusion For Bart (#6105)

Fusion fix for Bart models

* Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)

* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.

* Enable running the mnist_training sample without cuda (#6085)

Signed-off-by: George Nash <george.nash@intel.com>

* nnapi add min max support (#6117)

* Fix CUDA test hang: (#6138)

- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.

* Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)

* add static subgraph kernel index

* change kernel naming to avoid conflicts

* Add gradient registration for Abs. (#6139)

* Partition initial optimizer state for Zero-1 (#6093)

* Initial changes

* Working changes

* Working changes

* Cleanup

* fix windows CI

* Review comments

* review comments

* Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)

#4656

* Revert "work around of the build break in mac (#6069)" (#6150)

This reverts commit 3cae28699b.

* Fix clean_docker_image_cache.py detection of image pushes. (#6151)

Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.

* MLAS: add NEON version of int8 depthwise convolution (#6152)

* Using a map of of ops to stages as input of partition function. (#5940)

* New partition algorithm running before AD

* Convert cut_group_info into device map. Work in progress -- works for  bert-tiny with pp=2

* Removing code for partition of bwd graphs

* Remove old code

* Adding some verification code

* Handle Shared Initializer

* Renaming rank with stage

* Added first unit test

* new test

* redundant check

* undo change in bert

* Moved cut-based partition to testing utils file

Co-authored-by: xzhu1900
Co-authored-by: wschin

* New conversion function and tests

* minor

* remove test that is not needed2

* improve GetDeviceAssignment and PR comments

* minor changes

* PR comments

* improving documentation and variable naming

* add documentation

* Variable naming and docs

* more doc improvements

* more doc improvements

* missing static cast

* Fix test file for windows

* Fix test file for windows

* Fix test file for windows

* stage id is not the same as rank id

* PR comments

* PR comments

* More comments

* More comments

* Minor fix to satisfy c++14 (#6162)

* Deprecating Horovod and refactored Adasum computations (#5468)

deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests

* Update TensorRT-ExecutionProvider.md (#6161)

* Bugfix for topk cuda kernel (#6164)

* fix the issue that std::numeric_limits cannot handle half type

* adding a test

Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)

This reverts commit f2dcba7afe.

* Remove ignored build warnings for pybind on Mac (#6165)

* save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)

* save_checkpoint and load_checkpoint implementations

* checkpoint aggregation logic

* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints

* Don't try to bind unused inputs in the Training frontend (#6166)

* Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172)

* aggregate model states only for the case when mixed precision was true (#6176)

* [NNAPI EP] Enable per-channel quantization for QlinearConv  (#6155)

* Enable qlinearconv per-channel quantization

* Fix the android CI test failure

* Add Android Version Check for Per-Channel Quant

* Address PR comments

* Fix some minor issues

* Add verification of per-channel zero points

* Make the error tolerance configurable

* Fix typo in BERT pretraining script (#6175)

A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.

* Update get_docker_image.py to enable use without image cache container registry. (#6177)

Update get_docker_image.py to enable use without image cache container registry.

* Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156)

* Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions.

Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach).

* Restructure the helper so it can be called across the EP bridge.
Add ability to call id generation helper from EP bridge
  - convert DNNL EP to use helper to validate
Address issue where a new Model may be loaded into the same address as a previous one.
  - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model
Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time.
  - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.

* Backend APIs for checkpointing (#5803)

* Add backend API GetOptimizerState and GetModelState

* add GetPartitionInfoMap

* Android coverage dashboard (#6163)

* Write the report to a file.

* Post code coverage to the Dashboard database.

* Add usage details of unified MCR container image (#6182)

Going forward, a single unifed docker image will be published in
MCR. The hardware accelerator target choice will have to be made
in the application using OpenVINO EP's runtime config options.

* improve perf for softmax (#6128)

* improve perf for both gathergrad and softmax

* revert the change in gathergrad and will be done in another PR.

* address comments from code review.

* Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)

* tune fast gelu to use exp(x) instead of tanh(x) on rocm

* update to use expression 2/(1+exp(-2x))-1 for stability

* Add Status.csv to EP Perf Tool (#6167)

* merge master, keep postprocess status commit

* download float16.py everytime

* removing hardcoded values

* Lochi/quantization tool for trt (#6103)

* Initial implementation of generating calibration dynamic range table

* Initialize validation support for Quantization

* Initialize validation support for Quantization (cont.)

* Improve validation support for Quantization

* Improve validation support for Quantization

* Rewrite/Refine for calibration and validation

* Rewrite/Refine for calibration and validation (cont.)

* Refine code

* Refine code

* Add data reader for BERT

* Add flatbuffers to serialize calibration table

* Refine code and add BERT evaluation

* Refine the code

* minor modification

* Add preprocess/postprocess of vision team yolov3 and refine the code

* Update annotation

* Make bbox cooridates more accurate

* Fix bug

* Add support of batch processing

* Batch processing for model zoo yolov3

* Add batch inference for evaluation

* Refine the code

* Add README

* Add comments

* Refine the code for PR

* Remove batch support checking in data_reader and refine the code

* Refine the code for PR

* Refine the code for PR review

Co-authored-by: Olivia Jain <oljain@microsoft.com>

* Implement ScatterND for CUDA EP (#6184)

* Condition fix in Resize operator (#6193)

* Clean up checkpoint tests to use the new checkpoint functions (#6188)

* add deprecation warning for old checkpoint functions

* update all the distributed checkpoint tests to use new checkpoint functions

* Implement comparing outputs that are sequence of maps of strings to floats (#6180)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Dockerfile to build onnxruntime with ROCm 4.0

* Add ability to skip GPU tests based on GPU adapter name (#6198)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Add ability to skip gpu tests according to adapter description

* spacing

* spacing

* spacing

* Openvino ep 2021.2 (#6196)

* Enabling fasterrcnn variant and vehicle detector

* changes for 2021_2 branch

* yolov3_pytorch commit

* fixed braces in basic_backend.cc

* ci information added

* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2

* some changes to support unit tests

* disable some tests which are failing

* fix myriad tests for vehicle detector

* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* yolov3 pytorch workaround to ensure that the output names are matched

* gemmoptest fixed on myriad

* Fixed MYRIADX CPP Test Failures

*Expand,GatherND,Range,Round op's
are only supported in model

*where op with float input data
types are not supported and fixed

*Scatter and ScatterElements op's with
negative axis are fixed

*Reshape op with 0 dim value are not
supported and fixed

*Disabled InstanceNorm_2 test on MYRIADX

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* make changes to yolov3 pytorch

* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes POW op failures on GPU_FP16

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Clean up capability_2021_2.cc

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fixed slice and removed extra prints

* Disabled failing python tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes added in capabilty_2021_2

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* made changes to slice to avoid failures

* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for Inferencing a FP16 Model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fix for mask rcnn

* Script for installing openvino from source

* Updated with openvino 2021.2 online installation

* code comment fixes
fixed accuracy mismatch for div

* Update OpenvinoEP-ExecutionProvider.md

updated for 2021.2 branch

* Update README.md

updated dockerfile documentation

* Update BUILD.md

build.md update documentation

* permissiong change of install_openvino.sh

* made changes to align with microsoft onnxruntime changes

* Updated with ov 2021.2.200

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>

* Fix a memory leak in test_inference.cc (#6201)

* Fix a memory leak in test_inference.cc

* Use TArray in AMD element-wise kernels, rather than manually copying memory to device.

* Remove most ROCm-specific element-wise code and reuse CUDA element-wise code.

* Minor change to improve performance for operator Pad. (#5537)

* small improvment for pad

* Support double for operators Log, Reciprocal, Sum (CPU) (#6032)

* Support double for operators Log, Reciprocal, Sum
* remove tesdt erf_double

* Support double for operators Where, LpNormalisation (#6034)

* Support double for operators Relu, Tanh, Sigmoid (#6221)

* Fix ImportError in build.py (#6231)

There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already

* Removed executor todo that looks dead. (#6234)

* Remove MKLML/openblas/jemalloc build config (#6212)

* Remove python 3.5

* Update the readme file

* Upgrade build.py to assert for python 3.6+

Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.

* Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233)

* MLAS: handle MlasGemm(M/N/K==0) cases (#6238)

* Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220)

* Support double for operator TopK
* add static classes for topk/double
* fix cast issue in topk

* Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)

* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm

* Support double for operator ReduceMean, ReduceLogSumExp (#6217)

* Support double for operators ReduceMean, ReduceLogSumExp

* Support double for operator ArgMin (#6222)

* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt

* Update BUILD.md

* Update manylinux docker image to the latest (#6242)

* Fix allocator issue for TensorRT IOBinding (#6240)

* Fix issue: https://github.com/microsoft/onnxruntime/issues/6094

Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt.

Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT

* Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239)

* bias gelu grad use exp(...) instead

* update cuda to rocm

* missing semicolon

* comment

* remove dockerfile

* missing factor of two

* Refactor EP Perf Tool  (#6202)

* merge master, keep postprocess status commit

* download float16.py everytime

* using variables to reference eps

* adding ACL EP to ep perf tool

* accuracy with absolute tolerance configurable

* add acl to dict + remove commented line

* Documentation for distributed CI tests pipeline (#6140)

* Remove a debug log in provider_test_utils.cc (#6200)

* Add the Concat Slice Elimination transform, fix constant_folding transform (#5457)

* Add concat slice transform + test

* Cosmetic improvements in concat slice transform

* Remove unrelated file, fix comment, fix constant folding bug

* Add test onnx graph

* fix windows build

* Review comments

* review comment

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252)

* Add MakeStringLite which uses current locale, update macros to use that to generate messages.

* Convert calls to MakeStringLite().

* Liqun/speech model loop to scan (#6070)

Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* model parallel refinement (#6244)

* Megatron Transformation as a seperate step

* remove useless header

* clang formating

* Re-Structure megatron transformer for subsquent changes

* fix  comments

* Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248)

* Fix Linux/Mac error message on input type mismatch (#6256)

* add bfloat16 to gathergrad type constrains (#6267)

Co-authored-by: Cheng Tang <chenta@microsoft.com>

* Fix VS 2017 build break (#6276)

* Deprecate Python global configuration functions [Part 2] (#6171)

Update Python API to allow more flexibility for setting providers and provider options.

The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.

Update some EP-specific option parsing to fail on unknown options.

Other clean up.

* Add script to preprocess python documentation before publishing (#6129)

* add script to preprocessing python documentation before publishing

* rename past to past_key_values for GPT-2 (#6269)

rename past to past_key_values for transformers 4.*

* Rename MakeString and ParseString functions. (#6272)

Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, *ParseString to *ParseStringWithClassicLocale.
Add missing pass-through versions of MakeStringWithClassicLocale for string types.

* Increase timeout for Linux GPU CUDA11 build. (#6280)

* Add helper to compare model with different precision (#6270)

* add parity_check_helper.py

* add real example

* remove lines

* Fix Min/Max CPU kernels for float16 type (#6205)

* fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)

 fix io binding crash for past_sequence_length=0

* A list of changes in transformers tool (#6224)

* longformer fp16 e2e

* add fp16/fp32 parity check helper file

* excludes nodes with subgraph in profiling

* use onnxconverter_common to do fp32->fp16

* add version check for onnxconverter_common

* remove helper file

* add pkg installation on notebooks and script

* Workaround for static_cast<double>(half)

* Add workaround to remove ROCm-specific binary-elementwise files.

* Update nuget build (#6297)

1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.

* Enable ONNX backend test of SequenceProto input/output  (#6043)

* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name

* add --sequence_lengths option (#6285)

* more dtype for Equal CUDA kernel (#6288)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Force reinstall onnx python package on Windows (#6309)

* update transformers required package versions (#6315)

* Remove abs in LpPool (#6303)

* Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)

* Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.

* Add longformer to  python package (#6314)

* add longformer to python package
* move test related script and data to a new folder

* Avoid false sharing on thread pool data structures (#6298)

Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops.

Motivation and Context
MobileNet on a 2*12-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread).

The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 2*14-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine.

Before change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232

After change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317

* fix opset imports for function body  (#6287)

* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix

* Remove false positive prefast warning from threadpool (#6324)

* Java: add Semmle to Java publishing pipelines (#6326)

Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.

* Quantization support for split operator with its NHWC support (#6107)

* Make split working for quantization.

* NHWC transformer support for split operator

* Refactor some according to Feedback. Will add test cases soon.

* Fix build error on windows.

* Add test case for split op on uint8_t support

* Add nhwc_transformer_test for split uint8_t support

* Some change according to PR feedbacks.

* Liqun/enable pipeline parallel test (#6331)

enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340)

This removes a special case of the cuda EP.

* MLAS: add fallback implementation for quantized GEMM (#6335)

Add a non-vectorized version of the kernel used for the quantized version of MlasGemm.

* Delete float16.py (#6336)

No longer needed. Also doesn't pass policheck.

* Enable add + softmax fusion for Rocm platform (#6259)

* add bias softmax; tests appear to pass

* check fusion occurs for rocm as well

* check for rocm provider compatible as well

* build for cpu scenario as well

* try again; broader cope

* proper scope on kGpuExecutionProvider

* been editing wrong file

* remove commented #include lines

* try again due to mac os ci error

* try again

* test fusion both cuda and rocm to avoid mac ci error

* add external data support to tensor proto utils (#6257)

* update unpack tensor utilities to support loading external data

* more updates

* fix test

* fix nuphar build

* minor build fix

* add tests

* fix Android CI

* fix warning

* fix DML build failure and some warnings

* more updates

* more updates

* plus few updates

* plus some refactoring

* changes per review

* plus some change

* remove temp code

* plus updates to safeint usage

* build fix

* fix for safeint

* changed wording. (#6337)

* Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321)

* remove gemmlowp submodule (#6341)

* [NNAPI] Add pow support (#6310)

* Add support for running Android emulator from build.py on Windows. (#6317)

* fix the pipeline failure (#6346)

* Train BERT Using BFloat16 on A100 (#6090)

* traing bert using bf16

* Adam support bf16

* bugfix

* add fusedmatmul support

* fix after merge from master.

* bugfix

* bugfix after merge from master

* fast reduction for bf16.

* resolve comments

* fix win build

* bugfix

* change header file.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Fix DerefNullPtr issues raised by SDLNativeRules. (#6348)

* update quantize to support basic optimization and e2e example for image classification (#6313)

update the resnet50-v1 to standard one from onnx zoo.
add an example for mobilenet
run basic optimization before quantization
fix a bug in Clip

* Enable graph save for orttrainer (#6333)

* Enable graph save for orttrainer

* Fix CI

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add PREfast to python packaging pipeline (#6343)

* Add PREfast to python packaging pipeline

* fix longformer benchmark io_binding output_buffers (#6345)

* fix longformer benchmark io_binding output_buffers

* format

* import benchmark_helper from parent directory.

* Use readelf for minimal build binary size checks. (#6338)

* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function

* Java: Set C language warnings to W4 and adjust JNI code (#6347)

Set /W3 for C language and fix up JNI warnings.

* Pipeline Parallel Experimental Python API (#5815)

* Add create session to WinML telemetry to track WinML Usage (#6356)

* Fix one more SDL warning (#6359)

* fix -Wdangling-gsl (#6357)

* Add python example of TensorRT INT8 inference on ResNet model (#6255)

* add trt int8 example on resnet model

* Update e2e_tensorrt_resnet_example.py

* remove keras dependency and update class names

* move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example

* simplify e2e_tensorrt_resnet_example.py

* Update preprocessing.py

* merge tensorrt_calibrate

* Update calibrate.py

* Update calibrate.py

* generalize calibrate

* Update calibrate.py

* fix issues

* fix formating

* remove augment_all

* This added telemetry isn't needed (#6363)

* Wezuo/memory analysis (#5658)

* merged alloc_plan

* pass compilation

* Start running, incorrect allocation memory info

* add in comments

* fix a bug of recording pattern too early.

* debugging lifetime

* fix lifetime

* passed mnist

* in process of visualization

* Add code to generate chrome trace for allocations.

* in process of collecting fragmentation

* before rebuild

* passed mnist

* passed bert tiny

* fix the inplace reuse

* fix the exception of weight in pinned memory

* add guards to ensure the tensor is in AllocPlan

* add customized profiling

* debugging

* debugging

* fix the reuse of differnt location type

* add rank

* add the rank

* add fragmentation

* add time_step_trace

* Add summary for each execution step (total bytes, used/free bytes).

* add top k

* change type of top k parameter

* remove prints

* change heap to set{

* add the name pattern

* add the useage for pattern

* add partition

* change to static class

* add custom group

* remove const

* update memory_info

* in process of adding it as runtime config

* change the memory profiling to be an argument

* add some comments

* add checks to recored meomry_info in traaining session

* set the "local rank setting" to correct argument.

* addressing comments

* format adjustment

* formatting

* remove alloc_interval

* update memory_info.cc to skip session when there is no tensor for a particular memory type

* fix memory_info multiple iteration seg-fault

* consolidate mainz changes

* fixed some minor errors

* guard by ORT_MINIMAL_BUILD

* add ORT_MEMORY_PROFILE flag

* added compiler flag to turn on/off memory profiling related code

* clean up the code regarding comments

* add comments

* revoke the onnx version

* clean up the code to match master

* clean up the code to match master

* clean up the code to match master

Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>

* Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)

* Add CumSum-14 for Cuda

* fix convert_common version retrival (#6382)

* Refine auto_pad based pad computation in ConvTranspose (#6305)

* Fix SDL warning (#6390)

* Add max_norm for gradient clipping. (#6289)

* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests

* Add the custom op project information (#6334)

* Dont use default string marshalling in C# (#6219)

* Fix Windows x86 compiler warnings in the optimizers project  (#6377)

* [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376)

* Unblock Android CI code coverage failure (#6393)

* fix build on cuda11 (#6394)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Load the model path correctly (#6369)

* Fix some compile warnings (#6316)

* OpenVino docker file changes to bypass privileged mode

Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device.

Motivation and Context

This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments.

* Megatron checkpointing (#6293)

* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:

* Fix generate_submodule_cgmanifest.py Windows issues. (#6404)

* Continue memory planning when unknown shape tensor is encountered. (#6413)

* Reintroduce experimental api changes and fix remote build break (#6385)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Add support for custom ops to minimal build. (#6228)

* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.

* enable pipeline to run quantization tests (#6416)

* enable pipeline to run quantization tests
setup test pipeline for quantization

* Minor cmake change (#6431)

* Liqun/liqun/enable pipeline parallel test2 (#6399)

* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Farewell TrainableDropout (#5793)

* Deprecate TrainableDropout kernel.

* Update bert_toy_postprocessed.onnx to opset 12.

* Add more dropout tests.

* Fix BiasDropout kernel.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>

* fix null dereference warning (#6437)

* Expose graph ModelPath to TensorRT shared library (#6353)

* Update graph_viewer.cc

* Update tensorrt_execution_provider.cc

* Update graph_viewer.h

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update provider_api.h

* Update provider_bridge_ort.cc

* Update provider_interfaces.h

* Update provider_interfaces.h

* expose GraphViewer ModelPath API to TRT shared lib

* add modelpath to compile

* update

* add model_path to onnx tensorrt parser

* use GenerateMetaDefId to generate unique TRT kernel name

* use GenerateMetaDefId to generate unique TRT engine name

* fix issue

* Update tensorrt_execution_provider.cc

* remove GetVecHash

* Update tensorrt_execution_provider.h

* convert wchar_t to char for tensorrt parser

* update tensorrt parser to include latest changes

* fix issues

* Update tensorrt_execution_provider.cc

* merge trt parser latest change

* add PROVIDER_DISALLOW_ALL(Path)

* add tool for generating test data for longformer (#6415)

* only build experimental api in redist (#6465)

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Add an option to save the training graph after optimization (#6410)

* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`

* Share allocator between CUDA EP & TRT EP. (#6332)

* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs

* fix max norm clipping test in python packaging pipeline test (#6468)

* fix python packaging pipeline

* make clip norm test compatabile with both V100 and M60 GPUs

* Initial version of CoreML EP (#6392)

* Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)

* update load library code to have the fullly qualified path

* make it work for syswow32

* git Revert "make it work for syswow32"

This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* dequantize 1st input of lstm back if it is quantized (#6444)

* [java] Adds support for OrtEnvironment thread pools (#6406)

* Updates for Gradle 7.

* Adding support for OrtThreadingOptions into the Java API.

* Fixing a typo in the JNI code.

* Adding a test for the environment's thread pool.

* Fix cuda test, add comment to failure.

* Updating build.gradle

* fix SDL native rule warning #6246 (#6461)

* fix SDL rule (#6464)

* use tickcount64 (#6447)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Update pypi package metadata (#6354)

* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket

* Delete nuget extra configs (#6477)

* Op kernel type reduction infrastructure. (#6466)

Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.

* Fixing a leak in OnnxSequences with String keys or values. (#6473)

* Increase the distributes tests pipeline timeout to 120 minutes (#6479)

* [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)

* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.

* Add ability to track per operator types in reduced build config. (#6428)

* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs

* merge e2e with distributed pipeline (#6443)

merge e2e with distributed pipeline

* Fix test breaks in Windows ingestion pipeline (#6476)

* fix various build breaks with Windows build

* fix runtime errors loading libraries from system32

* add build_inbox check to winml_test_common

* use raw string

* cleanup

* fix dll load

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Speed up the Mac CI runs (#6483)

* expose learningmodelpixelrange property (#5877)

* Fix of support api version bug for [de]quantize (#6492)

* SDL fixes: add proper casts/format specifiers (#6446)

* SDL annotation fixes (#6448)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)

* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py

* Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)

Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.

* Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)

Description: This PR makes two changes identified while looking at a PGAN model.

First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.

Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.

Motivation and Context

Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.

* Update document of transformer optimization (#6487)

* nuphar test to avoid test data download to improve passing rate (#6467)

nuphar test to avoid test data download to improve passing rate

* Fuse cuda conv with activation (#6351)

* optimize cuda conv by fused activation

* remove needless print out

* exclude test from cpu

* handle status error from cudnn 8.x

* add reference to base class

* add hipify

* [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)

* Init change

* Move some helper from nnapi ep to shared

* Add transpose support

* Fix trt ci build break

* Refine transformers profiler output (#6502)

* output nodes in the original order; grouped by node name
* add document for profiler

* Update to match new test setup. (#6496)

* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.

* Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)

* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case

* Optimize GatherGrad for AMD GPU (#6381)

* optimize gathergrad

* address comments

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* add explicit barriers for buffer overread and overrwrite (#6484)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* fix sdl bugs for uninitialized variables and returns (#6450)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* handle hr error conditions (#6449)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Dnnl training (#6045)

* Add ReluGrad and ConvGrad ops for the dnnl provider

* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.

* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.

* Enable specifying the execution provider for Gradient Checker Tests

* Prevent memory leak when running dnnl_provider in training mode

Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.

The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.

Signed-off-by: George Nash <george.nash@intel.com>

* Added fixes to maxpoolgrad and memory leak.

Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Fixed misc issues when testing training code with dnnl provider

* fix conv_grad dnnl tests with dilation to run dnnl execution provider

* update mnist training sample to accept convolution type models

  convolution models require the input shape to be {1, 28, 28}
  instead of the flat {728} image that is used for the gemm models

  this will enable models that require the different shape by adding
 `--model_type conv` to the command line when running the mnist sample.
 (while testing a workaround was used see #4762)

* Disable weight caching in dnnl conv operator when using training

  When training we can not use cached weights because the weight
  will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
  The weight caching was the source of the error from Conv when training.

* Fix issues found when building grad ops on Linux
  * The dnnl_convgrad code was over using the scope operator
    causing a compilation problem.
  * The dnnl_maxpoolgrad code had a logic error that is was
    comparing with the source description when it should have
    been comparing with the destination despription.

* Update BUILD.md so it shows DNNL for training
  * Updated the table of contents. Since the same providers
    are listed twice. Once for Infrance and again for Training
    an HTML anchor was added to distinguish the second header
    from the first for the TOC.

* Fix build failure when not using --enable-training build option

* reorganize the gradient operators so they are grouped together

* Fix issues found when running onnx_backend_test_series.py

* Pooling code only supports 2 outputs when built with --enable-training

* Address code review feedback
  * class member variables end in underscore_
  * use dst instead of dist to match pattern use elsewhere in DNNL code.

* Remove workaround that was introduced to handle problems running
  convolution based training models. See issue #4762

Signed-off-by: George Nash <george.nash@intel.com>

* Isolate training code and code cleanup

* Do not build if dnnl_gpu_runtime if enable_training is set training code
  does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
  project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
  pointer. This matches what was done in Pool code and is safer memory code.

Signed-off-by: George Nash <george.nash@intel.com>

* Address code review issues

- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.

Signed-off-by: George Nash <george.nash@intel.com>

* Code changes to address code review

- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
  and `ComputeTheoreticalJacobianTranspose()` instead of using
  a pointer to an instance varaible [gradient_checker.h/.cc]

Signed-off-by: George Nash <george.nash@intel.com>

* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com

* Code clean up and formating updates

 - Removed empty else statment
 - updated indentation of code that was causing double curly brackets to look unususal
 - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
 - isolated training code

Signed-off-by: George Nash <george.nash@intel.com>

* Restore inadvertantly removed ConvGrad tests

When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded.  This adds
back the Conv3d and Conv3d with strides test cases.

Signed-off-by: George Nash <george.nash@intel.com>

* Add validation to ConvGrad

This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.

The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes

The added validation will return an error if this assumption is not true.

Signed-off-by: George Nash <george.nash@intel.com>

* Do not create new execution providers in provider_test_utils

This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.

Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.

When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run.  However, it is no
longer causing problems even with the code removed.

Signed-off-by: George Nash <george.nash@intel.com>

* Change the forward conv stack to a forward conv map

This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.

The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators.  This was always problematic and was unlikely to
work for inception models.

Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
  possible to use the weight_name as a lookup key to find the
  Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
  `std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
  to and reading from the fwd_conv_kernel_map_ as well as the
  fwd_kernel_map_. These should always be accessed by a single
  thread when preparing the dnnl subgraphs so the guard should not
  be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
  stack. The error check is not removed. It will be good to verify
  there are no errors as we continue to test against more models.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>

* Lochi/refactor yolov3 quantization (#6290)

* Refactor the code and move data reader, preprocessing, evaluation to
E2E_example_mode

* Refactor the code.

Move data reader, preprocessing, evaluation to model specific example
under E2E_example_mode

* refactor code

* Move yolov3 example to specific folder and add additional pre/post
processing

* Print a warning message for using newer c_api header on old binary (#6507)

* Fix issues with ArmNN build setup (#6495)

* ArmNN build fixes
* Update BUILD.md to document that the ACL paths must be specified to build ArmNN
* Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that.

* Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518)

* Update onnxruntime_test_python.py to work with numpy 1.20.

Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that.

* Fix another test script

* Fix ORTModule branch for orttraining-* pipelines

* Update pytorch nightly version dependency

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com>
Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com>
Co-authored-by: George Nash <george.nash@intel.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Juliana Franco <jufranc@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Tixxx <tix@microsoft.com>
Co-authored-by: Jay Rodge <jayrodge@live.com>
Co-authored-by: Du Li <duli1@microsoft.com>
Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Olivia Jain <oljain@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Ryan Lai <rylai@microsoft.com>
Co-authored-by: Jesse Benson <jesseb@microsoft.com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin@vols.utk.edu>
Co-authored-by: Michael Giba <michaelgiba@gmail.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: Tang, Cheng <souptc@gmail.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Co-authored-by: Vincent Wang <wangwchpku@outlook.com>
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Luyao Ren <375833274@qq.com>
Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com>
Co-authored-by: Tim Harris <tiharr@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com>
Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Martin Man <supermt@gmail.com>
Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Ryota Tomioka <ryoto@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: Yulong Wang <f.s@qq.com>
Co-authored-by: Faith Xu <faxu@microsoft.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
2021-02-02 08:59:56 -08:00
Suffian Khan
76bc0e479c
Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)
* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case
2021-01-29 13:12:34 -08:00
Scott McKay
8c6d76a4c0
Update to match new test setup. (#6496)
* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.
2021-01-30 06:27:19 +10:00
suryasidd
1a5b75a554
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py
2021-01-28 23:00:41 -08:00
Guoyu Wang
3f60b27703
Speed up the Mac CI runs (#6483) 2021-01-28 15:13:44 -08:00
liqunfu
00afd00059
merge e2e with distributed pipeline (#6443)
merge e2e with distributed pipeline
2021-01-28 14:17:47 -08:00
Scott McKay
c84bb9df9f
Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs
2021-01-29 07:59:51 +10:00
Guoyu Wang
752627c5bb
[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)
* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.
2021-01-28 12:25:46 -08:00
baijumeswani
2e228d74d0
Increase the distributes tests pipeline timeout to 120 minutes (#6479) 2021-01-28 12:04:26 -08:00
liqunfu
6ed12402a4
Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-25 15:15:26 -08:00
ashbhandare
60c772e2bc
Megatron checkpointing (#6293)
* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:
2021-01-22 11:26:47 -08:00
Guoyu Wang
eb946c4177
Unblock Android CI code coverage failure (#6393) 2021-01-20 21:26:10 -08:00
pengwa
453431f7bb
Add max_norm for gradient clipping. (#6289)
* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests
2021-01-21 01:01:11 +08:00
Hariharan Seshadri
d7bdd96425
Refine auto_pad based pad computation in ConvTranspose (#6305) 2021-01-19 19:01:49 -08:00
Scott McKay
e54e2f969d
Use readelf for minimal build binary size checks. (#6338)
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function
2021-01-15 07:46:02 +10:00
Changming Sun
ea6789b754
Add PREfast to python packaging pipeline (#6343)
* Add PREfast to python packaging pipeline
2021-01-14 10:39:24 -08:00
Guoyu Wang
e35db194e3
fix the pipeline failure (#6346) 2021-01-14 00:33:22 -08:00
Edward Chen
042053c55e
Add support for running Android emulator from build.py on Windows. (#6317) 2021-01-13 19:21:49 -08:00
baijumeswani
0586c610b2
Add ORTModule BERT classifier to CI the pipeline (#6330) 2021-01-13 12:34:04 -08:00
baijumeswani
9b7510d88c
Add ORTModule distributed CI pipeline (#6278)
* Add ortmodule distributed ci pipeline
2021-01-13 11:24:01 -08:00
liqunfu
aeca96caba
Liqun/enable pipeline parallel test (#6331)
enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-13 10:24:04 -08:00
Dmitri Smirnov
6b73bae035
Java: add Semmle to Java publishing pipelines (#6326)
Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.
2021-01-12 15:12:13 -08:00
Ashwini Khade
0ed56d491a
fix opset imports for function body (#6287)
* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix
2021-01-12 13:44:36 -08:00
Changming Sun
c43ca45c4f
Force reinstall onnx python package on Windows (#6309) 2021-01-11 22:12:56 -08:00
Chun-Wei Chen
84024bdfa9
Enable ONNX backend test of SequenceProto input/output (#6043)
* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name
2021-01-11 11:30:33 -08:00
Changming Sun
5084ce0969
Update nuget build (#6297)
1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.
2021-01-11 10:49:05 -08:00
Edward Chen
04287ec770
Increase timeout for Linux GPU CUDA11 build. (#6280) 2021-01-07 15:44:42 -08:00
baijumeswani
e0f2a12c2c
ortmodule ci pipeline setup (#6251) 2021-01-05 09:13:19 -08:00
baijumeswani
93bf7c4d52
Documentation for distributed CI tests pipeline (#6140) 2021-01-04 10:09:39 -08:00
Changming Sun
1685167e46
Update manylinux docker image to the latest (#6242) 2020-12-31 19:57:04 -08:00
Xavier Dupré
cd14c1af29
Support double for operator ArgMin (#6222)
* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt
2020-12-31 11:25:46 +01:00
Xavier Dupré
84addcd2cf
Support double for operator ReduceMean, ReduceLogSumExp (#6217)
* Support double for operators ReduceMean, ReduceLogSumExp
2020-12-31 11:24:54 +01:00
Changming Sun
3911105f09 Remove python 3.5 2020-12-30 20:16:45 -08:00
Changming Sun
1b23b28706
Remove MKLML/openblas/jemalloc build config (#6212) 2020-12-30 17:18:19 -08:00
sfatimar
7347996942
Openvino ep 2021.2 (#6196)
* Enabling fasterrcnn variant and vehicle detector

* changes for 2021_2 branch

* yolov3_pytorch commit

* fixed braces in basic_backend.cc

* ci information added

* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2

* some changes to support unit tests

* disable some tests which are failing

* fix myriad tests for vehicle detector

* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* yolov3 pytorch workaround to ensure that the output names are matched

* gemmoptest fixed on myriad

* Fixed MYRIADX CPP Test Failures

*Expand,GatherND,Range,Round op's
are only supported in model

*where op with float input data
types are not supported and fixed

*Scatter and ScatterElements op's with
negative axis are fixed

*Reshape op with 0 dim value are not
supported and fixed

*Disabled InstanceNorm_2 test on MYRIADX

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* make changes to yolov3 pytorch

* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes POW op failures on GPU_FP16

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Clean up capability_2021_2.cc

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fixed slice and removed extra prints

* Disabled failing python tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes added in capabilty_2021_2

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* made changes to slice to avoid failures

* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for Inferencing a FP16 Model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fix for mask rcnn

* Script for installing openvino from source

* Updated with openvino 2021.2 online installation

* code comment fixes
fixed accuracy mismatch for div

* Update OpenvinoEP-ExecutionProvider.md

updated for 2021.2 branch

* Update README.md

updated dockerfile documentation

* Update BUILD.md

build.md update documentation

* permissiong change of install_openvino.sh

* made changes to align with microsoft onnxruntime changes

* Updated with ov 2021.2.200

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
2020-12-23 08:47:22 -08:00
satyajandhyala
201d0dbb1a
Android coverage dashboard (#6163)
* Write the report to a file.

* Post code coverage to the Dashboard database.
2020-12-21 10:34:01 -08:00
Edward Chen
cd3a5acca0
Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
2020-12-18 19:01:02 -08:00
Changming Sun
344a2a8ee5
Revert "work around of the build break in mac (#6069)" (#6150)
This reverts commit 3cae28699b.
2020-12-16 14:41:18 -08:00
Sheil Kumar
a6a23db130
Enable C# .NET5 for WinML (#6120)
* build for .net5

* only reference cswinrt for .net5

* remove netstandard2.0 references

* upgrade language version

* net5

* remove extra comment closure

* add targetframework

* set target framework

* remove net*

* pep8 errors

* make test project build with .net windows SDK projection

* disable c# builds for non-x64 builds

* fix pep8 errors

* disable for store build

* fix tests

* remove cswinrt and sdk references from package

* bump cswinrt down to 1.0.1

* fix bin path

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-12-14 15:05:15 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
Vincent Wang
7ddeafdfcc
Add ReduceL2Grad and ClipGrad (#5970)
* ReduceL2Grad and ClipGrad.

* fix win build and amd ci pipeline

* resolve comments.

Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-12-10 11:03:26 +08:00
Edward Chen
e357486707
Fix build definition template typo, add logging (#6065)
Fix a typo in tools/ci_build/github/azure-pipelines/templates/get-docker-image-steps.yml.
Add logging to tools/ci_build/get_docker_image.py for easier debugging.
2020-12-08 15:16:50 -08:00
satyajandhyala
f68a256140
Android code coverage (#6061)
* Added Onnxruntime_GCOV_COVERAGE flag for Android.

* Set CMAKE_SYSTEM_NAME explicityly for Android.

* Added GCOV_PREFIX option to collect code coverage data.
Added a new python script to generate code coverage info.
Modified build pipeline to geneate Android code coverage info

* Added build command line option --android_coverage

* Added a comment describing the GCOV environment variables

* Fixed PEP8 issues.

* Added --android_coverage option to the build command.

* Increased Android emulator memory from 3K to 8K.

* Increased Android partition-size from 2GB to 4GB to overcome no-space-left-on-device error

* Removed source_dir from command line args.

* Use cwd absolute path to run tests.

* Added commands to output the contents of /data/local/tmp on the emulator.

* Added run_adb_shell function.

* Format changes.

* Removed keywd argument cwd.

* Removed Android in the --build_dir path.

* Removed commands added for debugging.

* Removed exxtra new-lines.

* Fix MacOs build pipeline failures by uninstalling openssl before running build script.

* Revert "Fix MacOs build pipeline failures by uninstalling openssl before running build script."

This reverts commit 90d0568fe533e9456c20d061a2d435c8fea48266.

* Change dir to the build directory where the tar file is copied.

* Changed the option from --android_coverage to --code_coverage

* Moved steps to generate Android code coverage to run_nnap_code_coverage.sh

* Require --android option if --code_coverage is specified.

* No code coverage needed for onnx_test_runner.

* Expect that the emulator is running when the script is executed.

* Fixed the title in the buildpipeline step.

* Fixed the formatting issue.

* Added a command line argument, ORT_ROOT, to run_nnapi_code_coverage.sh script

Co-authored-by: Satya Jandhyala <satyajandhyala@Satyas-Mac-mini.local>
2020-12-08 10:55:02 -08:00
Suffian Khan
e35211c0ff
Fix AMD GPU pipeline by adjusting reference /opt/rocm-3.9.0 => /opt/rocm (#6063)
* use /opt/rocm instead

* fix indent
2020-12-08 08:53:20 -08:00
Yufeng Li
3cae28699b
work around of the build break in mac (#6069)
* Fix the build break in macos release

* revert android change
2020-12-07 20:39:36 -08:00
Edward Chen
b348538c8a
Update build docker image cache cleanup (#6048)
The current image cache cleanup is not removing many images. Upon examining the cache container registry logs, it appears there are some infrequent pulls of old images which may be made by something other than CI builds (perhaps some automated scan of the registry).
This change adds a minimum access count for images in the cache so that infrequently but periodically accessed images can be removed. The idea is that images used by CI builds that are worth caching will have a higher volume of accesses.
2020-12-07 13:07:19 -08:00
Changming Sun
925879a8b0
Remove python 3.8 Windows GPU build from python packaging pipeline (#6054)
Revert the last a few changes to get the pipeline back to a normal state.
2020-12-07 10:23:07 -08:00
Edward Chen
d8139814fd
Clean up builds (#6015)
Update training Python packaging build to use get_docker_image.py.
Remove BUILD_EXTR_PAR docker build argument.
Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.
2020-12-04 15:13:17 -08:00
Jesse Benson
98ea7372d3 Re-enable Lamb unit tests for AMD 2020-12-03 13:06:34 -08:00
Edward Chen
6572a4d306
Disable Python 3.9 for training Python packaging build. (#6012)
Disable Python 3.9 for training Python packaging build. Python 3.9 is not supported by the PyTorch dependency.
2020-12-03 11:42:28 -08:00
Edward Chen
6d642a3dba
Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906) 2020-12-01 19:10:23 -08:00
Changming Sun
2d9dcc4576
Add python 3.9 support (#5874)
1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.
2020-11-30 12:02:48 -08:00
Changming Sun
5fdd9f0fd2
Fix Python Linux GPU package name (#5943)
Fix Python Linux GPU package name. I accidentally added "noopenmp" to it.
2020-11-25 17:46:11 -08:00
Edward Chen
7546d251e0
Expose parameters in clean build Docker image cache build. (#5941)
Expose some parameters in the clean build Docker image cache build. In particular, whether to do a dry-run and the lifetime of unused cache images.
2020-11-25 14:15:54 -08:00
Ashwini Khade
705d093167
Update onnx (#5720)
* update onnx

* update docker image for testing
2020-11-24 11:20:15 -08:00
Suffian Khan
9b8189dd0a
Rework AMD CI pipeline to use pool AMD-GPU and disable more tests in order to enable it. (#5885)
Move AMD test pipeline to use self-hosted pool AMD-GPU. For time being, remove failing/flaky unit tests for AMD pipeline.
2020-11-24 09:38:14 -08:00
Guoyu Wang
846c5fb917
Report arm64 minimal baseline binary size only for continuous integration (#5913)
* Report binary size only for continuous integration
2020-11-24 20:24:08 +10:00
Guoyu Wang
4137c18d9b
Add ORT minimal with NNAPI EP to Android CI (#5890)
Description: Add ORT minimal with NNAPI EP to Android CI

Motivation and Context

The added build/test to Android CI will only run UT, additional onnx_test_runner with customer .ort models will be added later
2020-11-23 18:21:34 -08:00
Edward Chen
5e8fcda24a
Build docker image cache fixes. (#5902)
Fix Python 3.5 compatibility issue in tools/ci_build/get_docker_image.py.
Fix line endings in tools/ci_build/github/azure-pipelines/clean-build-docker-image-cache-pipeline.yml.
2020-11-23 14:43:12 -08:00
baijumeswani
208f4c1d3c
Azure ci pipeline for distributed environment tests (#5881) 2020-11-23 14:01:00 -08:00
satyajandhyala
353e071b7e
Fuzz testing misc (#5862)
* Run only required steps relevant to fuzz testing.

* Exit status non-zero for any uncaught exception other than ort_exception in the driver code
Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
2020-11-23 13:43:44 -08:00
Guoyu Wang
26e6ced172
Temporary fix for Android CI failure (#5889)
* Unblock the Android CI

* Add python to android ci's command
2020-11-21 17:58:32 +10:00
Dmitri Smirnov
ceedf5630b
Document all C# API pubic interfaces (#5853)
Address documentation shortcomings.
 Document all required public interfaces.
 Add pipeline configuration.
Make Doxygen lookup a env vars for paths.
2020-11-20 14:03:55 -08:00
Edward Chen
bef06dac93
Automatically clean up build docker image cache. (#5843)
Follow up to #5811 to automate cleanup of the build docker image cache.
Added a script and build definition to clean up docker images that haven't been accessed recently.
2020-11-20 11:56:26 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Hariharan Seshadri
62508ef0e4
Revert "Remove MKLML build config (#5559)" (#5855) 2020-11-19 10:53:08 -08:00
Changming Sun
26db396b4b
Reduce the number of CI build variants (#5856) 2020-11-18 20:41:30 -08:00
satyajandhyala
b495ae8103
ORT fuzz testing (#5771)
* Added fuzz testing using ORT model.

* The onnxruntime_security_fuzz driver code should accept either ONNX or ORT (based on the file extension) input file if /f flag is provided.

*  Added ValidateOrtFormatModelDoesNotRunOptimizersInFullBuild test.

* Added win-ci-fuzz-testing.yml to run build pipeline.

* Prevent out-of-range access in the graph.cpp.
2020-11-18 16:07:36 -08:00
Changming Sun
85f945a875
Regenerate CI build docker images (#5850) 2020-11-18 14:36:59 -08:00
Edward Chen
71e7c2b423
Cache build docker images in container registry. (#5811)
This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.

Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.

With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.

The cache container registry will need to be cleaned up periodically. This is not automated yet.
2020-11-17 17:02:24 -08:00
Justin Stoecker
bd236ecc26
Switch to unified DirectML 1.4.0 redistributable (#5794)
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
2020-11-17 13:42:23 -08:00
Scott McKay
c84bc25e28
Add validation of op registrations (#5817)
* Add validation of operator registrations to the reduction script
  - the script has all the logic to process the registrations, and there's a CI that uses it

Fix some operator registrations

* Fix CUDA PRelu registration

* Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script.
Run op validation in minimal build CI

* Fix PEP8 error and some comments
2020-11-17 10:44:09 -08:00
Sherlock
241b2226a7
Update orttraining-linux-gpu-ci-pipeline.yml for Azure Pipelines (#5826) 2020-11-17 09:27:59 -08:00
Guoyu Wang
1a66dfc0f9
Enable Squeeze Opset 13 for NNAPI (#5717)
* Add copy sparse model in minimal CI

* Add squeeze 13 support

* fix small typo

* Add ut for squeeze in NNAPI

* Fix some issue in the UT and code

* Modify based on the master change

* Fix build break
2020-11-17 00:26:06 -08:00
edgchen1
4d517c68a3
Fix reference to old download_e2e_test_data.py script. It was renamed to download_azure_blob.py. (#5790) 2020-11-12 15:48:06 -08:00
liqunfu
1416d12f0b
Liqun/merge e2e pipelines (#5702)
* Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-11 09:42:08 -08:00
Changming Sun
79350a642a
Update install_deps.sh: remove the unnecessary data generating step (#5758)
We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.
2020-11-10 22:19:03 -08:00
Changming Sun
4094a09a56
Merge pull request #5731 from microsoft/snnn/rtti
Disable RTTI in Windows GPU CI pipeline
2020-11-10 09:02:59 -08:00
Edward Chen
919c270f3c Increase build timeouts. 2020-11-09 22:26:27 -08:00
Chi Lo
92292de135
Tensorrt perf tool (#5436)
* Add YAML file for pipeline

* Modify typo

* Add working directory

* Modify and test

* Modfiy and test

* Modify and test

* Modify and test

* Modify

* Modify

* Modify

* Modify

* Make sure to copy all the result files

* Add clearn up

* Modify

* Modify agent pool name

* Upload only specific artifacts

* Modify

* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target

* Fix bug

* Fix bug

* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation

* Modify the script file for CI

* Replace print with logger.info

* Fix bug

* Fix bug

* Refine the code

* Modify the script so that it can capture script segmentation fault while
running ORT

* Fix bug

* fix bug

* fix bug

* Add debug info

* fix bug

* Refine perf code

* Refine the code

* fix bug

* Code refactoring

* change many-models path

* remove metadata after validation/benchmark are done

* Update README.md

* Fix bug so that metadata doesn't hold stale value

* Remove hardcode and update README

* Add arguments to the script to make it run correctly

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Fix bug so that metadata doesn't hold stale value

* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information

* use -i random for perf test of TRT changes

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-11-06 12:27:42 -08:00
RandySheriffH
71f90e08f1
Nuget packaging no omp (#5666)
* create new nuget packaging pipeline without openmp

* rename package

* update image name

* rename package name

* rename managed package

* reset project attribute

* merge master

* set package name

* set NoOpenMP as cpu build

* shorten line length

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-11-06 11:43:35 -08:00
Tiago Koji Castro Shibata
9e68e98423
Add static CRT DLLs to Nuget package (#5661)
* Add static runtime yaml option

* Add to WAI Nuget build matrix

* Support empty build flags

* Add DML to x64

* Bundle static rt

* Bundle after Nugets are built

* Fix typo

* Skip static tests

* Pack test artifact only in x64 dynamic

* No DML static runtime

* Add Store static

* Revert "Add Store static"

This reverts commit 69133e5838.

* Static subfolder
2020-11-05 09:26:17 -08:00
Changming Sun
357a51c75c
Update python packaging pipeline's docker image (#5680) 2020-11-03 12:01:36 -08:00
Ashwini Khade
1cca903680
update onnx commit id (#5594)
* update onnx commit id

* update onnx commit for docker images

* update docker images
2020-11-02 09:46:36 -08:00
Weixing Zhang
aec4cb489e
ROCm EP for AMD GPU (#5480)
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-10-29 17:13:04 -07:00
Changming Sun
e6956be40c
Publish no-openmp python packages to test pypi (#5610)
Publish no-openmp python packages to test pypi
2020-10-28 19:49:53 -07:00
liqunfu
92662659ba
Liqun/remove number matching (#5606)
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-27 21:27:37 -07:00
Changming Sun
5802fe1699
Remove MKLML build config (#5559)
Remove MKLML build config
2020-10-21 13:11:25 -07:00
Ashwini Khade
df22611026
Update ONNX commit (#5487)
* update ONNX

* update onnx + register kernels for reduction ops

* bug fix kernel reg

* update cgmanifests

* revert unsqueeze op 13 registration

* filter ops which are not implemented yet

* filter some tests

* update onnx commit to include conv transpose bug fix

* update docker images

* undo not required test changes

* fix test failures
2020-10-21 07:22:20 -07:00
Guoyu Wang
915d475353
Android CI update (#5474)
* Update Android CI

* update comments
2020-10-14 16:56:50 -07:00
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
Pranav Sharma
c2c78399ee
Include config keys header file in the release packages for Linux and Mac. (#5388) 2020-10-08 15:00:29 -07:00
Changming Sun
09aef240d6
Skip running onnx tests in python mac os pipeline (#5416) 2020-10-08 11:49:28 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
Wenbing Li
4721729fdc
Enable iOS CI pipeline (#5360)
* add the ios ci build.

* no dependency on mac ci pipeline.

* fix the command line.

* keep sync

* automatically retrieve sdpath

* fix the case errors and warnings

* fix the vlog switch issue.

* add parallel flag for build.

* update the display name of the pipeline.
2020-10-02 20:14:45 -07:00
Guoyu Wang
9df0790856
Update linux minimal CI to report Android mininal baseline binary size (#5361)
* Update linux minimal CI to report Android mininal baseline binary size

* Fix some issues in the script
2020-10-02 17:35:23 -07:00
edgchen1
d62873a331
Docker image release build updates (#5326)
- Update docker image release build to use build commit.
- Use valid default in component governance detection step.
- Use smaller docker build context.
2020-10-01 12:25:31 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
Dmitri Smirnov
89742411ec
Insert telemetry template into GPU build, add telemry build switches. (#5278) 2020-09-24 17:13:09 -07:00
edgchen1
6d5b93b805
Synchronize training dependency versions between Docker image and Python wheel. (#5261)
Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.
2020-09-23 19:03:42 -07:00
suffian khan
417929b049 jobs timeout .. 2020-09-21 21:51:59 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
Guoyu Wang
78a29aebbc
[ORT Mobile] ORT Minimal E2E CI (#5200)
* Modify the ort minimal CI to ort minimal e2e ci
2020-09-19 18:43:22 +10:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
liqunfu
f37e1292a1
--shm-size=1024m to fix nccl shared memory issue (#5214)
* --shm-size=256m to fix nccl shared memory issue

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 17:21:47 -07:00
Guoyu Wang
8156e0dd10
[ORT Mobile] Some updates to iOS/Android build settings (#5184)
* Update android CI and build settings

* add build_java to arm64 also

* Add ios signing param

* fix a small build warning

* address pr comments
2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata
1a2e289d2d
Fix nuget build (#5163)
* Fix nuget content

* Revert "Fix nuget content"

This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443.

* Nuget packaging

* skip tests

* msbuild path

* Force msbuild version

* Workaround https://github.com/NuGet/Home/issues/7621

* cleanup
2020-09-16 10:37:09 -07:00
Changming Sun
a0a435abc6
Add sympy==1.1.1 to Linux docker image (#5177) 2020-09-15 16:08:49 -07:00
Scott McKay
089789c135
Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168) 2020-09-15 15:11:06 +10:00
RandySheriffH
1dde215d96
promote cuda version on packacking pipelines (#5154)
* promote cuda version on packacking pipelines

* fix cudnn version in py packaing template

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-14 21:09:09 -07:00
RandySheriffH
9392aa2f64
Promote Cuda version to 10.2 for windows pipelines (#5138) 2020-09-13 20:32:06 -07:00
Scott McKay
323a1ba8a4
Add option to exclude support for loading ORT format models in full build. (#5129)
* Add ability to exclude support for loading ORT format models.
Disable support for ORT format models in packages
2020-09-12 12:21:30 +10:00
RandySheriffH
120e3cda74
fix path (#5131)
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-11 12:18:07 -07:00
Changming Sun
c5efb0085d
Update Linux GPU build pipelines to CUDA 10.2 (#5120)
* Update Linux GPU build pipelines to CUDA 10.2
2020-09-10 17:40:51 -07:00
Changming Sun
a5530358c9
Fix a path problem in Dockerfile.manylinux2014_cuda10_2 (#5106) 2020-09-10 10:30:13 -07:00
Tiago Koji Castro Shibata
62848c4de5
Add store builds to nuget packaging (#5040)
* Nuget store packaging

* Move DNNL workaround to EP

* Fix warning as error

* Disable store tests

* Skip store tests

* msbuild target

* Cross compile protoc in Store

* Disable DML in store

* Move store builds to CPU queue

* Copy uap10 to final nuget

* Fix pip8 error

* Remove extra dml copies

* Fix argparse

* pep8

* Forward IsStoreBuild

* Apply is_store_build to duplicate generate_nuspec

* runtimes

* Refactor uap10

* Store .NET

* uap

* PR feedback
2020-09-09 21:38:14 -07:00
RandySheriffH
5e10cde006
PipelinesForCuda11Cudnn8 (#4938)
* cancel night build on pyop

* setup win cuda11 pipeline

* add debug build

* test base gpu settings

* setup pipelines to test cuda 10.2 and 11

* rename linux docker images

* rename docker image tag and add clean up job

* fix typo in cuda 11 config

* set cuda11 env

* update linux cuda 11 pipeline

* reset docker image name

* disable uninitialized warning from linux build

* change the way to silence uninitialized warning

* add flags to linux gpu pipeline

* switch docker image for linux cuda 10.2

* switch linuc cuda 10.2 image

* test cuda11 with devtool8

* try latest built images

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-09 16:13:58 -07:00
Changming Sun
924ecb0623
Use manylinux2014 for Linux CPU build (#5091) 2020-09-09 10:09:52 -07:00
gwang-msft
a1a81470e3
Add minimal build binary size verification (arm64) to Android CI (#5087)
* Add minimal build binary size verification (arm64) to Android CI

* Add comments in the CI ymal
2020-09-09 19:06:20 +10:00
gwang-msft
a40d34386a
Add Linux CPU CI for ORT minimal build (#5074)
* initial test version

* update yml

* minor updates

* minor updates

* Test minimal build

* update with include ops for minimal build ut only

* error case to see build failure

* test no_exceptio

* Remove error cases

* address pr comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-08 17:09:33 -07:00
Changming Sun
370d194db7
Add a docker file for CI build CUDA 10.2 (#5065) 2020-09-04 16:28:45 -07:00
Scott McKay
b5c2932ae8
Last major set of ORT format model changes (#5056)
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
2020-09-05 07:59:01 +10:00
Changming Sun
d5d5e37e76
Build system enhancements (#5012)
1. Add a docker file for CUDA11
2. Support setting CUDA_ARCHITECTURES from command line.
2020-09-02 10:13:26 -07:00
RandySheriffH
14b51d6502
CiPipeline@ReducedOpsBuild (#4917)
* cancel night build on pyop

* setup ci pipeline for build of reduced ops

* add back c# test

* remove debugging print

* add testing model

* add more arg in pipeline script

* disable pipeline trigger temporarily

* fix yaml format

* fix yaml format

* fix pipeline error

* rid c# test

* add ops for test cases

* add Conv from domain com.microsoft.nchwc

* remove --reduce_ops

* fix typo

* remove --build_java

* add test case for excluded op

* update doc with --skip_test

* formatting code, renaming files and simplify yaml

* remove debug build from yaml

* remove surplus ops from included_ops.txt

* add MinSizeRel build to yaml

* rename test cases and models

* exclude ir test from minimum build

* restrict ir test to be only applied to reduced ops build
2020-08-31 21:21:18 -07:00
Ashwini Khade
8679a7244e
Enable rejecting models based on onnx opset (#4912)
* enable rejecting models based on onnx opset

* enable unreleased opsets in linux and mac CI

* test fixes and more updates

* enable unreleased opsets in CI builds

* enable released opsets in linux cis

* try fix windows ci yml

* yml fixes

* update yml

* yml updates post master merge

* review comments

* bug fix
2020-08-31 13:35:36 -07:00
Hariharan Seshadri
b945225de3
Include DirectML pdb in x86 bin folder (#4953) 2020-08-28 11:29:26 -07:00
Changming Sun
c37fa7c278
Delete Dockerfile.centos6_gpu (#4851) 2020-08-28 09:56:52 -07:00
edgchen1
71d8846635
Fix telemetry-steps.yml (#4903)
Fix bug in telemetry-steps.yml that causes telemetry setup to be disabled even if TELEMETRYGUID is set.
2020-08-24 22:14:40 -07:00
Changming Sun
f34ed3a576
Hot fix for the python packaging pipeline Linux ARM build (#4902) 2020-08-24 20:14:33 -07:00
Rayan-Krishnan
eb05db5a2a
Fix OptimizerConfig params groups (#4877)
* Copy samples to build folder and load models from there. Fix CI
* This PR also includes a fix to path validation for save_as_onnx API
* Add torchtext to CI for GPU training
* Remove new frontend tests from CI

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2020-08-22 22:04:17 -07:00
liqunfu
6260d073b3
Glue parallel training (#4550)
add mpi size, rank python API

add single node parallel training example
2020-08-21 21:24:27 -07:00
Yulong Wang
c6119a548c enable telemetry in node.js binding 2020-08-20 09:47:57 -07:00
suryasidd
3a00b50cf8
[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836)
* Removed building ngraph from source

* Disabled some tests temporarily

* Enabled softmax for all dims

* Added onnx importer to link libraries

* int64 changes

* fixed

* temp

* slice update start and end need to be initializer

* Disabled GatherND, ScatterND, ReverseSequence operators

* Added supported ops instead of unsupported ops

* Set precision only for CPU

* Removed some unecessary conditions

* Fixed segfault in slice

* Softmax restriction removed

* changes

* Setting precision for all plugins

* Changes added to include precision
and supported ops for gpu and vpu

* branch op support

* checking for disabled python test failure

* mapped input names and tensors directly rather than copying which was leading to mismatch

* last index is not supported
mkldnn does not support pow between integers

* included the code changes

* Rename inner-scoped variable to avoid MSVC warning

* applied changed to vadm as well and removed the utility function
getinputtensors() completely

* OpenVINO multi version support: CMake changes

* OpenVINO multi version support: C++ support

* removed commented code

* Remove redundant code lines

* Revert "Rename inner-scoped variable to avoid MSVC warning"

This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.

* vadm changes disabled reduction op test

* putting test_gather_negative_indices in unsupported list for now

* Update MCR Dockerfile with 2020.4

Installs OpenVINO 2020.4 from deb packages via APT tool.

* Update build docs with 2020.4 info

* Update dockerfile with OV 2020.4 info

Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04

* Added constant folding bypass logic

* Added cout statements for ci

* Added NDEBUG flag for debug symbols

* Update Ops info in docs

* fixes multiple unit tests

* mathoptest.ceil disabled for gpu and myriad

* activation test temp disabled

* Fix models for CPU

* Fixed a syntax error

* local cmmit

* fixing unit tests for myriad

* Fixed Variadic Split, Topk issues

* fix_model commit

* Fix models in myriad

* Added ifdefs for OpenVINO 2020.4

* temp

* made some changes to not operator

* Added unused parameter

* relu enabled

* Fixed bug in Conv output

* Consolidated GPU failing tests into one category

* Made it compatible to InternalCI 2020.4

* Made changes for ngraph

* Disabled test for mask,fastercnn,tinyyolov3

* Removed proxy for ci

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* Updated documentation for 2020.4

* Removed FP32 to FP16 transformation for GPU

* Disabled Coreml-FNS-Candy model test

* Added FP16 transformations

Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>
2020-08-19 23:18:08 -07:00
Changming Sun
1ba07ccfaf Codesign validator fixes 2020-08-18 16:20:15 -07:00
Changming Sun
e98697ec28
Fix nuget cpu package pipeline (#4832) 2020-08-17 17:08:48 -07:00
Ksenija Stanojevic
ea37a4d89b
Add Trilu custom op (#4537)
Co-authored-by: neginraoof <neginmr@utexas.edu>
2020-08-17 14:42:26 -07:00
Thiago Crepaldi
42408aa3ed
Add new PytTrch front-end (#4815)
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)

Add ORTTrainerOptions class and some placeholders

* Add _ORTTrainerModelDesc to perform validation for model description (#4416)

* Add Loss Scaler classes to the new frontend (#4306)

* Add TrainStepInfo used on the new frontend API (#4256)

* Add Optimizer classes to the new frontend (#4280)

* Add LRScheduler implementation (#4357)

* Add basic ORTTrainer API (#4435)

This PR presents the public API for ORTTrainer for the short term
development.

It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session

* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)

* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)

* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions

This PR keeps the public API intact, but changes how model description is stored on the backend

Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).

With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.

Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.

This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.

Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None

* Rename input name for test

* Add ONNX Model Export to New Frontend (#4612)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Create training session + minor improvements (#4668)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Save ONNX model in file (#4671)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add eval step (#4674)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add train_step (#4677)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add LR Scheduler (#4694)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add deterministic compute tests (#4716)


Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add Mixed precision/LossScaler + several fixes (#4739)

Additionally to the mixed precision/loss scaler code, this PR includes:

* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests

* Add ONNX BERT example + minor fixes (#4757)

* Fix training issue when passing ONNX file into ORTTrainer

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add Dynamic Shape support (#4758)

* Update DeepSpeed Zero Stage option to a separate option group (#4772)

* Add support to fetches (#4777)

* Add Gradient Accumulation Steps support (#4793)

* Fix Dynamic Axes feature and add unit test (#4795)

* Add frozen weights test (#4807)

* Move new pytorch front-end to 'experimental' namespace (#4814)

* Fix build

Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-17 09:45:25 -07:00
Changming Sun
5eec4f66ed
Refactor manylinux docker image and the related pipelines (#4751)
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
2020-08-17 09:40:31 -07:00
Yulong Wang
aa993e95c9
enable build flag '--use_openmp' on MacOS (#4774)
* enable build flag '--use_openmp' on MacOS

* cmake 3.16.1 to enable find_package(OpenMP) on mac
2020-08-13 15:56:42 -07:00
jingyanwangms
adda8c66d9
Docker image release pipeline (#4682)
* create orttraining-1p-linux-gpu-ci-pipeline.yml

* fix syntax

* fix file path

* fix template path

* publish docker image to test acr

* use right task name

* change parameter list

* use variables

* use python.version

* remove --enable_onnx_tests due to segfault

* add back --enable_onnx_tests

* fix docker push command line

* change docker login command

* login differently

* fix docker tag script

* create password.txt

* add ortrelease docker image

* enable test in build.sh

* add pipeline parameter

* add pipeline parameter

* change timeout

* change timeout

* fix run_dockerbuild.sh

* use PR checkin build docker

* fix strategy syntax

* fix strategy syntax

* change dockerfile

* change run_dockerbuild.sh

* change tag name

* build with root user

* use build id for docker image tag

* remove all user lines

* change docker tag

* add mpi, mellanox

* add missing args

* use release dockerfile for ci build

* remove install wheel

* use release docker image

* fix syntax

* use different pool

* add Dockerfile.training

* remove sudo to run on Linux-Multi-GPU-V100

* change docker file path

* update dockerfile

* use latest dockerfile

* change agent pool

* remove --preserve-env

* add back parameter

* Add test_flag

* use azuredevops docker

* change repository

* use cmd for docker login

* echo build script

* use ortrelrease ACR

* change key vault connection

* Move --build flag

* change build command

* add paramter for image tag

* clean up for PR

* remove unnecessary changes

* whitespace changes

* whitespace changes

* change build flag

* change flag name

* change flag

* use latest dockerfile

* enable build tests

* build builder stage and run test

* Add back python.version

* change build directory

* always run build entire dockerfile

* fix yml syntax

* fix syntax

* add en-UTF8 locale

* rename

* remove unused template

* Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines

* Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines

* Test commit sha1 in pipeline

* fix parameter

* update docker file

* fix --from=build

* remove commented blocks

* PR comments

* fix syntax

* fix syntax

* use timestamp as build number

* remove latest tag

* add build_timestamp variable

* remove wrong property

* fix docker run command

* test build id

* Use datestamp build id

* change build tags

* add no-cache to docker build

* rename BUILD_VERSION -> BUILD_CONFIG

Co-authored-by: Jingyan Wang <jingywa@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-12 13:29:37 -07:00
Dmitri Smirnov
ac4997665a
Make Java Publishing and Java GPU pipelines to run nightly (#4749)
Schedule Java daily
  Bump up iInux GPU build timeout
2020-08-10 17:38:45 -07:00
stevenlix
77c69a0325
Upgrade TensorRT to v7.1.3.4 (#4704)
* upgrade to TensorRT 7.1.3.4

* Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4

* fix format issue

* fix format issue

* fix format issue

* Update tensorrt_execution_provider.cc

* change cmake version to 3.14

* Remove --msvc_toolset 14.16

* change to onnxruntime::make_unique

* use onnxruntime::make_unique

* disable some tests for TensorRT

* disable some tests for TensorRT

* Update upsample_op_test.cc

* Update tile_op_test.cc

* disable some tests for TensorRT

* Update constant_of_shape_test.cc

* update parser

* Update Dockerfile.ubuntu_tensorrt
2020-08-07 17:43:56 -07:00
Sheil Kumar
5c5efa900d
Add .NET Core 3.0 nuget e2e pipeline tests (#4695)
* bump cswinrt version

* add cswinrt

* test dotnetcore 3.0

* rename buildpacakge source

* set folder path to the package source and not the version

* refactor .netframework tests

* build .net core anycpu

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-08-05 13:02:24 -07:00
Changming Sun
d0297f8d24
Add 'Install ONNX' step to Windows GPU pipeline (#4696)
Add 'Install ONNX' step to Windows GPU pipeline

Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073
2020-08-03 18:51:24 -07:00
Changming Sun
01ca6392cb
Avoid building ONNX of every history ONNX versions in our CI (#4678)
1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail.
2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.
2020-08-03 10:18:10 -07:00
Changming Sun
f9f25c5559
Remove featurizer from CI build (#4661) 2020-07-30 18:37:55 -07:00
Changming Sun
51332e3c81
Change Linux CI build time out value to 3 hours (#4664)
Because it often need more than 1 hr 55 minutes, increase the value so that we'll less likely see pipeline failed.
2020-07-30 02:52:05 -07:00
Xiang Zhang
d73e01e5b9
remove ENABLE_TELEMETRY macro (#4633) 2020-07-27 20:06:11 -07:00
gwang-msft
c2ec3b734b
[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576)
* remove dependency of external jd-dnnlibrary

* remove extra variables not used any more

* update /cgmanifest.json
2020-07-22 14:08:12 -07:00
Sheil Kumar
fa6d035090
Create WindowsAI zip files automatically as part of the pipeline (#4584)
* copy rename nupkg to zip as part of build task

* update both symbols and regular package

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-22 10:53:47 -07:00
Changming Sun
c2c4e6760b
Fix code sign validation errors in nuget and nodejs pipeline (#4527) 2020-07-20 14:18:47 -07:00
Changming Sun
bc1d197ddf
Re-enable dnnl in CI build (#4544)
* Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)"

Previously it fails because it used too much memory.
Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.
2020-07-19 23:20:03 -07:00
Yulong Wang
5086e55a35
Fix condition of running tests in win CI (#4459) 2020-07-16 16:33:30 -07:00
Changming Sun
8ada440961
Move model tests to onnxruntime_test_all (#4521)
1. Move model tests to onnxruntime_test_all
2. Publish TestResults of Windows CI build.
2020-07-15 16:46:18 -07:00
edgchen1
34f73fa1aa
Add sudo --preserve-env option to allow environment to go through to docker commands. (#4512) 2020-07-14 18:12:31 -07:00
liqunfu
f721f5f1cd
Liqun/multiple choice (#4480)
* multiple choice runner

* add docker cleanup task to frontent pipeline
2020-07-14 17:57:58 -07:00
Sheil Kumar
ee5ca27ae2
Split Microsoft.AI.MachineLearning.nupkg in a NuGet package and symbol NuGet package (#4503)
* add threadpool interface

* generate snupkgs

* include_pdb check

* fix snupkg generation

* Add task to merge snupkgs

* folder exists

* check dir

* revert thread pool stuff

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-14 14:52:39 -07:00
gwang-msft
5f8f443ac4
Android CI build, test copy, emulator boot improvement (#4481)
* Enable onnxruntime_test_all for NNAPI EP

* switch to use ninja for ANdroid CI

* make android elumator boot faster in android ci

* simplify adb push

* more style change

* more tweaking on android ci

* build.py style update
2020-07-13 14:18:34 -07:00
Dmitri Smirnov
35ee00d888
Pin typing version. (#4490) 2020-07-13 11:48:30 -07:00
Hariharan Seshadri
26ebcfab88
Fix Nuget GPU pipeline (#4462) 2020-07-10 14:02:28 -07:00
Yulong Wang
bec18eb3f4
[Node.js binding] support CentOS 7 in CI (#4447) 2020-07-09 00:59:50 -07:00
Negin Raoof
71aec2adcb
Custom op export test template (#4383)
* Adding pytorch custom op export tests to CI

* Test clean build

* Fix export for intended failure

* update export script

* Build onnxruntime
2020-07-08 10:14:56 -07:00
Hariharan Seshadri
6d6b6b54a5
Support binding a graph output to a specific device via the Python binding (#4439) 2020-07-07 21:09:37 -07:00
Sheil Kumar
fdb4a3a2e8
Add cppwinrt and cswinrt tests in windowsai nuget pipeline (#4381)
* build e2e cppwinrt tests

* add use nuget task

* make all referenced to package version prop/target-ified

* remove dupe props/targets reference

* work around project.assets.json error by deleting it

* powershell test invocation

* switch to batch script

* print debug info

* update x86->x64

* stdio.h

* pushd/popd

* add csharp tests

* package.config -> packages.config

* typo

* x86 -> anycpu

* debug is default

* add test path

* update csproj as well

* debug

* really replace all package versions

* debug output

* really use [PackageVersion]

* sleep intead of converting async operation to task and waiting

* dont close software bitmap

* switch to powershell script

* remove binding check

* continue on failure

* continuse on error action

* continueOnError and errorActionPreference

* tabbing

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-07 09:36:42 -07:00
suffiank
7a05b3ca87
Increase python packaging pipeline timeout (#4412)
* increase python packaging pipeline from 90 to 110 min

* change timeout to Linux GPU and do 120 min to match Win GPU
2020-07-02 15:38:39 -07:00
gwang-msft
0bef9d5114
Fix the broken Android NNAPI CI (#4403)
* Change NNAPI CI to run on new NNAPI EP

* update android ci to mac 10.15 and remove in install cmake

* update the android ci to targe android api level 29

* remove unnecessary ndk install git submodule call
2020-07-02 10:22:18 -07:00
Changming Sun
3bb6a865cc Revert "remove openmp and scipy from build pipelines (#4305)" 2020-07-02 00:30:02 -07:00
Tiago Koji Castro Shibata
7fea332f93
Support builds without RTTI (#4333)
* Support builds without RTTI

* Disable RTTI in all builds
2020-07-01 13:05:35 -07:00
Dmitri Smirnov
49268c42da
Change the way java home is set on Mac OS for CI and Java publishing pipeline (#4385)
* Change the way java_home is set on Mac.

* Change the way JAVA_HOME is set on Mac OS
2020-07-01 07:37:14 -07:00
Negin Raoof
37cbe8551d
Adding export registration and tests for custom ops (#4248) 2020-06-25 22:29:02 -07:00
Changming Sun
5db67ec000
Fix python package issue and upgrade the linux image to 2010 (#4342)
1. Increase job timeout, while we are investigating why the tests take much longer
2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy)
3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.
2020-06-25 20:22:39 -07:00
Dmitri Smirnov
a08805daf9
Fix a minor typon in POM file name (#4250)
Co-authored-by: Changming Sun <chasun@microsoft.com>
2020-06-25 11:15:14 -07:00
Changming Sun
deea945f80
Remove openmp and scipy from build pipelines (#4305)
1. Remove openmp because the default thread pool is already good enough.
2. Remove scipy from build pipelines because it stops support python 3.5.
2020-06-23 20:18:16 -07:00
edgchen1
4e39fda06a
Fix version of torch and torchvision in install_deps.sh. (#4316) 2020-06-23 14:55:18 -07:00
edgchen1
737c22a911
Refactor Python packaging builds (#4283)
Reuse the same template file for all Python packaging builds.
2020-06-22 17:13:22 -07:00
Pranav Sharma
2204d39a06
Add build option to disable traditional ML ops from the binary. (#4272)
* Add build option to disable traditional ML ops from the binary.

* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.

* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
2020-06-20 06:36:06 -07:00
Changming Sun
0349479b19
Fix component governance and codesign validation errors (#4277)
Adjust the job steps so that these security tasks run before the build directory clean up.
2020-06-18 15:54:18 -07:00
Changming Sun
43deec2174
Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266) 2020-06-17 16:25:24 -07:00
edgchen1
63bf587623
Use azcopy to download test data (#4221)
Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy.
Update download_test_data.py to use helper function.
2020-06-16 10:14:34 -07:00
Hariharan Seshadri
91a41298cc
Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954) 2020-06-12 19:32:57 -07:00
Changming Sun
6f4320fb85
Fix the python package name issue (#4207)
Fix the package package name issue. In my last change(#4197) about enabling code sign. I forgot to pass the additional flags to setup.py,
2020-06-12 08:32:59 -07:00
Changming Sun
8f8d899bf2 Enable code sign in c api pipeline and python pipeline 2020-06-10 19:31:22 -07:00
Yulong Wang
73bc6be5d1
build: split nodejs binding build and test to avoid timeout issue (#4188)
* split nodejs binding build and test

* enable nodejs tests
2020-06-10 19:16:32 -07:00
Dmitri Smirnov
af0750ba1b
Java GPu artifact naming (#4179)
Modify gradle build so artifactID has _gpu for GPU builds.
  Pass USE_CUDA flag on CUDA build
  Adjust publishing pipelines to extract POM from a correct path.

Co-Authored-By: @Craigacp
2020-06-10 11:15:48 -07:00
Changming Sun
c0bdbc0b39
Enable telemetry for the C API and python pipeline (#4174) 2020-06-10 00:07:46 -07:00
George Wu
9d65ce53bc
move back to toolset 14.16 to possibly work around nvcc bug (#4180) 2020-06-09 19:36:30 -07:00
Sheil Kumar
4377ff4a1a
Enable .NET Core 2.0 and .NET Framework 4.6.1 in Microsoft.AI.MachineLearning NuGet package (#4125)
* add project to download cswinrt and build winrt c# interop dll

* Add to nuget package

* reverse if check

* run generation before core compile

* add generated files to compile

* update .net package to binplace native libs

* add props to .netstandard2.0 folder

* auto binplace ml native binaries

* force 'Any CPU' platform build

* Fix anycpu and platform targets

* fix flake errors

* fix variable order

* fix flake pep8 errors, semicolon

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-06-09 09:08:19 -07:00
Changming Sun
2ab3a19728
Enlarge the read buffer size in C#/Java test code (#4150)
1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings.
2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.
2020-06-08 16:13:11 -07:00
Yulong Wang
842be1535d
[Node.js binding] add linux and mac package (#4157)
* try mac pipeline

* fix path separator

* copy prebuilds folder

* split esrp yaml for win/mac

* disable mac signing temporarily

* add linux

* fix indent

* add nodetool in linux

* add nodetool in win-ci-2019

* replace linux build by custom docker scripts

* use manylinux as node 12.16 not working on centos6

* try ubuntu

* loosen timeout for test case - multiple runs calls
2020-06-08 14:12:05 -07:00
liqunfu
ffed43e9b8
handle loss and name marching wrappers (#4066)
* handle loss and name marching wrappers

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-05 23:34:26 -07:00
Yulong Wang
2aab20b4ea
[Node.js binding] upgrade node-addon-api to 3.0 (#4148) 2020-06-05 21:24:34 -07:00
Yulong Wang
2e58097f8f
fix build: pipeline Node.js version to 12.16.3 (#4145) 2020-06-05 17:56:03 -07:00
Yulong Wang
647a886587
[Nodejs binding] create a new pipeline to generate signed binaries (#4104)
* add yml files

* update pipeline

* fix yaml syntax

* yaml pop BuildCSharp

* udpate yaml

* do not stage codesign summary
2020-06-02 01:28:05 -07:00
Dmitri Smirnov
afca0d15ee
Create Java publishing pipeline (#3944)
Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.
2020-06-01 18:18:57 -07:00
Changming Sun
3eaec57c38
Fix the daily pipeline failures (#4084)
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
2020-06-01 14:44:49 -07:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
Scott McKay
1d441f89ac
Re-enable PEP8 check in Win CI build (#4075)
* Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running.
Fix build.py to be compliant again.
Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output.

* Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.
2020-05-30 09:10:05 +10:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
liqunfu
6665d5e2bc
Liqun/a transformer example (#3845)
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-05-27 15:21:35 -07:00
Yulong Wang
b3ec8035ee
[Node.js binding] add build flag for node.js binding (#3948) 2020-05-27 13:30:22 -07:00
Wei-Sheng Chin
24eda3df33
Create Utils for Adding Range and Marker (#4013)
In this PR, we
  1. create some APIs for creating NVTX objects
  2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in 
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
2020-05-24 22:55:24 -07:00
Changming Sun
aafe988a11 Temporarily disable windows static analysis CI job 2020-05-24 16:31:09 -07:00
Ryan Lai
357bffe47c
Fix deprecated CentOS link for Linux CI pipeline (#4000)
* Fix Linux_CI_GPU_Dev

* centos6
2020-05-20 16:14:48 -07:00
Bowen Bao
0a5395bb78
Remove 'model_.' prefix from onnx model initializers in training (#3881)
* Remove 'model_.' prefix for onnx model initializers in training

* fix test case remove redundant device test

* rename

* Fix state_dict/load_state_dict with frozen_weight

* nit

* Add monkey patch for pt opset 10

* remove pt patch in CI

* nit: newline
2020-05-20 10:06:31 -07:00
Prabhat
08763e80e0
Fix permission denied while creating directory in azure pipelines (#4001)
* Fix permission denied while creating directory

* Run tar with sudo
2020-05-20 09:47:12 -07:00
edgchen1
989fe2498f
Change training perf test build to use "docker" instead of "sudo docker" (#3995)
Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".
2020-05-19 16:54:35 -07:00
Ryan Lai
354e571277
Miscounted the number of characters in package version of DirectML nuget (#3993)
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2020-05-19 15:28:30 -07:00
ytaous
fb4efafc8e
GPT-2 training perf scripts (#3974)
* gpt2 training perf

* gpt2 training perf

* debug

* debug

* debug

* fix bug

* minor

* on comments

* dynamic sql

* fix build

* minor

* linked hash

* on comments

* minor

* mem

* minor

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-19 10:21:40 -07:00
Changming Sun
2fa2019daf
Run docker commands with sudo (#3979) 2020-05-18 17:35:09 -07:00
edgchen1
024b92a970
Use path relative to script location to refer to symbolic_opset10.py from install_deps.sh. (#3975)
Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.
2020-05-18 13:36:06 -07:00
Adam Pocock
9d2d1eb6f6
[java] Adds a CUDA test (#3956)
* [java] - adding a cuda enabled test.

* Adding --build_java to the windows gpu ci pipeline.

* Removing a stray line from the unit tests that always enabled CUDA for Java.
2020-05-18 12:05:51 -07:00
edgchen1
e259a13f8e
Initial training Python packaging pipeline (#3767)
Add a pipeline to produce training-enabled ORT wheels.
2020-05-18 09:41:00 -07:00
edgchen1
e55f24364a
Disable LTO on Windows training CPU build (#3960)
Disable LTO on Windows training CPU build. Add a parameter to the win-ci-2019.yml build template for enabling LTO with a default value of true.
2020-05-18 09:24:10 -07:00
Prabhat
4ff73d00b0
Fix python pkg permission issue (#3957)
* Fix python pkg permission issue

* Run chown with sudo

* Add workspace clean to arm pipeline

* Run docker as current user
2020-05-17 14:06:55 +05:30
Ryan Lai
38467f8c9a
DirectML Nuget package has different time stamp than Native and Managed Nuget (#3950)
* Fix DirectML nuget creation in Nuget pipeline

* DirectML Nuget package has different timestamp

* remove accidentally changed file
2020-05-14 18:52:08 -07:00
Scott McKay
5e0928a777
Enable running PEP8 on python scripts using flake8 (#3928)
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
2020-05-15 07:15:06 +10:00
ytaous
93eb9bcfde
Add yaml/perf scripts for new perf test pipeline (#3909)
* yaml/perf scripts for new pipeline

* yaml/perf scripts for new pipeline

* remove unused imports

* testing some comments change

* testing some comments change

* testing jdbc

* testing jdbc

* testing jdbc

* exclude pwd from jdbc properties

* exclude pwd from jdbc properties

* namedtuple

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-13 14:15:17 -07:00
Prabhat
25257a661d
Added onnxruntime aarch64 wheel to pypi publishing pipeline (#3903)
* Added onnxruntime aarch64 wheel to pypi publishing pipeline

* Support nightly build flag

* Add support for nightly build
2020-05-13 23:20:29 +05:30
liqunfu
9b5daa2039
patch torch onnx opset 10 (#3910)
patch pytorch to export onnx nll_loss opset version 10. add mnist test to covert onnx opset version 10.
2020-05-12 18:11:25 -07:00
Ori Levari
7b858d60b0
Various changes for automated downlevel test pipeline (#3901)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2020-05-12 17:22:47 -07:00
Prabhat
ce3678ffaf
Added aarch64 build pipeline (#3841)
* Added aarch64 build pipeline

* Fix build error

* Remove auditwheel repair which doesn't work with cross compiling

* Statically link C++

* Added auditwheel repair back and fix stdlib.h

* Remove extra space
2020-05-11 22:56:16 +05:30
Ryan Lai
7fd2c8f9e8
Add signed GPU nuget package to publish ort-nightly nuget feed (#3834)
* Add signed nuget package to publish ort-nightly nuget feed

* Push managed nuget as well

* Indentation fix

* Indentation fix

* Update gpu.yml to also publish directml nuget

* Fix typo in naming of task
2020-05-10 16:24:45 -07:00
M. Zeeshan Siddiqui
5e1244eb4d
Update ONNX submodule to ONNX 1.7 release branch. (#3888)
* Update to ONNX submodule to ONNX 1.7 release branch.

* Update to ONNX submodule to ONNX 1.7 release branch.

* fix version.
2020-05-10 15:44:44 -07:00
Pranav Sharma
22a711457f
Fix C# log APIs. Also fixes github issue #3409. (#3840)
* Fix C# log APIs. Fixes github issue #3409.

* Fix build error due to accidental duplication of GraphOptimizationLevel

* Fix runoptions

* Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.
2020-05-08 14:31:06 -07:00
stevenlix
4ea10c9202
bump up ORT version and extend time limit for windows cpu packaging pipelines (#3852) 2020-05-07 14:22:20 -07:00
M. Zeeshan Siddiqui
9b02b3df6f
Update ONNX submodule to ONNX 1.7 release candidate 3. (#3838) 2020-05-06 00:55:19 -07:00
M. Zeeshan Siddiqui
ef4d73e887
Update ONNX submodule to ONNX 1.7 release candidate 2. (#3818)
* Update ONNX submodule to ONNX 1.7 release candidate 2.

* fix build error.

* Update ONNX submodule to latest and disable preview op tests.
2020-05-05 15:08:40 -07:00
Changming Sun
c11fbf68e4
Publish gpu package to nuget feed (#3816) 2020-05-04 21:49:19 -07:00
Changming Sun
2684d47fc5
Disable data downloading in linux-nocontribops-ci-pipeline (#3803)
* Disable data downloading in linux-nocontribops-ci-pipeline

* update

* update
2020-05-02 12:59:24 -07:00
Sheil Kumar
37b60251ca
test packaging (#3756)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
2020-05-02 12:23:33 -07:00
Changming Sun
ee8900e21a
Update centos-ci-pipeline.yml (#3800)
* Update centos-ci-pipeline.yml
2020-05-02 11:04:23 -07:00
edgchen1
440f361363
Remove orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3788) 2020-05-02 00:35:08 -07:00
M. Zeeshan Siddiqui
517bff9675
Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782)
* Function expansion support, Update ONNX to 1.7 release candidate 1.

* Renable disabled tests.
2020-05-01 10:35:16 -07:00
George Wu
dcb1a21552
fix python package linux gpu failure (#3786)
* pin base image for manylinux2010_gpu

* pin base image for Dockerfile.manylinux2010
2020-05-01 17:04:59 +08:00
liqunfu
af3988198c
Liqun/e2e transformer test (#3540)
* initial change to transformer.py

* prepare e2e transformer tests

* refactor transformer tests

* put test python files in a flat folder

* fix typo pip install transform(s)

* python 3.6

* python version to 3.6 in install_ubuntu.sh

* remove argparser

* to use opset ver 12

* workaround loss_scale naming patch in case of loss_fn_

* assign self.loss_fn_ so it can be checked

* skip a few un-needed post-process steps

* fix loss_scale_input_name, clean up post process steps

* skip non-frontend tests

* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* update with reviewers' comments

* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference

* fix merge mistakes

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
2020-04-30 12:26:38 -07:00
Scott McKay
9f72752397
Fix 'Install ONNX' CI failure (#3761)
* Disable flaky test temporarily

* turn off pip upgrade warning

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com>
Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>
2020-04-30 18:18:58 +10:00
Changming Sun
7ff06056bd
Fix the test coverage pipeline (#3710) 2020-04-28 21:21:19 -07:00
Sheil Kumar
f1a948fd62
Enable telemetry on windows zip packages (#3738)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-28 14:07:11 -07:00
Ori Levari
78fde2c4cb
add downlevel test artifact to windowsai-nuget build (#3711) 2020-04-28 10:05:32 -07:00
Changming Sun
805ffc01e5
Temp remove --enable_wcos --use_winml from CI build (#3707)
The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first.
Will add them back before 1.3 release.
2020-04-26 16:10:25 -07:00
edgchen1
e22d97ba56
Merge pull request #3643 from microsoft/ort_training_for_merge_to_master
Introduce ORT training implementation
2020-04-25 07:15:22 -07:00
Sheil Kumar
a475f2824d
Create the Nuget WindowsAI Pipeline (#3684)
* add windowsai.yml for new Microsoft.AI.MachineLearning nuget

* temporarily add windowsai.yml to gpu.yml

* pass in build arch

* remove install onnx task

* no dml for arm or arm64

* refactor nuget pipeline defs

* update package creation

* pass in build and sources path

* missing hyphens

* copy license file

* fix parameter variable

* disable arm builds for now

* remove commented script block

* download pipeline atifcat name update

* set working dir

* Add bundling nuget script

* path combine

* null path

* combine needs parentheses

* binplace microsoft.* dlls in new nuget package

* update artifact name

* move merged nuget to artifacts directory

* move to merged subfolder in artifacts staging dir

* forward slash to back

* enable arm

* vcvarsall needs x64 vars setup

* Run Tests

* fix tests

* move global variables

* update yml to not have global variable in template

* removed parameters

* fixes

* Add build arch as an env variable

* ne not neq

* %Var% for batch script

* dont pass argument for x64

* disable arm tests

* skip csharp/cxx tests for microsoft nuget package

* remove test-win as it tests only c# cxx and capi

* test build for store apps

* dont build for store

* tools/nuget/generate_nuspec_for_native_nuget.py

* remove args.

* add new props and targets for microsoft.ai

* make windowsai props/targets static

* add dependency

* dont ship dot net props

* Remove c# fom windowsai nuget

* copy license file

* native packages must have win10 as the platform, not win

* cuda header in wrong if branch

* no dml for arm builds

* only build dml for x64/ x86

* User/sheilk/props update (#3616)

* prelim store work

* props

* Fix desktop nuget props/targets

* clean up targets and make store apps work

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* update windowsai.yml with latest

* remove extra dloadhelpers

* Add abi headers to abi dir, and reference native includes

* update windowsai.yml

* minor update

* remove parameters

* add doesrp param

* hard code esrp to true

* add directml for x86/x64

* revert gpu yml changes

* add store builds

* add store builds

* add checks again in old way

* dup job names for store and desktop builds

* move all of the runtime binaries to win10 folder

* only set safeseh on x86

* disable the store builds for now... missing msvcprt.lib

* copy paste deletion...

* switch back to win- (#3646)

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* use stahlworks

* & not supported in ado

* add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package

* revert nocontribops

* add underscore...

* extra win/win10 change

* merged nuget... still not being bundled...

* files in merged directory

* missing parens causing dml to be included in cpu package

* more diagnostic info

* switch dir to get-childitem

* wait for compression to complete

* add winml_adapter to mkml and gpu packages

* enable_wcos

* add mklml binaries

* props and targets missing from mklml

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-04-24 20:20:04 -07:00
Ethan Tao
e9f1e7e797 resolve conflicts 2020-04-24 15:15:36 -07:00
S. Manohar Karlapalem
6d4f2f5bf9
OpenVINO EP v2.0 (#3585)
* Added FP16 transformations

* Revert "Added CMAKE_BUILD_TYPE to make building dynamic"

This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85.

* Added FP16 transformations for FP16 builds

* Backend logic cleanup

Cleans the backend(intel_graph.*) code in the following ways:-

1. Minimize global usage: Since all the IR graphs need to be
re-generated on every Infer, it is bad practice to rely on globals
for their saving and usage as there would be multiple readers and
writers to the same global variable leading to incorrect usages or
contentions. This change replaces globals with locals where possible.
 This change also fixes an existing bug with due to
incorrect global usage.

2. Remove all unused functions.

3. Remove all unused headers and prepocessor directives.

* removed commented out code

* Disabled default optimization for Intel EP

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fix missed plugins.xml for python bindings

* Fixed the build after latest master changes

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled unsupported ops for accelerators

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added some more disabled ops

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added environment variable to enable debugging

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added more debug statements

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsupported ops list for GPU and VPU

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsqueeze unit tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added error message to the status

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Overwrite Model proto with shape info from data

Overwrites the shape info of Model proto with the shape from
actual input data. Needed for inferring models with Dynamic
shapes.

* Removed print statement and disabled where op

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Reshape with Empty initializer

* Added more debug statements for 1P

* Don't allow 1D inputs with symbol for dimension

* Disabled some 3rd phase ops

* Disabled split and added zero dimension check for OutputDefs

* Cleanup zero dimensionality check

* Added different data type check for inputs and initializers

* Added conditions for Mod, Cast and Pad

* Removed unused variable

* Disabled scan and added conditions for squeeze

* Added changes for fixing all C++ unit tests

* Implements Backend Manager class for caching

Backend Manager provides a layer of indirection between EP interface
and OV backend that provides caching services for models with
symbolic dims in input shapes.

* clean up commented blocks

* clang-formatting

* Read I/O type info from ModleProto

Read the tensor element type information from ModelProto object,
as FusedNode is no longer available.

* code cleanup

* clang-formatting

* Added print statement for jenkins

* Disabled some python tests

* Changed the path of convert fp32 to fp16 hpp

* Added conditions for BatchNorm in GetCapability

* Fixed failed tests

* Revert "Added conditions for BatchNorm in GetCapability"

This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90.

* Added Intel to onnxruntime backends

* pick up vars set by OV package setupvars.sh

* Added conditions for Identity

* remove a few cout prints

* Added conditions for GPU_FP32 unit tests

* Revert "pick up vars set by OV package setupvars.sh"

This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b.

* Commented out fatal message for protobuf

* Might need to be removed

* Add interface class for current backend

* moved common logic to base class

* simplified cpu backend

* Removed unused headers

* use vectors to save i/o tensors for windows compatibility

* move utils fxns to backend_utils namespace

* rename ov_backend to ibackend

* Factory pattern for backend creation

* rename CPU backend to Basic backend

* renamed to vad-M and added to factory list

* Added conditions for VPU

* Added print statements

* Changed the logic for checking for symbolic shapes

* Modified logic for zero dimension check

* Removed VPU single dimension condition

* Removed comments

* Modified logic in DimensionCheck method

* Remove legacy OpenVINO EP

Remove all the legacy code for OpenVINO EP. UEP code will take its
place going forward.

This change does NOT remove OVEP files in the following areas asa
they will be reused by UEP:-
1. Documentation: All .md files
2. Docker releated files
3. Python bindings
4. Java bindings
5. C# bindings
6. ORT Server
7. CI pipeline setup files

* Rename Intel EP to OpenVINO EP

* Added unique names to the subgraphs

* Removed subgraphs with only constant inputs

* Modified subgraph partitioning algorithm to remove const input subgraphs

* Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc

* Tracking output names to fix the output order bug

* Changed output names to a unordered map

* Modified logic to check for symbolic input shapes

* Fixed a bug in Reshape check

* Added empty model path to Model constructor

* Made necessary changes to cmake to build from the binary package

* Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR

* Enable dyn device selection with C++ API

* Added Round operator to unsupported list

* Modified subgraph partition logic for MYRIAD

* Removed supported ops from the list

* Enable dyn dev selection in Py API's

* Add documentation for dynamic device selection

* Use MYRIAD || HDDL instead of VPU

* Removed temporary cast of Int64 to FP32

* Disabled unit Tests for CPU_FP32 and GPU_FP32

* Removed default "CPU" from unit tests to allow overriding

* Removed ops Concat, Squeeze, Unsqueeze from unsupported list

* Get the device id from info

* Removed overwriting device_id and precision

* Enabled ConvTranspose and EyeLike

* Reordered unsupported ops in alphabetical order

* Fixed syntax error

* Fixed syntax error

* Code clean-up: Handle exceptions, logs and formatting

Code formatted according to ORT coding guidelines.

* remove debug print from pybind code

* updated docs with ops and models

* formatting prints

* Added default values for c and j for openvino

* Overriding the values set for c and j to be 1
* BACKEND_OPENVINO should be empty if openvino is not in build

* Overriding c value with default for perftest

* fix VAD-M device string bug

* Add IE error details to exceptions

* Use IE specific device names in EP

* Add VAD-F (FPGA) device support

* Removed unecessary libraries from whl package

* Code changes for Windows compatibility

* Add VAD-F option to python API

* [revert before merge] cmake changes for RC

* Enable Windows build in CMake

* Unset macro OPTIONAL for windows builds

inference_engine.hpp's include chain defines a macro 'OPTIONAL'
which conflicts with onnx project's headers when using MSVC. So
would need to explictly unset it for MSVC.

* Use a single copy of plugin/IE::Core

Defined as a static member in Backend manager

* Remove restriction of single subgraphs for  myriad

* Passed subgraph name to Backend to enhance log statements

* Disabled zero dimension conditions

* Disabled concat to remove zero dims

* Enabled building ngraph as part of ORT

* Removed serializing and added versioning

* Fix CPU_FP32 unit tests

* Removed unecessary condition

* add ngraph.so.0.0 to .whl

* Check for zero dimensions only for inputs and outputs

* Restrict loading only 10 subgraphs on myriad

* Build ngraph.dll within UEP. Doesn't link yet

* Rename Linux included libngraph.so to libovep_ngraph.so

Renames locally built libngraph.so containing ONNX importer to
libovep_ngraph.so in order to avoid linkage conflicts with
libngraph.so supplied by OpenVINO binary installer.
Applies only for Linux builds.

* use output_name cmake properties for lib name

* fix .so name format in lib_name.patch

* CMake code cleanup

* Rename WIN32 included ngraph.dll to ovep_ngraph.dll

To avoid conflict with ngraph.dll distributed by openvino.

* Added myriad config for networks without 4 dimensions

* Loading the 10 max clusters for inference on myriad

* Refactor code and add Batching support

Encapsulate subgraph settings into context structs.

Add batching support for completely supported models.

* Disabled some broken tests

* use input_indexes to avoid batch-checking initializers

* Avoid static initialization order error on WOS

* Added candy to broken tests

* InternalCI changes for 2020.2

* Updated DLDT instructions

* Unsaved changed in install_openvino.sh

* Changes after manual check

* Remove custom ngraph onnx_import build for WOS

ONNX Importer on WOS does not have protobuf issue.

* Remove FP32ToFP16 ngraph pass

This conversion is performed implicitly within IE.

* Surround debug logic by #ifndef NDEBUG

* remove invalid TODO comments

* removed references to ngrpah-ep

* clang-formatting

* remove commented code

* comment edits

* updating copyright year to that of first OpenVINO-EP release

* remove redundant log msg

* Modified operator and topology support

* Update build instructions

* doc formatting

* Fixed clip unit tests

* Revert "Remove FP32ToFP16 ngraph pass"

This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1.

* Applying FP16 transformation only for GPU FP16

* Fixed GPU FP32 python tests

* automatically use full protobuf

* disable onnxrt server for now

* Disabled upsample

* update dockerfile instructions

* Removed MO paths and added ngraph path

* Remove OVEP from ORT Server docs

Will put it back in after validation

* Updated path to Ngraph lib

* Disabled Resize and some other python tests

* Removed unnecesary header files

* Use commit SHA to fetch ngraph repo

* Avoid un-needed file changes due to version update

* Fixed clip tests

* Fixed Pow, max and min onnx tests

* build.md doc typo

* Update cmake patch command for ngraph src

* remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY

* use spaces instead of tab

* remove commented code

* Add info about protobuf version

* edit debug env var and enable for WIN32

* specify only version tag of 2020.2 for dockerbuilds

* remove unnecessary file changes

* Pass empty string as default argument to C# tests

* Use ${OPENVINO_VERSION} to name openvino install directory in CI builds

* Enabled unnecessarily disabled tests

* Fixed ngraph protobuf patch

* Fixed error in protobuf patch

* Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds"

This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002.

* Remove unsetting OPTIONAL macro

This is no longer used in recent ONNX update onnx/onnx@da13be2,
so this unset workaround is no longer necessary.

* Use a null string  default argument for C# API

* Set OpenVINO version yml files and pass to CI Docker builds

Git Tag info for DLDT as well as install directory are set
using this value.

This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5.

* Documentation: recommendation and instructions for disabling ORT graph optimizations

* more doc updates

* Reduced the number of models according to CI time constraints

Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com>
Co-authored-by: mbencer <mateusz.bencer@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
2020-04-24 04:06:02 -07:00
Edward Chen
deac467683 Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-23 20:50:33 +00:00
David Brownell
3ce31933bb
Wheel file updates for FeaturizerLibrary data (#3640) 2020-04-23 13:27:22 -07:00
edgchen1
49a1c5e546
Change CentOS build to use agent pool because builds on hosted agents run out of disk space. (#3662) 2020-04-23 12:19:19 -07:00
Changming Sun
00917917d6
Downgrade numpy requirement to 1.16.6 (#3635) 2020-04-22 16:11:33 -07:00
Edward Chen
8d09cefafc Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master 2020-04-22 16:56:15 +00:00
edgchen1
5492d02c4e
Remove Windows CUDA 9 build definition and helper scripts. (#3615) 2020-04-21 15:22:27 -07:00
Edward Chen
47f1758fdc Add --skip_onnx_tests to orttraining Windows builds. 2020-04-21 21:50:35 +00:00
Edward Chen
297ab43b0c Add --enable_onnx_tests to Windows builds to allow set up of test data directory. 2020-04-21 20:34:55 +00:00
Edward Chen
daa14b64e3 Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-21 03:31:32 +00:00
Changming Sun
911d125323 Remove openmp from gpu build 2020-04-20 17:13:54 -07:00
liqunfu
781e1c36be
Add front-end MNIST test (#3231)
* add frontend minst test

* to use torch nightly with torchvision

* remove incorrect comment per reviewer's comment

* experiment torchvision import failure

* experiment install_deps.sh

* more experiment install_deps.sh

* experiment install_deps.sh with --upgrade

* Experiment with install_deps.sh.

* Experiment with install_ubuntu.sh.

* Use Ubuntu 18.04 and Python 3.6 for CI.

* Update cmake version for CI.

* Install MPI on Ubuntu 18.04 for CI.

* Increase tolerance for MNIST test.

* Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa.

* Clean-up.

* Update ort_trainer.py from ort_training.

* Get default Ubuntu Python ver back to 3.5.

* Add underscore to opset_version parameter name in ORTTrainer constructor.

* Move loss/model wrap before the call for sample output.

* Update expected values for MNIST test.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
2020-04-20 11:19:31 -07:00
Changming Sun
d68245853e
Disable downloading test data on Linux (#3581) 2020-04-18 15:54:58 -07:00
Hector Li
5acd8dbe7d
remove option --enable_lto (#3515) 2020-04-17 14:18:56 -07:00
Sheil Kumar
2717c178cc
Fork the WinML APIs into the Microsoft namespace (#3503)
* Migrate winml to Microsoft Namespace (packaging changes are pending)

* add ns_prefix toggle

* fix packaging

* Users/sheilk/add missing raw header (#3484)

* add dualapipartition

* wrong variable for repo root

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* remove existence check to force failures

* extra paren

* dualapipartition needs to be referenced from the source

* add microsoft.ai.machinelearning.dll to the output dir

* rename the idl file so that assembly info is correctly added into the winmd

* fix namespaces

* update namespaces

* default to microsoft, and add namespace override as build argument

* update cmakesetings.json as well

* remove from cmakelists.txt

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
2020-04-17 06:18:54 -07:00
Changming Sun
1a222b3f6e
Disable downloading test data on Windows (#3551)
* Disable downloading test data on Windows
2020-04-16 22:15:20 -07:00
harshitha
80e0c64e2e merged with master 2020-04-16 17:13:36 +00:00
Changming Sun
7c89f38a34
Fix static analysis warnings found by VC++ (#3530)
1. Fix static analysis warnings found by VC++
2. Add a new pipeline for static analysis
3. Merge all the windows CI build into one single yaml file.(Easier to queue them all).
4. Make DNNL build faster by disabling building the tests and examples.
5. Enable custom op unitest.
2020-04-16 01:46:47 -07:00
David Brownell
72cd61baae
Removed use of parameters in python wheel build scripts (#3524) 2020-04-15 10:31:14 -07:00
Changming Sun
a2feb29b0d
Fix build break (#3528)
Ignore some known test failures
Install ONNX package before running Windows CI builds
2020-04-14 18:07:56 -07:00
David Brownell
006c5be1b1
Optionally produce a python wheel that includes featurizers (#3491) 2020-04-14 09:00:13 -07:00
M. Zeeshan Siddiqui
5d99f179b9
Merge pull request #3486 from microsoft/sedymche/merge_master_ort_training
Merge from master into ort_training
2020-04-13 10:55:36 -07:00
Sergii Dymchenko
bf3df41424 Put back SubmoduleCheckoutMode parameter into mac-ci.yml. 2020-04-12 21:49:38 -07:00
George Wu
7f6e407e09
fix python packaging manylinux1 build break. (#3482) 2020-04-11 06:58:22 +08:00
edgchen1
cffdff6702
Publish unit test results from Linux and Mac builds (#3480)
* Added publish test results step to Linux and Mac builds.

* Fix test result file pattern.
2020-04-10 14:51:56 -07:00
liqunfu
e7297e6c9d
create pipeline for ci frontend tests (#3422)
create pipeline for nightly python front-end e2e tests
2020-04-09 15:31:22 -07:00
Sergii Dymchenko
6ba7c99e50 Merge branch 'master' into ort_training 2020-04-09 12:42:04 -07:00
Changming Sun
33006f48c0
Update onnx submodule to 1.7.0 release candidate (#3405)
Update onnx submodule to 1.7.0 release candidate.  This isn't a release tag,  but it will be released soon, in 1-2 weeks.
2020-04-04 16:23:42 -07:00
Changming Sun
a5fea26cb4 Disable model tests for Mac OS X builds 2020-04-02 15:14:32 -07:00
Thiago Crepaldi
759818f2c1 Merge remote-tracking branch 'origin/master' into thiagofc/ort_training_merge_from_master 2020-03-31 10:53:22 -07:00
stevenlix
2332a93db0
Update onnx-tensorrt parser (#3369)
* sync onnx-tensorrt parser and update TensorRT doc

* remove --msvc_toolset 14.16 in tensorrt ci pipeline
2020-03-30 20:31:59 -07:00
Xueyun Zhu
ccc3535e72 resolve conflict 2020-03-20 20:20:35 +00:00
Tiago Koji Castro Shibata
3bdb0b620a
Fix WCOS/Win32 linking bugs (#3126)
* Fix WCOS/Win32 linking bugs

* Remove unused NODEFAULTLIB flags

* Avoid plain target_link_libraries signature

* Avoid plain target_link_libraries signature

* Fix library list escaping

* Use library list instead of string

* Remove duplicate link to windowsapp.lib

* Remove Win32 build workarounds

* Specify CMake policies before initializing language

* Expose Win32 header definitions during build

* Force set API family

* Enable Win32 APIs in featurizer

* Use MT dynamic CRT

* Expose Win32 specific functions

* Disable app container globally

* Disable default wide functions in featurizers

* Add featurizers to test include path

* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428

* Revert pipeline debugging hacks

* Skip /FI in CUDA sources

* Default to Win32 builds

* Enable WCOS when using WinML

* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
2020-03-19 08:52:40 -07:00
Changming Sun
0fceb33288
Fix onnxruntime server docker file build failure (#3219)
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
2020-03-15 14:46:46 -07:00
Tracy Sharpe
fe0b2b2abd
QLinearConv speed up (#3196)
For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.
2020-03-13 16:54:55 -07:00
Zeeshan Siddiqui
2cad08bd60 Merged PR 5688: Upgrade ONNX submodule to the latest from github ONNX master.
We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec!

- Reverse integrate changes from *.in.proto files in github ONNX repo.
- Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs
- Disable ONNX tests that don't have op implementation for the latest opset.
2020-03-12 16:51:45 -07:00
edgchen1
fa4dd51e3b
Add back orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3182) 2020-03-11 18:03:58 -07:00
Edward Chen
80dd62a240 Enable CI for training. 2020-03-11 14:41:32 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00
Hariharan Seshadri
3464801c3e
Explicitly specify NugetPackage parameter while validating nuget in some release pipelines (#3139) 2020-03-10 15:14:09 -07:00
Dmitri Smirnov
f87b6913cd
Add package download step before pushing to feeds (#3162)
Add package download step before publishing.
2020-03-09 14:32:18 -07:00
Changming Sun
6ed5d7c332
Update post_binary_sizes_to_dashboard.py (#3161)
Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.
2020-03-09 13:21:58 -07:00
Tiago Koji Castro Shibata
a59243090a
Publish release symbols (#3152)
* Publish release symbols

* Publish symbols if IsReleaseBuild
2020-03-05 22:32:18 -08:00
Dmitri Smirnov
e2894c5ffb
Fix package name overrides (#3150)
Add env var with the package name.
2020-03-05 17:10:55 -08:00
Dmitri Smirnov
2c446a7f2f
Add push to ORT-NIGHTLY. (#3146) 2020-03-05 11:38:22 -08:00
Dmitri Smirnov
ef8768a53f
Override native package name. Preserve managed package name the same. (#3133)
Override native package name. Preserve managed package name the same.
  Specify pckage name for validation purposes.
 Fix up validation package name parameter.
2020-03-04 10:12:55 -08:00
Changming Sun
12605f05d1
Fix CUDA PATH (#3131)
Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.

(The problem only affect the C# test, not the C/C++ tests that forked from build.py).
2020-03-03 14:34:19 -08:00
smk2007
6cdd2b4934
Enable DML Nuget Package for x64 or x86 architectures (#3120)
* add dml gpu pipelines

* add x86 to the gpu dml dev build pipeline

* Enable DML x86 builds

* Fix uint64_t -> size_t warning

* fix warnings

* enable dml on x86 ci builds

* operatorHelper 773 error uint32_t vs uint64_t

* operatorHelper 773 error uint32_t vs uint64_t

* make x86 pipeline use the gpu pool

* more warnings

* fix x86 directml path

* make dml nuget package

* disable tf_pnasnet_large

* disable zfnet512

* make validation use wildcards

* disable x86 dml gpu tests

* add args.

* update gpu.yml

* change nupkg wildcard

* add debug statements

* package x86 dml nupkg

* dont drop managed nuget again from dml pipeline build

* Add DML EULA

* directml license should be renamed to not clobber the existing license

* casing on dml package....

* {} to ()

* fix license name

* disable dml from x86 ci

* typo and cr feedback

* remove featurizers

* ship the dml pdb as well
2020-03-02 20:18:46 -08:00
Dmitri Smirnov
e45326b5df
Create NuGet packaging pipeline for ORT Featurizers (#3125)
Create a new pipeline to publish ORT with Featurizers
  Update pipeline for two separate packages.
  Change package names.
2020-03-02 17:00:56 -08:00
Hariharan Seshadri
86b755774f
Create a separate Nuget hosting just managed assemblies (#3020)
* Initial commit

* More changes

* More changes

* More changes 3

* More changes 4

* More changes 5

* More changes 5

* More changes 6

* More changes 7

* More changes 8

* Remove C# ifdefs

* More changes 10

* More changes 11

* YAML changes for other release pipelines

* Add release notes metadata

* Props and Targets change

* Add CSHarp proj

* More changes 12

* More changes

* Minor fix

* Minor fix

* Fix yaml

* Some missing logic for winml

* Minor update

* Fix casing for winmd file

* Fix casing

* Add targets and props for managed section into native nuget

* revert file

* a
2020-02-27 18:00:17 -08:00
daquexian
37a905f557
Make Java API available on Android (#3030) 2020-02-27 08:23:50 -08:00
Changming Sun
d7500b26bd
Remove Publish Build Symbols from pre-checkin CI build (#3088) 2020-02-25 08:02:36 -08:00
stevenlix
f4a5d17294
Upgrade to CUDA10.2 for TensorRT (#3084)
* Switch to CUDA10.2

* Update win-gpu-tensorrt-ci-pipeline.yml

* Update win-gpu-tensorrt-ci-pipeline.yml

* remove dynamic_shape

* update onnx-tensorrt submodule

* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests

* fix format issue

* add shape inference instruction for TensorRT

* update according to the reviews

* Update win-gpu-tensorrt-ci-pipeline.yml
2020-02-25 05:36:01 -08:00
Hariharan Seshadri
d7f2cdcc7e
Fix target platform of managed OnnxRuntime dll and enable x86 .NET testing (#3056)
* WIP: Re-enable x86 .NET testing in Release pipelines

Enabling x86 testing will make sure that ORT packages doesn’t break x86 projects of customers

* Remove setting some env variables

* Comment out a test failing on x86 builds

* More changes

* Minor fix

* More changes

* More changes

* s

* s

* s

* Revert minor change

* More changes

* More changes

* More changes 2

* explicitly set platform target

* Delete bin and obj folders

* Clean output dirs

* Add back TargetFramwork

* Disable x86 .net framework tests

* Skip x86 tests in MKLML pipeline
2020-02-24 23:02:59 -08:00
Dmitri Smirnov
dae9a31719
Introduce new Featurizers packaging pipeline. (#3068)
Introduce new Featruizers packaging pipeline.
2020-02-24 13:57:38 -08:00
Changming Sun
61ae134469
Fix binary size report (#3080) 2020-02-22 21:01:06 -08:00
Changming Sun
fb871978b5
Adjust build flags for the release pipelines (#3066)
1. Add LTCG back. It was set to default OFF in my previous PR to speed up Windows build. It is only needed in release pipelines.
2. Remove --use_featurizers from all the packaging pipelines
3. Make sure all the packages have openmp
2020-02-21 16:45:42 -08:00
Changming Sun
179603775f
Use CUDA 10.1 for Linux build (#3057)
Use CUDA 10.1 for Linux build
(Windows change is already in)

Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits.

libcublas10-10.1.0.105 won't work!!!

The cuda docker image by viswamy is already using 10.1, no need to change.
2020-02-21 11:55:32 -08:00
Ori Levari
be12fb3143
include winml x86 binaries in the drop-signed-nuget artifact (#3058) 2020-02-21 11:17:23 -08:00
Changming Sun
ef2bba316b
CUDA 10.1 for Windows(#3049) 2020-02-19 23:26:47 -08:00
Dmitri Smirnov
daf8c4bee4
Remove faturizers from CPU MLDNN and NoContribOps builds. (#3039)
The first one is temp. The second one is permanent removal.
2020-02-19 06:23:36 -08:00
James Yuzawa
411b3aa801
Java build system enhancements (#2866) 2020-02-18 15:41:49 -08:00
stevenlix
da653ccdac
Upgrade TensorRT to version 7.0.0.11 (#2973)
* update onnx-tensorrt submodule to trt7 branch

* add fp16 option for TRT7

* switch to master branch of onnx tensorrt

* update submodule

* update to TensorRT7.0.0.11

* update to onnx-tensorrt for TensorRT7.0

* switch to private branch due to issues in master branch

* remove trt_onnxify

* disable warnings c4804 for TensorRT parser

* disable warnings c4702 for TensorRT parser

* add back sanity check of shape tensort input in the parser

* disable some warnings for TensorRT7

* change fp16 threshold for TensorRT

* update onn-tensorrt parser

* fix cycle issue in faster-rcnn and add cycle detection in GetCapability

* Update TensorRT container to v20.01

* Update TensorRT image name

* Update linux-multi-gpu-tensorrt-ci-pipeline.yml

* Update linux-gpu-tensorrt-ci-pipeline.yml

* disable rnn tests for TensorRT

* disable rnn tests for TensorRT

* disabled some unit test for TensorRT

* update onnx-tensorrt submodule

* update build scripts for TensorRT

* formating the code

* Update TensorRT-ExecutionProvider.md

* Update BUILD.md

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* Update win-gpu-tensorrt-ci-pipeline.yml

* use GetEnvironmentVar function to get env virables and switch to Win-GPU-2019 agent pool for win CI build

* change tensorrt path

* change tensorrt path

* fix win ci build issue

* update code based on the reviews

* fix build issue

* roll back to cuda10.0

* add RemoveCycleTest for TensorRT

* fix windows ci build issues

* fix ci build issues

* fix file permission

* fix out of range issue for max_workspace_size_env
2020-02-12 07:03:58 -08:00
Dmitri Smirnov
273868eaa5
Disable NuGetPackaging on Linux GPU and remove DML from the pipelines (#3006) 2020-02-11 20:08:18 -08:00
Dmitri Smirnov
c1997db85e Exclude faturizers from Linux NuGet packaging. 2020-02-10 22:21:52 -08:00
Dmitri Smirnov
36915b3674 Temporarily remove Featirizers from packaging-pipelines 2020-02-10 22:21:52 -08:00
smk2007
ce713823cc
enable winml in the gpu ci pipeline (#2993) 2020-02-10 22:21:13 -08:00
smk2007
5c5ac34b5c
Disable use_dml in nuget pipeline (#3001) 2020-02-10 22:09:58 -08:00
Tiago Koji Castro Shibata
fb2182f3fc
Release ARM/ARM64 Nuget packages (#2987)
* Enable ARM64 release builds

* Add ARM release

* Skip C# dll signing in ARM

* Copy ARM binaries to Nuget

* Restore nuget packages before ARM packaging

* wip

* Use host protoc at C# build

* Set ProtocDirectory on cross-compiled builds

* wip

* Fix typo
2020-02-10 16:29:27 -08:00
Dmitri Smirnov
c8ea154e55
Package data_frame_tool, include featurizers into Manilinux2010 (#2989)
* Package data_frame_tool, exclude featurizers from Manilinux2010 as their fail to build.
2020-02-07 11:38:42 -08:00
Dmitri Smirnov
0e5582bcc3
Publish nightly NuGet packages to a new Vsts feed (#2976)
* Add a push to the internal ORT-NIGHTLY

* Add a descriptive name

* Specify packageToPush attribute
2020-02-06 10:08:21 -08:00
smk2007
c32cedc6c9
Merge windowsai (winml layering) into master (#2956)
* Initial Commit

* Merged PR 3985217: add onecoreuap_apiset.lib in order to avoid linking against kernel32.lib etc (#2346)

add onecoreuap_apiset.lib in order to avoid linking against kernel32.lib etc and violating our OS layering requirements.

We linked against onecoreuap_apiset.lib in VB so we will continue doing this, but I am still unsure why not to link against onecore instead since that is where we ship. However, since Sheil is the owner of this code we will wait to discuss with him before changing anything.

* Initial changes for layering

* more snipping to get core into ort

* update build instructions to include --build_shared_lib (#2358)

* update build instructions to include --build_shared_lib

* fix line breaks

* Task 23998197: add winml_lib_core into onnnxruntime.dll (#2368)

* Task 23998197: add winml_lib_core into onnnxruntime.dll

* PR feedback
build break on perf_test

* return proper error when the model path isn't found (#2391)

* LearningModelSession is cleaned up to use the adapter, and parts of b… (#2382)

this is a big PR.    we are going to move it up to layer_dev , which is still a L3 so we are still safe to do work there agile.

we are going to move this into the L3 so that ryan can start doing intergration testing.   

we will pause for a full code review and integration test result prior to going into the L2.

>>>> raw comments from previous commits >>> 

* LearningModelSession is cleaned up to use the adapter, and parts of binding are.
* moved everything in the winmladapter
made it all nano-com using, WRL to construct objects in the ORT side.
base interfaces for everythign for winml to call
cleaned up a bunch of winml to use the base interfaces.
* more pieces
* GetData across the abi.
* renamed some namepsace
cleaned up OrtValue
cleaned up Tensor
cleaned up custom ops.
everything *but* learnignmodel should be clean
* make sure it's building.   winml.dll is still a monolith.

* model moved over.
everything builds clean.
step !

* weak ref comment

* Layer dev paulm (#2408)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* Layer dev paulm (#2414)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* User/xianz/win ml telemetry (#2410)

* add option to enable winml telemetry

* add option to enable winml telemetry

* clean logs while developping

* clean the log of GUID

* compile onnxruntime_common with winml telemetry

* use option for use_telemetry

* rename option winml_use_telemetry to onnxruntime_use_telemetry

* little change

* fixed some lifetime management.
fixed the debug build.
squeezenet passes using winmlrunner for CPU and GPU

* Layer dev paulm (#2423)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* fixed some lifetime management.
fixed the debug build.
squeezenet passes using winmlrunner for CPU and GPU

* PR feedback.

* Layer dev paulm (#2424)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* fixed some lifetime management.
fixed the debug build.
squeezenet passes using winmlrunner for CPU and GPU

* PR feedback.

* couple of fixes and coded getmutabledata()

* Layer dev paulm (#2425)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* fixed some lifetime management.
fixed the debug build.
squeezenet passes using winmlrunner for CPU and GPU

* PR feedback.

* couple of fixes and coded getmutabledata()

* fixed 2 more heap corruptions

* Layer dev paulm (#2426)

* model moved over.
everything builds clean.
step !

* weak ref comment

* added a wrapper for RoGetActivationFactory to hook back into winml for creating winml objects.
fixes model load.

* fixed some lifetime management.
fixed the debug build.
squeezenet passes using winmlrunner for CPU and GPU

* PR feedback.

* couple of fixes and coded getmutabledata()

* fixed 2 more heap corruptions

* Add opset and IR check when loading model (#2413)

* Add opset and IR check.
* Add test case for future opsets.

https://github.com/microsoft/onnxruntime/issues/2371

* fixed map and sequence when passing stl types across the ABI .
found a leak in nvidia driver, but skipped it.
all winmlapitests pass now

* Moved SessionOptions over to the abi

* WinML CI (#2412)

* Pass flags to build/test WinML in CI

* Add initial CMake config for unit tests in WinML

* Set winml_unittests standard to C++17

* Add WinML API tests and port them to googletest

* Install WinML test collateral

* Add LearningModelSessionAPITests ported to googletest

* Fix WinML test files encoding

* Add GPU tests

* Add parameterized test, skip GPU tests

* Enable precompiled header

* Remove unused code and collateral

* Remove brand images

* Add dllload.cpp

* Remove images not used in API tests

* Add LICENSE.md to image collaterals

* Add models with licenses

* Remove FNS Candy tests

* Add API test models

* Add ModelInSubdirectory

* Install collaterals post-build with copy_if_different, split common lib

* fix warnings

* Link to gtest_main

* Register WinML TraceLogging provider on Onnxruntime.dll (#2455)

* Register WinML TraceLogging provider on Onnxruntime.dll

* Add ifdef to make sure trace logging provider has telemetry option when LAYERING_DONE

* No need for ifdef for TraceLoggingOptionMicrosoftTelemetry

* PR feedback

* Move etw registration into lotus environment constructor and deresgister in lotus environment destructor

* Brianma/cpuwinml (#2466)

* allow building winml cpu without dml.

* Brianma/breaks (#2469)

* fix some more breaks

* learning model doesn't need lotusEnvironment and CPU shouldn't include dmlEP headers

* move dml checks out of winml and into the adapter

* better error handling

* Brianma/fi (#2470)

* learning model doesn't need lotusEnvironment and CPU shouldn't include dmlEP headers

* User/xianz/win ml telemetry (#2410)

* add option to enable winml telemetry

* add option to enable winml telemetry

* clean logs while developping

* clean the log of GUID

* compile onnxruntime_common with winml telemetry

* use option for use_telemetry

* rename option winml_use_telemetry to onnxruntime_use_telemetry

* little change

* Add opset and IR check when loading model (#2413)

* Add opset and IR check.
* Add test case for future opsets.

https://github.com/microsoft/onnxruntime/issues/2371

* WinML CI (#2412)

* Pass flags to build/test WinML in CI

* Add initial CMake config for unit tests in WinML

* Set winml_unittests standard to C++17

* Add WinML API tests and port them to googletest

* Install WinML test collateral

* Add LearningModelSessionAPITests ported to googletest

* Fix WinML test files encoding

* Add GPU tests

* Add parameterized test, skip GPU tests

* Enable precompiled header

* Remove unused code and collateral

* Remove brand images

* Add dllload.cpp

* Remove images not used in API tests

* Add LICENSE.md to image collaterals

* Add models with licenses

* Remove FNS Candy tests

* Add API test models

* Add ModelInSubdirectory

* Install collaterals post-build with copy_if_different, split common lib

* fix warnings

* Link to gtest_main

* fix bad merge

* Checking in a staging checkpoint point so that Ryan can work with me in parrallel

* build break.

* Brianma/testfails (#2473)

* add missing ir version to dictvectorizer-string.onnx

* add missing ir version to relu.onnx

* add missing ir version to zipmap*onnx

* add IR version to manually generated models

* remove an unnecessary ifdef dml

* Brianma/windowsai fi (#2475)

* update dockerfiles/README (#2336)

* Make elementwise op run 4 items per thread (#2335)

Description: Describe your changes.
Make elementwise op run 4 items per thread
unroll for loop to leverage ILP
remove unnessary N==0 check inside elementwise GPU kernel
Motivation and Context
Why is this change required? What problem does it solve?
It can improve the performance of GPU elementwise ops. ~2% performance gain on popular NLP bert model.
If it fixes an open issue, please link to the issue here.

* Add CUDA GatherElements kernel (#2310)

* Updates

* Update test

* Update

* Updates

* nits

* PR feedback

* Update

* Update

* PR feedback

* PR comments

* Update

* Fix build

* Fix build

* Nits

* Fix

* Layer Normalization Fusion  (#2319)

basic layer normalization transform

* Add FastGelu Cuda Op for Gelu and Add bias fusion (#2293)

* Add FastGelu cuda op

* Add AddBiasGelu for experiment

* Revert "Add AddBiasGelu for experiment"

This reverts commit 5c1ee019858c657e6bb75887265cb85675626e5b.

* Add bias

* Add unit tests

* update comment

* update script

* fix build error

* update coding style

* update for CR feedback
Enable half2 optimization only when cuda arch >= 7.0

* move _Tanh to common.cuh

* implement CPU contrib OP Attention (#2333)

* Remove unused initializer from GraphProto as well as name_to_initial_tensor_ in CleanUnusedInitializers. (#2320)

* Remove unused initializer from GraphProto as well as name_to_initial_tensor_ in CleanupUnusedInitializers.

This means initializers that have been replaced during graph optimizations are not left in the GraphProto when we save an optimized model.

* Handle edge case where a model has an unused initializer with matching graph input by also removing the graph input.

* Use non-const iterators in std::find_if calls to make centos build happy.

* Nuget pipeline changes (#2305)

1. refactor the pipeline, remove some duplicated code
2. Move Windows_py_GPU_Wheels job to Win-GPU-CUDA10. We'll deprecated the "Win-GPU" pool
3. Delete cpu-nocontribops-esrp-pipeline.yml and cpu-nocontribops-pipeline.yml
4. In Linux nuget jobs, run "make install" before creating the package. So that extra RPAH info will be removed

* Cuda Reverse Sequence Op, maping types of same size using same template function. (#2281)

* Set ElementType to String type of node metadata, instead of byte[] (#2348)

* Set ElementType to String type of node metadata, instead of byte[]

* Fix spacing

* Introduce PrimitiveType into a Type System along with an integer constant (#2307)

Improve perf by avoiding GetType<T>() calls. Introduce MLTypeCallDispatcher to switch on Input Type. Add Tensor IsType<T>() fast method.

* Fix/test dim value of 0 handling in a couple of places (#2337)

* Update the CUDA Where implementation broadcasting logic to handle a dim with value of 0.
Add unit test
Also add unit test for unary op with dim value of 0

* Exclude ngraph from Where test with 0 dim.

* Openvino EP R3.1 onnxrt server (#2357)

* onnxrt server with OVEP

* onnxrt server with OVEP

* Update Dockerfile.server.openvino

* onnxrt server OVEP fix reviews

* onnxrt server OVEP fix reviews

* Implement cuda nonzero op. (#2056)

Implement cuda nonzero op.

* Direct use python numpy array's memory if already contiguous.  (#2355)

* Direct use python numpy array's memory if already contiguous. This
could greatly improve performance for session with large input,
like big image 1920x1080 fastrcnn, 30~40% speed up could be achieved.

* Add test case enforce contiguous/non-contiguos numpy array as inputs.

* Add helper to create output to minimize binary size. (#2365)

Add ConstEigenTensorMap typedef so we don't unnecessarily const_cast the const input Tensor.

* fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS (#2369)

* fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS

* update

* Add Tracelogging for profiling (#1639)

Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON

* test bidaf with nuphar for avx target (#2370)

increase nuphar test coverage a bit

* Fix a bug in TLS refcount that may destabilized CUDA CI (#2374)

* update output size calculation for resize (#2366)

* change how output size is calculated for resize op

* add tests for ver 10 resize

* Extend OneHot CPU kernel to support more types (#2311)

* Extend OneHot CPU kernel to support input int64_t, depth int32_t, output float

* Skip BERT before the test data fix is picked up

* Fix bug with Slice. Need to pass in flattened input dimensions so the initial offset into the input is calculated correctly. (#2372)

* Add opset 11 version of Split to CUDA ops (#2376)

Organize the CUDA ops definitions so all the opset 10 and 11 parts are together (same setup used for CPU ops)

* Layer Norm Fusion Fix (#2379)

* layer norm fusion fix

* Add input shape check in code and unit tests

* Fuse Add + Gelu (#2360)

Implement the transformer to fuse add + gelu
Implement the accurate kernel

* Skip layer norm transform (#2350)

* skip layer normalization transformer

* Another try to stabilize CUDA CI (#2383)

The root cause seems to be failure in CUDA dealloc when tear down. cudaFree return code was ignored before, so should the debug check.

* fix BUILD.md typo (#2375)

build.py: error: argument --config: invalid choice: 'RelWithDebugInfo' (choose from 'Debug', 'MinSizeRel', 'Release', 'RelWithDebInfo')

* Fixed compilation with ngraph (#2388)

* Fix reuse logic in allocation planner. (#2393)

* Fix reuse logic in allocation planner.

* PR comments

* Add helpful comments

* Don't allow reuse across string tensors.

* [NupharEP] Multiple optimizations  (#2380)

Fuse transpose into MatMul
Implement Pow and constant scalar simplification
Vectorize ReduceMean
Improve symbolic shape inference
Minor updates for better debugging in fused function name

* Avoid using the default logger in the graph lib and optimizers (#2361)

1. Use the session logger if it is available.
2. Don't disable warning 4100 globally. We should fix the warnings instead of disabling it.

* Change CUDA implementation of Transpose to support all fixed size tensor types (#2387)

* Change CUDA implementation of Transpose to not use a typed kernel so we can support more types with minimum binary size.
Add support for 8, 16, 32 and 64 bit types.
Add unit tests.
Add method so the implementation can be called directly (will be used by CUDA Scan very soon).

* Disable TensorRT for MLFloat16 and int8 unit tests.

* Address PR comment and add support for calling cublas implementation if type is mlfloat16.

* Add opset 11 versions of the existing CUDA operators that had negative axis support explicitly added. (#2398)

* Add opset 11 versions of the existing CUDA operators that had negative axis support explicitly added.

* [NupharEP] force some low/zero cost ops to be inlined (#2409)

* fix cross compile bug (#2415)

* Minor optimization: if a node has already been placed, there's no need to find a kernel for it. (#2417)

* Add Reshape Fusion (#2395)

* Add reshape fusion

* Add some comments

* update comments

* update comment format

* update according to feedback

* update for recent logger change

* fix build error

* (1) Support both input and output edges in find path in graphutils
(2) Add a test case of only one constant initializer of Concat input.
(3) Refactor ReshapeFusion class to allow add more subgraph fusion in the future.

* fix error

* (1) loose constraint on initializer: non constant is allowed for reshape fusion.
(2) Change versions type to vector.
(3) Add logging.
(4) Return false when multiple output edges matched in FindPath. Add comments.

* only allow one direction (input or output) in FindPath

* [NupharEP] Update notebook and docker image (#2416)

Add BERT squad in Nuphar tutorial
Enhance speed comparsion readability

* Fix the issue in matmul_add_fusion (#2407)

Fix the issue in matmul_add_fusion

If Muatmul + Add has shape [K] * [K, N], reset it to [1, K] * [K, N] will make the output shape to [1, N] will also requires a reshape on the output.
Fix: just remove the shape reset to not fuse it.

Add a negative test case for matmul+add fusion

* feat(treeregressor): Update TreeEnsembleRegressor for type support (#2389)

Updates the `TreeEnsembleRegressor` to allow for `double`, `float`,
`int64`, and `int32` inputs to match the upstream specification.

Signed-off-by: Nick Groszewski <nicholas.groszewski@capitalone.com>

* onnxrt server documentation update (#2396)

* Added support for Pad-2 operator in OpenVINO-EP (#2405)

* Add CUDA If operator. (#2377)

* Add CUDA If operator.
Uses CPU operator for implementation.
By adding a CUDA version the inputs/outputs (with the exception of the 'cond' input) stay on GPU, and no other logic is required to avoid a copy to CPU across the control flow node.

* Improved documentation for onnxruntime::utils::SwapByteOrderCopy(), added precondition check.

* Fix the type constraints on CUDA If operator to exclude strings. (#2431)

* add Im2col<uint8_t> (#2438)

* Adjust codegen vectorization width from target (#2439)

* Adjust codegen vectorization width from target

* Add CUDA Scan operator. (#2403)

* Add Scan CUDA op.
Uses CPU implementation for logic.
Added some device specific functors for handling when data needs to be manipulated on a different device.
Added ability to override the materialization logic in the OrtValue slicer so DML can plugin their handling.

* Fix Windows GPU C API packaging pipeline failure (#2440)

Fix Windows GPU C API packaging pipeline failure (#2440)

* Correctly handle implicit inputs for fused nodes (#2390)

* Correctly handle implicit inputs for fused nodes

Previously, nuphar's partitioning function didn't include
node's implicit inputs into the inputs list of MetaDef, and hence
a crash was triggered in the onnx graph checker.

This commit fixed the issue. Furthermore, it also fixed a related
issue where we didn't add implicit inputs into
graph_inputs_excluding_initializers_ in Graph::SetGraphInputsOutputs.

the issue was that graph_inputs_including_initializers_ populated by
SetInputs (e.g. called by FunctionImpl::FunctionImpl) may contain
implicit inputs which were not of any node's initializers in the graph.
Because they were not part of any initializers, these implicit inputs
couldn't be visited by going through all nodes' inputs.
Consequently, they would *not* be added into graph_inputs_excluding_initializers_.

We fixed the issue by first copying the populated graph_inputs_including_initializers_
into graph_inputs_excluding_initalizers_, which then had both initializers and
non-initializers as its initial content. Later, we erase initializers from the
list. In this way, we can ensure all implicit inputs to remain in
graph_inputs_excluding_initializers_.

* refined comments and fixed duplicates

Address CR by revisiting comments in terms of implicit inputs

Also fixed an issue by skipping duplicates while copying inputs
from graph_inputs_including_initializers_.

* address CR

explain why we need to collect nodes' implicit inputs

* don't rely on pointer values for iterating std::set

Previously, openvino relied on iterating a set of NodeArg pointers
to construct inputs and outputs for a fused graph. It could cause
non-determinism. The reason was that although iterating std::set by
itself is stable, pointer values of NodeArgs may vary. Consequently,
we could end up visiting the set's elements in different orders for
different runs for the same test, which resulted in constructing
inputs (and outputs) with different orders to the fused graph.
For example, for the same test, we may have inputs [A, B] in some
runs but inputs[B, A] in others.

Let's use std::string as the key type to avoid such nondeterminism.

This commit also added implicit inputs into meta->inputs while returning
the capability from the openvino provider.

* Fixed another latent issue in openvino's GetCapability function

The issue was that we couldn't simply erase fused_inputs and fused_outputs
while iterating the nodes. For example, an output NodeArg may have multiple
uses, and it's wrong if we erase it from fused_outputs when we encounter only
one of its uses as input.

* Remove DeviceAllocatorRegistry class (#2451)

Remove DeviceAllocatorRegistry class

* CSharp api and test for loading custom op shared library (#2420)

- Added C-API test for loading custom op shared lib.
- Made some changes in C++ api header and C-api implementation to get it working.
- Added C# API and corresponding test for loading custom op shared library.

* Parallel Gelu with ParallelFor (#2399)

Parallel Gelu to get better performance for Gelu

* Clean up build.py (#2446)

* Pull the latest image before running docker build

* Fuse SkipLayerNorm with Bias (#2453)

Fuse SkipLayerNorm with Bias

* Allow more than one invocation of CreateEnv in the same process. (#2467)

* Allow more than one invocation of CreateEnv in the same process.

* Fix centos build

* Symbolic shape inference improvements: (#2460)

* Symbolic shape inference improvements:
- add a mode to guess unknown ops' output rank
- add support for GatherND
- add support for If
- fix a bug in get_int_values when then tensor rank > 1D, by treating it as no sympy data
- add symbol to literal merge when ONNX silently merges dims
- fix a bug in Concat when input dim is 0
- fix a bug in ConstantOfShape that computed dim is not updated
- add support for dynamic shape in ConstantOfShape
- fix a bug in Loop output shape that loop iterator dim is not inserted at dim 0
- add support for dynamic padding in Pad
- add support for dynamic shape in Reshape
- add support for Resize with opset > 10, by treating output dims as dynamic
- fix a bug in Slice when starts/ends are dynamic
- restrict input model to opset 7 and above
- make output model optional to avoid disk write when testing

Run model tests for symbolic shape inference

Reduce 2GB docker image size of nuphar

* add additional test data set for nuget pipeline (#2448)

* add SAS token to download internal test data for nuget pipeline

* update azure endpoint

* fix keyvault download step

* fix variable declaration for secret group

* fix indentation

* fix yaml syntax for variables

* fix setting secrets for script

* fix env synctax

* Fix macos pipeline

* attempt to add secrets to windows download data

* fix mac and win data download

* fix windows data download

* update test data set url and location

* Revert "Brianma/windowsai fi (#2475)"

This reverts commit 5780b864a1.

* Add scenario tests (#2457)

* Add scenario tests

* Remove TODO from model license

* Add winml_api test dependency

* fix model load test. fi from master changed the constructor (#2483)

* make api tests all pass (#2486)

* fix bad merge

* fix bad model merge

* Layer dev paulm (#2492)

* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* Rename ambiguous header (#2489)

* fix one more missing IR version model (#2500)

* add missing IR version to 4 more models used by scenario tests (#2501)

* Add CLI parameters to test runner, build WinML in ARM and x86 CI (#2479)

* Support test parameters through CLI arguments

* Add WinML do Windows x86/ARM CI builds

* Code style fixes

* Update googletest

Remove GPUTEST macros everywhere now that GTEST_SKIP is supported

* Refactor main.cpp

* Build scenario tests without DML

* Link scenario tests to DML when it's enabled (#2502)

* Layer dev release pipeline (#2488)

Adds winml binaries to existing cpu nuget package, and creates new gpu dml nuget package with winml binaries and DML EP.

* Layer dev paulm (#2506)

* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* Remove usage of IOBinding in WinML and use C_API Run method (#2504)

* remove usage of iobinding

* Change data structure to use vector of Ort::Values

* Polish bind input / output

* Use C APIrun method

* Update providers on evaluate getresults

* Remove run and IObinding interface from WinMLAdapter

* Remove use of IObinding

* bind unbound outputs code moved to learningmodelbinding

* clean up unneeded istensor adapter function

* Fix comment

* Check if session is closed before binding and clearing

* PR feedback

* Layer dev paulm (#2507)

* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.

* Make tests dependend on winml_dll (#2509)

* add dml binaries to DirectML package and be more explicit about condition variables (#2520)

* re-enable warnings for winml builds and fix the warnings that were hiding (#2526)

* turn devmode back on for winml builds

* fix some warnings. include protobuf in a way that disables some warnings

* undo protobufhelpers changes and just ignore 4100 errors in pb code

* attempt to isolate protobufhelpers errors

* add template specialization for getting tensor proto data

* Layer dev paulm (#2533)

* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.

* moved files from inc to lib\api.core
cleaned up some of the cmake

* staged changes

* Spawn child process to run DeviceLostRecovery scenario test (#2530)

* Spawn child process to run DeviceLostRecovery scenario test

* Layer dev paulm (#2536)

ori said yes

* add missing namespace to winml_trace_logging_provider in lotusenvironment.h (#2542)

* Handle exception thrown from all apis in WinMLAdapter (#2539)

* various changes to unblock windowsai ADO build

* Fix custom ops scenario tests (#2562)

* Do not shutdown protobuf after ort environment gets destroyed. Lazy load lotus environment first time it is needed

* comment typo

* pr comment  about calling phoenix singleton

* Make lotus_environment static in winmladapter

* Layer dev paulm (#2567)

* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.

* moved files from inc to lib\api.core
cleaned up some of the cmake

* staged changes

* making windowsAI azure dev ops work.

* code review comments.

* revert changes

* Cmake and preprocessor fixes that where uncovered by building on agents without DML available via SDK

* Layer dev dml delayload (#2580)

* Brianma/cpu (#2583)

* don't include dml stuff in cpu builds

* tests that link the image lib also need the telemetry lib now

* Throw Winml_err_invalid_binding if binding gpu resource on cpu device (#2589)

* Throw Winml_err_invalid_binding if binding gpu resource on cpu device

* PR comments. No need to query executionprovider for is gpu device

* User/xianz/ortthrow (#2596)

* thrown and handle onnxruntime exceptions

* handle exception thrown from ort in winmladapter

* undo changes in error.h

* add message to HRESULT

* User/xianz/ortthrow (#2599)

* thrown and handle onnxruntime exceptions

* handle exception thrown from ort in winmladapter

* undo changes in error.h

* add message to HRESULT

* add status error message

* Remove uwp onsuspending winrt call because logruntimeperf is getting removed (#2630)

* User/xianz/dedup telemetry (#2631)

* investigate duplication of telemetry in winml and ort

* remove winml telemetry events

* telemetry executionProviderEvent

* remove unneccessary file and refactor code little bit

* Revert back TelemetryEvent, which send up ETW event.

* merge changes from layer_dev to windowsai (#2638)

* Remove underscore from googletest names (#2616)

* Fix leaking memory allocator

Fix https://microsoft.visualstudio.com/OS/_workitems/edit/24278761
and https://microsoft.visualstudio.com/OS/_workitems/edit/24330198

* Explicitly initialize Ort::Value with nullptr

* Cache WinML adapter

* bad merge

* define private version of dxcore enum that is added in 19H1 SDK. (#2654)

* add comment for explaning private definition of dxcore d3d feature level ennum value. (#2672)

* do not package directml.pdb for redist packages. (#2676)

* Fix leaking operator registry (#2645)

Fix https://microsoft.visualstudio.com/OS/_workitems/edit/24354916

* User/orilevari/windowsai master merge (#2674)

merge resolutions included pulling in telemetry logic that was merged to master and not windowsai and dereferencing InferenceSession::sessionstate now that it is a unique pointer

* Delete Ort Allocator in LearningModelBinding (#2653)

* Delete OrtAllocator in LearningModelBinding

* PR comments to make Ort::Allocator a smart pointer

* Small comment change

* PR feedback to clean up code

* PR feedback on move semantics

* Clean up std::move

* Fix memory leaks (#2679)

Fix https://microsoft.visualstudio.com/OS/_workitems/edit/24356109,
https://microsoft.visualstudio.com/OS/_workitems/edit/24388361 and
https://microsoft.visualstudio.com/OS/_workitems/edit/24388596

* various changes to properly organize and skip GPU tests. For now for No DML builds we will not run GPU tests at all. In the future we should adapt the tests to expect the appropiate errors. (#2695)

* Windowsai without fi (#2701)

* Disable Attention fusion tests when DISABLE_CONTRIB_OPS is defined (#2529)

* Setup java ci (#2528)

* Add provision in ORT for session options to be parsed when available via model file  (#2449)

* Initial commit

* Fix gitmodules

* Nits

* Nits

* Updates

* Update

* More changes

* Updates

* Update

* Some updates

* More changes

* Update

* Update

* Merge

* Update

* Updates

* More changes

* Update

* Fix nits

* Updates

* Fix warning

* Fix build

* Add comment

* PR feedback

* PR feedback

* Updates

* Updates

* Update

* More changes

* Fix build break

* Comment test for now

* Updates

* Updates

* PR feedback

* Updates

* Nits

* Add tests

* Fix build

* Fix build

* Fix build

* Fix build break

* Fix build

* Nits

* PR feedback

* More change

* Expose GetSessionOptions in pybind logic and add unit test for python

* Fix build

* PR feedback

* PR feedback

* Revert "Disable thread pool creation when enabled OpenMP (#2485)" (#2535)

This reverts commit 7c7d5a149c.

* Add dynamic shape support in TensorRT execution provider (#2450)

* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* update header file

* Removed submodule

* add submodule onnx-tensorrt kevin's branch shape-test'

* add debugging code

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* merge master

* Removed submodule

* update onnx-tensorrt submodule

* add more changes for dynamic shapes

* Update tensorrt_execution_provider.cc

* update for dynamic shape

* update dynamic shape processing

* fix logger issue

* remove submodule onnx-tensorrt

* add submodule onnx-tensorrt

* add env variable min_subgraph_size

* remove redundency

* update document

* use onnxruntime::make_unique

* fix multi-run issue

* remove some tests to save CI build time

* Add dynamic shape test

* Update TensorRT-ExecutionProvider.md

* Add example of running Faster R-CNN model on TensorRT EP

* Add more details on env variables

* update environment variables

* Update tensorrt_basic_test.cc

* Update model tests

* Update tensor_op_test.cc

* remove --use_full_protobuf

* Update build.py

* User/xianz/telemetry (#2458)

* enabme telemetry

* enable telemetry

* set enable telemetry as default

* for debugging

* remove log and set disable telemetry as default back

* delete private file while testing

* resolve comment: mainly add license header, rename macro and update docs

* rewording in privacy.md

* Fix integer overflow in cuda NonMaxSuppression implementation (#2540)

* add test case that should pass but fail

* fix nms

* extract int_max_output_boxes_per_class

* Introduce container type runtime checks and other improvements (#2522)

Rework TensorSeq in a manner consistent with Tensor and SparseTensor
  in terms of type system setup.
  Reduce templating. Introduce helpers to ensure the same
  data type.
  Make OrtValue __dtor not virtual.
  Introduce ContainerChecker

* Fix C API tests for centos and mac (#2544)

* change c++14 to c++11

* add ld lib path for centos

* enable csharp tests on macos

* fix C API test on MacOS + fix manylinux dotnet install

* fix manylinux dotnet install

* fix lib link

* Add back executable bit to build.py

* Fix a bug handling negative begin pad values in Pad op (#2550)

* Fix bug in Pad op

* Update

* DNNL CMAKE update (#2548)

* Fix android build (#2558)

* Update win-x86-ci.yml (#2557)

Fix build pipeline break

* Re-enable Windows C# tests (#2564)

* disable onnx_test_runner -x invocations for dnnl (#2568)

* Allow sequence length to be symbolic (#2559)

* setup java ci mac (#2570)

* make layernorm fusion to support opset 11 (#2545)

* Fix a warning found in the latest VS release

* Add more check on SkipLayerNorm and BiasGelu fusion (#2574)

* Fix file not found error during docker build. (#2569)

* Add ConvTranspose1D (#2578)

* Ryanunderhill/packagename test (#2582)

* [Nuphar EP] fixes for some object detection models (#2581)

Update notebook tutorial with multi-threaded int8 GEMM from #2517

* EmbedLayerNormalization Fusion Improvement (#2553)

Embedding layer norm fusion improvements - add more checks

* Update version (#2584)

* Temporarily exclude vgg19 test from Python backend test

1. temporarily exclude vgg19 test which comsumes too much memory, run out of memory on Upsquared device. Single test pass for vgg19, need furture investigation (#2588)
2. Update docker file to decrease the docker image size

* Update docs for Android NNAPI EP (#2586)

* Fix lto bug for protobuf and ubuntu

* add path to build dir before test run (#2590)

* Add missig env variables for mac pipeline test (#2595)

* Fixed an issue in updating realized dims (#2597)

when we update realized dims for scan's output, the sliced axis also
needs to be inclusive, i.e. we should check with "dim >= insert_inclusive_axis",
because the offset in the symbols are based on Scan sugraph.
Otherwise, we would end up with shape mismatch later.

* Java API for onnxruntime (#2215)

* Add support for opset 11 in reshape fusion (#2592)

 Support opset verion 11 in reshape fusion

* Rename automl python tools folder to featurizer_ops. (#2593)

* Support opset 11 subgraph of Squad model in Embed Layer Normalization (#2605)

Support opset 11 Squad model that is exported from PyTorch nightly. The embed layer uses Range op which is missed in the transformer.

* symbolic shape inference: fix warnings in GPT-2 model (#2608)

And revise nuphar perf test on BERT squad

* Dump subgraph ID and fused graph ID (#2607)

* Dump subgraph ID and fused graph ID

Dump subgraph ID and fused graph ID for better debugging

* Remove local static fused_count

added a field global_fused_count_ to NupharExecutionProvider class

* EmbedLayerNormalization Fusion For Dynamic Squad Model Opset 10 (#2613)

Support subgraph of SQuAD model exported from pytorch with dynamic input axes

* Allow providers to be set for InferenceSession at construction (#2606)

* Remove unnecessary parameter in some places in GatherElements implementation (#2612)

* Remove unnecessary parameter in some places

* Update

* Update

* Make sure fenced tensor could not reuse other tensor. (#2561)

Fix random error caused by this.

* Improve Embed Layer Norm Fusion for SQuAD with static input shape  (#2621)

* fix float16 comparison in initializer (#2629)

* epsilon attribute for layernormalization fusion (#2639)

* removed unnecessary batch file and fix path (#2640)

* Add shape inference to ConvTransposeWithDynamicPads schema (#2632)

* Improve cuda expand() opeator's performance. (#2624)

* Cuda pad optimize when no padding is needed. (#2625)

* Shortcut cuda Pad() when no padding is needed.

* Optimize cuda scatter() on 2D compatible. (#2628)

* Optimize cuda scatter() on 2D compatible.

* Add some comments.

* fix build error for ARM (#2648)

* Improve performance of resize() in Nearest mode (#2626)

Special treatment for 2D, check same size as input image.
And in 2d kernel, template use_expolation.

* Fix memory exception in Layer Norm Fusion (#2644)

* Windows CI changes(#2650)

* Revert "User/orilevari/windowsai master merge (#2674)"

This reverts commit fe26146311.

* Revert "Windowsai without fi (#2701)"

This reverts commit 285d4c85ff.

* Revert "User/orilevari/windowsai master merge (#2674)"

This reverts commit fe26146311.

* Deref unique pointer for session_state

* send shutdown event when dll is unloaded and EvaluationStop, SessionC… (#2704)

* send shutdown event when dll is unloaded and EvaluationStop, SessionCreationStart Events.

* Add EvalutationStart Event

* add comment

* use correct type for for loop (#2755)

* ARM CI (#2759)

* Set ARM agent pool

* Set CMake generator to VS 2019 in ARM

* Use system-wide CMake instead of custom version

Our custom version is too old for VS 2019

* Use DML and build shared lib in ARM CI

* Restore nuget packages in ARM CI

* Disable DML

* Refactor ARM debug/release builds

* Use system packaged Python version

* Remove hardcoded Python path

* Downgrade Python to 3.7 for build

* Remove explicit CMake path

* Fix invalid JSON in cgmanifest.json (#2760)

* Fix cgmanifest.json generating script (#2770)

* Fix protobuf submodule name

* Workaround pygit2 bug

* Remove usage of WHOLEARCHIVE in WinML CMake and add WinMLAdapterFactory (#2726)

* Remove usage of WHOLEARCHIVE in WinMLAdapter CMake and add WinMLAdapterFactory

* PR feedback, no need for dll(export) since using def file

* PR comments

* Small comment in gen_def.py

* User/orilevari/32bit comparison warning (#2800)

* use correct type for for loop

* explicitly specify void for parameters of OrtGetApiBase because the function is defined in c, so when the function is just (), it is interpreted as having an unknown number of parameters. This was causing compiler warning C4276.

* Move winml_provider_factory.h to proper location (#2801)

* Scneario Test : Build Google Test and Taef Test based on preprocessor definition (#2809)

* Add winml macro wrappers on top of google test macros

* change test methods to disabled

* Add custom winml macros for both taef and google tests

* PR comments

* Filter CPU case for IsFloat16Supported (#2802)

* Merge fixes

* CMake cross-generator fixes (#2790)

* Fix compilation w/ non-VS CMake generators

* Fix custom WINMD target in Ninja

* Remove usage of msbuild .targets file

* Fix linking using DML in Ninja

* Automate SDK kit version choice

* Cleanup DML package install

* Fix SDK version detection

* Fix comment

* Revert unittest linkage changes

* Fix latest SDK detection

* Don't link to non-uapcore libraries

* Remove MessageBoxA reference and unused link libs

* Refactor WinMLAPI Tests to build both google and taef test based on preprocessor definition (#2829)

* Add winml macro wrappers on top of google test macros

* change test methods to disabled

* Add custom winml macros for both taef and google tests

* PR comments

* Refactor winml api tests

* Move additional gtest specific macro definition into googleTestMacros.h

* Fix test build break since winml_lib_api needs to be statically linked to tests since winmlp::learningmodeldevice::iscpu() is being used in devicehelpers.cpp (#2837)

* Enforce WINML_TEST_CLASS_BEGIN_* matches w/ a WINML_TEST_CLASS_END (#2841)

* Fix warnings that cause build to fail

* Fix test warnings and delayload linking (#2843)

* Ortmemoryinfo struct changed

* mark the camera scenario test as edgecore because it uses d3d11 (#2852)

* User/orilevari/pipeline fi breaks (#2853)

* remove conflicting artifact names. Decided to stop using drop-nuget-cuda since this may have implications on other dependent pipelines.

* change job name in gpu.yml back to Windows_CI_GPU_CUDA_Dev

* Remove internal libs from tests (#2864)

* Support custom DML in onnxruntime_providers.cmake (#2867)

* Make DML include path global (#2882)

* Make DML include path global

* Add generated cppwinrt headers to winml_lib_common

* Integrate changes to WindowsAI to make ADO Build (#2886)

* Revert "CMake cross-generator fixes (#2790)"

This reverts commit dbe7d97fa1.

*  add additional suppress warning in onnx_proto

* ignore /wd4996 warning

* DML execution provider fixes

* Revert "Revert "CMake cross-generator fixes (#2790)""

This reverts commit 1ae7b4bcbc.

* Update func signature of custom op function overloads

* common devicehelpers fixes

* Add pch.h for winml_lib_common

* re-add winml_lib_common_dir/inc to include path for winml_adapter

* User/orilevari/dml redist shared folder (#2890)

* move dml nuget package directory up one level to make it shared between build flavors

* Merge conflict fix

* Revert "Merge conflict fix"

This reverts commit 142fa72cf9ce4344ad717b50b7ea2b8582aadc7c.

* Revert "Merge remote-tracking branch 'origin/master' into windowsai"

This reverts commit 6e2126d46e5e5f564d65da37dd4f70c93dd81165, reversing
changes made to b3f5583dc9249834b947c8ea905f6a98060d5bd6.

* Make winml_test_common free of test macros (#2902)

* Add option to build winml_test_common without googletest specifics

* remove test macros from squeezenet

* comment change

* Make cmake functions to get scenario and api source

* PRcomments about hresult

* Build errors fixed

* Fix cmake variable

* Make winml_google_test_lib to build main.cpp once

* PRcomments

* Don't generate files outside the build root (#2914)

* Don't generate files outside the build root

* Add onnxruntime_EXTERNAL_DEPENDENCIES to WinML

* Add DML depedency on RESTORE_PACKAGES

* User/orilevari/fix yaml merge bugs (#2918)

* Add winml test source parameter into cmake function (#2919)

* Add option to build winml_test_common without googletest specifics

* remove test macros from squeezenet

* comment change

* Make cmake functions to get scenario and api source

* PRcomments about hresult

* Build errors fixed

* Fix cmake variable

* Make winml_google_test_lib to build main.cpp once

* PRcomments

* Add arguments to unittest cmake functions

* remove comment

* Revert "Revert "Merge remote-tracking branch 'origin/master' into windowsai""

This reverts commit ade5abe72a4234fdbc3623093c61c02c6b0bdc26.

* Fix breaks from merge with ORT master

* Brianma/linux (#2917)

* don't include windows.h in cross-plat header

* add default case for switch statement

* signed/unsigned mismatch fix

Co-authored-by: Brian Martin <42186431+martinb35@users.noreply.github.com>

* User/sheilk/winml adapter c api (#2891)

* Create winml adapter c api

* fix build

* make it build

* move adapter into onnxruntime core/session

* entry point not exported

* minor changes

* make model metadata work

* make tests pass

* implement all the model reflection apis on the adapter c abi

* update the new ort interface to create a lotus ennvironment with a logging sink

* start adding ort env

* move all winml code into adapter folder/lib to isolate it

* ensure a single logging manager at a time

* start refactoring session

* refactor session creation interface

* add cpu and dml session option methods to adapter

* finish session init

* stub out interfaces in ort lib to perform similar mechanics of iinference session

* enable profiling, and enable schema override

* update session register graph transformers

* turn back on custom registry for custom ops

* Add sync api

* add last c api stubs

* should build... but all feature values are broken since this is in flight to moving all implementation details into ivalue

* remove ep adapter header

* Implement DML execution provider functions from adapter (#2846)

* Implement DML execution provider functions from adapter

* Use functions in OnnxruntimeEngine.cpp

* make map/sequence type_infos freeable, and start implementing ivalue

* make it build again

* implement value methods

* implement remaining methods

* remove com adapter abi

* check dml session

* cache the allocator on ivalue

* check if resource is cpu/gpu when access its mutable data

* update tensor

* mismatched parentheses

* fix tensor base and binding obj

* it evaluates tensors! sometimes...

* minor fixes

* enable gpu evals

* wrapper all existing winml adapter apis with API_IMPL to try catch (#2854)

* update winml... tensor strings are broken, need to template tensorbase to do different things for strings

* make tensor strings work with 2 copies in/2 copies out

* Fix tensor string and allocator bug

* make maps work again... needs some fixes still

* Make it build!

* enable map inputs

* map outputs

* unbound outputs for sequences and maps

* User/xianz/merge windowsai (#2883)

* Packaging pipeline changes for VS 2019 (#2711)

* Tiny fix to codegen

* Simplify cache implementation and avoid static variables that may carry over between models

* Extend DML kernels (#2641)

* Additional DML operators

* Check unsupported attributes and inputs

* Address PR comments

* Add kernel capability function used for partitioning, and re-enable stride-based int64 support based on value range

* Fix test failures

* Build fix

* PR comments

* Update Nuphar tutorial notebook (#2721)

1. Reflect int8 GEMV improvements for multi-threading from #2696
2. Add notes on multi-threading control using OpenMP
3. Add samples of running multi-isa AOT, and show int8 GEMM differences between AVX and AVX2
4. Add rnn_benchmark example to resolve #1993

* Add schema for new Qops (#2611)

* Add schema for new Qops

* adding shape inference + qlinearaveragepool

* plus review comments

* plus review comments

* updates per review comments

* plus review comments

* [server] Add supposed for model_name and model_version as cli parameter (#2708)

* remove 64bit warning message from python validation. (#2727)

* MLAS: ARM64 build fix (#2734)

fix bad usage of vreinterpret to cast vector element types

* Fix broken python docs links (#2740)

* Fix build on Mac OS (#2731)

mac os ld doesn't support --while-archive, correct option is -all_load

* fix ngraph wheel (#2737)

* fix ngraph wheel

1.1.0 onnxruntime_ngraph wheel doesn't work

* remove libdnnl.so in nGraph Libs

* make it easy to compare

* Split onnxruntime server to a separated folder (#2744)

* Fix build for Python 3.8 (#2747)

* Fix build for Python 3.8

* Update protobuf to 3.11.2 (#1928)

Update protobuf to 3.11.2 (#1928)

* Change default optimization level to All (from Basic) (#2745)

* change default optimization level to All (from Basic)

* fix test

* fix c# test

* Update numpy to 1.18 (#2758)

* Update numpy to 1.18

* Pipeline changes for python 3.8 (#2753)

1. Pipeline changes for python 3.8
2. Fix a regression in setup.py which was just introduced in the previous commit.

Please notice, we still haven't made python 3.8 + Windows + CUDA work.

* Add basic stacktrace output for posix debug builds. (#2749)

* [NupharEP] fix a race condition when multiple sessions running different models concurrently (#2772)

* Revert "Change default optimization level to All (from Basic) (#2745)"

This reverts commit 56bb503c2f.

* Fix typo in error message (#2736)

* Rename MKL-DNN to DNNL to fix broken link (#2730)

* Fix nightly build version number issue

* Pass BUILD_BUILDNUMBER to linux docker

* Disable featurizers in python packages

* Import more featurizers (#2781)

Make kernels non-template. Add input constraint for learnt data.
  Add min_max_scalar_transformer, robust_scalar_transformer,
  inputation_marker_transfomer, label_encoder_transformer,
 missing_dummies_transformer along with tests.
 Advance Featurizers library commit.

* Implement a more stable softmax (#2715)

* Implement a more stable SoftMax
 e^x is represented as infinity if x is large enough, like 100.f. Infinity divided by Infinity is a NAN. Thus, softmax gets a NAN if one or more item are large enough.
A math transform as below is leveraged to get a stable softmax:
e^xi/(e^x1 + ...e^xn) = e^(xi - max) / (e^(x1 - max) + ... + e^(xn - max))

And for convenience, force max to 0.f if all xi are negative

* Contributing: Fix a typo (#2784)

* ACL EP GEMM improvements (#2780)

When it is posible we use a fully connected layer instead of the gemm implementation.
This will let the library use the best implementation based on the input data.

* ACL EP convolution improvements (#2774)

Added the optimized implementation for depthwise convolution for both ACL v19.02 and ACL 19.05.
Also the pointwise convolution seems to be more optimal in the CPU implementation so we opted for that instead.

* Add script for release Nuget validation (#2719)

* Initial commit

* Nits

* Disable a test temporarily

* Change working directory

* Test

* Add download python step

* Test update

* More changes

* Fix space issue

* Fix

* Verify nuget signing

* Fix

* Spaces

* PR feedback

* Nit

* Fix

* Fix

* Remove temporary changes

* add uint8 support to where op (#2792)

* Improve bert optimization script: (#2712)

(1) Move input int64=>int32 conversion to embed layer fusion.
(2) Output epsilon attribute for LayerNormalization fusion.

* add session creation time cost. (#2798)

* ML.NET team needs featurizers within a package (#2789)

Add auto ml featurizers to Windows, MacOS as well as to GPU  packaging-pipelines.

* Initialize max of softmax with lowest of float (#2786)

* MLAS: update SGEMM threading parameters (#2808)

* add interface to copy batch tensors. (#2807)

* add interface to copy batch tensors.

* onnxruntime

* speed up Windows TRT CI (#2811)

* don't run cuda tests if building with tensorrt

* remove unnecessary build options for win trt ci

* refactor win gpu tensorrt ci yml

* --numpy_version=1.17

* update

* update

* azcopy and cuda path

* Update test data (#2356)

* Add timeseries imputer transformer featurizer kernel (#2813)

 Make kernels non-template. Add input constraint for learnt data.
  Fixup tests.
  Add two more featurizers along with tests. Tests fail.
  min_max_scalar_transformer
  robust_scalar_transformer
  Fix tests serialized stream by prepending version bytes.
  Add inputation_marker_transfomer and the test.
  Fix up float/double type designations.
 Added label_encoder_transformer along with a test.
  string_throw case is broken at the momement.
  Fix labelencodertransfomer_test.cc string_throw case
  Rename maxabsscalertransformer_test.cc
  Add MissingDummiesTransformer along with the test.
  Update manifest.
  Add TimeSeriesImputerTransformer definition, implementation and tests

* Fix memory leak in TRT (#2815)

* fix memory leak issue

* revert EP_FAIL on enueueV2

* Add manifest missing comma

* Run static code analyzer on most of our code (#2817)

* Scneario Test : Build Google Test and Taef Test based on preprocessor definition (#2809)

* Add winml macro wrappers on top of google test macros

* change test methods to disabled

* Add custom winml macros for both taef and google tests

* PR comments

* update quantization doc (#2783)

* update documentation for quantization script

* plus some spell corrections

* Filter CPU case for IsFloat16Supported (#2802)

* update default optimization level + fix gemm_activation fusion (#2791)

* update defualt optimization level + fix gemm_activation fusion

* fix typo

* add unit test and incorporate review comments

* fix test comment

* Fix dnnl wheel package name (#2823)

* Append '-dnnl' to whl package name when --use_dnnl

* Update build.py

* Update Ubuntu & TensorRT version  in README (#2820)

Dockerfile.tensorrt is using nvcr.io/nvidia/tensorrt:19.09-py3 as base Image, update Ubuntu and TensorRT version according to
https://docs.nvidia.com/deeplearning/sdk/tensorrt-container-release-notes/rel_19-09.html#rel_19-09

* Merge fixes

* Add OneHotEncoder and HashOneHotEncoder kernels. (#2830)

 Add defs and imlementation for OneHotEncoders, adjuist date_time_transformer kernel and test.
  Add OneHotEncoder kernel test.
  Add HashOneHotVectorizerTransformer unit test.
  This does not link due to multiple definitions of functions
  that are included into header from a CPP file.

* Upgrade gtest to the latest version (#2827)

WinML would like to update the googletest submodule. They want some newer features (namely GTEST_SKIP to skip tests programmatically and be able to skip entire fixtures easily) and would need to update the submodule version.

However, because the new version of code hit a bug in gcc, even though the bug is already fixed in the latest gcc but we're using gcc 4.8.x and it won't get patched for the bug, so we have to do a compromise, change our code a little bit to make it work.

The gcc bug:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51213

* Add support for int64_t for topk CPU. Fixes github issue #2806. (#2833)

* Ignore allocator type in ExecutionProviders allocator map. Make default initialization of OrtMemoryInfo more clearly invalid. (#2768)

* Remove allocator type from the key comparison in ExecutionProviders.
Remove usage of DummyArena as it's no longer necessary.

* Fix x86 tests where arena allocator is disabled.
Make initialization of OrtMemoryInfo clearer by adding Invalid enum value.

* Make OrtValueNameIdxMap::MaxIdx more intuitive.

* Convert ExternalProject Featurizers into git submodule (#2834)

Add git submodule for Featurizer library.
  Update cmake to build for git submodule.

* add domain check for nodes + update documentation (#2831)

* Fix cgmanifest.json generating script (#2770)

* Fix protobuf submodule name

* Workaround pygit2 bug

* User/orilevari/32bit comparison warning (#2800)

* use correct type for for loop

* explicitly specify void for parameters of OrtGetApiBase because the function is defined in c, so when the function is just (), it is interpreted as having an unknown number of parameters. This was causing compiler warning C4276.

* CMake cross-generator fixes (#2790)

* Fix compilation w/ non-VS CMake generators

* Fix custom WINMD target in Ninja

* Remove usage of msbuild .targets file

* Fix linking using DML in Ninja

* Automate SDK kit version choice

* Cleanup DML package install

* Fix SDK version detection

* Fix comment

* Revert unittest linkage changes

* Fix latest SDK detection

* Don't link to non-uapcore libraries

* Remove MessageBoxA reference and unused link libs

* Fix Linux CUDA nuget packaging pipeline break

* Refactor WinMLAPI Tests to build both google and taef test based on preprocessor definition (#2829)

* Add winml macro wrappers on top of google test macros

* change test methods to disabled

* Add custom winml macros for both taef and google tests

* PR comments

* Refactor winml api tests

* Move additional gtest specific macro definition into googleTestMacros.h

* Fix test build break since winml_lib_api needs to be statically linked to tests since winmlp::learningmodeldevice::iscpu() is being used in devicehelpers.cpp (#2837)

* Enforce WINML_TEST_CLASS_BEGIN_* matches w/ a WINML_TEST_CLASS_END (#2841)

* update optimization doc for BERT related fusions  (#2819)

* Add bert related transformers to doc
* Add execution provider and comment for bert optimizations
* Add comment about accuracy impact of approximation

* Fix warnings that cause build to fail

* MLAS: enable threading for quantized GEMMs (#2844)

* Fix test warnings and delayload linking (#2843)

* Ortmemoryinfo struct changed

* mark the camera scenario test as edgecore because it uses d3d11 (#2852)

* User/orilevari/pipeline fi breaks (#2853)

* remove conflicting artifact names. Decided to stop using drop-nuget-cuda since this may have implications on other dependent pipelines.

* change job name in gpu.yml back to Windows_CI_GPU_CUDA_Dev

* Remove internal libs from tests (#2864)

* Support custom DML in onnxruntime_providers.cmake (#2867)

* remove old winmladapter cpp

Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: KeDengMS <kedeng@microsoft.com>
Co-authored-by: Jeff <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Andrey <andrey.lompart@gmail.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Faith Xu <txsafx@gmail.com>
Co-authored-by: zhanyi-ms <zhanyi@microsoft.com>
Co-authored-by: Changyoung Koh <gkcy1019@gmail.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
Co-authored-by: Andrews548 <32704142+Andrews548@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Nathan <7902510+ybrnathan@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Ryan Lai <ryalai96@gmail.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Yingge WAN <y-wan@users.noreply.github.com>
Co-authored-by: Qing <cwq1913@gmail.com>
Co-authored-by: Pranav Sharma <emailpranav@gmail.com>
Co-authored-by: Tiago Koji Castro Shibata <tiago.shibata@gmail.com>

* move sequence implementation into ort lib... still commented out... need to turn back on...

* begin sequence implementation

* make maps and sequences work

* fix broken tests

* remove dead code

* misc cleanup

* CR feedback

* User/xianz/winml adapter c api (#2869)

* wrapper all existing winml adapter apis with API_IMPL to try catch

* Return HR or Throw for WinML adapter APIs if failed

* undo macro wrapper for two places

* Wrap error macros around ort apis, too.

* address CR feedback #2

* add more api throw/return macros

* Revert changes no longer needed

* revert changes to cxx api

* format winml lib.ort and winml adapter

* remove static pheonix singleton

Co-authored-by: Ryan Lai <ryalai96@gmail.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: KeDengMS <kedeng@microsoft.com>
Co-authored-by: Jeff <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Andrey <andrey.lompart@gmail.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Faith Xu <txsafx@gmail.com>
Co-authored-by: zhanyi-ms <zhanyi@microsoft.com>
Co-authored-by: Changyoung Koh <gkcy1019@gmail.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
Co-authored-by: Andrews548 <32704142+Andrews548@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Nathan <7902510+ybrnathan@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Yingge WAN <y-wan@users.noreply.github.com>
Co-authored-by: Qing <cwq1913@gmail.com>
Co-authored-by: Pranav Sharma <emailpranav@gmail.com>
Co-authored-by: Tiago Koji Castro Shibata <tiago.shibata@gmail.com>

* missing use_dml check in winml_adapter_session (#2930)

* --use_dnnl flag was mangled in merge (#2931)

* use dml macro not wrapping custom registry code (#2934)

* Disable LNK4199 winml_dll to enable cuda builds (#2936)

* Disable LNK4199 in winml_dll

* linkler->linker

* LearningModelSessionAPITestGpu.CreateSessionWithCastToFloat16InModel should return DXGI_ERROR_UNSUPPORTED when FP16 not supported (#2937)

* Disable LNK4199 in winml_dll

* linkler->linker

* Need to return DXGI_ERROR_UNSUPPORTED when Model does not support fp16

* Publish build symbols (#2939)

* Publish build symbols

* Don't upload PDBs for .exe files

* Make x86 build (#2943)

* fix last remaining size_t/int64_t warnings->errors (#2948)

* TensorString, Sequences and Maps use the first allocator, but should use the cpu default allocator. (#2952)

* fix tensor string allcoator

* clean up default allocator usage for strings in winml lib/api.ort

Co-authored-by: Ryan Lai <ryalai96@gmail.com>

* Handle tensor shape of zero (#2954)

Co-authored-by: Ryan Lai <ryalai96@gmail.com>

* CR feedback (#2970)

* CR feedback

* fix weird formatting on privacy readme

* Add 'All rights reserved.' everywhere

* readd all rights reserved to winml_provider_factory.h

* remove extra space in comment

* remove extra whitespace

* fixes post master merge

* remove winml from nuget gpu pipeline

* set IR VERSION on generated_model in rnn_benchmark (#2972)

* Fix slice conformance failures (#2908)

Co-authored-by: Adrian Tsai <adtsai@microsoft.com>
Co-authored-by: Brian Martin <42186431+martinb35@users.noreply.github.com>
Co-authored-by: Ryan Lai <ryalai96@gmail.com>
Co-authored-by: Paul McDaniel <paul_mcdaniel@hotmail.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
Co-authored-by: Tiago Koji Castro Shibata <tiago.shibata@gmail.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Jeff <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: KeDengMS <kedeng@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Andrey <andrey.lompart@gmail.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Faith Xu <txsafx@gmail.com>
Co-authored-by: zhanyi-ms <zhanyi@microsoft.com>
Co-authored-by: Changyoung Koh <gkcy1019@gmail.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
Co-authored-by: Andrews548 <32704142+Andrews548@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Nathan <7902510+ybrnathan@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Yingge WAN <y-wan@users.noreply.github.com>
Co-authored-by: Qing <cwq1913@gmail.com>
Co-authored-by: Pranav Sharma <emailpranav@gmail.com>
2020-02-04 17:12:19 -08:00
Changming Sun
7ff5c0e5a3
CMake changes (#2961)
1. Add support for vstest. 
2. Add support for vcpkg. To use it:
  ```bat
   vcpkg install zlib:x64-windows benchmark:x64-windows gtest:x64-windows protobuf:x64-windows pybind11:x64-windows re2:x64-windows
   mkdir build
   cmake ..\cmake -DCMAKE_BUILD_TYPE=Debug -A x64 -T host=x64 -DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows -Donnxruntime_PREFER_SYSTEM_LIB=ON
  ```
3. New cmake option: onnxruntime_PREFER_SYSTEM_LIB, which allows user using the preinstall libs instead of the things in onnxruntime submodule.
4. New cmake option: onnxruntime_ENABLE_MEMLEAK_CHECKER, which allows user turn on/off the memory leak checker by @RyanUnderhill in Windows Debug Build. The checker doesn't work with vstest.
4. Fix the post merge pipeline(Mainly for test coverage report).
5. Ignore the compile warning from the Featurizer library code
6. Apply "/utf-8" VC compile flag to our code. Without this, you can't build onnxruntime on Chinese Windows.
7. Remove the SingleUnitTestProject cmake option because it's deprecated more than one year and nobody is using it.
8. Move opaque api tests to onnxruntime_test_all
9. Enable "/W4" on CUDA ep's C++ code(Not the *.cu files), and fix some warnings, add some extra checks.
10. Delete the onnxruntime::test::TestEnvironment class.
11. Add a DLLmain for onnxruntime.dll. 
12. Allow dynamic link to libprotobuf
2020-02-03 19:33:14 -08:00
jignparm
645d8fb213
Jignparm/upgrade macos vm image (#2928)
* update MacOS image to 10.14

* Update to macos 10.14
2020-01-28 20:24:45 -08:00
Changming Sun
e0c9cdaa73
Fix the nuget pipelines (#2901) 2020-01-23 20:02:18 -08:00
Jeff
ba336b5583
Disable DML EP on software adapter, fix float16 fallback bug, re-enable DML in CI (#2896)
* Re-enable DML in CI pipeline

* Fix bug with float16 fallback + fusion, and disallow DML EP with software adapter

* Address PR comments
2020-01-23 15:18:28 -08:00
Changming Sun
201b089a36
Fix some warnings on Windows (#2560)
1. Enable warning "4503" # Decorated name length exceeded.
2. Enable warning "4146" # unary minus operator applied to unsigned type.
3. Enable float64 support for the Softmax operator
4. Enable compliance checks for Windows x86 32bits build
5. Use TryBatchParallelFor to replace some fallback code in mlas pooling.cc
6. Fix Android CI pipeline.
2020-01-22 15:59:11 -08:00
Pranav Sharma
49725f896c
Disable openmp for the nocontribops pipeline. (#2888) 2020-01-22 12:07:44 -08:00
KeDengMS
f9f25ec047
Fix spurious component detection warning (#2857)
Fix spurious component detection warning
Use component detection template for all pipelines
2020-01-18 20:10:35 -08:00
Changming Sun
e6f7658ade Update Windows GPU build to use cudnn 7.6 2020-01-17 12:23:13 -08:00
Changming Sun
47e27ec9a1
Disable DML in Windows GPU CI build (#2856)
Disable DML in Windows GPU CI build for now, because there are some wired model test failure and I don't know how to fix it. Will seek help from WinML team.
2020-01-16 18:47:30 -08:00
Changming Sun
c4e4abce73
Run static code analyzer on most of our code (#2817) 2020-01-10 22:17:17 -08:00
Changming Sun
48e042868f
Update test data (#2356) 2020-01-10 10:52:23 -08:00
George Wu
31200ed92c
speed up Windows TRT CI (#2811)
* don't run cuda tests if building with tensorrt

* remove unnecessary build options for win trt ci

* refactor win gpu tensorrt ci yml

* --numpy_version=1.17

* update

* update

* azcopy and cuda path
2020-01-10 08:40:40 -08:00
Dmitri Smirnov
2c8179bee4
ML.NET team needs featurizers within a package (#2789)
Add auto ml featurizers to Windows, MacOS as well as to GPU  packaging-pipelines.
2020-01-09 10:54:12 -08:00
Hariharan Seshadri
ebfcad1c90
Add script for release Nuget validation (#2719)
* Initial commit

* Nits

* Disable a test temporarily

* Change working directory

* Test

* Add download python step

* Test update

* More changes

* Fix space issue

* Fix

* Verify nuget signing

* Fix

* Spaces

* PR feedback

* Nit

* Fix

* Fix

* Remove temporary changes
2020-01-08 18:42:22 +05:30
Changming Sun
e3f674b563 Disable featurizers in python packages 2020-01-06 11:16:44 -08:00
Changming Sun
7ace7a5bcd Pass BUILD_BUILDNUMBER to linux docker 2020-01-06 11:16:44 -08:00
Changming Sun
382fa86af8
Pipeline changes for python 3.8 (#2753)
1. Pipeline changes for python 3.8
2. Fix a regression in setup.py which was just introduced in the previous commit.

Please notice, we still haven't made python 3.8 + Windows + CUDA work.
2020-01-02 15:25:25 -08:00
Changming Sun
fd334aff44
Update numpy to 1.18 (#2758)
* Update numpy to 1.18
2019-12-30 14:51:01 -08:00
Changming Sun
c7a9c6b488
Split onnxruntime server to a separated folder (#2744) 2019-12-27 11:21:23 -08:00
Changming Sun
b42cb61904
Packaging pipeline changes for VS 2019 (#2711) 2019-12-20 19:53:51 -08:00
Changming Sun
89d6bfaa94
VS 2019 build pipeline changes (#2693)
1. Move Win GPU pipeline to VS2019
2. Move C API pipeline to VS 2019
3. Move nuget mklml pipeline to VS 2019
4. Move windows no contrib ops pipeline to VS 2019
2019-12-18 15:34:58 -08:00
Dmitri Smirnov
ce7a180f21
Import more featurizers with tests (#2685)
Advance commit to 4df80d5865a9d4e97f6d0b9304d4316115a04d9e
  Add generated code for the commit before editing.
  Import more featurizers.
  Rename Automl ops domain to mlfeaturizers.
  Rename conditional compilation macro.
  Move and rename files getting rid of automl
  Rename --use_automl build switch to --use_featurizers
  Rename CMake option accordingly. Rename automl CMake targets.
  Adjust CI and packaging pipeline switches.
  Rename namespace automl to featurizers.
2019-12-17 22:17:40 -08:00
Hector Li
47503ec7a6
Initiate the build scripts for ARM ACL (#2652)
1. Add scripts to build Yocto image & toolchain
2. Update docker build scripts to support Onnxruntime build with ARM ACL 19.02/19.05
2019-12-16 09:44:19 -08:00
Dmitri Smirnov
7c87070b24
Import Featurizers (#2643)
Import FeaturizerLibrary as ExternalPorject which is optional and is not registered as git submodule.
2019-12-13 16:07:12 -08:00
Changming Sun
a46a28b7d8
Windows CI changes(#2650) 2019-12-13 12:23:49 -08:00
shahasad
4dbf9442cc
removed unnecessary batch file and fix path (#2640) 2019-12-12 14:21:02 -08:00
Adam Pocock
35ceb1a6a6 Java API for onnxruntime (#2215) 2019-12-10 08:28:46 -08:00
Ashwini Khade
78099701b4
Add missig env variables for mac pipeline test (#2595) 2019-12-09 21:10:43 -08:00
shahasad
41fc820f76
add path to build dir before test run (#2590) 2019-12-09 18:58:55 -08:00
Hector Li
0ab54521f4
Temporarily exclude vgg19 test from Python backend test
1. temporarily exclude vgg19 test which comsumes too much memory, run out of memory on Upsquared device. Single test pass for vgg19, need furture investigation (#2588)
2. Update docker file to decrease the docker image size
2019-12-09 12:25:46 -08:00
shahasad
eeb28a80c0
setup java ci mac (#2570) 2019-12-06 11:43:40 -08:00
Changming Sun
7eddac16c2
Re-enable Windows C# tests (#2564) 2019-12-05 21:22:31 -08:00
Ryan Hill
854362cf05
Update win-x86-ci.yml (#2557)
Fix build pipeline break
2019-12-05 18:44:12 -08:00
Ashwini Khade
281933fa1c
Fix C API tests for centos and mac (#2544)
* change c++14 to c++11

* add ld lib path for centos

* enable csharp tests on macos

* fix C API test on MacOS + fix manylinux dotnet install

* fix manylinux dotnet install

* fix lib link
2019-12-04 18:01:35 -08:00
Xiang Zhang
3e7aaf8fa1 User/xianz/telemetry (#2458)
* enabme telemetry

* enable telemetry

* set enable telemetry as default

* for debugging

* remove log and set disable telemetry as default back

* delete private file while testing

* resolve comment: mainly add license header, rename macro and update docs

* rewording in privacy.md
2019-12-03 23:34:53 -08:00
stevenlix
293b15480b Add dynamic shape support in TensorRT execution provider (#2450)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* update header file

* Removed submodule

* add submodule onnx-tensorrt kevin's branch shape-test'

* add debugging code

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* merge master

* Removed submodule

* update onnx-tensorrt submodule

* add more changes for dynamic shapes

* Update tensorrt_execution_provider.cc

* update for dynamic shape

* update dynamic shape processing

* fix logger issue

* remove submodule onnx-tensorrt

* add submodule onnx-tensorrt

* add env variable min_subgraph_size

* remove redundency

* update document

* use onnxruntime::make_unique

* fix multi-run issue

* remove some tests to save CI build time

* Add dynamic shape test

* Update TensorRT-ExecutionProvider.md

* Add example of running Faster R-CNN model on TensorRT EP

* Add more details on env variables

* update environment variables

* Update tensorrt_basic_test.cc

* Update model tests

* Update tensor_op_test.cc

* remove --use_full_protobuf

* Update build.py
2019-12-03 23:18:33 -08:00
shahasad
178d059111 Setup java ci (#2528) 2019-12-03 14:21:51 -08:00
Ashwini Khade
e32eff826c
enable nuget package testing on centos7 (#2527)
* add centos tests to linux cpu ci pipeline

* Disable failing test

* use centos6 instead of centos7

* change back to centos7

* add dotnet runtime dependency

* fix dotnet runtime dependencies

* install dotnet sdk instead of runtimes

* add more dotnet dependencies

* temporary skip failing test

* ix lib path

* reenable failing test
2019-12-03 10:16:45 -08:00
Sreekanth Yalachigere
31ea11a696 Renaming MKL-DNN as DNNL (#2515)
* DNNL: Moving Files to rename file names

* DNNL name change

* azure pipeline updated

* disable ceil/dialation and enable Opset10

* disable ceil/dialation tests in Python

* mlperf_ssd_resnet34_1200 disabled
2019-12-03 07:34:23 -08:00
Changming Sun
3d627362a0
Upgrade Windows CPU CI pipeline to use VS 2019 (#2519) 2019-12-02 23:05:35 -08:00
shahasad
882f28a74b
Fix NuGet end to end tests for custom op dll (#2472) 2019-11-25 15:26:09 -08:00
Ashwini Khade
4caf5c9c13
add additional test data set for nuget pipeline (#2448)
* add SAS token to download internal test data for nuget pipeline

* update azure endpoint

* fix keyvault download step

* fix variable declaration for secret group

* fix indentation

* fix yaml syntax for variables

* fix setting secrets for script

* fix env synctax

* Fix macos pipeline

* attempt to add secrets to windows download data

* fix mac and win data download

* fix windows data download

* update test data set url and location
2019-11-25 13:08:03 -08:00
Changming Sun
e7131f6d12 Pull the latest image before running docker build 2019-11-22 13:48:37 -08:00
Changming Sun
6bcf95477e
Fix Windows GPU C API packaging pipeline failure (#2440)
Fix Windows GPU C API packaging pipeline failure (#2440)
2019-11-20 14:00:37 -08:00
Changming Sun
080a0a3186
Nuget pipeline changes (#2305)
1. refactor the pipeline, remove some duplicated code
2. Move Windows_py_GPU_Wheels job to Win-GPU-CUDA10. We'll deprecated the "Win-GPU" pool
3. Delete cpu-nocontribops-esrp-pipeline.yml and cpu-nocontribops-pipeline.yml
4. In Linux nuget jobs, run "make install" before creating the package. So that extra RPAH info will be removed
2019-11-08 09:45:52 -08:00
Patrick Foley
151075790d [OpenVINO-EP] Update to latest version: OpenVINO 2019 R3.1 (#2308)
* Updates OpenVINO EP to latest version: 2019 R3.1

* Reviews fixed

* Update Dockerfile.openvino

* Addressed PR comments and disabled model tests temporarily

* Update Dockerfile.ubuntu_openvino
2019-11-05 19:55:46 -08:00
Changming Sun
138a7f194e Add cleanup step 2019-10-30 08:13:09 -07:00
Changming Sun
ce14b07b1c
Fix the GPU nuget pipeline failure (#2255) 2019-10-25 13:55:38 -07:00
Tomasz Dołbniak
63acd4e89b Adjust the nGraph EP to the newest CI test data (#2180)
* Adjust the nGraph EP to the newest CI test data

* Increase the linux pipeline timeout for nGraph
2019-10-24 16:44:03 -07:00
Changming Sun
4b62241c77
Update ONNX to 1.6.1 (#2235) 2019-10-23 13:47:45 -07:00