Commit graph

341 commits

Author SHA1 Message Date
Changming Sun
5a7f65b831
Fix training e2e pipeline (#7942)
1. Fix training e2e pipeline. The failure was caused by my recent change #7632. The fix is adding "--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=70" to the build parameters because the machines are with V100 GPUs.
2. Simplify Nuphar pipeline. It doesn't need to install a separated ONNX version(1.5.0)
3. Fix a problem that run_dockerbuild.sh ignored OS version parameter. Now because it starts to take effect, I also set python version to the system default one(3.8 for ubuntu 20.04)
2021-06-04 09:37:09 -07:00
Changming Sun
b854f2399d
Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632)
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs). 
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2.  (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use. 
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines.  In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build. 
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines. 
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
2021-06-02 23:36:49 -07:00
Thiago Crepaldi
c45ac166d3
Add graphviz into Dockerfile images for Python API documentation (#7819) 2021-06-02 16:12:54 -07:00
Suffian Khan
02c78a8aa8
test migration to rocm4.2 (#7800) 2021-05-24 11:48:44 -07:00
Changming Sun
38d90b0f15
Cleanup install_deps.sh (#7734) 2021-05-17 19:27:47 -07:00
liqunfu
d604281a86
Liqun/training pkg to run tests (#7662) 2021-05-16 09:10:57 -07:00
liqunfu
3ead2f2f39
update pt lightning version (#7711)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-15 21:46:16 -07:00
liqunfu
359fe1d197
Liqun/ort training version (#7620) 2021-05-14 09:54:19 -07:00
ashbhandare
56e993a434
Bump to rel-1.9.1 (#7684) 2021-05-13 18:41:28 -07:00
Changming Sun
41e370c2b3
Update protobuf to 3.16 (#7616) 2021-05-07 14:09:23 -07:00
baijumeswani
f3a70f1aec
Ignore invalid input argument to install_os_deps.sh (#7566) 2021-05-05 14:33:31 -07:00
Changming Sun
a284eede64
Fix Linux CPU pipeline (#7584) 2021-05-05 13:26:10 -07:00
baijumeswani
cab84d902e
Install and use conda on ortmodule CI pipelines (#7530)
* Install and use conda on ortmodule CI pipelines

* Update build script to install onnxruntime wheel before running unit tests

* Remove python 3.5 from install_python_deps

* Pinning deepspeed version to 0.3.15
2021-05-03 15:52:22 -07:00
liqunfu
196e6702ad
to support multiple cuda versions in published onnxruntime-training package (#7468)
to support multiple CUDA versions in published onnxruntime-training package
2021-04-27 17:15:33 -07:00
Suffian Khan
7a3c1787af
Add CI pipeline to publish Python training package targeting Rocm (#7417)
* first attempt rocm training wheel

* modifications needed to python packaging pipeline for Rocm 4.1

* changges to not conflict with cuda

missed stage1 changes

remove package push

add option r to getopt

try again without python install

try again without python install

try again without python install

split pipelines and add back push to remote storage

try on cuda gpu pool

try again

try again

try running without az subscription set

try again on original pipeline

change pool

passing AMD Rocm whl on AMD-GPU pool

split rocm pipeline from cuda pipeline

remove comments

* try adding Rocm tests as well

* try with tests in place

* fix trailing ws

* add training data

* try again as root for tests

* use python3

* typo

* try to map video, render group into container

* try again

* try again

* try to avoid yum error code

* make UID 1001

* try without yum downgrade

* define rocm_version=None

* remove CUDA related comments for Rocm Dockerfile

* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)

* missed requirements-rocm.txt from last commit

* fix whitespace
2021-04-23 17:22:31 -07:00
Ashwini Khade
75e054cd33
pick onnx release candidate (#7177)
* pick onnx release candidate

* fix typo

* filter batchnorm tests

* add implementation for reshape 14

* add identity op kernel for opset 14

* fix typo

* update onnx commit

* update commit to latest master

* add hashes for new kernel registrations and update 1

* TEST commit

* update onnx back to right commit

* Update onnx to latest in rel-1.9.0

* temp fix

* remove nonzeroshapesetter transformer

* pick rel branch latest commit

* fix build failures

* fix build failures

* fix build failures

* update the commit to latest in release branch

* add test filters for not impemented op14 ops in c# tests

* plus review comments
2021-04-22 23:57:09 -07:00
Changming Sun
65b2b87f83
Update CI build docker images (#7386)
Update CI build docker images: delete ubuntu 16.04 support.
2021-04-21 13:18:34 -07:00
Changming Sun
b4cfa88bf7
Update protobuf to the latest version (#7396) 2021-04-21 10:30:06 -07:00
Guoyu Wang
fce67e2b9b
Create Android Package pipeline (#7295)
* Create Android Package pipeline

* adress CR comments

* Switch to jdk11
2021-04-12 17:56:25 -07:00
baijumeswani
249a2c14ef
Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167)
* Pin version of pytorch to 1.8.1 for ORTModule CI pipeline
* Use pytorch-lightning stable version 1.2.5
* Revert to cuda 10.1
2021-04-01 09:37:47 -07:00
Ashwini Khade
b22e60bd44
pull onnx latest commit (#7102)
* update onnx commit

* fix test scripts to remove deprecated call

* update filters

* add registration for relu and cumsum ver 14

* add promote trilu to onnx domain

* update onnx-tensorrt submodule

* update flag

* update flag

* update dependencies

* fix android ci failure
2021-03-29 11:00:38 -07:00
harshithapv
540eac253e
Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078)
* adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule

* fixed typo in args

* addressed Thiago's comments

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-24 09:43:05 -07:00
baijumeswani
a7a2a16edd
Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094) 2021-03-22 21:48:32 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Thiago Crepaldi
335edaa2c4
Merge pull request #6973 from microsoft/thiagofc/merge-ortmodule-into-master
Introduce ORTModule training API to ONNX Runtime
2021-03-17 10:30:06 -07:00
Changming Sun
ed2d441a2e
Update ORT server build pipeline (#7030)
1. Migrated it to Ed's new docker build script
2. Use python 3.6 instead, because it is the default one in ubuntu 18.04
3. Move the "pip install" command to the docker image build stage(instead of when running the image)
2021-03-16 18:02:09 -07:00
baijumeswani
79f832c682
Separate requirements.txt file for ORTModule pipelines (#6879)
* Move all ORTModule dependency installations to ortmodule subfolder
2021-03-05 14:12:11 -08:00
Sherlock
12edf22f11
Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-master-210226
Sync from master
2021-02-27 12:32:36 -08:00
Thiago Crepaldi
f71d93ea2b
Enable PyTorch Lightning basic test on CI (#6809) 2021-02-27 09:35:42 -08:00
M. Zeeshan Siddiqui
ca48310d6d Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/ortmodule-api-sync-from-master-210226 2021-02-27 04:25:23 +00:00
stevenlix
53eb948f4c
Upgrade TensorRT to v7.2.2 (#6452)
* upgrade to TensorRT 7.2.2

* extend GPU tensorrt CI timeout to 150 minutes

* update docker image name

* disable user interaction to avoid tensorrt container stuck when install tzdata

* upgrade to libssl1.1 for ubuntu20.04

* remove libicu60 from ubuntu20.04

* add libicu66 for ubuntu20.04

* debug

* llvm

* llvm

* disable ReverseSequenceTest.InvalidInput

* disable ReverseSequenceTest.InvalidInput

* fix issues

* fix issues

* Update linux-gpu-tensorrt-ci-pipeline.yml

* disable warning 4458 for TensorRT parser

* update onnx-tensorrt submodule

* disable warnings for TensorRT parser

* update onnx-tensorrt submodule to include latest bug fixes

* update setup_env_trt

* update pool for win trt ci pipeline'

Co-authored-by: George Wu <jywu@microsoft.com>
2021-02-18 04:30:47 -08:00
M. Zeeshan Siddiqui
40dda452cf Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/sync-from-master 2021-02-18 03:03:01 +00:00
liqunfu
dd8ef4409a
Liqun/migrate perf test (#6733)
move ort training perf tests to azure devops
2021-02-17 17:48:47 -08:00
Thiago Crepaldi
3184c47ad1 Merge branch 'master' into thiagofc/merge-from-master 2021-02-17 11:49:52 -08:00
baijumeswani
01dfa8e125
Support non tuple return values from torch.nn.module (#6660)
* Support dictionary, namedtuples and huffingface ModelOutput type for model return values
2021-02-16 20:48:32 -08:00
Changming Sun
b5bd14fc9f
Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585)
Update gpu packaging pipelines to CUDA11

In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
2021-02-05 16:58:37 -08:00
Chun-Wei Chen
f2ce3aae13
add set_model_dir and update ONNX (#6119) 2021-02-05 09:30:49 -08:00
baijumeswani
62ac164279
Cache datasets on CI machines (#6525) 2021-02-02 21:11:35 -08:00
Thiago Crepaldi
8a890ddfd7
Sync ORTModule branch with master and fix tests (#6526)
* Deprecate Python global configuration functions [Part 1] (#5923)

Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.

* remove dnnl_dll_path from post build copy (#6142)

* Model Fusion For Bart (#6105)

Fusion fix for Bart models

* Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)

* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.

* Enable running the mnist_training sample without cuda (#6085)

Signed-off-by: George Nash <george.nash@intel.com>

* nnapi add min max support (#6117)

* Fix CUDA test hang: (#6138)

- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.

* Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)

* add static subgraph kernel index

* change kernel naming to avoid conflicts

* Add gradient registration for Abs. (#6139)

* Partition initial optimizer state for Zero-1 (#6093)

* Initial changes

* Working changes

* Working changes

* Cleanup

* fix windows CI

* Review comments

* review comments

* Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)

#4656

* Revert "work around of the build break in mac (#6069)" (#6150)

This reverts commit 3cae28699b.

* Fix clean_docker_image_cache.py detection of image pushes. (#6151)

Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.

* MLAS: add NEON version of int8 depthwise convolution (#6152)

* Using a map of of ops to stages as input of partition function. (#5940)

* New partition algorithm running before AD

* Convert cut_group_info into device map. Work in progress -- works for  bert-tiny with pp=2

* Removing code for partition of bwd graphs

* Remove old code

* Adding some verification code

* Handle Shared Initializer

* Renaming rank with stage

* Added first unit test

* new test

* redundant check

* undo change in bert

* Moved cut-based partition to testing utils file

Co-authored-by: xzhu1900
Co-authored-by: wschin

* New conversion function and tests

* minor

* remove test that is not needed2

* improve GetDeviceAssignment and PR comments

* minor changes

* PR comments

* improving documentation and variable naming

* add documentation

* Variable naming and docs

* more doc improvements

* more doc improvements

* missing static cast

* Fix test file for windows

* Fix test file for windows

* Fix test file for windows

* stage id is not the same as rank id

* PR comments

* PR comments

* More comments

* More comments

* Minor fix to satisfy c++14 (#6162)

* Deprecating Horovod and refactored Adasum computations (#5468)

deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests

* Update TensorRT-ExecutionProvider.md (#6161)

* Bugfix for topk cuda kernel (#6164)

* fix the issue that std::numeric_limits cannot handle half type

* adding a test

Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)

This reverts commit f2dcba7afe.

* Remove ignored build warnings for pybind on Mac (#6165)

* save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)

* save_checkpoint and load_checkpoint implementations

* checkpoint aggregation logic

* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints

* Don't try to bind unused inputs in the Training frontend (#6166)

* Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172)

* aggregate model states only for the case when mixed precision was true (#6176)

* [NNAPI EP] Enable per-channel quantization for QlinearConv  (#6155)

* Enable qlinearconv per-channel quantization

* Fix the android CI test failure

* Add Android Version Check for Per-Channel Quant

* Address PR comments

* Fix some minor issues

* Add verification of per-channel zero points

* Make the error tolerance configurable

* Fix typo in BERT pretraining script (#6175)

A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.

* Update get_docker_image.py to enable use without image cache container registry. (#6177)

Update get_docker_image.py to enable use without image cache container registry.

* Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156)

* Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions.

Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach).

* Restructure the helper so it can be called across the EP bridge.
Add ability to call id generation helper from EP bridge
  - convert DNNL EP to use helper to validate
Address issue where a new Model may be loaded into the same address as a previous one.
  - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model
Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time.
  - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.

* Backend APIs for checkpointing (#5803)

* Add backend API GetOptimizerState and GetModelState

* add GetPartitionInfoMap

* Android coverage dashboard (#6163)

* Write the report to a file.

* Post code coverage to the Dashboard database.

* Add usage details of unified MCR container image (#6182)

Going forward, a single unifed docker image will be published in
MCR. The hardware accelerator target choice will have to be made
in the application using OpenVINO EP's runtime config options.

* improve perf for softmax (#6128)

* improve perf for both gathergrad and softmax

* revert the change in gathergrad and will be done in another PR.

* address comments from code review.

* Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)

* tune fast gelu to use exp(x) instead of tanh(x) on rocm

* update to use expression 2/(1+exp(-2x))-1 for stability

* Add Status.csv to EP Perf Tool (#6167)

* merge master, keep postprocess status commit

* download float16.py everytime

* removing hardcoded values

* Lochi/quantization tool for trt (#6103)

* Initial implementation of generating calibration dynamic range table

* Initialize validation support for Quantization

* Initialize validation support for Quantization (cont.)

* Improve validation support for Quantization

* Improve validation support for Quantization

* Rewrite/Refine for calibration and validation

* Rewrite/Refine for calibration and validation (cont.)

* Refine code

* Refine code

* Add data reader for BERT

* Add flatbuffers to serialize calibration table

* Refine code and add BERT evaluation

* Refine the code

* minor modification

* Add preprocess/postprocess of vision team yolov3 and refine the code

* Update annotation

* Make bbox cooridates more accurate

* Fix bug

* Add support of batch processing

* Batch processing for model zoo yolov3

* Add batch inference for evaluation

* Refine the code

* Add README

* Add comments

* Refine the code for PR

* Remove batch support checking in data_reader and refine the code

* Refine the code for PR

* Refine the code for PR review

Co-authored-by: Olivia Jain <oljain@microsoft.com>

* Implement ScatterND for CUDA EP (#6184)

* Condition fix in Resize operator (#6193)

* Clean up checkpoint tests to use the new checkpoint functions (#6188)

* add deprecation warning for old checkpoint functions

* update all the distributed checkpoint tests to use new checkpoint functions

* Implement comparing outputs that are sequence of maps of strings to floats (#6180)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Dockerfile to build onnxruntime with ROCm 4.0

* Add ability to skip GPU tests based on GPU adapter name (#6198)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Add ability to skip gpu tests according to adapter description

* spacing

* spacing

* spacing

* Openvino ep 2021.2 (#6196)

* Enabling fasterrcnn variant and vehicle detector

* changes for 2021_2 branch

* yolov3_pytorch commit

* fixed braces in basic_backend.cc

* ci information added

* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2

* some changes to support unit tests

* disable some tests which are failing

* fix myriad tests for vehicle detector

* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* yolov3 pytorch workaround to ensure that the output names are matched

* gemmoptest fixed on myriad

* Fixed MYRIADX CPP Test Failures

*Expand,GatherND,Range,Round op's
are only supported in model

*where op with float input data
types are not supported and fixed

*Scatter and ScatterElements op's with
negative axis are fixed

*Reshape op with 0 dim value are not
supported and fixed

*Disabled InstanceNorm_2 test on MYRIADX

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* make changes to yolov3 pytorch

* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes POW op failures on GPU_FP16

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Clean up capability_2021_2.cc

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fixed slice and removed extra prints

* Disabled failing python tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes added in capabilty_2021_2

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* made changes to slice to avoid failures

* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for Inferencing a FP16 Model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fix for mask rcnn

* Script for installing openvino from source

* Updated with openvino 2021.2 online installation

* code comment fixes
fixed accuracy mismatch for div

* Update OpenvinoEP-ExecutionProvider.md

updated for 2021.2 branch

* Update README.md

updated dockerfile documentation

* Update BUILD.md

build.md update documentation

* permissiong change of install_openvino.sh

* made changes to align with microsoft onnxruntime changes

* Updated with ov 2021.2.200

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>

* Fix a memory leak in test_inference.cc (#6201)

* Fix a memory leak in test_inference.cc

* Use TArray in AMD element-wise kernels, rather than manually copying memory to device.

* Remove most ROCm-specific element-wise code and reuse CUDA element-wise code.

* Minor change to improve performance for operator Pad. (#5537)

* small improvment for pad

* Support double for operators Log, Reciprocal, Sum (CPU) (#6032)

* Support double for operators Log, Reciprocal, Sum
* remove tesdt erf_double

* Support double for operators Where, LpNormalisation (#6034)

* Support double for operators Relu, Tanh, Sigmoid (#6221)

* Fix ImportError in build.py (#6231)

There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already

* Removed executor todo that looks dead. (#6234)

* Remove MKLML/openblas/jemalloc build config (#6212)

* Remove python 3.5

* Update the readme file

* Upgrade build.py to assert for python 3.6+

Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.

* Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233)

* MLAS: handle MlasGemm(M/N/K==0) cases (#6238)

* Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220)

* Support double for operator TopK
* add static classes for topk/double
* fix cast issue in topk

* Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)

* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm

* Support double for operator ReduceMean, ReduceLogSumExp (#6217)

* Support double for operators ReduceMean, ReduceLogSumExp

* Support double for operator ArgMin (#6222)

* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt

* Update BUILD.md

* Update manylinux docker image to the latest (#6242)

* Fix allocator issue for TensorRT IOBinding (#6240)

* Fix issue: https://github.com/microsoft/onnxruntime/issues/6094

Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt.

Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT

* Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239)

* bias gelu grad use exp(...) instead

* update cuda to rocm

* missing semicolon

* comment

* remove dockerfile

* missing factor of two

* Refactor EP Perf Tool  (#6202)

* merge master, keep postprocess status commit

* download float16.py everytime

* using variables to reference eps

* adding ACL EP to ep perf tool

* accuracy with absolute tolerance configurable

* add acl to dict + remove commented line

* Documentation for distributed CI tests pipeline (#6140)

* Remove a debug log in provider_test_utils.cc (#6200)

* Add the Concat Slice Elimination transform, fix constant_folding transform (#5457)

* Add concat slice transform + test

* Cosmetic improvements in concat slice transform

* Remove unrelated file, fix comment, fix constant folding bug

* Add test onnx graph

* fix windows build

* Review comments

* review comment

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252)

* Add MakeStringLite which uses current locale, update macros to use that to generate messages.

* Convert calls to MakeStringLite().

* Liqun/speech model loop to scan (#6070)

Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* model parallel refinement (#6244)

* Megatron Transformation as a seperate step

* remove useless header

* clang formating

* Re-Structure megatron transformer for subsquent changes

* fix  comments

* Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248)

* Fix Linux/Mac error message on input type mismatch (#6256)

* add bfloat16 to gathergrad type constrains (#6267)

Co-authored-by: Cheng Tang <chenta@microsoft.com>

* Fix VS 2017 build break (#6276)

* Deprecate Python global configuration functions [Part 2] (#6171)

Update Python API to allow more flexibility for setting providers and provider options.

The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.

Update some EP-specific option parsing to fail on unknown options.

Other clean up.

* Add script to preprocess python documentation before publishing (#6129)

* add script to preprocessing python documentation before publishing

* rename past to past_key_values for GPT-2 (#6269)

rename past to past_key_values for transformers 4.*

* Rename MakeString and ParseString functions. (#6272)

Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, *ParseString to *ParseStringWithClassicLocale.
Add missing pass-through versions of MakeStringWithClassicLocale for string types.

* Increase timeout for Linux GPU CUDA11 build. (#6280)

* Add helper to compare model with different precision (#6270)

* add parity_check_helper.py

* add real example

* remove lines

* Fix Min/Max CPU kernels for float16 type (#6205)

* fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)

 fix io binding crash for past_sequence_length=0

* A list of changes in transformers tool (#6224)

* longformer fp16 e2e

* add fp16/fp32 parity check helper file

* excludes nodes with subgraph in profiling

* use onnxconverter_common to do fp32->fp16

* add version check for onnxconverter_common

* remove helper file

* add pkg installation on notebooks and script

* Workaround for static_cast<double>(half)

* Add workaround to remove ROCm-specific binary-elementwise files.

* Update nuget build (#6297)

1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.

* Enable ONNX backend test of SequenceProto input/output  (#6043)

* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name

* add --sequence_lengths option (#6285)

* more dtype for Equal CUDA kernel (#6288)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Force reinstall onnx python package on Windows (#6309)

* update transformers required package versions (#6315)

* Remove abs in LpPool (#6303)

* Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)

* Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.

* Add longformer to  python package (#6314)

* add longformer to python package
* move test related script and data to a new folder

* Avoid false sharing on thread pool data structures (#6298)

Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops.

Motivation and Context
MobileNet on a 2*12-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread).

The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 2*14-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine.

Before change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232

After change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317

* fix opset imports for function body  (#6287)

* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix

* Remove false positive prefast warning from threadpool (#6324)

* Java: add Semmle to Java publishing pipelines (#6326)

Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.

* Quantization support for split operator with its NHWC support (#6107)

* Make split working for quantization.

* NHWC transformer support for split operator

* Refactor some according to Feedback. Will add test cases soon.

* Fix build error on windows.

* Add test case for split op on uint8_t support

* Add nhwc_transformer_test for split uint8_t support

* Some change according to PR feedbacks.

* Liqun/enable pipeline parallel test (#6331)

enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340)

This removes a special case of the cuda EP.

* MLAS: add fallback implementation for quantized GEMM (#6335)

Add a non-vectorized version of the kernel used for the quantized version of MlasGemm.

* Delete float16.py (#6336)

No longer needed. Also doesn't pass policheck.

* Enable add + softmax fusion for Rocm platform (#6259)

* add bias softmax; tests appear to pass

* check fusion occurs for rocm as well

* check for rocm provider compatible as well

* build for cpu scenario as well

* try again; broader cope

* proper scope on kGpuExecutionProvider

* been editing wrong file

* remove commented #include lines

* try again due to mac os ci error

* try again

* test fusion both cuda and rocm to avoid mac ci error

* add external data support to tensor proto utils (#6257)

* update unpack tensor utilities to support loading external data

* more updates

* fix test

* fix nuphar build

* minor build fix

* add tests

* fix Android CI

* fix warning

* fix DML build failure and some warnings

* more updates

* more updates

* plus few updates

* plus some refactoring

* changes per review

* plus some change

* remove temp code

* plus updates to safeint usage

* build fix

* fix for safeint

* changed wording. (#6337)

* Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321)

* remove gemmlowp submodule (#6341)

* [NNAPI] Add pow support (#6310)

* Add support for running Android emulator from build.py on Windows. (#6317)

* fix the pipeline failure (#6346)

* Train BERT Using BFloat16 on A100 (#6090)

* traing bert using bf16

* Adam support bf16

* bugfix

* add fusedmatmul support

* fix after merge from master.

* bugfix

* bugfix after merge from master

* fast reduction for bf16.

* resolve comments

* fix win build

* bugfix

* change header file.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Fix DerefNullPtr issues raised by SDLNativeRules. (#6348)

* update quantize to support basic optimization and e2e example for image classification (#6313)

update the resnet50-v1 to standard one from onnx zoo.
add an example for mobilenet
run basic optimization before quantization
fix a bug in Clip

* Enable graph save for orttrainer (#6333)

* Enable graph save for orttrainer

* Fix CI

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add PREfast to python packaging pipeline (#6343)

* Add PREfast to python packaging pipeline

* fix longformer benchmark io_binding output_buffers (#6345)

* fix longformer benchmark io_binding output_buffers

* format

* import benchmark_helper from parent directory.

* Use readelf for minimal build binary size checks. (#6338)

* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function

* Java: Set C language warnings to W4 and adjust JNI code (#6347)

Set /W3 for C language and fix up JNI warnings.

* Pipeline Parallel Experimental Python API (#5815)

* Add create session to WinML telemetry to track WinML Usage (#6356)

* Fix one more SDL warning (#6359)

* fix -Wdangling-gsl (#6357)

* Add python example of TensorRT INT8 inference on ResNet model (#6255)

* add trt int8 example on resnet model

* Update e2e_tensorrt_resnet_example.py

* remove keras dependency and update class names

* move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example

* simplify e2e_tensorrt_resnet_example.py

* Update preprocessing.py

* merge tensorrt_calibrate

* Update calibrate.py

* Update calibrate.py

* generalize calibrate

* Update calibrate.py

* fix issues

* fix formating

* remove augment_all

* This added telemetry isn't needed (#6363)

* Wezuo/memory analysis (#5658)

* merged alloc_plan

* pass compilation

* Start running, incorrect allocation memory info

* add in comments

* fix a bug of recording pattern too early.

* debugging lifetime

* fix lifetime

* passed mnist

* in process of visualization

* Add code to generate chrome trace for allocations.

* in process of collecting fragmentation

* before rebuild

* passed mnist

* passed bert tiny

* fix the inplace reuse

* fix the exception of weight in pinned memory

* add guards to ensure the tensor is in AllocPlan

* add customized profiling

* debugging

* debugging

* fix the reuse of differnt location type

* add rank

* add the rank

* add fragmentation

* add time_step_trace

* Add summary for each execution step (total bytes, used/free bytes).

* add top k

* change type of top k parameter

* remove prints

* change heap to set{

* add the name pattern

* add the useage for pattern

* add partition

* change to static class

* add custom group

* remove const

* update memory_info

* in process of adding it as runtime config

* change the memory profiling to be an argument

* add some comments

* add checks to recored meomry_info in traaining session

* set the "local rank setting" to correct argument.

* addressing comments

* format adjustment

* formatting

* remove alloc_interval

* update memory_info.cc to skip session when there is no tensor for a particular memory type

* fix memory_info multiple iteration seg-fault

* consolidate mainz changes

* fixed some minor errors

* guard by ORT_MINIMAL_BUILD

* add ORT_MEMORY_PROFILE flag

* added compiler flag to turn on/off memory profiling related code

* clean up the code regarding comments

* add comments

* revoke the onnx version

* clean up the code to match master

* clean up the code to match master

* clean up the code to match master

Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>

* Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)

* Add CumSum-14 for Cuda

* fix convert_common version retrival (#6382)

* Refine auto_pad based pad computation in ConvTranspose (#6305)

* Fix SDL warning (#6390)

* Add max_norm for gradient clipping. (#6289)

* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests

* Add the custom op project information (#6334)

* Dont use default string marshalling in C# (#6219)

* Fix Windows x86 compiler warnings in the optimizers project  (#6377)

* [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376)

* Unblock Android CI code coverage failure (#6393)

* fix build on cuda11 (#6394)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Load the model path correctly (#6369)

* Fix some compile warnings (#6316)

* OpenVino docker file changes to bypass privileged mode

Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device.

Motivation and Context

This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments.

* Megatron checkpointing (#6293)

* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:

* Fix generate_submodule_cgmanifest.py Windows issues. (#6404)

* Continue memory planning when unknown shape tensor is encountered. (#6413)

* Reintroduce experimental api changes and fix remote build break (#6385)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Add support for custom ops to minimal build. (#6228)

* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.

* enable pipeline to run quantization tests (#6416)

* enable pipeline to run quantization tests
setup test pipeline for quantization

* Minor cmake change (#6431)

* Liqun/liqun/enable pipeline parallel test2 (#6399)

* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Farewell TrainableDropout (#5793)

* Deprecate TrainableDropout kernel.

* Update bert_toy_postprocessed.onnx to opset 12.

* Add more dropout tests.

* Fix BiasDropout kernel.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>

* fix null dereference warning (#6437)

* Expose graph ModelPath to TensorRT shared library (#6353)

* Update graph_viewer.cc

* Update tensorrt_execution_provider.cc

* Update graph_viewer.h

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update provider_api.h

* Update provider_bridge_ort.cc

* Update provider_interfaces.h

* Update provider_interfaces.h

* expose GraphViewer ModelPath API to TRT shared lib

* add modelpath to compile

* update

* add model_path to onnx tensorrt parser

* use GenerateMetaDefId to generate unique TRT kernel name

* use GenerateMetaDefId to generate unique TRT engine name

* fix issue

* Update tensorrt_execution_provider.cc

* remove GetVecHash

* Update tensorrt_execution_provider.h

* convert wchar_t to char for tensorrt parser

* update tensorrt parser to include latest changes

* fix issues

* Update tensorrt_execution_provider.cc

* merge trt parser latest change

* add PROVIDER_DISALLOW_ALL(Path)

* add tool for generating test data for longformer (#6415)

* only build experimental api in redist (#6465)

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Add an option to save the training graph after optimization (#6410)

* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`

* Share allocator between CUDA EP & TRT EP. (#6332)

* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs

* fix max norm clipping test in python packaging pipeline test (#6468)

* fix python packaging pipeline

* make clip norm test compatabile with both V100 and M60 GPUs

* Initial version of CoreML EP (#6392)

* Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)

* update load library code to have the fullly qualified path

* make it work for syswow32

* git Revert "make it work for syswow32"

This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* dequantize 1st input of lstm back if it is quantized (#6444)

* [java] Adds support for OrtEnvironment thread pools (#6406)

* Updates for Gradle 7.

* Adding support for OrtThreadingOptions into the Java API.

* Fixing a typo in the JNI code.

* Adding a test for the environment's thread pool.

* Fix cuda test, add comment to failure.

* Updating build.gradle

* fix SDL native rule warning #6246 (#6461)

* fix SDL rule (#6464)

* use tickcount64 (#6447)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Update pypi package metadata (#6354)

* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket

* Delete nuget extra configs (#6477)

* Op kernel type reduction infrastructure. (#6466)

Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.

* Fixing a leak in OnnxSequences with String keys or values. (#6473)

* Increase the distributes tests pipeline timeout to 120 minutes (#6479)

* [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)

* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.

* Add ability to track per operator types in reduced build config. (#6428)

* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs

* merge e2e with distributed pipeline (#6443)

merge e2e with distributed pipeline

* Fix test breaks in Windows ingestion pipeline (#6476)

* fix various build breaks with Windows build

* fix runtime errors loading libraries from system32

* add build_inbox check to winml_test_common

* use raw string

* cleanup

* fix dll load

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Speed up the Mac CI runs (#6483)

* expose learningmodelpixelrange property (#5877)

* Fix of support api version bug for [de]quantize (#6492)

* SDL fixes: add proper casts/format specifiers (#6446)

* SDL annotation fixes (#6448)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)

* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py

* Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)

Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.

* Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)

Description: This PR makes two changes identified while looking at a PGAN model.

First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.

Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.

Motivation and Context

Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.

* Update document of transformer optimization (#6487)

* nuphar test to avoid test data download to improve passing rate (#6467)

nuphar test to avoid test data download to improve passing rate

* Fuse cuda conv with activation (#6351)

* optimize cuda conv by fused activation

* remove needless print out

* exclude test from cpu

* handle status error from cudnn 8.x

* add reference to base class

* add hipify

* [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)

* Init change

* Move some helper from nnapi ep to shared

* Add transpose support

* Fix trt ci build break

* Refine transformers profiler output (#6502)

* output nodes in the original order; grouped by node name
* add document for profiler

* Update to match new test setup. (#6496)

* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.

* Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)

* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case

* Optimize GatherGrad for AMD GPU (#6381)

* optimize gathergrad

* address comments

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* add explicit barriers for buffer overread and overrwrite (#6484)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* fix sdl bugs for uninitialized variables and returns (#6450)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* handle hr error conditions (#6449)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Dnnl training (#6045)

* Add ReluGrad and ConvGrad ops for the dnnl provider

* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.

* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.

* Enable specifying the execution provider for Gradient Checker Tests

* Prevent memory leak when running dnnl_provider in training mode

Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.

The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.

Signed-off-by: George Nash <george.nash@intel.com>

* Added fixes to maxpoolgrad and memory leak.

Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Fixed misc issues when testing training code with dnnl provider

* fix conv_grad dnnl tests with dilation to run dnnl execution provider

* update mnist training sample to accept convolution type models

  convolution models require the input shape to be {1, 28, 28}
  instead of the flat {728} image that is used for the gemm models

  this will enable models that require the different shape by adding
 `--model_type conv` to the command line when running the mnist sample.
 (while testing a workaround was used see #4762)

* Disable weight caching in dnnl conv operator when using training

  When training we can not use cached weights because the weight
  will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
  The weight caching was the source of the error from Conv when training.

* Fix issues found when building grad ops on Linux
  * The dnnl_convgrad code was over using the scope operator
    causing a compilation problem.
  * The dnnl_maxpoolgrad code had a logic error that is was
    comparing with the source description when it should have
    been comparing with the destination despription.

* Update BUILD.md so it shows DNNL for training
  * Updated the table of contents. Since the same providers
    are listed twice. Once for Infrance and again for Training
    an HTML anchor was added to distinguish the second header
    from the first for the TOC.

* Fix build failure when not using --enable-training build option

* reorganize the gradient operators so they are grouped together

* Fix issues found when running onnx_backend_test_series.py

* Pooling code only supports 2 outputs when built with --enable-training

* Address code review feedback
  * class member variables end in underscore_
  * use dst instead of dist to match pattern use elsewhere in DNNL code.

* Remove workaround that was introduced to handle problems running
  convolution based training models. See issue #4762

Signed-off-by: George Nash <george.nash@intel.com>

* Isolate training code and code cleanup

* Do not build if dnnl_gpu_runtime if enable_training is set training code
  does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
  project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
  pointer. This matches what was done in Pool code and is safer memory code.

Signed-off-by: George Nash <george.nash@intel.com>

* Address code review issues

- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.

Signed-off-by: George Nash <george.nash@intel.com>

* Code changes to address code review

- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
  and `ComputeTheoreticalJacobianTranspose()` instead of using
  a pointer to an instance varaible [gradient_checker.h/.cc]

Signed-off-by: George Nash <george.nash@intel.com>

* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com

* Code clean up and formating updates

 - Removed empty else statment
 - updated indentation of code that was causing double curly brackets to look unususal
 - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
 - isolated training code

Signed-off-by: George Nash <george.nash@intel.com>

* Restore inadvertantly removed ConvGrad tests

When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded.  This adds
back the Conv3d and Conv3d with strides test cases.

Signed-off-by: George Nash <george.nash@intel.com>

* Add validation to ConvGrad

This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.

The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes

The added validation will return an error if this assumption is not true.

Signed-off-by: George Nash <george.nash@intel.com>

* Do not create new execution providers in provider_test_utils

This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.

Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.

When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run.  However, it is no
longer causing problems even with the code removed.

Signed-off-by: George Nash <george.nash@intel.com>

* Change the forward conv stack to a forward conv map

This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.

The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators.  This was always problematic and was unlikely to
work for inception models.

Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
  possible to use the weight_name as a lookup key to find the
  Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
  `std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
  to and reading from the fwd_conv_kernel_map_ as well as the
  fwd_kernel_map_. These should always be accessed by a single
  thread when preparing the dnnl subgraphs so the guard should not
  be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
  stack. The error check is not removed. It will be good to verify
  there are no errors as we continue to test against more models.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>

* Lochi/refactor yolov3 quantization (#6290)

* Refactor the code and move data reader, preprocessing, evaluation to
E2E_example_mode

* Refactor the code.

Move data reader, preprocessing, evaluation to model specific example
under E2E_example_mode

* refactor code

* Move yolov3 example to specific folder and add additional pre/post
processing

* Print a warning message for using newer c_api header on old binary (#6507)

* Fix issues with ArmNN build setup (#6495)

* ArmNN build fixes
* Update BUILD.md to document that the ACL paths must be specified to build ArmNN
* Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that.

* Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518)

* Update onnxruntime_test_python.py to work with numpy 1.20.

Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that.

* Fix another test script

* Fix ORTModule branch for orttraining-* pipelines

* Update pytorch nightly version dependency

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com>
Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com>
Co-authored-by: George Nash <george.nash@intel.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Juliana Franco <jufranc@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Tixxx <tix@microsoft.com>
Co-authored-by: Jay Rodge <jayrodge@live.com>
Co-authored-by: Du Li <duli1@microsoft.com>
Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Olivia Jain <oljain@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Ryan Lai <rylai@microsoft.com>
Co-authored-by: Jesse Benson <jesseb@microsoft.com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin@vols.utk.edu>
Co-authored-by: Michael Giba <michaelgiba@gmail.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: Tang, Cheng <souptc@gmail.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Co-authored-by: Vincent Wang <wangwchpku@outlook.com>
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Luyao Ren <375833274@qq.com>
Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com>
Co-authored-by: Tim Harris <tiharr@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com>
Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Martin Man <supermt@gmail.com>
Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Ryota Tomioka <ryoto@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: Yulong Wang <f.s@qq.com>
Co-authored-by: Faith Xu <faxu@microsoft.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
2021-02-02 08:59:56 -08:00
Scott McKay
c84bb9df9f
Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs
2021-01-29 07:59:51 +10:00
Hariharan Seshadri
d7bdd96425
Refine auto_pad based pad computation in ConvTranspose (#6305) 2021-01-19 19:01:49 -08:00
Guoyu Wang
e35db194e3
fix the pipeline failure (#6346) 2021-01-14 00:33:22 -08:00
baijumeswani
0586c610b2
Add ORTModule BERT classifier to CI the pipeline (#6330) 2021-01-13 12:34:04 -08:00
baijumeswani
9b7510d88c
Add ORTModule distributed CI pipeline (#6278)
* Add ortmodule distributed ci pipeline
2021-01-13 11:24:01 -08:00
Ashwini Khade
0ed56d491a
fix opset imports for function body (#6287)
* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix
2021-01-12 13:44:36 -08:00
Chun-Wei Chen
84024bdfa9
Enable ONNX backend test of SequenceProto input/output (#6043)
* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name
2021-01-11 11:30:33 -08:00
liqunfu
cde723a136
Liqun/move nightly pl to linux multi gpu v100 (#6024)
* move e2e nightly pipeline to azure devop
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-12-14 12:43:41 -08:00
Edward Chen
d8139814fd
Clean up builds (#6015)
Update training Python packaging build to use get_docker_image.py.
Remove BUILD_EXTR_PAR docker build argument.
Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.
2020-12-04 15:13:17 -08:00
Edward Chen
6d642a3dba
Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906) 2020-12-01 19:10:23 -08:00
Changming Sun
2d9dcc4576
Add python 3.9 support (#5874)
1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.
2020-11-30 12:02:48 -08:00
Ashwini Khade
705d093167
Update onnx (#5720)
* update onnx

* update docker image for testing
2020-11-24 11:20:15 -08:00
baijumeswani
208f4c1d3c
Azure ci pipeline for distributed environment tests (#5881) 2020-11-23 14:01:00 -08:00
Changming Sun
79350a642a
Update install_deps.sh: remove the unnecessary data generating step (#5758)
We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.
2020-11-10 22:19:03 -08:00
Ashwini Khade
1cca903680
update onnx commit id (#5594)
* update onnx commit id

* update onnx commit for docker images

* update docker images
2020-11-02 09:46:36 -08:00
liqunfu
92662659ba
Liqun/remove number matching (#5606)
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-27 21:27:37 -07:00
Ashwini Khade
df22611026
Update ONNX commit (#5487)
* update ONNX

* update onnx + register kernels for reduction ops

* bug fix kernel reg

* update cgmanifests

* revert unsqueeze op 13 registration

* filter ops which are not implemented yet

* filter some tests

* update onnx commit to include conv transpose bug fix

* update docker images

* undo not required test changes

* fix test failures
2020-10-21 07:22:20 -07:00
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
edgchen1
6d5b93b805
Synchronize training dependency versions between Docker image and Python wheel. (#5261)
Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.
2020-09-23 19:03:42 -07:00
Xueyun Zhu
55e4b5d302
add pipeline distributed training test (#5222)
* add pipeline distributed training test

* fix max line length error in windows build

* function header indent

* fix

* fix flake8 error
2020-09-21 14:35:01 -07:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
Changming Sun
d5d5e37e76
Build system enhancements (#5012)
1. Add a docker file for CUDA11
2. Support setting CUDA_ARCHITECTURES from command line.
2020-09-02 10:13:26 -07:00
Changming Sun
c37fa7c278
Delete Dockerfile.centos6_gpu (#4851) 2020-08-28 09:56:52 -07:00
Rayan-Krishnan
eb05db5a2a
Fix OptimizerConfig params groups (#4877)
* Copy samples to build folder and load models from there. Fix CI
* This PR also includes a fix to path validation for save_as_onnx API
* Add torchtext to CI for GPU training
* Remove new frontend tests from CI

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2020-08-22 22:04:17 -07:00
liqunfu
6260d073b3
Glue parallel training (#4550)
add mpi size, rank python API

add single node parallel training example
2020-08-21 21:24:27 -07:00
suryasidd
3a00b50cf8
[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836)
* Removed building ngraph from source

* Disabled some tests temporarily

* Enabled softmax for all dims

* Added onnx importer to link libraries

* int64 changes

* fixed

* temp

* slice update start and end need to be initializer

* Disabled GatherND, ScatterND, ReverseSequence operators

* Added supported ops instead of unsupported ops

* Set precision only for CPU

* Removed some unecessary conditions

* Fixed segfault in slice

* Softmax restriction removed

* changes

* Setting precision for all plugins

* Changes added to include precision
and supported ops for gpu and vpu

* branch op support

* checking for disabled python test failure

* mapped input names and tensors directly rather than copying which was leading to mismatch

* last index is not supported
mkldnn does not support pow between integers

* included the code changes

* Rename inner-scoped variable to avoid MSVC warning

* applied changed to vadm as well and removed the utility function
getinputtensors() completely

* OpenVINO multi version support: CMake changes

* OpenVINO multi version support: C++ support

* removed commented code

* Remove redundant code lines

* Revert "Rename inner-scoped variable to avoid MSVC warning"

This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.

* vadm changes disabled reduction op test

* putting test_gather_negative_indices in unsupported list for now

* Update MCR Dockerfile with 2020.4

Installs OpenVINO 2020.4 from deb packages via APT tool.

* Update build docs with 2020.4 info

* Update dockerfile with OV 2020.4 info

Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04

* Added constant folding bypass logic

* Added cout statements for ci

* Added NDEBUG flag for debug symbols

* Update Ops info in docs

* fixes multiple unit tests

* mathoptest.ceil disabled for gpu and myriad

* activation test temp disabled

* Fix models for CPU

* Fixed a syntax error

* local cmmit

* fixing unit tests for myriad

* Fixed Variadic Split, Topk issues

* fix_model commit

* Fix models in myriad

* Added ifdefs for OpenVINO 2020.4

* temp

* made some changes to not operator

* Added unused parameter

* relu enabled

* Fixed bug in Conv output

* Consolidated GPU failing tests into one category

* Made it compatible to InternalCI 2020.4

* Made changes for ngraph

* Disabled test for mask,fastercnn,tinyyolov3

* Removed proxy for ci

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* Updated documentation for 2020.4

* Removed FP32 to FP16 transformation for GPU

* Disabled Coreml-FNS-Candy model test

* Added FP16 transformations

Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>
2020-08-19 23:18:08 -07:00
Thiago Crepaldi
42408aa3ed
Add new PytTrch front-end (#4815)
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)

Add ORTTrainerOptions class and some placeholders

* Add _ORTTrainerModelDesc to perform validation for model description (#4416)

* Add Loss Scaler classes to the new frontend (#4306)

* Add TrainStepInfo used on the new frontend API (#4256)

* Add Optimizer classes to the new frontend (#4280)

* Add LRScheduler implementation (#4357)

* Add basic ORTTrainer API (#4435)

This PR presents the public API for ORTTrainer for the short term
development.

It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session

* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)

* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)

* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions

This PR keeps the public API intact, but changes how model description is stored on the backend

Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).

With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.

Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.

This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.

Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None

* Rename input name for test

* Add ONNX Model Export to New Frontend (#4612)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Create training session + minor improvements (#4668)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Save ONNX model in file (#4671)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add eval step (#4674)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add train_step (#4677)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add LR Scheduler (#4694)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add deterministic compute tests (#4716)


Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add Mixed precision/LossScaler + several fixes (#4739)

Additionally to the mixed precision/loss scaler code, this PR includes:

* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests

* Add ONNX BERT example + minor fixes (#4757)

* Fix training issue when passing ONNX file into ORTTrainer

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add Dynamic Shape support (#4758)

* Update DeepSpeed Zero Stage option to a separate option group (#4772)

* Add support to fetches (#4777)

* Add Gradient Accumulation Steps support (#4793)

* Fix Dynamic Axes feature and add unit test (#4795)

* Add frozen weights test (#4807)

* Move new pytorch front-end to 'experimental' namespace (#4814)

* Fix build

Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-17 09:45:25 -07:00
Changming Sun
5eec4f66ed
Refactor manylinux docker image and the related pipelines (#4751)
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
2020-08-17 09:40:31 -07:00
Yulong Wang
aa993e95c9
enable build flag '--use_openmp' on MacOS (#4774)
* enable build flag '--use_openmp' on MacOS

* cmake 3.16.1 to enable find_package(OpenMP) on mac
2020-08-13 15:56:42 -07:00
Changming Sun
01ca6392cb
Avoid building ONNX of every history ONNX versions in our CI (#4678)
1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail.
2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.
2020-08-03 10:18:10 -07:00
Dmitri Smirnov
35ee00d888
Pin typing version. (#4490) 2020-07-13 11:48:30 -07:00
Hariharan Seshadri
26ebcfab88
Fix Nuget GPU pipeline (#4462) 2020-07-10 14:02:28 -07:00
Hariharan Seshadri
6d6b6b54a5
Support binding a graph output to a specific device via the Python binding (#4439) 2020-07-07 21:09:37 -07:00
Changming Sun
deea945f80
Remove openmp and scipy from build pipelines (#4305)
1. Remove openmp because the default thread pool is already good enough.
2. Remove scipy from build pipelines because it stops support python 3.5.
2020-06-23 20:18:16 -07:00
edgchen1
4e39fda06a
Fix version of torch and torchvision in install_deps.sh. (#4316) 2020-06-23 14:55:18 -07:00
liqunfu
ffed43e9b8
handle loss and name marching wrappers (#4066)
* handle loss and name marching wrappers

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-05 23:34:26 -07:00
Dmitri Smirnov
afca0d15ee
Create Java publishing pipeline (#3944)
Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.
2020-06-01 18:18:57 -07:00
liqunfu
6665d5e2bc
Liqun/a transformer example (#3845)
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-05-27 15:21:35 -07:00
Yulong Wang
b3ec8035ee
[Node.js binding] add build flag for node.js binding (#3948) 2020-05-27 13:30:22 -07:00
Ryan Lai
357bffe47c
Fix deprecated CentOS link for Linux CI pipeline (#4000)
* Fix Linux_CI_GPU_Dev

* centos6
2020-05-20 16:14:48 -07:00
Bowen Bao
0a5395bb78
Remove 'model_.' prefix from onnx model initializers in training (#3881)
* Remove 'model_.' prefix for onnx model initializers in training

* fix test case remove redundant device test

* rename

* Fix state_dict/load_state_dict with frozen_weight

* nit

* Add monkey patch for pt opset 10

* remove pt patch in CI

* nit: newline
2020-05-20 10:06:31 -07:00
edgchen1
024b92a970
Use path relative to script location to refer to symbolic_opset10.py from install_deps.sh. (#3975)
Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.
2020-05-18 13:36:06 -07:00
edgchen1
e259a13f8e
Initial training Python packaging pipeline (#3767)
Add a pipeline to produce training-enabled ORT wheels.
2020-05-18 09:41:00 -07:00
Scott McKay
5e0928a777
Enable running PEP8 on python scripts using flake8 (#3928)
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
2020-05-15 07:15:06 +10:00
liqunfu
9b5daa2039
patch torch onnx opset 10 (#3910)
patch pytorch to export onnx nll_loss opset version 10. add mnist test to covert onnx opset version 10.
2020-05-12 18:11:25 -07:00
M. Zeeshan Siddiqui
5e1244eb4d
Update ONNX submodule to ONNX 1.7 release branch. (#3888)
* Update to ONNX submodule to ONNX 1.7 release branch.

* Update to ONNX submodule to ONNX 1.7 release branch.

* fix version.
2020-05-10 15:44:44 -07:00
M. Zeeshan Siddiqui
9b02b3df6f
Update ONNX submodule to ONNX 1.7 release candidate 3. (#3838) 2020-05-06 00:55:19 -07:00
M. Zeeshan Siddiqui
ef4d73e887
Update ONNX submodule to ONNX 1.7 release candidate 2. (#3818)
* Update ONNX submodule to ONNX 1.7 release candidate 2.

* fix build error.

* Update ONNX submodule to latest and disable preview op tests.
2020-05-05 15:08:40 -07:00
M. Zeeshan Siddiqui
517bff9675
Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782)
* Function expansion support, Update ONNX to 1.7 release candidate 1.

* Renable disabled tests.
2020-05-01 10:35:16 -07:00
liqunfu
af3988198c
Liqun/e2e transformer test (#3540)
* initial change to transformer.py

* prepare e2e transformer tests

* refactor transformer tests

* put test python files in a flat folder

* fix typo pip install transform(s)

* python 3.6

* python version to 3.6 in install_ubuntu.sh

* remove argparser

* to use opset ver 12

* workaround loss_scale naming patch in case of loss_fn_

* assign self.loss_fn_ so it can be checked

* skip a few un-needed post-process steps

* fix loss_scale_input_name, clean up post process steps

* skip non-frontend tests

* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* type cast for ratio is not necessary for dropout (#3682)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* thrustallocator is not needed since cub is used directly for gather now. (#3683)

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* GatherND-12 Implementation (#3645)

* Renamed, UT passing

* Move GatherND CUDA Kerenl into onnxruntime

* Merge GatherNDOpTest

* Refactor Test code

* Merge CPU Kernel Impl

* Handle Negative Indice, Fix UT

* Improve CUDA kernel to handle negative index

* Minor Fixes

* Preserve GatherND-1 Cuda kernel

* Fix Mac build

* fix UT

* Fix Build

* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>

* update with reviewers' comments

* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference

* fix merge mistakes

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
2020-04-30 12:26:38 -07:00
Ethan Tao
e9f1e7e797 resolve conflicts 2020-04-24 15:15:36 -07:00
S. Manohar Karlapalem
6d4f2f5bf9
OpenVINO EP v2.0 (#3585)
* Added FP16 transformations

* Revert "Added CMAKE_BUILD_TYPE to make building dynamic"

This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85.

* Added FP16 transformations for FP16 builds

* Backend logic cleanup

Cleans the backend(intel_graph.*) code in the following ways:-

1. Minimize global usage: Since all the IR graphs need to be
re-generated on every Infer, it is bad practice to rely on globals
for their saving and usage as there would be multiple readers and
writers to the same global variable leading to incorrect usages or
contentions. This change replaces globals with locals where possible.
 This change also fixes an existing bug with due to
incorrect global usage.

2. Remove all unused functions.

3. Remove all unused headers and prepocessor directives.

* removed commented out code

* Disabled default optimization for Intel EP

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fix missed plugins.xml for python bindings

* Fixed the build after latest master changes

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled unsupported ops for accelerators

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added some more disabled ops

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added environment variable to enable debugging

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added more debug statements

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsupported ops list for GPU and VPU

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsqueeze unit tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added error message to the status

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Overwrite Model proto with shape info from data

Overwrites the shape info of Model proto with the shape from
actual input data. Needed for inferring models with Dynamic
shapes.

* Removed print statement and disabled where op

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Reshape with Empty initializer

* Added more debug statements for 1P

* Don't allow 1D inputs with symbol for dimension

* Disabled some 3rd phase ops

* Disabled split and added zero dimension check for OutputDefs

* Cleanup zero dimensionality check

* Added different data type check for inputs and initializers

* Added conditions for Mod, Cast and Pad

* Removed unused variable

* Disabled scan and added conditions for squeeze

* Added changes for fixing all C++ unit tests

* Implements Backend Manager class for caching

Backend Manager provides a layer of indirection between EP interface
and OV backend that provides caching services for models with
symbolic dims in input shapes.

* clean up commented blocks

* clang-formatting

* Read I/O type info from ModleProto

Read the tensor element type information from ModelProto object,
as FusedNode is no longer available.

* code cleanup

* clang-formatting

* Added print statement for jenkins

* Disabled some python tests

* Changed the path of convert fp32 to fp16 hpp

* Added conditions for BatchNorm in GetCapability

* Fixed failed tests

* Revert "Added conditions for BatchNorm in GetCapability"

This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90.

* Added Intel to onnxruntime backends

* pick up vars set by OV package setupvars.sh

* Added conditions for Identity

* remove a few cout prints

* Added conditions for GPU_FP32 unit tests

* Revert "pick up vars set by OV package setupvars.sh"

This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b.

* Commented out fatal message for protobuf

* Might need to be removed

* Add interface class for current backend

* moved common logic to base class

* simplified cpu backend

* Removed unused headers

* use vectors to save i/o tensors for windows compatibility

* move utils fxns to backend_utils namespace

* rename ov_backend to ibackend

* Factory pattern for backend creation

* rename CPU backend to Basic backend

* renamed to vad-M and added to factory list

* Added conditions for VPU

* Added print statements

* Changed the logic for checking for symbolic shapes

* Modified logic for zero dimension check

* Removed VPU single dimension condition

* Removed comments

* Modified logic in DimensionCheck method

* Remove legacy OpenVINO EP

Remove all the legacy code for OpenVINO EP. UEP code will take its
place going forward.

This change does NOT remove OVEP files in the following areas asa
they will be reused by UEP:-
1. Documentation: All .md files
2. Docker releated files
3. Python bindings
4. Java bindings
5. C# bindings
6. ORT Server
7. CI pipeline setup files

* Rename Intel EP to OpenVINO EP

* Added unique names to the subgraphs

* Removed subgraphs with only constant inputs

* Modified subgraph partitioning algorithm to remove const input subgraphs

* Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc

* Tracking output names to fix the output order bug

* Changed output names to a unordered map

* Modified logic to check for symbolic input shapes

* Fixed a bug in Reshape check

* Added empty model path to Model constructor

* Made necessary changes to cmake to build from the binary package

* Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR

* Enable dyn device selection with C++ API

* Added Round operator to unsupported list

* Modified subgraph partition logic for MYRIAD

* Removed supported ops from the list

* Enable dyn dev selection in Py API's

* Add documentation for dynamic device selection

* Use MYRIAD || HDDL instead of VPU

* Removed temporary cast of Int64 to FP32

* Disabled unit Tests for CPU_FP32 and GPU_FP32

* Removed default "CPU" from unit tests to allow overriding

* Removed ops Concat, Squeeze, Unsqueeze from unsupported list

* Get the device id from info

* Removed overwriting device_id and precision

* Enabled ConvTranspose and EyeLike

* Reordered unsupported ops in alphabetical order

* Fixed syntax error

* Fixed syntax error

* Code clean-up: Handle exceptions, logs and formatting

Code formatted according to ORT coding guidelines.

* remove debug print from pybind code

* updated docs with ops and models

* formatting prints

* Added default values for c and j for openvino

* Overriding the values set for c and j to be 1
* BACKEND_OPENVINO should be empty if openvino is not in build

* Overriding c value with default for perftest

* fix VAD-M device string bug

* Add IE error details to exceptions

* Use IE specific device names in EP

* Add VAD-F (FPGA) device support

* Removed unecessary libraries from whl package

* Code changes for Windows compatibility

* Add VAD-F option to python API

* [revert before merge] cmake changes for RC

* Enable Windows build in CMake

* Unset macro OPTIONAL for windows builds

inference_engine.hpp's include chain defines a macro 'OPTIONAL'
which conflicts with onnx project's headers when using MSVC. So
would need to explictly unset it for MSVC.

* Use a single copy of plugin/IE::Core

Defined as a static member in Backend manager

* Remove restriction of single subgraphs for  myriad

* Passed subgraph name to Backend to enhance log statements

* Disabled zero dimension conditions

* Disabled concat to remove zero dims

* Enabled building ngraph as part of ORT

* Removed serializing and added versioning

* Fix CPU_FP32 unit tests

* Removed unecessary condition

* add ngraph.so.0.0 to .whl

* Check for zero dimensions only for inputs and outputs

* Restrict loading only 10 subgraphs on myriad

* Build ngraph.dll within UEP. Doesn't link yet

* Rename Linux included libngraph.so to libovep_ngraph.so

Renames locally built libngraph.so containing ONNX importer to
libovep_ngraph.so in order to avoid linkage conflicts with
libngraph.so supplied by OpenVINO binary installer.
Applies only for Linux builds.

* use output_name cmake properties for lib name

* fix .so name format in lib_name.patch

* CMake code cleanup

* Rename WIN32 included ngraph.dll to ovep_ngraph.dll

To avoid conflict with ngraph.dll distributed by openvino.

* Added myriad config for networks without 4 dimensions

* Loading the 10 max clusters for inference on myriad

* Refactor code and add Batching support

Encapsulate subgraph settings into context structs.

Add batching support for completely supported models.

* Disabled some broken tests

* use input_indexes to avoid batch-checking initializers

* Avoid static initialization order error on WOS

* Added candy to broken tests

* InternalCI changes for 2020.2

* Updated DLDT instructions

* Unsaved changed in install_openvino.sh

* Changes after manual check

* Remove custom ngraph onnx_import build for WOS

ONNX Importer on WOS does not have protobuf issue.

* Remove FP32ToFP16 ngraph pass

This conversion is performed implicitly within IE.

* Surround debug logic by #ifndef NDEBUG

* remove invalid TODO comments

* removed references to ngrpah-ep

* clang-formatting

* remove commented code

* comment edits

* updating copyright year to that of first OpenVINO-EP release

* remove redundant log msg

* Modified operator and topology support

* Update build instructions

* doc formatting

* Fixed clip unit tests

* Revert "Remove FP32ToFP16 ngraph pass"

This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1.

* Applying FP16 transformation only for GPU FP16

* Fixed GPU FP32 python tests

* automatically use full protobuf

* disable onnxrt server for now

* Disabled upsample

* update dockerfile instructions

* Removed MO paths and added ngraph path

* Remove OVEP from ORT Server docs

Will put it back in after validation

* Updated path to Ngraph lib

* Disabled Resize and some other python tests

* Removed unnecesary header files

* Use commit SHA to fetch ngraph repo

* Avoid un-needed file changes due to version update

* Fixed clip tests

* Fixed Pow, max and min onnx tests

* build.md doc typo

* Update cmake patch command for ngraph src

* remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY

* use spaces instead of tab

* remove commented code

* Add info about protobuf version

* edit debug env var and enable for WIN32

* specify only version tag of 2020.2 for dockerbuilds

* remove unnecessary file changes

* Pass empty string as default argument to C# tests

* Use ${OPENVINO_VERSION} to name openvino install directory in CI builds

* Enabled unnecessarily disabled tests

* Fixed ngraph protobuf patch

* Fixed error in protobuf patch

* Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds"

This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002.

* Remove unsetting OPTIONAL macro

This is no longer used in recent ONNX update onnx/onnx@da13be2,
so this unset workaround is no longer necessary.

* Use a null string  default argument for C# API

* Set OpenVINO version yml files and pass to CI Docker builds

Git Tag info for DLDT as well as install directory are set
using this value.

This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5.

* Documentation: recommendation and instructions for disabling ORT graph optimizations

* more doc updates

* Reduced the number of models according to CI time constraints

Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com>
Co-authored-by: mbencer <mateusz.bencer@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
2020-04-24 04:06:02 -07:00
Edward Chen
deac467683 Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-23 20:50:33 +00:00
Changming Sun
00917917d6
Downgrade numpy requirement to 1.16.6 (#3635) 2020-04-22 16:11:33 -07:00
liqunfu
781e1c36be
Add front-end MNIST test (#3231)
* add frontend minst test

* to use torch nightly with torchvision

* remove incorrect comment per reviewer's comment

* experiment torchvision import failure

* experiment install_deps.sh

* more experiment install_deps.sh

* experiment install_deps.sh with --upgrade

* Experiment with install_deps.sh.

* Experiment with install_ubuntu.sh.

* Use Ubuntu 18.04 and Python 3.6 for CI.

* Update cmake version for CI.

* Install MPI on Ubuntu 18.04 for CI.

* Increase tolerance for MNIST test.

* Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa.

* Clean-up.

* Update ort_trainer.py from ort_training.

* Get default Ubuntu Python ver back to 3.5.

* Add underscore to opset_version parameter name in ORTTrainer constructor.

* Move loss/model wrap before the call for sample output.

* Update expected values for MNIST test.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
2020-04-20 11:19:31 -07:00
Sergii Dymchenko
6ba7c99e50 Merge branch 'master' into ort_training 2020-04-09 12:42:04 -07:00
Changming Sun
33006f48c0
Update onnx submodule to 1.7.0 release candidate (#3405)
Update onnx submodule to 1.7.0 release candidate.  This isn't a release tag,  but it will be released soon, in 1-2 weeks.
2020-04-04 16:23:42 -07:00
Xueyun Zhu
ccc3535e72 resolve conflict 2020-03-20 20:20:35 +00:00
Changming Sun
0fceb33288
Fix onnxruntime server docker file build failure (#3219)
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
2020-03-15 14:46:46 -07:00
Zeeshan Siddiqui
2cad08bd60 Merged PR 5688: Upgrade ONNX submodule to the latest from github ONNX master.
We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec!

- Reverse integrate changes from *.in.proto files in github ONNX repo.
- Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs
- Disable ONNX tests that don't have op implementation for the latest opset.
2020-03-12 16:51:45 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00
Changming Sun
179603775f
Use CUDA 10.1 for Linux build (#3057)
Use CUDA 10.1 for Linux build
(Windows change is already in)

Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits.

libcublas10-10.1.0.105 won't work!!!

The cuda docker image by viswamy is already using 10.1, no need to change.
2020-02-21 11:55:32 -08:00
James Yuzawa
411b3aa801
Java build system enhancements (#2866) 2020-02-18 15:41:49 -08:00
Changming Sun
382fa86af8
Pipeline changes for python 3.8 (#2753)
1. Pipeline changes for python 3.8
2. Fix a regression in setup.py which was just introduced in the previous commit.

Please notice, we still haven't made python 3.8 + Windows + CUDA work.
2020-01-02 15:25:25 -08:00
Changming Sun
fd334aff44
Update numpy to 1.18 (#2758)
* Update numpy to 1.18
2019-12-30 14:51:01 -08:00
Changming Sun
c7a9c6b488
Split onnxruntime server to a separated folder (#2744) 2019-12-27 11:21:23 -08:00
Hector Li
47503ec7a6
Initiate the build scripts for ARM ACL (#2652)
1. Add scripts to build Yocto image & toolchain
2. Update docker build scripts to support Onnxruntime build with ARM ACL 19.02/19.05
2019-12-16 09:44:19 -08:00
Ashwini Khade
281933fa1c
Fix C API tests for centos and mac (#2544)
* change c++14 to c++11

* add ld lib path for centos

* enable csharp tests on macos

* fix C API test on MacOS + fix manylinux dotnet install

* fix manylinux dotnet install

* fix lib link
2019-12-04 18:01:35 -08:00
shahasad
178d059111 Setup java ci (#2528) 2019-12-03 14:21:51 -08:00
Ashwini Khade
e32eff826c
enable nuget package testing on centos7 (#2527)
* add centos tests to linux cpu ci pipeline

* Disable failing test

* use centos6 instead of centos7

* change back to centos7

* add dotnet runtime dependency

* fix dotnet runtime dependencies

* install dotnet sdk instead of runtimes

* add more dotnet dependencies

* temporary skip failing test

* ix lib path

* reenable failing test
2019-12-03 10:16:45 -08:00
Patrick Foley
151075790d [OpenVINO-EP] Update to latest version: OpenVINO 2019 R3.1 (#2308)
* Updates OpenVINO EP to latest version: 2019 R3.1

* Reviews fixed

* Update Dockerfile.openvino

* Addressed PR comments and disabled model tests temporarily

* Update Dockerfile.ubuntu_openvino
2019-11-05 19:55:46 -08:00
Changming Sun
4b62241c77
Update ONNX to 1.6.1 (#2235) 2019-10-23 13:47:45 -07:00
Changming Sun
cff7879d89
Update C API pipeline to use CentOS 6 (#2198) 2019-10-19 22:25:42 -07:00
Dmitri Smirnov
acec4b446f Make CentOS 6 CUDA build and run (#2159)
* Add manylinux1 source code changes

* Disable a python test
2019-10-19 15:33:31 -07:00
Changming Sun
021073b5e5
Update python packaging pipelines (#2167) 2019-10-19 07:42:54 -07:00
Changming Sun
5558b80774
clean up ubuntu docker scripts (#2103) 2019-10-14 07:20:20 -07:00
Changming Sun
a314402097
Downgrade python gpu package to CUDA 10.0 (#2086) 2019-10-10 18:31:24 -07:00
Changming Sun
a00ca56ae1
Remove gcc from manylinux1 docker image (#2048) 2019-10-08 13:49:15 -07:00
Changming Sun
e9bed8b23b
Change python packaging pipeline to use manylinux1 (#2035)
1. Change the python packaing pipeline to use manylinux1
2. Temporarily disable model test in the python pipeline.
2019-10-08 10:03:54 -07:00
stevenlix
544e53e24e Update TensorRT to version 6.0.1.5 (#1966)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Remove redundency

* Fix issue that it does not add memcopy node correctly if some nodes fall back to CUDA EP.
e.g. after partition, there's TRT_Node -> Cuda_node (with CPU memory expected), we still need to add memcpy node between them.

* update for Trt Windows build

* Update onnxruntime_providers.cmake

* Disable opset11 tests on TensorRT

* Update pad_test.cc

* Update build.py

* update scripts for ubuntu18.04

* Disable warning for Windows build
2019-10-06 10:40:53 -07:00
Hariharan Seshadri
f528da35f2
Update ONNX to a newer commit (#2015)
* Update ONNX to a newer version

* PR comments
2019-10-04 19:41:00 -07:00
Changming Sun
c86d17754a
Dockerfile for CentOS CI build (#1986) 2019-10-03 11:46:27 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
baowenlei
9fc5598b7e
update nuphar ci llvm version and uncomment unit tests (#1954) 2019-09-29 23:35:14 -07:00
Pranav Sharma
052339d9dc
Fix python packaging pipeline (#1922)
* Mention OrtCreateSessionFromArray in C API doc

* Fix python packaging pipeline broken by this commit id dc03ce.
2019-09-25 14:55:17 -07:00
Hariharan Seshadri
dbff8272e7
Update ONNX to newer commit (#1907) 2019-09-24 19:25:34 -07:00
Hariharan Seshadri
aacfa2af65
Bump up ONNX to the latest commit (#1868)
* Initial commit

* Delete unnecessary files

* Update generated proto files

* Update server proto file

* Update submodule onnx

* Update OnnxMl.cs

* update OnnxMl.cs

* Update OnnxMl.cs

* Comment one test

* Update disabled test list

* Update backend tests

* Formatting fix

* Formatting

* Disable a test

* More tests updated

* commit id update

* Update to a newer commit

* More updates

* More test updates

* Update

* Update

* Updates

* Update
2019-09-20 18:15:16 -07:00
Hector Li
582a27f546
remove sudo from the cleanup step for Linux so that we don't need the sudo access for vstsagent build user
1. remove sudo from the cleanup step for Linux so that we don't need the sudo access for vstsagent build user
2. a minor fix in the install_ubuntu.sh to make the image smaller for openvino
2019-09-18 11:22:37 -07:00
Changming Sun
dc03ce0278
New OP: CDist (#1808)
Add a new op for scikit-learn converter. It's for scikit's cdist function:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

Will add docs and shape-inference function later.
Will convert it to an ONNX function before pushing into ONNX.
2019-09-17 10:55:31 -07:00
Bowen Bao
8712a523a4
Bump onnx to latest (#1756)
* Bump onnx to latest

Update onnx.in.proto with changes for SparseTensor.

* add temp skip tests

* remove passed tests from skip list

* skip more tests for new ops in opset 11

* skip crashing tests

* update handling of new attribute types sparse tensor and sparse tensors

* advance onnx commit and remove skip cpu_flaky_tests

* temporarily skip yolo3 model test due to resize opset10 shape inference regression

* update proto for onnxruntime server

* advance onnx commit further
2019-09-12 11:46:49 -07:00
Hector Li
2b8677b210
Enable Openvino nightly build on edge device (#1684)
1. Add openvino GPU nightly build pipeline, this test is running on Intel Up square Edge device. The device are host locally not from Azure VM. We persist a smaller model test data on Edge device.

2. Update the build condition for openvino GPU so it works for GPU_FP32, GPU_FP16

3. add option to install_ubuntu.sh to exclude the package used for nuphar, so that we can save some disk space as the Edge device usually have limited disk space.
2019-09-11 16:36:12 -07:00
KeDengMS
58fe5a6bf1
Enable Nuphar docker build, and reinstate Nuphar tests (#1757)
Enable Nuphar EP docker build
Revert back to LLVM 6.0.1
Reinstate disabled Softmax tests caused by LLVM 8.0.1
Reinstate Nuphar Python test due to stale sympy version
Increase build timeout of Linux CI
2019-09-05 08:50:48 -07:00
Changming Sun
94d9161166
Add nuphar to Linux CI build (#1750) 2019-09-03 11:39:27 -07:00
Ashwini Khade
0044be6259
update onnx to latest commit (#1622)
* update onnx to latest commit

* Disable and/or fix failing tests

* disable not yet implemented tests for opset 11

* disable tests

* fix bug in mkldnn fp16 graph check
2019-08-15 17:10:32 -07:00
Hariharan Seshadri
28a6f6b11b
Add back MacOS leg of the Python packaging job (#1523) (#1526)
* Add MacOS leg of Python packaging job

* Update copy files source directory for Mac OS leg

* Add a task to display the binaries directories contents after build wheel creation

* Revert some changes

* Add task to log

* Update

* Remove unnecessary logs
2019-07-31 15:57:26 -07:00
daquexian
ec3c553501 NNAPI EP Update (#1483)
* Update DNNLibrary

* Allow fp16 by default

* Add nnapi build in ci

* Fix nnapi ep after #1268

* Remove unused variables

* Support nnapi in onnx_test_runner

* Update DNNLibrary to fix tests

* Update build.py for android build support, solve conflict of
tools/ci_build/build.py

* Support non-ARM Android build, solve conflict of tools/ci_build/build.py

* Enable android test by x86_64 android emulator

* Add dnnlibrary/NNAPI support in build.py

* suppress the verbose adb output

* Remove debug logs

* Install cmake by pip

* Fix undefined host_protoc_path

* cmake==3.13.2 in pypi is actually 3.12.2, so install 3.13.2.post1 instead

* Fix Android ARM64 build

* Use android ndk r20 instead of r19c, fix conflicts in install_deps_android.sh
2019-07-24 13:20:05 -07:00
Ke Zhang
638398e675
sync onnx to get equal op with float support (#1432)
* sync onnx to get equal op with float support

* doc update

* fix test failure because of updated shape inference logic for roialign.

* filter consum test cases since it's not implemented yet.
2019-07-19 13:19:09 -07:00
suryasidd
e9e777925f [OpenVINO-EP] Added support for OpenVINO R1.1 (#1438)
* Initial commit for OpenVINO R1

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed MO dynamic shape error

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Add debug messages for failure

* Update install_openvino.sh script

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Try catch included.  Return type of Isgraphsupported function changed to void

* Removed error_msg variable and commented code

* formatting cleanup

* Added missing return statement

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Changed MO to be compatible with both R5 and R1

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Updated docker scripts to include openvino version number

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Ignore compiler warnings from external headers

* Updated dockerfiles

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Code cleanup using clang-format

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Suppress model optimizer info error

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Python code formatting using auto pep8

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Updated documentation

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2019-07-19 00:52:15 -07:00
Colin Versteeg
5ee0f185dc Add GRPC support to ONNX Runtime Server (#1144)
* add grpc

* add-submodule

* Revert "add-submodule"

This reverts commit e35994b25035ce310a98909658582bff759ee358.

* fix submodule

* IT BUILDS

* Initial commit of prediction_service_impl.cpp

* Server builds and runs!

* add request id, health and reflection. GRPC is done

* enable channelz for monitoring

* GRPC unit tests

* clang format

* add unit tests

* Add function tests for GRPC

* add grpc to model_zoo_tests

* revert update protobuf to 3.7.0

* update submodules

* builds but runs some gflags tests which fail

* get build working

* confine build changes to onnxruntime_server.cmake

* update build files

* code reveiw comments

* Maik's code review comments

* update cares version to fix compilation issue

* update build to fix c-ares

* code review comments

* update cgmanifest.json

* remove extraneous file

* Klein comments.

* update ci based on discussions for go dependency

* fix tag issue

* fix build issues

* remove stray submodule

* update dockerfile and build script

* dynamic linking changes

* update build script

* code review comments

* update dockerfile

* update script for mount

* code review comments
2019-07-18 11:10:38 -07:00
Matthieu Darbois
04d581995d Use manylinux2010 image to build linux python wheels (#1282)
* Update cuda for python wheels

* Update cuda for python wheels

* Update cuda for python wheels

* Update azure-pipelines-py-packaging.yml

* Update to cuda 10

* Only test win gpu

* Update cuda for python wheels

* Use manylinux2010 image to build linux python wheels

Allow wheels built to truly be compliant with a manylinux policy
2019-06-27 15:45:06 -07:00
Scott McKay
0951f53c80 Update ONNX to d94f99d21a9a0820d58966410ceaf525132f85f1 to pickup change to checker that makes ssd_mobilenet model load 20x faster by avoiding unnecessary copies. (#1307) 2019-06-27 08:39:41 -07:00
Ashwini Khade
a571ea74a6 update onnx (#1287) 2019-06-24 14:17:27 -07:00
Raymond Yang
c96049fe4a Update ONNX version to include new fixes/changes (#1250) 2019-06-18 14:39:36 -07:00
RandySheriffH
a4148c85a5
call install_onnx.sh with relative path (#1225)
* call install_onnx with relative path

* use caller path

* get abs path of current script

* process unicode char

* replace script name

* add suffix as part of match

* match only the end of path
2019-06-18 13:28:09 -07:00
S. Manohar Karlapalem
8d15ffd8f5 Initial commit for OpenVINO Execution Provider (#935)
* Initial commit for OpenVINO Execution Provider

OpenVINO Execution Provider provides the interface for ONNX Runtime
applications to access Intel's hardware accelerators using Intel's
OpenVINO Toolkit.

* Fixed bug in GetCapability to disable custom ops

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added OPENVINO ci pipeline

Added new pipeline for openvino provider,
made changes to support the docker build and
onnxruntime build with openvino.

Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>

* Enabled all unit tests for OpenVINO EP

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed syntax issue in run_docker_build.sh file

* Added missing default OPENVINO_VERSION

Default value for OPENVINO_VERSION env was
missing causing the build to fail

* Added install Model Optimizer deps step

* Fixed python unit tests and some tests from onnx_backend_test_series

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed indentation bug

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled some of the python backend tests for OpenVINO

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled some model tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Remove Duplicate checks for openvino in build.py

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Modified GetCapability for FP16

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled GPU FP32 tests that are not supported

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Convert modelProto to string and use it in compile

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Pass byte-array input args to MO

* Serialized ModelProto passed in-memory to MO

ModelOptimizer python module receives the serialized  ModelProto
in-memory.
Uses appropriate ONNX function to load the serialized bytes.

* Make Py_Finalize compatible with older python versions

Also, remove pFunc unassigned variable possibility.

* Fallback if input dims of Matmul is greater than 2

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* fixup: Device #define syntax

* Updated the documentation

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Enable dynamic dim value

* removed commented out code

* Added Dockerfile for openvino EP

Updated instructions on dockerfiles/README.md file

Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>

* Disabled fp16_inception_v1 test

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Code formatting with clang-format

Uses style from the .clang-format file in root directory.

* fixup: docker tag and build error fixes

* Heuristics to automatically detect batching

Distributes slices from batch into parallel infer-request objects.

* Handle disabled tests in GetCapability

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled average pool and max pool if ceil_mode is 1

Also dilations are not supported if they are greater than 1

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Unsqueeze int32 test

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* changes to fix output results bug

* Disabled a few C++ unit tests for MYRIAD FP16

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Manually revert '9fe162bb Enable dynamic dim value'

Reverts compile time setting of dynamic shape
Reverting manually due to significantly huge auto-revert conflicts.

* Fixed unused variable warning

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Mul test for GPU_FP16 due to accuracy issue

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* VPU documentation update

* Disabled inception_v1 for MYRIAD and HDDL

*Also disabled few C++ accuracy tests for HDDL

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* updates from upstream

* use the new CustomOpApis for I/O interfacing

* Pass initializers as subgraph meta-def inputs in GetCapability()

Requirement due to API changes introduced with PR# 1019.

* Remove obsolete functions

* Save indexes of graph inputs from fused_node info

Both inputs and initializers are passed as data inputs to the
infer function. To identify only inputs among them, save thier
index info from fused_node in Compile function.

* Documentation changes to enable VPU

* Fix VPU related changes in documentation

* Fix minor changes in documentation

* Fix VPU related changes in documentation

* Use Node.In/OutputDefs() to track graph inputs and outputs.

Don't use graph_viewer's GetInputs() or
GetInputsIncludingInitializers().

* Permit "SAME_UPPER" auto_pad attribute from MaxPool

* Disabled fp16_tiny_yolov2 in onnx model tests

* Updated documentation to include configuration guides for myriad and hddl

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Use 8 Infer requests only for VAD-R

* disable debug prints

* Clang-format source files

* Updated BUILD.md with OpenVINO R5 links

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled same upper python tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Update test exclusion syntax

* Change path of install_onnx.sh

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disable tiny_yolov2 in broken tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Revert "Change path of install_onnx.sh"

This reverts commit ba9db165f3be430f2aff1ef413299ed04637196a.
This change is only required for Intel internal CI pipeline until
the settings are matched with the upstream's CI pipeline.

* Added debug statements for debugging CI error

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Add --build_wheel to linux openvino pipeline

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added -v option to onnx_test_runner for debugging

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Removed path change patch

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added -c 1  to onnx_test_runner

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Refactor MO python invocation in separate function

Cleans up Model Optimizer python invocation check and conversion
logic. Invokes MO only once in GetCapability() and passes the
IR strings (xml and bin) to the Compiler as meta-def attributes.

* Add comments

* code cleanup and comments

* Code cleanup for GetCapability

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Removed unnecessary files

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Revert "Added -v option to onnx_test_runner for debugging"

This reverts commit d1dd70938a94d648df1a1dbbc2e48d0b97e49ec8.

* Revert "Added debug statements for debugging CI error"

This reverts commit b86d41afed2aa29c3508155d6f9c8d3a7263cc60.

* incorporate Status Code changes

* ComputeFunc returns Status::OK() on success

* Use test names to disable tests for MYRIAD and VAD-R

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Rename local identifiers from CNNNetwork to OpenVINO network

CNNNetwork is an OpenVINO's API class that represents more than
just convolutional neural networks (CNNs). Renaming helps to avoid
confusion that the API's only support CNN type models.

* Added error message if building on windows

* Removed duplicate option in Cmake
* Removed unnecessary parameters in activation_opt_test

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Refactor Map search and access logic for efficiently and cleanliness.

* use C++ style casts

* Use os.path.join for python directory path operations

* use C++ style casts

* EP classes should use onnxruntime namespace

* Clean up fixes from PR comments

* Don't explicitly shutdown Py interpreter

* Remove debug print statements

Prints will be re-enabled later with a logging mechanism with
debug/verbose printing options.

* Decrement ref counts for used pyObjects

* Restore build instructions for other compilers

Content under the "Using other compilers" section has been
accidentally deleted by a previous commit. Restoring back that
content from the latest upstream repo.

* CMake code cleanup

Code clean up, commenting and formatting of CMake code.

* Don't pass the unused device_info parameter to OpenVINOGraph ctor.

* Add support for multiple I/O data types

Adds support for the following tensor data types for graph inputs
and outputs:
1) float
2) float16
3) int32
4) int16
5) int8
6) uint16
7) uint8

* cleanup setup.py module list definition

* Deduce index of input using tracked input index map

Ignores initializers in case they are ordered before inputs.

* Removed debug statement in MO code

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* PR feedback

* Removed per_sample_tolerance for openvino
* Removed unnecessary disabled tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Removed debug function

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled tiny_yolo_v2 due to accuracy issues

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Changed the disabled reason for broken tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Reshape with no input

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Python formatting with Autopep8

* Minor fix for MYRIAD devices

* Added zero dimension check

*Removed setting batch size for the network

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Set the threshold to larger value for MNIST

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Removed setting higher threshold in provider_test_utils

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Check for --use_openvino in python wheel setup.py

Add openvino modules to the setup script for building the wheel
package only for --use_openvino a build option.

* Removed nullptr checks for GetNode()

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2019-06-18 08:58:53 -07:00
RandySheriffH
6850e55966
Fix install_onnx.sh issue (#1201)
* switch from map to array to keep visiting sequence

* add comment

* improve comment
2019-06-13 13:37:59 -07:00
Dmitri Smirnov
a92998c235
Uncomment ConstantOfShape tests. (#1059)
Advance ONNX submodule to 5c51f0dbbe88ee1536f17ee7bd462b2ab3772c52
  This commit in ONNX contains a fix to ConstantOfShape test data.
  Uncomment ConstantOfShape.
  Update test script, make sure exclusions are uniform.
2019-06-10 14:36:36 -07:00
Changming Sun
be36385a8c
Delete docker/scripts/install_deps_x86.sh and enable onnx tests for x86 (#1191) 2019-06-08 16:17:18 -07:00
Changming Sun
ccab8165eb
Delete scripts/install_ubuntu_x86.sh (#1189)
* Delete scripts/install_ubuntu_x86.sh to reduce duplicated code
2019-06-07 15:48:52 -07:00
RandySheriffH
4757933afe
Exclude test by onnx version tag (#1073)
* add version filter to failed tests

* exclude test from backend

* exclude shrink from opset 9

* fix compile err

* exclude certain version of constant shape

* enable flatten test

* fix compile err

* comment mvn test

* disable constantofshape test in x86

* disable x86 test

* get model version from imported opset

* test linux x86 case

* disable nonzero opset 10

* make mutex const

* test filter by commit id

* adjust substr offset

* Limit test platform

* remove change impacting TFModleInfo.h

* refactoring

* refactoring

* test x86 pipeline with filter

* add comment

* restrict version extraction on non-win

* restrict version extraction on non-win

* add tag

* exclude case from backend test

* remove dup

* remove dup

* make script runnable

* hard code adsolute path

* refactor log

* fix x86 compile err

* fix x86 compile err

* fix x86 compile err

* sync with latest tensorrt

* switch to regex

* fix cpu pipeline err

* test filter

* disable nonzero from all versions
2019-05-30 16:19:06 -07:00
Bowen Bao
a42222f9de
bump onnx version & fix conv/pool tests (#1067) 2019-05-21 09:52:41 -07:00
Klein Hu
c2b412f7be Update the ONNX Runtime Server CI pipeline setup (#986)
* Update the ORT-SRV ci pipeline setup

* Update pip package installation for server tests

* Install requests package in build setup

* Check if python dependencies exists before install
2019-05-08 11:37:39 -07:00
Ashwini Khade
f4fd36ee91
merge rel-0.4.0 into master (#959)
* Accomodate missing optional 'axes' when 'steps' is present in Slice op (#946)

* Accomodate missing optional axes when steps is present in Slice implementation

* PR feedback

* Update package links (#937)

* Update package links

* Minor fix

* Update README.md

* Minor edit

* Update onnx commit (#949)

* Update onnx commit

* disable failing tests which don't have to be fixed for this release

* dummy change to fix file permission

* fix file permission
2019-05-03 09:07:19 -07:00
Raymond Yang
01cd7eaca8 Bump up onnx version (#936)
* bump up onnx version
2019-04-30 08:44:32 -07:00
Changming Sun
1f066d4dc4 Update onnx (#893) 2019-04-24 21:31:49 +10:00
Hector Li
e8d722003a
Move NMS to Onnx domain (#865)
* move files

* move files

* Remove NonMaxSuppression from Contrib op, move it to Onnx domain, opset 10

* move NMS out of namespace contrib

* update data type in UT

* update to latest onnx

* white list the node test for Mod which is not implemented yet
2019-04-22 13:24:27 -07:00
nivas-x86
a4d7052aeb Add nGraph Execution Provider (#832)
* Add nGraph Execution Provider

* feedback changes 1

* feedback2

* Feedback and upgrade nGraph

* Feedback 4

* Fix CI

* Disable new ops
2019-04-20 17:02:35 -07:00
Changming Sun
d78c340eac update onnx (#861)
* update onnx

* ignore some tests
2019-04-19 10:52:47 -07:00
Changming Sun
687bac455d Convert eigen to a submodule and update it to the latest version 2019-04-18 21:24:56 -07:00
Bowen Bao
ed0c86cd90 update onnx to fix matmul shape inference (#847)
* update onnx to fix matmul shape inference

* update onnx submodule hash in cgmanifest.json and ci scripts
2019-04-18 14:52:48 -07:00
daquexian
ac82c1f483 enable android build (#715)
* enable android build

* Add 'log' to onnxruntime_EXTERNAL_LIBRARIES

* Remove cmake about header_files_test.cc

* Add Android CI pipeline

* Remove some ms-specific(?) ci

* Fix bash error

* Add execute flag for install_deps_android.sh

* Add install_ubuntu_for_android.sh

* Remove python in deps for android

* Add comment for BUILD_ARCH

* Set BUILD_SERVICE to cpu

* Set BUILD_OS in run_build.sh

* Fix -o bug in run_build.sh

* Android -> android

* Correct the android ndk location

* Checkout submodules in my own azure pipelines

* Revert "Remove some ms-specific(?) ci"

This reverts commit 302463213480487d8944c3127a3b311c591d55c0.

* Revert "Checkout submodules in my own azure pipelines"

This reverts commit 1acfb6755f933e532b8312ca35bb4900a833903f.
2019-04-18 09:59:04 +08:00
Ashwini Khade
07e6dfa7ab
update onnx and enable tests for qlinearconv (#840) 2019-04-16 09:43:17 -07:00
Pranav Sharma
4b4a359943
Exclude unreferenced global data and op doc strings in the opschema object. The first causes a decrease in the binary size by at least 85k. The latter reduces resident memory size. (#823)
* Exclude unreferenced global data and op doc strings in the opschema object. The first causes a decrease in the binary size by at least 85k. The latter reduces resident memory size.

* Update onnx to incorporate my PR that fixes SetDoc compiler warnings
2019-04-15 15:57:19 -07:00
Ashwini Khade
10b113f144
update onnx to bring in quantized ops (#808)
* update onnx + move quantized ops kernels and test to onnx + remove exp ops

* update onnx

* Revert "update onnx"

This reverts commit 533abfc297e75473a74505fb89921ffc05c46a1c.

* add generated csharp test file
2019-04-10 17:20:35 -07:00
Changming Sun
290112d614
Update onnx (#761)
* update onnx
2019-04-04 10:58:45 -07:00
Ashwini Khade
8bc532bfb9 update onnx and add removed experimental ops to contrib ops (#723) 2019-04-02 22:30:00 -07:00
KeDengMS
deaea702ff
Bump up cmake_minimum_required to 3.13 (#722)
This is consistent with CI version. cmake 3.11 has issues with CUDA build in Linux.
2019-03-27 14:45:24 -07:00
Raymond Yang
c35b605b8d
Support updated opschema with functionbody (#640)
* Update onnx

* Support updated function schema in ORT

* Update onnx related commit hash

* Check out an older commit in ONNX

* Add support for subgraph attribute

* Add comments
2019-03-27 11:38:10 -07:00
Changming Sun
a26696fb0e Enable LTO on Linux 2019-03-22 15:30:37 -07:00
stevenlix
e8b0ae8923
Trt execution provider (#382)
* updated cmake files for trt

* added trt execution provider

* added trt basic test

* removed trt_path action attribute

* Add files via upload

* Update build.py

* Update trt_allocator.h

* fixed issues found by reviewers

* changed cast operator

* added comment for custom kernel implementation

* changed auto to auto&

* changed to function compile APIs for TRT execution provider

* changed to function compile APIs for TRT execution provider

* added new DType DInt64

* adapted to the changes of onnxruntime_c_api

* removed trt kernel (use function compile instead)

* updated onnx-tensorrt submodule

* set default memory type to TRT fused kernel

* resolve merge conflict

* fixed the issue that USE_CUDA conflicts with USE_TRT

* construct graph by adding nodes in topological order

* made changes for Windows

* change buffers type

* bypass HasImplementationOf check for TRT XP because TRT kernel is not registered

* added domain to version info in rebuilt model proto

* added trt to test option list

* added DomainToVersionMap() to GraphViewer

* removed Copy()

* fixed broken code

* format the code to clang format

* used local reference to the frequently used values

* fixed a couple of issues according to reviewers feedback

* fixed a couple of issues according to reviewers feedback

* added python binding for TRT and enable use_cuda when use_trt is on

* fixed a redefinition issue

* changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers

* enabled trtexecution provider for unit tests

* renamed trt to tensorrt

* added tesorrt to python binding

* update submodule onnx and onnx-tensorrt

* made a couple of minor changes based on reviewer's feedback

* added CUDA_CHECK

* removed test code

* fixed broken code after merge

* updated onnx-tensorrt submodule

* added post processing to align trt inputs/outputs with graph inputs/outputs

* updated onnx submodule

* added CUDA fallback for TensorRT and fixed TensorRT cmake issue

* added ci pipeline for tensorrt and removed some redundent code from trt xp

* fixed syntax issue

* updated onnx-tensorrt submodule

* fix trt build problem by: (#602)

1. Add additional /wd for debug build
2. Add io.h for additional targets
3. Bring back mb version of getopt

* Update install_ubuntu.sh

* Update linux-gpu-tensorrt-ci-pipeline.yml

* Update linux-gpu-tensorrt-ci-pipeline.yml

* Update run_build.sh

* Update run_build.sh

* Update run_build.sh

* Update run_build.sh

* fixed the issue that GetKernelRegistry returns nullptr

* merged master to this branch

* moved some data types to private

* fixed tensorrt CI pipeline issue

* customized test data for TensorRT pipeline

* added onnx-tensorrt in json file and fixed an issue in ci script

* added comments
2019-03-14 12:00:39 -07:00
Hariharan Seshadri
cfb08c4848
TopK op: Promote onnx to a newer commit and handle changed TopK spec for opset 10 (#611)
* Initial commit

* Nit fix
2019-03-13 10:21:58 -07:00
Ke Zhang
5bb842538d
sync onnx and maintain old version history for removed exp ops (#588)
* sync onnx and maintain old version history for removed exp ops in onnx runtime.

* update

* updating to specific onnx commit - remove exp ops.

* update

* disable the 3 failures to push the change as it's blocking folks.

* update test
2019-03-12 18:48:27 -07:00
Randy
f048fc5fb0 cross compile x86 linux (#562)
* cross compile x86 linux

* fix comments

* install multilib for ubuntu cross compile

* remove tailing slash

* fix -fPIC relocations for x86 target too

* add asm make flag

* fix x86 compile err

* test x86 with zlib and png

* Disable zlib from x86

* install x86 python header

* remove cross-compiling changes

* test 32bit ubuntu

* add x86 ubuntu docker file

* add x86 as arch parametr for docker build

* config pipeline

* avoid dotnet install

* install cmake

* skip dep install

* use latest ubuntu

* install latest cmake

* install x86 deps

* configure cmake

* install ninja

* correct ninja dir

* apt get re2c

* install onnx

* set processor x86

* disable warning

* skip test

* disable test

* disable test

* find lib

* fix typo

* restore test

* disable backend model test

* disable test

* fix test err

* stop installing onnx

* disable onnx test on x86

* restore yml

* mergef with master yml

* cancel needless config setting

* enable x86 flag

* restore all onnx tests

* fix yml typo

* install onnx

* add back x86 flag

* disable cases

* disable case

* disable cases

* add macro to disable cases

* fix typo

* print platform

* remove condition
2019-03-12 09:47:45 -07:00
Hariharan Seshadri
1d3fcc525a
deps: update onnx to a newer commit and update test exclusions (#542)
* Update onnx dep to a newer commit and update test exclusions

* Keeping shrink excuded in c++ tests

* More changes
2019-03-05 12:03:16 -08:00
Raymond Yang
f5dfbba655
Clarify numpy version requirement (#537)
* update packaging numpy version to 1.15.0

* update version in numpy version in linux

* Install numpy 1.15.0

* Finish up numpy requirement after test

* Try fix

* Fix ci script
2019-03-05 11:07:28 -08:00
Raymond Yang
ec8ac04f30
Update cast op to support string <-> numeric (#379)
* Update cast kernel to support to/from string

* Update namespace

* Add support for literal numeric case

* Update to support -INF test

* Update kernel registration for cast

* Update ONNX to 1.4.1

* Update registy api

* Resolve some comments

* Update cast kernel implementation

* Resolve comments

* Fixed test data in onnx

* Update cast kernel implementation

* Resolve PR comments

* Update cast_op.cc

* Update onnx commits info

* Update comments
2019-02-12 10:10:56 -08:00
Raymond Yang
7cd393d697
Fix 3.7 build; Add cuda version in README (#427) 2019-02-06 13:38:04 -08:00
shahasad
f94fdad861
Fixes on the dotnet end-to-end test scripts to get it running on linux (#376)
* fixed typo in runtest.sh

* some fixes

* some fixes

* some fixes in the runtest.sh

* added test data url

* fixes on the dotnet test scripts

* fix on prior mistake regarding installation of apt-transport-https

* added verbosity in the test run for easy debugging

* updated comment in the runtest.sh
2019-01-24 13:14:29 -08:00
Scott McKay
bca8daf762
Update ONNX. Implement Scan 9 changes (#366)
* Update ONNX version to pickup Scan spec change that adds scan_output_axes.
Add logic to transpose an output
  - write to temporary buffer when executing subgraph
  - transpose temporary buffer into Scan output when execution completes
Add unit tests

* Update to ONNX dbf3581835e3a05716e10587511d7ab3b2cdc386 to pickup inferencing bugfix.
Update test to match.

* Disable some tests for opset 9 operators that haven't been implemented yet.
2019-01-24 08:10:39 +10:00
Changming Sun
948cc03490 upgrade onnx 2019-01-17 13:10:30 -08:00
Raymond Yang
0efc48a11a
Install dotnet sdk on linux ci (#320)
* Try install dotnet sdk on linux ci

* Fix install script

* Add configurable os version in docker build script

* Avoid use ARG in docker
2019-01-14 17:51:45 -08:00
edgchen1
34bcc92554 Added test data URL and checksum arguments to build.py. (#302)
* Added test data arguments to build.py, modified win-ci-pipeline build.

* Updated CI builds to use template tasks, added test data args, removed AZURE_BLOB_KEY uses.

* Fixed up set test data step template.
2019-01-09 22:33:14 -08:00
Changming Sun
5e113661a9 Build system upgrades (#281)
* update

* runas normal user
2019-01-07 13:15:24 -08:00
Dmitri Smirnov
7af1887b33
Introduce basic BFloat16 runtime support (#235)
* Add basic support for BFloat16 type.

* Advance onnx submodule for bfloat16 support.

* Update install_deps for linux.

* Address review comments.
2018-12-21 12:40:59 -08:00
Changming Sun
dc8b37f4c4
update onnx (#209)
* update onnx
2018-12-18 14:50:28 -08:00
Changming Sun
618cc51754
Update onnx_backend_test_series.py (#146)
* Update onnx_backend_test_series.py

* Update BUILD.md
2018-12-12 16:25:16 -08:00
Dmitri Smirnov
fbb23a9ed0
Implement StringNormalizer (#69)
* Imlpement StringNormalizer
  Add mixed language tests, test case insentive path.
* Create a locale on the fly. Default locale does not seem to create well.
* Add CI language-pack-en to make default locale available.
  Catch and translate locale creation exception to make the message
  meaningful.
* Make sure locales are configured on Ubuntu.
2018-12-04 13:47:08 -08:00
Raymond Yang
1b3efc36c1 Add pipeline for building python wheels (#41)
* Add pipeline for building python wheels for Windows/Linux CPU and GPU

* try enable mkldnn

* remove mklml

* Update python packaging configuration

* Add python3.7 support

* Revert to disable the py37 packaging on windows
2018-11-27 20:02:41 -08:00
Pranav Sharma
89618e8f1e Initial bootstrap commit. 2018-11-19 16:48:22 -08:00