1. Fix training e2e pipeline. The failure was caused by my recent change #7632. The fix is adding "--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=70" to the build parameters because the machines are with V100 GPUs.
2. Simplify Nuphar pipeline. It doesn't need to install a separated ONNX version(1.5.0)
3. Fix a problem that run_dockerbuild.sh ignored OS version parameter. Now because it starts to take effect, I also set python version to the system default one(3.8 for ubuntu 20.04)
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs).
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2. (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use.
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines. In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build.
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines.
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
* Install and use conda on ortmodule CI pipelines
* Update build script to install onnxruntime wheel before running unit tests
* Remove python 3.5 from install_python_deps
* Pinning deepspeed version to 0.3.15
* first attempt rocm training wheel
* modifications needed to python packaging pipeline for Rocm 4.1
* changges to not conflict with cuda
missed stage1 changes
remove package push
add option r to getopt
try again without python install
try again without python install
try again without python install
split pipelines and add back push to remote storage
try on cuda gpu pool
try again
try again
try running without az subscription set
try again on original pipeline
change pool
passing AMD Rocm whl on AMD-GPU pool
split rocm pipeline from cuda pipeline
remove comments
* try adding Rocm tests as well
* try with tests in place
* fix trailing ws
* add training data
* try again as root for tests
* use python3
* typo
* try to map video, render group into container
* try again
* try again
* try to avoid yum error code
* make UID 1001
* try without yum downgrade
* define rocm_version=None
* remove CUDA related comments for Rocm Dockerfile
* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)
* missed requirements-rocm.txt from last commit
* fix whitespace
1. Migrated it to Ed's new docker build script
2. Use python 3.6 instead, because it is the default one in ubuntu 18.04
3. Move the "pip install" command to the docker image build stage(instead of when running the image)
Update gpu packaging pipelines to CUDA11
In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
* Deprecate Python global configuration functions [Part 1] (#5923)
Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.
* remove dnnl_dll_path from post build copy (#6142)
* Model Fusion For Bart (#6105)
Fusion fix for Bart models
* Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)
* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.
* Enable running the mnist_training sample without cuda (#6085)
Signed-off-by: George Nash <george.nash@intel.com>
* nnapi add min max support (#6117)
* Fix CUDA test hang: (#6138)
- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.
* Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)
* add static subgraph kernel index
* change kernel naming to avoid conflicts
* Add gradient registration for Abs. (#6139)
* Partition initial optimizer state for Zero-1 (#6093)
* Initial changes
* Working changes
* Working changes
* Cleanup
* fix windows CI
* Review comments
* review comments
* Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)
#4656
* Revert "work around of the build break in mac (#6069)" (#6150)
This reverts commit 3cae28699b.
* Fix clean_docker_image_cache.py detection of image pushes. (#6151)
Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.
* MLAS: add NEON version of int8 depthwise convolution (#6152)
* Using a map of of ops to stages as input of partition function. (#5940)
* New partition algorithm running before AD
* Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2
* Removing code for partition of bwd graphs
* Remove old code
* Adding some verification code
* Handle Shared Initializer
* Renaming rank with stage
* Added first unit test
* new test
* redundant check
* undo change in bert
* Moved cut-based partition to testing utils file
Co-authored-by: xzhu1900
Co-authored-by: wschin
* New conversion function and tests
* minor
* remove test that is not needed2
* improve GetDeviceAssignment and PR comments
* minor changes
* PR comments
* improving documentation and variable naming
* add documentation
* Variable naming and docs
* more doc improvements
* more doc improvements
* missing static cast
* Fix test file for windows
* Fix test file for windows
* Fix test file for windows
* stage id is not the same as rank id
* PR comments
* PR comments
* More comments
* More comments
* Minor fix to satisfy c++14 (#6162)
* Deprecating Horovod and refactored Adasum computations (#5468)
deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests
* Update TensorRT-ExecutionProvider.md (#6161)
* Bugfix for topk cuda kernel (#6164)
* fix the issue that std::numeric_limits cannot handle half type
* adding a test
Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)
This reverts commit f2dcba7afe.
* Remove ignored build warnings for pybind on Mac (#6165)
* save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)
* save_checkpoint and load_checkpoint implementations
* checkpoint aggregation logic
* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints
* Don't try to bind unused inputs in the Training frontend (#6166)
* Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172)
* aggregate model states only for the case when mixed precision was true (#6176)
* [NNAPI EP] Enable per-channel quantization for QlinearConv (#6155)
* Enable qlinearconv per-channel quantization
* Fix the android CI test failure
* Add Android Version Check for Per-Channel Quant
* Address PR comments
* Fix some minor issues
* Add verification of per-channel zero points
* Make the error tolerance configurable
* Fix typo in BERT pretraining script (#6175)
A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.
* Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
* Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156)
* Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions.
Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach).
* Restructure the helper so it can be called across the EP bridge.
Add ability to call id generation helper from EP bridge
- convert DNNL EP to use helper to validate
Address issue where a new Model may be loaded into the same address as a previous one.
- hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model
Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time.
- Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.
* Backend APIs for checkpointing (#5803)
* Add backend API GetOptimizerState and GetModelState
* add GetPartitionInfoMap
* Android coverage dashboard (#6163)
* Write the report to a file.
* Post code coverage to the Dashboard database.
* Add usage details of unified MCR container image (#6182)
Going forward, a single unifed docker image will be published in
MCR. The hardware accelerator target choice will have to be made
in the application using OpenVINO EP's runtime config options.
* improve perf for softmax (#6128)
* improve perf for both gathergrad and softmax
* revert the change in gathergrad and will be done in another PR.
* address comments from code review.
* Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)
* tune fast gelu to use exp(x) instead of tanh(x) on rocm
* update to use expression 2/(1+exp(-2x))-1 for stability
* Add Status.csv to EP Perf Tool (#6167)
* merge master, keep postprocess status commit
* download float16.py everytime
* removing hardcoded values
* Lochi/quantization tool for trt (#6103)
* Initial implementation of generating calibration dynamic range table
* Initialize validation support for Quantization
* Initialize validation support for Quantization (cont.)
* Improve validation support for Quantization
* Improve validation support for Quantization
* Rewrite/Refine for calibration and validation
* Rewrite/Refine for calibration and validation (cont.)
* Refine code
* Refine code
* Add data reader for BERT
* Add flatbuffers to serialize calibration table
* Refine code and add BERT evaluation
* Refine the code
* minor modification
* Add preprocess/postprocess of vision team yolov3 and refine the code
* Update annotation
* Make bbox cooridates more accurate
* Fix bug
* Add support of batch processing
* Batch processing for model zoo yolov3
* Add batch inference for evaluation
* Refine the code
* Add README
* Add comments
* Refine the code for PR
* Remove batch support checking in data_reader and refine the code
* Refine the code for PR
* Refine the code for PR review
Co-authored-by: Olivia Jain <oljain@microsoft.com>
* Implement ScatterND for CUDA EP (#6184)
* Condition fix in Resize operator (#6193)
* Clean up checkpoint tests to use the new checkpoint functions (#6188)
* add deprecation warning for old checkpoint functions
* update all the distributed checkpoint tests to use new checkpoint functions
* Implement comparing outputs that are sequence of maps of strings to floats (#6180)
* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats
* PR comments
* Dockerfile to build onnxruntime with ROCm 4.0
* Add ability to skip GPU tests based on GPU adapter name (#6198)
* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats
* PR comments
* Add ability to skip gpu tests according to adapter description
* spacing
* spacing
* spacing
* Openvino ep 2021.2 (#6196)
* Enabling fasterrcnn variant and vehicle detector
* changes for 2021_2 branch
* yolov3_pytorch commit
* fixed braces in basic_backend.cc
* ci information added
* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2
* some changes to support unit tests
* disable some tests which are failing
* fix myriad tests for vehicle detector
* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* yolov3 pytorch workaround to ensure that the output names are matched
* gemmoptest fixed on myriad
* Fixed MYRIADX CPP Test Failures
*Expand,GatherND,Range,Round op's
are only supported in model
*where op with float input data
types are not supported and fixed
*Scatter and ScatterElements op's with
negative axis are fixed
*Reshape op with 0 dim value are not
supported and fixed
*Disabled InstanceNorm_2 test on MYRIADX
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* make changes to yolov3 pytorch
* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixes POW op failures on GPU_FP16
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Clean up capability_2021_2.cc
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* fixed slice and removed extra prints
* Disabled failing python tests
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor changes added in capabilty_2021_2
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* made changes to slice to avoid failures
* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated docx for Inferencing a FP16 Model
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* fix for mask rcnn
* Script for installing openvino from source
* Updated with openvino 2021.2 online installation
* code comment fixes
fixed accuracy mismatch for div
* Update OpenvinoEP-ExecutionProvider.md
updated for 2021.2 branch
* Update README.md
updated dockerfile documentation
* Update BUILD.md
build.md update documentation
* permissiong change of install_openvino.sh
* made changes to align with microsoft onnxruntime changes
* Updated with ov 2021.2.200
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
* Fix a memory leak in test_inference.cc (#6201)
* Fix a memory leak in test_inference.cc
* Use TArray in AMD element-wise kernels, rather than manually copying memory to device.
* Remove most ROCm-specific element-wise code and reuse CUDA element-wise code.
* Minor change to improve performance for operator Pad. (#5537)
* small improvment for pad
* Support double for operators Log, Reciprocal, Sum (CPU) (#6032)
* Support double for operators Log, Reciprocal, Sum
* remove tesdt erf_double
* Support double for operators Where, LpNormalisation (#6034)
* Support double for operators Relu, Tanh, Sigmoid (#6221)
* Fix ImportError in build.py (#6231)
There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already
* Removed executor todo that looks dead. (#6234)
* Remove MKLML/openblas/jemalloc build config (#6212)
* Remove python 3.5
* Update the readme file
* Upgrade build.py to assert for python 3.6+
Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.
* Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233)
* MLAS: handle MlasGemm(M/N/K==0) cases (#6238)
* Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220)
* Support double for operator TopK
* add static classes for topk/double
* fix cast issue in topk
* Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)
* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm
* Support double for operator ReduceMean, ReduceLogSumExp (#6217)
* Support double for operators ReduceMean, ReduceLogSumExp
* Support double for operator ArgMin (#6222)
* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt
* Update BUILD.md
* Update manylinux docker image to the latest (#6242)
* Fix allocator issue for TensorRT IOBinding (#6240)
* Fix issue: https://github.com/microsoft/onnxruntime/issues/6094
Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt.
Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT
* Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239)
* bias gelu grad use exp(...) instead
* update cuda to rocm
* missing semicolon
* comment
* remove dockerfile
* missing factor of two
* Refactor EP Perf Tool (#6202)
* merge master, keep postprocess status commit
* download float16.py everytime
* using variables to reference eps
* adding ACL EP to ep perf tool
* accuracy with absolute tolerance configurable
* add acl to dict + remove commented line
* Documentation for distributed CI tests pipeline (#6140)
* Remove a debug log in provider_test_utils.cc (#6200)
* Add the Concat Slice Elimination transform, fix constant_folding transform (#5457)
* Add concat slice transform + test
* Cosmetic improvements in concat slice transform
* Remove unrelated file, fix comment, fix constant folding bug
* Add test onnx graph
* fix windows build
* Review comments
* review comment
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252)
* Add MakeStringLite which uses current locale, update macros to use that to generate messages.
* Convert calls to MakeStringLite().
* Liqun/speech model loop to scan (#6070)
Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* model parallel refinement (#6244)
* Megatron Transformation as a seperate step
* remove useless header
* clang formating
* Re-Structure megatron transformer for subsquent changes
* fix comments
* Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248)
* Fix Linux/Mac error message on input type mismatch (#6256)
* add bfloat16 to gathergrad type constrains (#6267)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
* Fix VS 2017 build break (#6276)
* Deprecate Python global configuration functions [Part 2] (#6171)
Update Python API to allow more flexibility for setting providers and provider options.
The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.
Update some EP-specific option parsing to fail on unknown options.
Other clean up.
* Add script to preprocess python documentation before publishing (#6129)
* add script to preprocessing python documentation before publishing
* rename past to past_key_values for GPT-2 (#6269)
rename past to past_key_values for transformers 4.*
* Rename MakeString and ParseString functions. (#6272)
Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, *ParseString to *ParseStringWithClassicLocale.
Add missing pass-through versions of MakeStringWithClassicLocale for string types.
* Increase timeout for Linux GPU CUDA11 build. (#6280)
* Add helper to compare model with different precision (#6270)
* add parity_check_helper.py
* add real example
* remove lines
* Fix Min/Max CPU kernels for float16 type (#6205)
* fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)
fix io binding crash for past_sequence_length=0
* A list of changes in transformers tool (#6224)
* longformer fp16 e2e
* add fp16/fp32 parity check helper file
* excludes nodes with subgraph in profiling
* use onnxconverter_common to do fp32->fp16
* add version check for onnxconverter_common
* remove helper file
* add pkg installation on notebooks and script
* Workaround for static_cast<double>(half)
* Add workaround to remove ROCm-specific binary-elementwise files.
* Update nuget build (#6297)
1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.
* Enable ONNX backend test of SequenceProto input/output (#6043)
* assert sequence tensor and remove skips
* update testdata json
* use ONNX 1.8 in cgmanifest.json
* use previous commit to workaround
* update ONNX commit ID in docker
* skip test_maxpool_2d_dilations test for now
* update function name
* add --sequence_lengths option (#6285)
* more dtype for Equal CUDA kernel (#6288)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
* Force reinstall onnx python package on Windows (#6309)
* update transformers required package versions (#6315)
* Remove abs in LpPool (#6303)
* Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)
* Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.
* Add longformer to python package (#6314)
* add longformer to python package
* move test related script and data to a new folder
* Avoid false sharing on thread pool data structures (#6298)
Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops.
Motivation and Context
MobileNet on a 2*12-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread).
The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 2*14-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine.
Before change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232
After change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317
* fix opset imports for function body (#6287)
* fix function opsets
* add tests and update onnx
* changes per review comments
* add comments
* plus updates
* build fix
* Remove false positive prefast warning from threadpool (#6324)
* Java: add Semmle to Java publishing pipelines (#6326)
Add Semmle to Java API pipeline
Add security results publishing and add Java GPU.
* Quantization support for split operator with its NHWC support (#6107)
* Make split working for quantization.
* NHWC transformer support for split operator
* Refactor some according to Feedback. Will add test cases soon.
* Fix build error on windows.
* Add test case for split op on uint8_t support
* Add nhwc_transformer_test for split uint8_t support
* Some change according to PR feedbacks.
* Liqun/enable pipeline parallel test (#6331)
enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340)
This removes a special case of the cuda EP.
* MLAS: add fallback implementation for quantized GEMM (#6335)
Add a non-vectorized version of the kernel used for the quantized version of MlasGemm.
* Delete float16.py (#6336)
No longer needed. Also doesn't pass policheck.
* Enable add + softmax fusion for Rocm platform (#6259)
* add bias softmax; tests appear to pass
* check fusion occurs for rocm as well
* check for rocm provider compatible as well
* build for cpu scenario as well
* try again; broader cope
* proper scope on kGpuExecutionProvider
* been editing wrong file
* remove commented #include lines
* try again due to mac os ci error
* try again
* test fusion both cuda and rocm to avoid mac ci error
* add external data support to tensor proto utils (#6257)
* update unpack tensor utilities to support loading external data
* more updates
* fix test
* fix nuphar build
* minor build fix
* add tests
* fix Android CI
* fix warning
* fix DML build failure and some warnings
* more updates
* more updates
* plus few updates
* plus some refactoring
* changes per review
* plus some change
* remove temp code
* plus updates to safeint usage
* build fix
* fix for safeint
* changed wording. (#6337)
* Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321)
* remove gemmlowp submodule (#6341)
* [NNAPI] Add pow support (#6310)
* Add support for running Android emulator from build.py on Windows. (#6317)
* fix the pipeline failure (#6346)
* Train BERT Using BFloat16 on A100 (#6090)
* traing bert using bf16
* Adam support bf16
* bugfix
* add fusedmatmul support
* fix after merge from master.
* bugfix
* bugfix after merge from master
* fast reduction for bf16.
* resolve comments
* fix win build
* bugfix
* change header file.
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
* Fix DerefNullPtr issues raised by SDLNativeRules. (#6348)
* update quantize to support basic optimization and e2e example for image classification (#6313)
update the resnet50-v1 to standard one from onnx zoo.
add an example for mobilenet
run basic optimization before quantization
fix a bug in Clip
* Enable graph save for orttrainer (#6333)
* Enable graph save for orttrainer
* Fix CI
* Update orttraining/orttraining/python/training/orttrainer_options.py
* Update orttraining/orttraining/python/training/orttrainer_options.py
* Update orttraining/orttraining/python/training/orttrainer_options.py
* Update orttraining/orttraining/python/training/orttrainer_options.py
* Update orttraining/orttraining/python/training/orttrainer_options.py
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Add PREfast to python packaging pipeline (#6343)
* Add PREfast to python packaging pipeline
* fix longformer benchmark io_binding output_buffers (#6345)
* fix longformer benchmark io_binding output_buffers
* format
* import benchmark_helper from parent directory.
* Use readelf for minimal build binary size checks. (#6338)
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)
* Remove unused function
* Java: Set C language warnings to W4 and adjust JNI code (#6347)
Set /W3 for C language and fix up JNI warnings.
* Pipeline Parallel Experimental Python API (#5815)
* Add create session to WinML telemetry to track WinML Usage (#6356)
* Fix one more SDL warning (#6359)
* fix -Wdangling-gsl (#6357)
* Add python example of TensorRT INT8 inference on ResNet model (#6255)
* add trt int8 example on resnet model
* Update e2e_tensorrt_resnet_example.py
* remove keras dependency and update class names
* move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example
* simplify e2e_tensorrt_resnet_example.py
* Update preprocessing.py
* merge tensorrt_calibrate
* Update calibrate.py
* Update calibrate.py
* generalize calibrate
* Update calibrate.py
* fix issues
* fix formating
* remove augment_all
* This added telemetry isn't needed (#6363)
* Wezuo/memory analysis (#5658)
* merged alloc_plan
* pass compilation
* Start running, incorrect allocation memory info
* add in comments
* fix a bug of recording pattern too early.
* debugging lifetime
* fix lifetime
* passed mnist
* in process of visualization
* Add code to generate chrome trace for allocations.
* in process of collecting fragmentation
* before rebuild
* passed mnist
* passed bert tiny
* fix the inplace reuse
* fix the exception of weight in pinned memory
* add guards to ensure the tensor is in AllocPlan
* add customized profiling
* debugging
* debugging
* fix the reuse of differnt location type
* add rank
* add the rank
* add fragmentation
* add time_step_trace
* Add summary for each execution step (total bytes, used/free bytes).
* add top k
* change type of top k parameter
* remove prints
* change heap to set{
* add the name pattern
* add the useage for pattern
* add partition
* change to static class
* add custom group
* remove const
* update memory_info
* in process of adding it as runtime config
* change the memory profiling to be an argument
* add some comments
* add checks to recored meomry_info in traaining session
* set the "local rank setting" to correct argument.
* addressing comments
* format adjustment
* formatting
* remove alloc_interval
* update memory_info.cc to skip session when there is no tensor for a particular memory type
* fix memory_info multiple iteration seg-fault
* consolidate mainz changes
* fixed some minor errors
* guard by ORT_MINIMAL_BUILD
* add ORT_MEMORY_PROFILE flag
* added compiler flag to turn on/off memory profiling related code
* clean up the code regarding comments
* add comments
* revoke the onnx version
* clean up the code to match master
* clean up the code to match master
* clean up the code to match master
Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
* Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)
* Add CumSum-14 for Cuda
* fix convert_common version retrival (#6382)
* Refine auto_pad based pad computation in ConvTranspose (#6305)
* Fix SDL warning (#6390)
* Add max_norm for gradient clipping. (#6289)
* add max_norm as user option for gradient clipping
* add adam and lamb test cases for clip norm
* add frontend tests
* Add the custom op project information (#6334)
* Dont use default string marshalling in C# (#6219)
* Fix Windows x86 compiler warnings in the optimizers project (#6377)
* [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376)
* Unblock Android CI code coverage failure (#6393)
* fix build on cuda11 (#6394)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
* Load the model path correctly (#6369)
* Fix some compile warnings (#6316)
* OpenVino docker file changes to bypass privileged mode
Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device.
Motivation and Context
This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments.
* Megatron checkpointing (#6293)
* Add bart fairseq run script
* Add frontend change to enable megatron
* Initial changes for checkpointing
* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H
* Add load_checkpoint changes
* Fix CI
* Cleanup
* Fix CI
* review comments
* review comments
* review comments:
* Fix generate_submodule_cgmanifest.py Windows issues. (#6404)
* Continue memory planning when unknown shape tensor is encountered. (#6413)
* Reintroduce experimental api changes and fix remote build break (#6385)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* Add support for custom ops to minimal build. (#6228)
* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.
* enable pipeline to run quantization tests (#6416)
* enable pipeline to run quantization tests
setup test pipeline for quantization
* Minor cmake change (#6431)
* Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Farewell TrainableDropout (#5793)
* Deprecate TrainableDropout kernel.
* Update bert_toy_postprocessed.onnx to opset 12.
* Add more dropout tests.
* Fix BiasDropout kernel.
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
* fix null dereference warning (#6437)
* Expose graph ModelPath to TensorRT shared library (#6353)
* Update graph_viewer.cc
* Update tensorrt_execution_provider.cc
* Update graph_viewer.h
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update provider_api.h
* Update provider_bridge_ort.cc
* Update provider_interfaces.h
* Update provider_interfaces.h
* expose GraphViewer ModelPath API to TRT shared lib
* add modelpath to compile
* update
* add model_path to onnx tensorrt parser
* use GenerateMetaDefId to generate unique TRT kernel name
* use GenerateMetaDefId to generate unique TRT engine name
* fix issue
* Update tensorrt_execution_provider.cc
* remove GetVecHash
* Update tensorrt_execution_provider.h
* convert wchar_t to char for tensorrt parser
* update tensorrt parser to include latest changes
* fix issues
* Update tensorrt_execution_provider.cc
* merge trt parser latest change
* add PROVIDER_DISALLOW_ALL(Path)
* add tool for generating test data for longformer (#6415)
* only build experimental api in redist (#6465)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add an option to save the training graph after optimization (#6410)
* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`
* Share allocator between CUDA EP & TRT EP. (#6332)
* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs
* fix max norm clipping test in python packaging pipeline test (#6468)
* fix python packaging pipeline
* make clip norm test compatabile with both V100 and M60 GPUs
* Initial version of CoreML EP (#6392)
* Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)
* update load library code to have the fullly qualified path
* make it work for syswow32
* git Revert "make it work for syswow32"
This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* dequantize 1st input of lstm back if it is quantized (#6444)
* [java] Adds support for OrtEnvironment thread pools (#6406)
* Updates for Gradle 7.
* Adding support for OrtThreadingOptions into the Java API.
* Fixing a typo in the JNI code.
* Adding a test for the environment's thread pool.
* Fix cuda test, add comment to failure.
* Updating build.gradle
* fix SDL native rule warning #6246 (#6461)
* fix SDL rule (#6464)
* use tickcount64 (#6447)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* Update pypi package metadata (#6354)
* Update setup file data
* add missing comma
* remove python 3.5
* fix typo bracket
* Delete nuget extra configs (#6477)
* Op kernel type reduction infrastructure. (#6466)
Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.
* Fixing a leak in OnnxSequences with String keys or values. (#6473)
* Increase the distributes tests pipeline timeout to 120 minutes (#6479)
* [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)
* Add macos coreml CI and coreml_flags
* Move save debuggubg model to use environment var
* Move pipeline off from macos CI template
* Fix an issue building using unix make, add parallel to build script
* Fixed build break for shared_lib and cmpile warning
* Fix a compile warning
* test
* Revert the accidental push from another branch
This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.
* Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
- Add python bindings for ORT format models
- Add script to update bindings and help info
- Add parsing of ORT format models
- Add ability to enable type reduction to config generation
- Update build.py to only allow operator/type reduction via config
- simpler to require config to be generated first
- can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
- Add script to create reduced build config
- Update CIs
* merge e2e with distributed pipeline (#6443)
merge e2e with distributed pipeline
* Fix test breaks in Windows ingestion pipeline (#6476)
* fix various build breaks with Windows build
* fix runtime errors loading libraries from system32
* add build_inbox check to winml_test_common
* use raw string
* cleanup
* fix dll load
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Speed up the Mac CI runs (#6483)
* expose learningmodelpixelrange property (#5877)
* Fix of support api version bug for [de]quantize (#6492)
* SDL fixes: add proper casts/format specifiers (#6446)
* SDL annotation fixes (#6448)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support
* Updated documentation and build.py
* Removed unnecessary libraries from setup.py
* Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)
Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.
* Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)
Description: This PR makes two changes identified while looking at a PGAN model.
First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.
Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.
Motivation and Context
Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.
* Update document of transformer optimization (#6487)
* nuphar test to avoid test data download to improve passing rate (#6467)
nuphar test to avoid test data download to improve passing rate
* Fuse cuda conv with activation (#6351)
* optimize cuda conv by fused activation
* remove needless print out
* exclude test from cpu
* handle status error from cudnn 8.x
* add reference to base class
* add hipify
* [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)
* Init change
* Move some helper from nnapi ep to shared
* Add transpose support
* Fix trt ci build break
* Refine transformers profiler output (#6502)
* output nodes in the original order; grouped by node name
* add document for profiler
* Update to match new test setup. (#6496)
* Update to match new test setup.
* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.
* Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)
* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD
* enable Equal tests
* enable fast_matrix_reduction test case
* Optimize GatherGrad for AMD GPU (#6381)
* optimize gathergrad
* address comments
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* add explicit barriers for buffer overread and overrwrite (#6484)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* fix sdl bugs for uninitialized variables and returns (#6450)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* handle hr error conditions (#6449)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* Dnnl training (#6045)
* Add ReluGrad and ConvGrad ops for the dnnl provider
* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.
* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.
* Enable specifying the execution provider for Gradient Checker Tests
* Prevent memory leak when running dnnl_provider in training mode
Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.
The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.
Signed-off-by: George Nash <george.nash@intel.com>
* Added fixes to maxpoolgrad and memory leak.
Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Fixed misc issues when testing training code with dnnl provider
* fix conv_grad dnnl tests with dilation to run dnnl execution provider
* update mnist training sample to accept convolution type models
convolution models require the input shape to be {1, 28, 28}
instead of the flat {728} image that is used for the gemm models
this will enable models that require the different shape by adding
`--model_type conv` to the command line when running the mnist sample.
(while testing a workaround was used see #4762)
* Disable weight caching in dnnl conv operator when using training
When training we can not use cached weights because the weight
will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
The weight caching was the source of the error from Conv when training.
* Fix issues found when building grad ops on Linux
* The dnnl_convgrad code was over using the scope operator
causing a compilation problem.
* The dnnl_maxpoolgrad code had a logic error that is was
comparing with the source description when it should have
been comparing with the destination despription.
* Update BUILD.md so it shows DNNL for training
* Updated the table of contents. Since the same providers
are listed twice. Once for Infrance and again for Training
an HTML anchor was added to distinguish the second header
from the first for the TOC.
* Fix build failure when not using --enable-training build option
* reorganize the gradient operators so they are grouped together
* Fix issues found when running onnx_backend_test_series.py
* Pooling code only supports 2 outputs when built with --enable-training
* Address code review feedback
* class member variables end in underscore_
* use dst instead of dist to match pattern use elsewhere in DNNL code.
* Remove workaround that was introduced to handle problems running
convolution based training models. See issue #4762
Signed-off-by: George Nash <george.nash@intel.com>
* Isolate training code and code cleanup
* Do not build if dnnl_gpu_runtime if enable_training is set training code
does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
pointer. This matches what was done in Pool code and is safer memory code.
Signed-off-by: George Nash <george.nash@intel.com>
* Address code review issues
- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.
Signed-off-by: George Nash <george.nash@intel.com>
* Code changes to address code review
- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
and `ComputeTheoreticalJacobianTranspose()` instead of using
a pointer to an instance varaible [gradient_checker.h/.cc]
Signed-off-by: George Nash <george.nash@intel.com>
* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com
* Code clean up and formating updates
- Removed empty else statment
- updated indentation of code that was causing double curly brackets to look unususal
- Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
- isolated training code
Signed-off-by: George Nash <george.nash@intel.com>
* Restore inadvertantly removed ConvGrad tests
When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded. This adds
back the Conv3d and Conv3d with strides test cases.
Signed-off-by: George Nash <george.nash@intel.com>
* Add validation to ConvGrad
This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.
The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes
The added validation will return an error if this assumption is not true.
Signed-off-by: George Nash <george.nash@intel.com>
* Do not create new execution providers in provider_test_utils
This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.
Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.
When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run. However, it is no
longer causing problems even with the code removed.
Signed-off-by: George Nash <george.nash@intel.com>
* Change the forward conv stack to a forward conv map
This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.
The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators. This was always problematic and was unlikely to
work for inception models.
Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
possible to use the weight_name as a lookup key to find the
Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
`std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
to and reading from the fwd_conv_kernel_map_ as well as the
fwd_kernel_map_. These should always be accessed by a single
thread when preparing the dnnl subgraphs so the guard should not
be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
stack. The error check is not removed. It will be good to verify
there are no errors as we continue to test against more models.
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
* Lochi/refactor yolov3 quantization (#6290)
* Refactor the code and move data reader, preprocessing, evaluation to
E2E_example_mode
* Refactor the code.
Move data reader, preprocessing, evaluation to model specific example
under E2E_example_mode
* refactor code
* Move yolov3 example to specific folder and add additional pre/post
processing
* Print a warning message for using newer c_api header on old binary (#6507)
* Fix issues with ArmNN build setup (#6495)
* ArmNN build fixes
* Update BUILD.md to document that the ACL paths must be specified to build ArmNN
* Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that.
* Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518)
* Update onnxruntime_test_python.py to work with numpy 1.20.
Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that.
* Fix another test script
* Fix ORTModule branch for orttraining-* pipelines
* Update pytorch nightly version dependency
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com>
Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com>
Co-authored-by: George Nash <george.nash@intel.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Juliana Franco <jufranc@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Tixxx <tix@microsoft.com>
Co-authored-by: Jay Rodge <jayrodge@live.com>
Co-authored-by: Du Li <duli1@microsoft.com>
Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Olivia Jain <oljain@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Ryan Lai <rylai@microsoft.com>
Co-authored-by: Jesse Benson <jesseb@microsoft.com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin@vols.utk.edu>
Co-authored-by: Michael Giba <michaelgiba@gmail.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: Tang, Cheng <souptc@gmail.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Co-authored-by: Vincent Wang <wangwchpku@outlook.com>
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Luyao Ren <375833274@qq.com>
Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com>
Co-authored-by: Tim Harris <tiharr@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com>
Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Martin Man <supermt@gmail.com>
Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Ryota Tomioka <ryoto@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: Yulong Wang <f.s@qq.com>
Co-authored-by: Faith Xu <faxu@microsoft.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
- Add python bindings for ORT format models
- Add script to update bindings and help info
- Add parsing of ORT format models
- Add ability to enable type reduction to config generation
- Update build.py to only allow operator/type reduction via config
- simpler to require config to be generated first
- can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
- Add script to create reduced build config
- Update CIs
* assert sequence tensor and remove skips
* update testdata json
* use ONNX 1.8 in cgmanifest.json
* use previous commit to workaround
* update ONNX commit ID in docker
* skip test_maxpool_2d_dilations test for now
* update function name
Update training Python packaging build to use get_docker_image.py.
Remove BUILD_EXTR_PAR docker build argument.
Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Cmake changes for 2021.1
* added new ov version 2020.1 for faster rcnn
* Added missing defs
* equal op modified
* changes to incoroporate faster rcnn
* backend util.cc
* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp
* changing myriad precision bool to i32
* gather is not enabled for gpu
* conv2D and pooltest auto_pad attribute should not be null
* negative indices are not valid for scatter op in myriad
* non max suppression op only supported in faster rcnn mode
* maxpool indices output is not supported
* Cleaned redundant code in backends
* Added ifdefs for HDDL config
* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify
* we are limiting the subgraph size to 3 here
* taking care of review comments
* Fixed minor bugs
* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph
* incorporated upsample conditions too
* Dockerfile changes for 2021.1 release
* dockerfile aptkey update
* Minor fixes
* ceil condition added again
* Fixed few gpu models
* Disabled LSTM and yolov3 in ModelTests
* python softmax cross entropy tests and negative log likelihood
* Update Build.md
Updated for openvino 2021.1
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider for 2021.1
* Update READMe.md
updated new openvino version
* Update Dockerfile.openvino
added environment variable for DEBIAN Frontend
* Fixed myriad models
* Fixed gather condition
* Fixed mask rcnn model on myriad
* Modified Gather condition
* set default target of MCR dockerfile to MYRIAD_FP16
* Fixed tinyolov3 on CPU
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider documentation
* Update Dockerfile.openvino
Removed environment variable
* Update OpenVINO-ExecutionProvider.md
update image manipulation networks supported
* Update onnx_backend_test_series_filters.jsonc
removed test_upsample_nearest from cpu test cases
* New InternalCI changes for 2021.1
* Full protobuf removed for OpenVINO
* Protobuf added
* Updated with apt installation for openvino
* Revert the testing changes
* Reverted testing changes
* File permessions are changed to original
* Deleted openvino installation and cmake change
* Optimized Dockerfile
Removed unnecessary cmake installation, numpy
* Added missing ifdefs
* delete array fix
* backend_utils.cc output_shape
* Revert "set default target of MCR dockerfile to MYRIAD_FP16"
This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Copy samples to build folder and load models from there. Fix CI
* This PR also includes a fix to path validation for save_as_onnx API
* Add torchtext to CI for GPU training
* Remove new frontend tests from CI
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Removed building ngraph from source
* Disabled some tests temporarily
* Enabled softmax for all dims
* Added onnx importer to link libraries
* int64 changes
* fixed
* temp
* slice update start and end need to be initializer
* Disabled GatherND, ScatterND, ReverseSequence operators
* Added supported ops instead of unsupported ops
* Set precision only for CPU
* Removed some unecessary conditions
* Fixed segfault in slice
* Softmax restriction removed
* changes
* Setting precision for all plugins
* Changes added to include precision
and supported ops for gpu and vpu
* branch op support
* checking for disabled python test failure
* mapped input names and tensors directly rather than copying which was leading to mismatch
* last index is not supported
mkldnn does not support pow between integers
* included the code changes
* Rename inner-scoped variable to avoid MSVC warning
* applied changed to vadm as well and removed the utility function
getinputtensors() completely
* OpenVINO multi version support: CMake changes
* OpenVINO multi version support: C++ support
* removed commented code
* Remove redundant code lines
* Revert "Rename inner-scoped variable to avoid MSVC warning"
This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.
* vadm changes disabled reduction op test
* putting test_gather_negative_indices in unsupported list for now
* Update MCR Dockerfile with 2020.4
Installs OpenVINO 2020.4 from deb packages via APT tool.
* Update build docs with 2020.4 info
* Update dockerfile with OV 2020.4 info
Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04
* Added constant folding bypass logic
* Added cout statements for ci
* Added NDEBUG flag for debug symbols
* Update Ops info in docs
* fixes multiple unit tests
* mathoptest.ceil disabled for gpu and myriad
* activation test temp disabled
* Fix models for CPU
* Fixed a syntax error
* local cmmit
* fixing unit tests for myriad
* Fixed Variadic Split, Topk issues
* fix_model commit
* Fix models in myriad
* Added ifdefs for OpenVINO 2020.4
* temp
* made some changes to not operator
* Added unused parameter
* relu enabled
* Fixed bug in Conv output
* Consolidated GPU failing tests into one category
* Made it compatible to InternalCI 2020.4
* Made changes for ngraph
* Disabled test for mask,fastercnn,tinyyolov3
* Removed proxy for ci
* run_dockerbuild.sh restored to same version
* run_dockerbuild.sh restored to same version
* run_dockerbuild.sh restored to same version
* Updated documentation for 2020.4
* Removed FP32 to FP16 transformation for GPU
* Disabled Coreml-FNS-Candy model test
* Added FP16 transformations
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)
Add ORTTrainerOptions class and some placeholders
* Add _ORTTrainerModelDesc to perform validation for model description (#4416)
* Add Loss Scaler classes to the new frontend (#4306)
* Add TrainStepInfo used on the new frontend API (#4256)
* Add Optimizer classes to the new frontend (#4280)
* Add LRScheduler implementation (#4357)
* Add basic ORTTrainer API (#4435)
This PR presents the public API for ORTTrainer for the short term
development.
It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session
* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)
* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)
* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions
This PR keeps the public API intact, but changes how model description is stored on the backend
Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).
With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.
Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.
This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.
Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None
* Rename input name for test
* Add ONNX Model Export to New Frontend (#4612)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Create training session + minor improvements (#4668)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Save ONNX model in file (#4671)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add eval step (#4674)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add train_step (#4677)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add LR Scheduler (#4694)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Add deterministic compute tests (#4716)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* Add Mixed precision/LossScaler + several fixes (#4739)
Additionally to the mixed precision/loss scaler code, this PR includes:
* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests
* Add ONNX BERT example + minor fixes (#4757)
* Fix training issue when passing ONNX file into ORTTrainer
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add Dynamic Shape support (#4758)
* Update DeepSpeed Zero Stage option to a separate option group (#4772)
* Add support to fetches (#4777)
* Add Gradient Accumulation Steps support (#4793)
* Fix Dynamic Axes feature and add unit test (#4795)
* Add frozen weights test (#4807)
* Move new pytorch front-end to 'experimental' namespace (#4814)
* Fix build
Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail.
2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Remove 'model_.' prefix for onnx model initializers in training
* fix test case remove redundant device test
* rename
* Fix state_dict/load_state_dict with frozen_weight
* nit
* Add monkey patch for pt opset 10
* remove pt patch in CI
* nit: newline
Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
* initial change to transformer.py
* prepare e2e transformer tests
* refactor transformer tests
* put test python files in a flat folder
* fix typo pip install transform(s)
* python 3.6
* python version to 3.6 in install_ubuntu.sh
* remove argparser
* to use opset ver 12
* workaround loss_scale naming patch in case of loss_fn_
* assign self.loss_fn_ so it can be checked
* skip a few un-needed post-process steps
* fix loss_scale_input_name, clean up post process steps
* skip non-frontend tests
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* update with reviewers' comments
* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference
* fix merge mistakes
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* Added FP16 transformations
* Revert "Added CMAKE_BUILD_TYPE to make building dynamic"
This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85.
* Added FP16 transformations for FP16 builds
* Backend logic cleanup
Cleans the backend(intel_graph.*) code in the following ways:-
1. Minimize global usage: Since all the IR graphs need to be
re-generated on every Infer, it is bad practice to rely on globals
for their saving and usage as there would be multiple readers and
writers to the same global variable leading to incorrect usages or
contentions. This change replaces globals with locals where possible.
This change also fixes an existing bug with due to
incorrect global usage.
2. Remove all unused functions.
3. Remove all unused headers and prepocessor directives.
* removed commented out code
* Disabled default optimization for Intel EP
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fix missed plugins.xml for python bindings
* Fixed the build after latest master changes
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled unsupported ops for accelerators
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added some more disabled ops
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added environment variable to enable debugging
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added more debug statements
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed unsupported ops list for GPU and VPU
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed unsqueeze unit tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added error message to the status
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Overwrite Model proto with shape info from data
Overwrites the shape info of Model proto with the shape from
actual input data. Needed for inferring models with Dynamic
shapes.
* Removed print statement and disabled where op
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Reshape with Empty initializer
* Added more debug statements for 1P
* Don't allow 1D inputs with symbol for dimension
* Disabled some 3rd phase ops
* Disabled split and added zero dimension check for OutputDefs
* Cleanup zero dimensionality check
* Added different data type check for inputs and initializers
* Added conditions for Mod, Cast and Pad
* Removed unused variable
* Disabled scan and added conditions for squeeze
* Added changes for fixing all C++ unit tests
* Implements Backend Manager class for caching
Backend Manager provides a layer of indirection between EP interface
and OV backend that provides caching services for models with
symbolic dims in input shapes.
* clean up commented blocks
* clang-formatting
* Read I/O type info from ModleProto
Read the tensor element type information from ModelProto object,
as FusedNode is no longer available.
* code cleanup
* clang-formatting
* Added print statement for jenkins
* Disabled some python tests
* Changed the path of convert fp32 to fp16 hpp
* Added conditions for BatchNorm in GetCapability
* Fixed failed tests
* Revert "Added conditions for BatchNorm in GetCapability"
This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90.
* Added Intel to onnxruntime backends
* pick up vars set by OV package setupvars.sh
* Added conditions for Identity
* remove a few cout prints
* Added conditions for GPU_FP32 unit tests
* Revert "pick up vars set by OV package setupvars.sh"
This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b.
* Commented out fatal message for protobuf
* Might need to be removed
* Add interface class for current backend
* moved common logic to base class
* simplified cpu backend
* Removed unused headers
* use vectors to save i/o tensors for windows compatibility
* move utils fxns to backend_utils namespace
* rename ov_backend to ibackend
* Factory pattern for backend creation
* rename CPU backend to Basic backend
* renamed to vad-M and added to factory list
* Added conditions for VPU
* Added print statements
* Changed the logic for checking for symbolic shapes
* Modified logic for zero dimension check
* Removed VPU single dimension condition
* Removed comments
* Modified logic in DimensionCheck method
* Remove legacy OpenVINO EP
Remove all the legacy code for OpenVINO EP. UEP code will take its
place going forward.
This change does NOT remove OVEP files in the following areas asa
they will be reused by UEP:-
1. Documentation: All .md files
2. Docker releated files
3. Python bindings
4. Java bindings
5. C# bindings
6. ORT Server
7. CI pipeline setup files
* Rename Intel EP to OpenVINO EP
* Added unique names to the subgraphs
* Removed subgraphs with only constant inputs
* Modified subgraph partitioning algorithm to remove const input subgraphs
* Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc
* Tracking output names to fix the output order bug
* Changed output names to a unordered map
* Modified logic to check for symbolic input shapes
* Fixed a bug in Reshape check
* Added empty model path to Model constructor
* Made necessary changes to cmake to build from the binary package
* Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR
* Enable dyn device selection with C++ API
* Added Round operator to unsupported list
* Modified subgraph partition logic for MYRIAD
* Removed supported ops from the list
* Enable dyn dev selection in Py API's
* Add documentation for dynamic device selection
* Use MYRIAD || HDDL instead of VPU
* Removed temporary cast of Int64 to FP32
* Disabled unit Tests for CPU_FP32 and GPU_FP32
* Removed default "CPU" from unit tests to allow overriding
* Removed ops Concat, Squeeze, Unsqueeze from unsupported list
* Get the device id from info
* Removed overwriting device_id and precision
* Enabled ConvTranspose and EyeLike
* Reordered unsupported ops in alphabetical order
* Fixed syntax error
* Fixed syntax error
* Code clean-up: Handle exceptions, logs and formatting
Code formatted according to ORT coding guidelines.
* remove debug print from pybind code
* updated docs with ops and models
* formatting prints
* Added default values for c and j for openvino
* Overriding the values set for c and j to be 1
* BACKEND_OPENVINO should be empty if openvino is not in build
* Overriding c value with default for perftest
* fix VAD-M device string bug
* Add IE error details to exceptions
* Use IE specific device names in EP
* Add VAD-F (FPGA) device support
* Removed unecessary libraries from whl package
* Code changes for Windows compatibility
* Add VAD-F option to python API
* [revert before merge] cmake changes for RC
* Enable Windows build in CMake
* Unset macro OPTIONAL for windows builds
inference_engine.hpp's include chain defines a macro 'OPTIONAL'
which conflicts with onnx project's headers when using MSVC. So
would need to explictly unset it for MSVC.
* Use a single copy of plugin/IE::Core
Defined as a static member in Backend manager
* Remove restriction of single subgraphs for myriad
* Passed subgraph name to Backend to enhance log statements
* Disabled zero dimension conditions
* Disabled concat to remove zero dims
* Enabled building ngraph as part of ORT
* Removed serializing and added versioning
* Fix CPU_FP32 unit tests
* Removed unecessary condition
* add ngraph.so.0.0 to .whl
* Check for zero dimensions only for inputs and outputs
* Restrict loading only 10 subgraphs on myriad
* Build ngraph.dll within UEP. Doesn't link yet
* Rename Linux included libngraph.so to libovep_ngraph.so
Renames locally built libngraph.so containing ONNX importer to
libovep_ngraph.so in order to avoid linkage conflicts with
libngraph.so supplied by OpenVINO binary installer.
Applies only for Linux builds.
* use output_name cmake properties for lib name
* fix .so name format in lib_name.patch
* CMake code cleanup
* Rename WIN32 included ngraph.dll to ovep_ngraph.dll
To avoid conflict with ngraph.dll distributed by openvino.
* Added myriad config for networks without 4 dimensions
* Loading the 10 max clusters for inference on myriad
* Refactor code and add Batching support
Encapsulate subgraph settings into context structs.
Add batching support for completely supported models.
* Disabled some broken tests
* use input_indexes to avoid batch-checking initializers
* Avoid static initialization order error on WOS
* Added candy to broken tests
* InternalCI changes for 2020.2
* Updated DLDT instructions
* Unsaved changed in install_openvino.sh
* Changes after manual check
* Remove custom ngraph onnx_import build for WOS
ONNX Importer on WOS does not have protobuf issue.
* Remove FP32ToFP16 ngraph pass
This conversion is performed implicitly within IE.
* Surround debug logic by #ifndef NDEBUG
* remove invalid TODO comments
* removed references to ngrpah-ep
* clang-formatting
* remove commented code
* comment edits
* updating copyright year to that of first OpenVINO-EP release
* remove redundant log msg
* Modified operator and topology support
* Update build instructions
* doc formatting
* Fixed clip unit tests
* Revert "Remove FP32ToFP16 ngraph pass"
This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1.
* Applying FP16 transformation only for GPU FP16
* Fixed GPU FP32 python tests
* automatically use full protobuf
* disable onnxrt server for now
* Disabled upsample
* update dockerfile instructions
* Removed MO paths and added ngraph path
* Remove OVEP from ORT Server docs
Will put it back in after validation
* Updated path to Ngraph lib
* Disabled Resize and some other python tests
* Removed unnecesary header files
* Use commit SHA to fetch ngraph repo
* Avoid un-needed file changes due to version update
* Fixed clip tests
* Fixed Pow, max and min onnx tests
* build.md doc typo
* Update cmake patch command for ngraph src
* remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY
* use spaces instead of tab
* remove commented code
* Add info about protobuf version
* edit debug env var and enable for WIN32
* specify only version tag of 2020.2 for dockerbuilds
* remove unnecessary file changes
* Pass empty string as default argument to C# tests
* Use ${OPENVINO_VERSION} to name openvino install directory in CI builds
* Enabled unnecessarily disabled tests
* Fixed ngraph protobuf patch
* Fixed error in protobuf patch
* Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds"
This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002.
* Remove unsetting OPTIONAL macro
This is no longer used in recent ONNX update onnx/onnx@da13be2,
so this unset workaround is no longer necessary.
* Use a null string default argument for C# API
* Set OpenVINO version yml files and pass to CI Docker builds
Git Tag info for DLDT as well as install directory are set
using this value.
This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5.
* Documentation: recommendation and instructions for disabling ORT graph optimizations
* more doc updates
* Reduced the number of models according to CI time constraints
Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com>
Co-authored-by: mbencer <mateusz.bencer@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
* add frontend minst test
* to use torch nightly with torchvision
* remove incorrect comment per reviewer's comment
* experiment torchvision import failure
* experiment install_deps.sh
* more experiment install_deps.sh
* experiment install_deps.sh with --upgrade
* Experiment with install_deps.sh.
* Experiment with install_ubuntu.sh.
* Use Ubuntu 18.04 and Python 3.6 for CI.
* Update cmake version for CI.
* Install MPI on Ubuntu 18.04 for CI.
* Increase tolerance for MNIST test.
* Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa.
* Clean-up.
* Update ort_trainer.py from ort_training.
* Get default Ubuntu Python ver back to 3.5.
* Add underscore to opset_version parameter name in ORTTrainer constructor.
* Move loss/model wrap before the call for sample output.
* Update expected values for MNIST test.
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec!
- Reverse integrate changes from *.in.proto files in github ONNX repo.
- Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs
- Disable ONNX tests that don't have op implementation for the latest opset.
Use CUDA 10.1 for Linux build
(Windows change is already in)
Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits.
libcublas10-10.1.0.105 won't work!!!
The cuda docker image by viswamy is already using 10.1, no need to change.
1. Pipeline changes for python 3.8
2. Fix a regression in setup.py which was just introduced in the previous commit.
Please notice, we still haven't made python 3.8 + Windows + CUDA work.
* change c++14 to c++11
* add ld lib path for centos
* enable csharp tests on macos
* fix C API test on MacOS + fix manylinux dotnet install
* fix manylinux dotnet install
* fix lib link
* add centos tests to linux cpu ci pipeline
* Disable failing test
* use centos6 instead of centos7
* change back to centos7
* add dotnet runtime dependency
* fix dotnet runtime dependencies
* install dotnet sdk instead of runtimes
* add more dotnet dependencies
* temporary skip failing test
* ix lib path
* reenable failing test
Remove gsl subodule and replace with a local copy of gsl-lite
Refactor for onnxruntime::make_unique
gsl::span size and index are now size_t
Remove lambda auto argument type detection.
Remove constexpr from fail_fast in gsl due to Linux not being happy.
Comment out std::stream support due to MacOS std lib broken.
Move make_unique into include/core/common so it is accessible for server builds.
Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
due to x86 build.
Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
1. remove sudo from the cleanup step for Linux so that we don't need the sudo access for vstsagent build user
2. a minor fix in the install_ubuntu.sh to make the image smaller for openvino
* Bump onnx to latest
Update onnx.in.proto with changes for SparseTensor.
* add temp skip tests
* remove passed tests from skip list
* skip more tests for new ops in opset 11
* skip crashing tests
* update handling of new attribute types sparse tensor and sparse tensors
* advance onnx commit and remove skip cpu_flaky_tests
* temporarily skip yolo3 model test due to resize opset10 shape inference regression
* update proto for onnxruntime server
* advance onnx commit further
1. Add openvino GPU nightly build pipeline, this test is running on Intel Up square Edge device. The device are host locally not from Azure VM. We persist a smaller model test data on Edge device.
2. Update the build condition for openvino GPU so it works for GPU_FP32, GPU_FP16
3. add option to install_ubuntu.sh to exclude the package used for nuphar, so that we can save some disk space as the Edge device usually have limited disk space.
Enable Nuphar EP docker build
Revert back to LLVM 6.0.1
Reinstate disabled Softmax tests caused by LLVM 8.0.1
Reinstate Nuphar Python test due to stale sympy version
Increase build timeout of Linux CI
* Add MacOS leg of Python packaging job
* Update copy files source directory for Mac OS leg
* Add a task to display the binaries directories contents after build wheel creation
* Revert some changes
* Add task to log
* Update
* Remove unnecessary logs
* Update DNNLibrary
* Allow fp16 by default
* Add nnapi build in ci
* Fix nnapi ep after #1268
* Remove unused variables
* Support nnapi in onnx_test_runner
* Update DNNLibrary to fix tests
* Update build.py for android build support, solve conflict of
tools/ci_build/build.py
* Support non-ARM Android build, solve conflict of tools/ci_build/build.py
* Enable android test by x86_64 android emulator
* Add dnnlibrary/NNAPI support in build.py
* suppress the verbose adb output
* Remove debug logs
* Install cmake by pip
* Fix undefined host_protoc_path
* cmake==3.13.2 in pypi is actually 3.12.2, so install 3.13.2.post1 instead
* Fix Android ARM64 build
* Use android ndk r20 instead of r19c, fix conflicts in install_deps_android.sh
* sync onnx to get equal op with float support
* doc update
* fix test failure because of updated shape inference logic for roialign.
* filter consum test cases since it's not implemented yet.
* Update cuda for python wheels
* Update cuda for python wheels
* Update cuda for python wheels
* Update azure-pipelines-py-packaging.yml
* Update to cuda 10
* Only test win gpu
* Update cuda for python wheels
* Use manylinux2010 image to build linux python wheels
Allow wheels built to truly be compliant with a manylinux policy
* call install_onnx with relative path
* use caller path
* get abs path of current script
* process unicode char
* replace script name
* add suffix as part of match
* match only the end of path
* Initial commit for OpenVINO Execution Provider
OpenVINO Execution Provider provides the interface for ONNX Runtime
applications to access Intel's hardware accelerators using Intel's
OpenVINO Toolkit.
* Fixed bug in GetCapability to disable custom ops
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added OPENVINO ci pipeline
Added new pipeline for openvino provider,
made changes to support the docker build and
onnxruntime build with openvino.
Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>
* Enabled all unit tests for OpenVINO EP
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed syntax issue in run_docker_build.sh file
* Added missing default OPENVINO_VERSION
Default value for OPENVINO_VERSION env was
missing causing the build to fail
* Added install Model Optimizer deps step
* Fixed python unit tests and some tests from onnx_backend_test_series
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed indentation bug
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled some of the python backend tests for OpenVINO
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled some model tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Remove Duplicate checks for openvino in build.py
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Modified GetCapability for FP16
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled GPU FP32 tests that are not supported
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Convert modelProto to string and use it in compile
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Pass byte-array input args to MO
* Serialized ModelProto passed in-memory to MO
ModelOptimizer python module receives the serialized ModelProto
in-memory.
Uses appropriate ONNX function to load the serialized bytes.
* Make Py_Finalize compatible with older python versions
Also, remove pFunc unassigned variable possibility.
* Fallback if input dims of Matmul is greater than 2
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* fixup: Device #define syntax
* Updated the documentation
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Enable dynamic dim value
* removed commented out code
* Added Dockerfile for openvino EP
Updated instructions on dockerfiles/README.md file
Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>
* Disabled fp16_inception_v1 test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Code formatting with clang-format
Uses style from the .clang-format file in root directory.
* fixup: docker tag and build error fixes
* Heuristics to automatically detect batching
Distributes slices from batch into parallel infer-request objects.
* Handle disabled tests in GetCapability
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled average pool and max pool if ceil_mode is 1
Also dilations are not supported if they are greater than 1
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Unsqueeze int32 test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* changes to fix output results bug
* Disabled a few C++ unit tests for MYRIAD FP16
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Manually revert '9fe162bb Enable dynamic dim value'
Reverts compile time setting of dynamic shape
Reverting manually due to significantly huge auto-revert conflicts.
* Fixed unused variable warning
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Mul test for GPU_FP16 due to accuracy issue
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* VPU documentation update
* Disabled inception_v1 for MYRIAD and HDDL
*Also disabled few C++ accuracy tests for HDDL
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* updates from upstream
* use the new CustomOpApis for I/O interfacing
* Pass initializers as subgraph meta-def inputs in GetCapability()
Requirement due to API changes introduced with PR# 1019.
* Remove obsolete functions
* Save indexes of graph inputs from fused_node info
Both inputs and initializers are passed as data inputs to the
infer function. To identify only inputs among them, save thier
index info from fused_node in Compile function.
* Documentation changes to enable VPU
* Fix VPU related changes in documentation
* Fix minor changes in documentation
* Fix VPU related changes in documentation
* Use Node.In/OutputDefs() to track graph inputs and outputs.
Don't use graph_viewer's GetInputs() or
GetInputsIncludingInitializers().
* Permit "SAME_UPPER" auto_pad attribute from MaxPool
* Disabled fp16_tiny_yolov2 in onnx model tests
* Updated documentation to include configuration guides for myriad and hddl
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Use 8 Infer requests only for VAD-R
* disable debug prints
* Clang-format source files
* Updated BUILD.md with OpenVINO R5 links
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled same upper python tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Update test exclusion syntax
* Change path of install_onnx.sh
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disable tiny_yolov2 in broken tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Revert "Change path of install_onnx.sh"
This reverts commit ba9db165f3be430f2aff1ef413299ed04637196a.
This change is only required for Intel internal CI pipeline until
the settings are matched with the upstream's CI pipeline.
* Added debug statements for debugging CI error
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Add --build_wheel to linux openvino pipeline
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added -v option to onnx_test_runner for debugging
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed path change patch
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added -c 1 to onnx_test_runner
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Refactor MO python invocation in separate function
Cleans up Model Optimizer python invocation check and conversion
logic. Invokes MO only once in GetCapability() and passes the
IR strings (xml and bin) to the Compiler as meta-def attributes.
* Add comments
* code cleanup and comments
* Code cleanup for GetCapability
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed unnecessary files
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Revert "Added -v option to onnx_test_runner for debugging"
This reverts commit d1dd70938a94d648df1a1dbbc2e48d0b97e49ec8.
* Revert "Added debug statements for debugging CI error"
This reverts commit b86d41afed2aa29c3508155d6f9c8d3a7263cc60.
* incorporate Status Code changes
* ComputeFunc returns Status::OK() on success
* Use test names to disable tests for MYRIAD and VAD-R
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Rename local identifiers from CNNNetwork to OpenVINO network
CNNNetwork is an OpenVINO's API class that represents more than
just convolutional neural networks (CNNs). Renaming helps to avoid
confusion that the API's only support CNN type models.
* Added error message if building on windows
* Removed duplicate option in Cmake
* Removed unnecessary parameters in activation_opt_test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Refactor Map search and access logic for efficiently and cleanliness.
* use C++ style casts
* Use os.path.join for python directory path operations
* use C++ style casts
* EP classes should use onnxruntime namespace
* Clean up fixes from PR comments
* Don't explicitly shutdown Py interpreter
* Remove debug print statements
Prints will be re-enabled later with a logging mechanism with
debug/verbose printing options.
* Decrement ref counts for used pyObjects
* Restore build instructions for other compilers
Content under the "Using other compilers" section has been
accidentally deleted by a previous commit. Restoring back that
content from the latest upstream repo.
* CMake code cleanup
Code clean up, commenting and formatting of CMake code.
* Don't pass the unused device_info parameter to OpenVINOGraph ctor.
* Add support for multiple I/O data types
Adds support for the following tensor data types for graph inputs
and outputs:
1) float
2) float16
3) int32
4) int16
5) int8
6) uint16
7) uint8
* cleanup setup.py module list definition
* Deduce index of input using tracked input index map
Ignores initializers in case they are ordered before inputs.
* Removed debug statement in MO code
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* PR feedback
* Removed per_sample_tolerance for openvino
* Removed unnecessary disabled tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed debug function
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled tiny_yolo_v2 due to accuracy issues
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Changed the disabled reason for broken tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Reshape with no input
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Python formatting with Autopep8
* Minor fix for MYRIAD devices
* Added zero dimension check
*Removed setting batch size for the network
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Set the threshold to larger value for MNIST
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed setting higher threshold in provider_test_utils
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Check for --use_openvino in python wheel setup.py
Add openvino modules to the setup script for building the wheel
package only for --use_openvino a build option.
* Removed nullptr checks for GetNode()
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Advance ONNX submodule to 5c51f0dbbe88ee1536f17ee7bd462b2ab3772c52
This commit in ONNX contains a fix to ConstantOfShape test data.
Uncomment ConstantOfShape.
Update test script, make sure exclusions are uniform.
* add version filter to failed tests
* exclude test from backend
* exclude shrink from opset 9
* fix compile err
* exclude certain version of constant shape
* enable flatten test
* fix compile err
* comment mvn test
* disable constantofshape test in x86
* disable x86 test
* get model version from imported opset
* test linux x86 case
* disable nonzero opset 10
* make mutex const
* test filter by commit id
* adjust substr offset
* Limit test platform
* remove change impacting TFModleInfo.h
* refactoring
* refactoring
* test x86 pipeline with filter
* add comment
* restrict version extraction on non-win
* restrict version extraction on non-win
* add tag
* exclude case from backend test
* remove dup
* remove dup
* make script runnable
* hard code adsolute path
* refactor log
* fix x86 compile err
* fix x86 compile err
* fix x86 compile err
* sync with latest tensorrt
* switch to regex
* fix cpu pipeline err
* test filter
* disable nonzero from all versions
* Update the ORT-SRV ci pipeline setup
* Update pip package installation for server tests
* Install requests package in build setup
* Check if python dependencies exists before install
* Accomodate missing optional 'axes' when 'steps' is present in Slice op (#946)
* Accomodate missing optional axes when steps is present in Slice implementation
* PR feedback
* Update package links (#937)
* Update package links
* Minor fix
* Update README.md
* Minor edit
* Update onnx commit (#949)
* Update onnx commit
* disable failing tests which don't have to be fixed for this release
* dummy change to fix file permission
* fix file permission
* move files
* move files
* Remove NonMaxSuppression from Contrib op, move it to Onnx domain, opset 10
* move NMS out of namespace contrib
* update data type in UT
* update to latest onnx
* white list the node test for Mod which is not implemented yet
* enable android build
* Add 'log' to onnxruntime_EXTERNAL_LIBRARIES
* Remove cmake about header_files_test.cc
* Add Android CI pipeline
* Remove some ms-specific(?) ci
* Fix bash error
* Add execute flag for install_deps_android.sh
* Add install_ubuntu_for_android.sh
* Remove python in deps for android
* Add comment for BUILD_ARCH
* Set BUILD_SERVICE to cpu
* Set BUILD_OS in run_build.sh
* Fix -o bug in run_build.sh
* Android -> android
* Correct the android ndk location
* Checkout submodules in my own azure pipelines
* Revert "Remove some ms-specific(?) ci"
This reverts commit 302463213480487d8944c3127a3b311c591d55c0.
* Revert "Checkout submodules in my own azure pipelines"
This reverts commit 1acfb6755f933e532b8312ca35bb4900a833903f.
* Exclude unreferenced global data and op doc strings in the opschema object. The first causes a decrease in the binary size by at least 85k. The latter reduces resident memory size.
* Update onnx to incorporate my PR that fixes SetDoc compiler warnings
* Update onnx
* Support updated function schema in ORT
* Update onnx related commit hash
* Check out an older commit in ONNX
* Add support for subgraph attribute
* Add comments
* updated cmake files for trt
* added trt execution provider
* added trt basic test
* removed trt_path action attribute
* Add files via upload
* Update build.py
* Update trt_allocator.h
* fixed issues found by reviewers
* changed cast operator
* added comment for custom kernel implementation
* changed auto to auto&
* changed to function compile APIs for TRT execution provider
* changed to function compile APIs for TRT execution provider
* added new DType DInt64
* adapted to the changes of onnxruntime_c_api
* removed trt kernel (use function compile instead)
* updated onnx-tensorrt submodule
* set default memory type to TRT fused kernel
* resolve merge conflict
* fixed the issue that USE_CUDA conflicts with USE_TRT
* construct graph by adding nodes in topological order
* made changes for Windows
* change buffers type
* bypass HasImplementationOf check for TRT XP because TRT kernel is not registered
* added domain to version info in rebuilt model proto
* added trt to test option list
* added DomainToVersionMap() to GraphViewer
* removed Copy()
* fixed broken code
* format the code to clang format
* used local reference to the frequently used values
* fixed a couple of issues according to reviewers feedback
* fixed a couple of issues according to reviewers feedback
* added python binding for TRT and enable use_cuda when use_trt is on
* fixed a redefinition issue
* changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers
* enabled trtexecution provider for unit tests
* renamed trt to tensorrt
* added tesorrt to python binding
* update submodule onnx and onnx-tensorrt
* made a couple of minor changes based on reviewer's feedback
* added CUDA_CHECK
* removed test code
* fixed broken code after merge
* updated onnx-tensorrt submodule
* added post processing to align trt inputs/outputs with graph inputs/outputs
* updated onnx submodule
* added CUDA fallback for TensorRT and fixed TensorRT cmake issue
* added ci pipeline for tensorrt and removed some redundent code from trt xp
* fixed syntax issue
* updated onnx-tensorrt submodule
* fix trt build problem by: (#602)
1. Add additional /wd for debug build
2. Add io.h for additional targets
3. Bring back mb version of getopt
* Update install_ubuntu.sh
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update run_build.sh
* Update run_build.sh
* Update run_build.sh
* Update run_build.sh
* fixed the issue that GetKernelRegistry returns nullptr
* merged master to this branch
* moved some data types to private
* fixed tensorrt CI pipeline issue
* customized test data for TensorRT pipeline
* added onnx-tensorrt in json file and fixed an issue in ci script
* added comments
* sync onnx and maintain old version history for removed exp ops in onnx runtime.
* update
* updating to specific onnx commit - remove exp ops.
* update
* disable the 3 failures to push the change as it's blocking folks.
* update test
* cross compile x86 linux
* fix comments
* install multilib for ubuntu cross compile
* remove tailing slash
* fix -fPIC relocations for x86 target too
* add asm make flag
* fix x86 compile err
* test x86 with zlib and png
* Disable zlib from x86
* install x86 python header
* remove cross-compiling changes
* test 32bit ubuntu
* add x86 ubuntu docker file
* add x86 as arch parametr for docker build
* config pipeline
* avoid dotnet install
* install cmake
* skip dep install
* use latest ubuntu
* install latest cmake
* install x86 deps
* configure cmake
* install ninja
* correct ninja dir
* apt get re2c
* install onnx
* set processor x86
* disable warning
* skip test
* disable test
* disable test
* find lib
* fix typo
* restore test
* disable backend model test
* disable test
* fix test err
* stop installing onnx
* disable onnx test on x86
* restore yml
* mergef with master yml
* cancel needless config setting
* enable x86 flag
* restore all onnx tests
* fix yml typo
* install onnx
* add back x86 flag
* disable cases
* disable case
* disable cases
* add macro to disable cases
* fix typo
* print platform
* remove condition
* update packaging numpy version to 1.15.0
* update version in numpy version in linux
* Install numpy 1.15.0
* Finish up numpy requirement after test
* Try fix
* Fix ci script
* Update cast kernel to support to/from string
* Update namespace
* Add support for literal numeric case
* Update to support -INF test
* Update kernel registration for cast
* Update ONNX to 1.4.1
* Update registy api
* Resolve some comments
* Update cast kernel implementation
* Resolve comments
* Fixed test data in onnx
* Update cast kernel implementation
* Resolve PR comments
* Update cast_op.cc
* Update onnx commits info
* Update comments
* fixed typo in runtest.sh
* some fixes
* some fixes
* some fixes in the runtest.sh
* added test data url
* fixes on the dotnet test scripts
* fix on prior mistake regarding installation of apt-transport-https
* added verbosity in the test run for easy debugging
* updated comment in the runtest.sh
* Update ONNX version to pickup Scan spec change that adds scan_output_axes.
Add logic to transpose an output
- write to temporary buffer when executing subgraph
- transpose temporary buffer into Scan output when execution completes
Add unit tests
* Update to ONNX dbf3581835e3a05716e10587511d7ab3b2cdc386 to pickup inferencing bugfix.
Update test to match.
* Disable some tests for opset 9 operators that haven't been implemented yet.
* Added test data arguments to build.py, modified win-ci-pipeline build.
* Updated CI builds to use template tasks, added test data args, removed AZURE_BLOB_KEY uses.
* Fixed up set test data step template.
* Imlpement StringNormalizer
Add mixed language tests, test case insentive path.
* Create a locale on the fly. Default locale does not seem to create well.
* Add CI language-pack-en to make default locale available.
Catch and translate locale creation exception to make the message
meaningful.
* Make sure locales are configured on Ubuntu.
* Add pipeline for building python wheels for Windows/Linux CPU and GPU
* try enable mkldnn
* remove mklml
* Update python packaging configuration
* Add python3.7 support
* Revert to disable the py37 packaging on windows