Add Xamarin support to the ORT nuget packages.
- Update C# code to support Xamarin builds for iOS and Android
- refactor some things to split out common code
- include iOS and Android ORT native shared library in native nuget package
* make work for both rocm 4.2 and rocm 4.3.1
* fix rocm 4.3.1 docker image reference
* fix CUDA_VERSION to ROCM_VERSION
* fix ReduceConsts conflict def
* add ifdef to miopen_common.h as well
* trailing ws
* 2021.4.1 Docker and ci changes
* OV version change
* Removing Imagescaler op from the op's list
Reverting this change which was added in last
PR. Imagescaler is now deprecated. so removing
it from the supported list. Also this
op is causing regression in the performance
of the FP16 models.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Re-writing the help message for num_of_threads
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
* install protobuf from source
* fix rm command in Dockerfile
* fix options on rm command
* fix cd into protobuf source directory
* try again
* remove strip step
* debug list the files
* ls on /usr
* more debug
* more debug
* adjust LD_LIBRARY_PATH
* try remove protobuf before ORT build
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* draft - enable model local function
* enable model local functions in ORT
* update to latest rel onnx commit
* plus tests
* plus more updates
* plus updates
* test updates
* Fix for nested functions + shape inference
* plus bug fix and updates per review
* plus fixes per review
* plus test updates
* plus updates per review
* plus fixes
* fix a test
* copy changes from trt_and_mem
* second edits
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* change to cuda 11.4
* build with cuda 11.4
* Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2
* add cmake extra defines
* cmake architectures
* fix cmake arch
* Delete ubuntu-18.04.Dockerfile
* Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* removing previous ort args
* rename to cuda 11.4
* remove cuda 10_2
* delete trt 7.1
* remove 7.1
* Passing in cuda architecture to reduce build time
* always add submodule sync due to recursive cloning
* fix run command
* add and
* take away unused arms and share python installation script
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update Dockerfile.tensorrt
* cleanup file
* install python directly on dockerfile - move to scripts in future
* Update Dockerfile.custom-trt-perf
* adding cuda 11.1 for missing Libnvrtc.so.11.1
* Delete install_python.sh
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* prepare for PR
* Rename cuda directory to gpu directory in tarball
* Fix gpu java package
* fix bug
* fix small bug
* Add onnxruntime_providers_shared.dll into gpu nuget package
* Modify for test
* Temporarily remove for test
* Modify for test
* Modify for test
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* modify for test
* modify for test
* fix bug
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Prepare for PR
* Prepare for PR
* Code refactor
* Rename proper Artifact name
* Rename intermediate Artifact names
* Revert Artifact Names
* Rename Artifact Names
* Modify Artifact name
* Modify Artifact name
* Modify Artifact name
* Update Java package
* Update Java package
* fix bug to change artifact name
* Fix bug for the wrong file path
* Fix no fetching correct artifact and test
* temporarily modify for test
* undo the change for test
* fix build - python.h not found
* disable --build_shared_lib for ortmodule tests
* fix
* fix the build flag
* disable --build_shared_lib for training path (not only for ortmodule)
* fix missing test model files
* disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON
* enable custom_op_library build
* fix build
* fix
* merge master and fix build failure
* build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON
* resolve comments
* use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"
* initial update from 11.1 to 11.4
* change 11.4.1 to 11.4.0
* adjusting to match nvidia/cuda image tags
* adjusting to match nvidia/cuda image tags centos7
* correction to 11.4.0
* correction to 11.4.0
* update to cuda 11.4
* change training back to 11.1
* change training back to 11.1
* point to correct nvcr.io/nvidia/cuda 11.4.1 image
* change centos8 to centos7
* correct cudnn path
* Update linux-gpu-ci-pipeline.yml for Azure Pipelines
* Update c-api-noopenmp-packaging-pipelines.yml
* need to resolve centos images but remove space and change to 11.4
* Update linux-gpu-ci-pipeline.yml
* add cudnn to docker image
* bump devtoolset to 10
* revert cuda 11.4 change to setup_env_trt
* orttraining back to 11.1
* use nvcr.io
* Fix previous change back to cuda 11.1
* update cudnn path
* use cudnn image (revert if failure)
Add IsSparseTensor
Add CreateSparseTensor
Add utilities and test fully sparse instantiation
Fully sparse blocksparse
Add test and docs for fully sparse tensor instantiation
Rework creation API
Use API
Non string API
Retrofit of existing String API
Add tests
Add documentation
Address build issues (Winml pending)
Add inference test
Bump binary size
Add ifdef DISABLE CONTRIB
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates
* update onnx-tensorrt parser to master
* disable unsupported tests
* add cuda sm 75 for T4
* update tensorrt pipeline
* update trt pipelines
* update trt pipelines
* Update linux-gpu-tensorrt-ci-pipeline.yml
* update trt cid pipeline
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update Tensorrt Windows build pool and TensorRT/CUDA/CuDNN version
* update to cuda11.4 in trt ci pipeline
* update base image to cuda11.4
* update packaging pipeline to cuda11.4
* clean up
* remove cuda11.1 and cuda11.3 docker file
* disable unsupported tensorrt tests at runtime
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
SparseTensor support
Implement Builder pattern
Fix support for 1-D and 2-D COO indices
Implement and test CSR support.
Handle shape inference for SparseTensors
Implement conversion for COO, CSR and tests.
Address the case where constant sparse initializer is the output.
Implement test infra for SparseTensors
Implement SparseDenseMatMul for Csr and COO and tested it.
Add hash for SparseToDenseMatMul
Finish shared provider refactor
Refactor GetOrCreate to Create
Working on py interface
Expose OrtDevice and use it in allocate_numpy
Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
Add and test to_cuda()
Add accessors to format specific indices
Test values and indices views, read-only flag, after GC access
Add sparse related methods to OrtValue
Re-work SparseTensor wrapper, add OrtValue methods
Rework numpy_array_to_cuda/to_cpu
Add run_with_ort_values
Add models and test sparse_mat_mul with run_with_ort_values
Refactor sparse tensor to use a single buffer
Ifdef x86 Eigen CSR sparse matmul implementation
Exclude broken test, check for string type when copying cross device
Split pybind schema, regenerate docs, add exclusion
Conditionally exclude schema module
Update docs fix cuda build
Add test to a filter and renerate JS docs
Add conversion and test string support for sparse tensors
Exclude conversion utils from minimal build
Add CUDA Memcpy and adjust provider interfaces
* Changes to ensure the openvino-ep-2021.4 branch is created
* Fix failing cpp and python unit tests
* Fixed Myriad Tests for Ov_2021.4
* Disabled failing python tests for myriad
* Fixes models which were breaking w.r.t 2021.4
* Added fixes to Fix tinyyolov3 working on Myriad
and MaskRcnn, FasterRcnn using GPU_FP32
* Added FP16 output data type support for ngraph
* Implemented ReadNetwork() method
->Using Core::ReadNetwork() method for reading and creating a CNNNework
->Since OpenVINO™ 2020.4 version, Inference Engine enables reading ONNX models
via the Inference Engine Core API and there is no need to use directly the low-level
ONNX* Importer API anymore. To read ONNX* models, it's recommended to use the
Core::ReadNetwork() method that provide a uniform way to read models from ONNX format.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed ngraph f16 supported output type
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Added comments in data_ops.cc
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed broken windows build
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Disable failing CPP tests on CPU
Some of the convtranspose tests are failing on
OpenVINO-EP CPU due to accuracy mismatch w.r.t
default CPU. so currently we are disbaling
these tests.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated for ov version 2021.4
* Changes to include qdq ops in code
* Disabled failing python tests on GPU
Disabled two maxpool python tests on
GPU as they were passing but throwing
segfault
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fix the backward compatibility issue
ReadNetwork() API has a bug and will only work
starting from OpenVINO 2021.4 version.
The previous versions will still have to use
onnx importer route
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fix CMakeLists.txt for OpenVINO EP
If a directory with OpenVINO is sourced,
the latest OpenVINO settings have to
be imported.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
* Add ability to generate ios static framework
* Fix typos
* Add pod cache clean, update some comments of previous commit
* Fix CI failure with newly added cpuinfo library
* Update test model (CoreML requires node has a name)
* Addressed CR comments
Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations.
Unfortunately it does not work under Windows/ARM. We need to develop our own later
* Add metadata_props to ORT model
* Minor update
* Update python binding, and increase the minimal pipeline size threshold
* Fixed a small bug in serializing ir_version
* Remove temp ort.py.fbs and add it to .gitignore
* first attempt share docker image across python and torch versons
* set dependency between jobs
* fix yaml grammer
* remove python version from first stage
* clean deepspeed directroy
* split into two images according torch version
* fix yaml syntax
* invalidate cache
* remove DS to prevent torch 1.9.0 upgrade
ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel
This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised
PyTorch CPP Extensions for ORTModule can be compiled by running:
python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install
Full build environment is needed for this
1. Remove some unused code and simplify tools/ci_build/github/linux/run_dockerbuild.sh.
2. Enable Nuget CUDA tests. The original design was we could leverage Directory.Build.props and let cmake generate the required properties(USE_CUDA/...) there. However, in nuget packaging pipeline we test the package on a different host that doesn't run cmake command and doesn't have the auto-generated Directory.Build.props file.
* Revert for testing TensorRT 7.1
* change to origianl googletest version
* change machine
* remove build arg
* change back machine
* revert back googletest version
* Make it ready to merge to master
* revert onnx-tensorrt to v7.1
* rename yml
* use [[ ]] in bash command
* add sudo
* add chmod
* add correct path
* change another way to revert onnx-tensorrt
* change docker image to manylinux build