The previous attempt to enable static analysis (#8842) didn't actually run the static analysis checks.
- Run clang-tidy directly.
- Address static analysis warnings.
* Expose symbols in onnx and protobuf namespaces in python when building with --enable_external_custom_op_schemas
* Add external onnx and protobuf files to wheel
* Added an example to demonstrate external custom ops use-case
* Added a Linux build pipeline to test external custom ops
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* prepare for PR
* Rename cuda directory to gpu directory in tarball
* Fix gpu java package
* fix bug
* fix small bug
* Add onnxruntime_providers_shared.dll into gpu nuget package
* Modify for test
* Temporarily remove for test
* Modify for test
* Modify for test
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* modify for test
* modify for test
* fix bug
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Prepare for PR
* Prepare for PR
* Code refactor
* Rename proper Artifact name
* Rename intermediate Artifact names
* Revert Artifact Names
* Rename Artifact Names
* Modify Artifact name
* Modify Artifact name
* Modify Artifact name
* Update Java package
* Update Java package
* fix bug to change artifact name
* Fix bug for the wrong file path
* Fix no fetching correct artifact and test
* temporarily modify for test
* undo the change for test
* additional changes
* test package run
* minor fix
* minor fix
* minor fix
* Get around no arm64 simulator
* fix objc pod build failure
* downgrade_eigen
* update objc podspec template
* fix build - python.h not found
* disable --build_shared_lib for ortmodule tests
* fix
* fix the build flag
* disable --build_shared_lib for training path (not only for ortmodule)
* fix missing test model files
* disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON
* enable custom_op_library build
* fix build
* fix
* merge master and fix build failure
* build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON
* resolve comments
* use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"
* Merge CPU/GPU nuget pipeline
* Include TensorRT EP libraries into existing GPU nuget package pipeline
* modify to use correct YAML
* Modify for test
* modify for test
* Add depedance
* Add depedance (cont.)
* modify for test
* Add create TensorRT nuget package
* modify for test
* modify for test
* Merge CPU/GPU nuget pipeline
* Include TensorRT EP libraries into existing GPU nuget package pipeline
* modify to use correct YAML
* Modify for test
* modify for test
* Add depedance
* Add depedance (cont.)
* modify for test
* Add create TensorRT nuget package
* modify for test
* fix merge bug
* code refactor
* code refactor
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* cleanup
* modify for test
* fix bug
* modify for test
* refactor
* fix bug and test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Prepare for PR
* Prepare for PR
* code refacotr from review
* Remove naming 'Microsoft.ML.OnnxRuntime.TensorRT' to avoid confusion
* Add linux TensorRT libraries
* Remove redundant variable in YMAL
* revert file
* undo revert file
* Modify regular expression so that it can capture the correct file
* Remove newline at end of file
* small fix
* Revert to CUDA11.1 on Windows
* Add unit tests for nuget package on Linux
Co-authored-by: Changming Sun <chasun@microsoft.com>
* initial update from 11.1 to 11.4
* change 11.4.1 to 11.4.0
* adjusting to match nvidia/cuda image tags
* adjusting to match nvidia/cuda image tags centos7
* correction to 11.4.0
* correction to 11.4.0
* update to cuda 11.4
* change training back to 11.1
* change training back to 11.1
* point to correct nvcr.io/nvidia/cuda 11.4.1 image
* change centos8 to centos7
* correct cudnn path
* Update linux-gpu-ci-pipeline.yml for Azure Pipelines
* Update c-api-noopenmp-packaging-pipelines.yml
* need to resolve centos images but remove space and change to 11.4
* Update linux-gpu-ci-pipeline.yml
* add cudnn to docker image
* bump devtoolset to 10
* revert cuda 11.4 change to setup_env_trt
* orttraining back to 11.1
* use nvcr.io
* Fix previous change back to cuda 11.1
* update cudnn path
* use cudnn image (revert if failure)
Add IsSparseTensor
Add CreateSparseTensor
Add utilities and test fully sparse instantiation
Fully sparse blocksparse
Add test and docs for fully sparse tensor instantiation
Rework creation API
Use API
Non string API
Retrofit of existing String API
Add tests
Add documentation
Address build issues (Winml pending)
Add inference test
Bump binary size
Add ifdef DISABLE CONTRIB
Merge CPU/GPU nuget pipeline. The old GPU nuget pipeline will be only for DML.
TODO: the result GPU package contains PDB files for some of the DLLs, but not all. It is due to the refactoring of CUDA EP to pluggable DLLs. At that time we forgot to copy the PDB files. However, I can't add them in now. Because currently the package is already 220MB large. If the missed PDB files were added, then it will be oversize. nuget.org doesn't accept >250MB packages.
This change adds a new pipeline for checking Python code. Currently this pipeline only runs flake8.
flake8 is also run as part of the CMake project builds, but we can switch over completely to the new pipeline later.
The .flake8 config file was also updated to make it easier to run standalone (flake8 --config ./.flake8) and some Python formatting issues were addressed in files that were not previously scanned.
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates
* update onnx-tensorrt parser to master
* disable unsupported tests
* add cuda sm 75 for T4
* update tensorrt pipeline
* update trt pipelines
* update trt pipelines
* Update linux-gpu-tensorrt-ci-pipeline.yml
* update trt cid pipeline
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update Tensorrt Windows build pool and TensorRT/CUDA/CuDNN version
* update to cuda11.4 in trt ci pipeline
* update base image to cuda11.4
* update packaging pipeline to cuda11.4
* clean up
* remove cuda11.1 and cuda11.3 docker file
* disable unsupported tensorrt tests at runtime
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
SparseTensor support
Implement Builder pattern
Fix support for 1-D and 2-D COO indices
Implement and test CSR support.
Handle shape inference for SparseTensors
Implement conversion for COO, CSR and tests.
Address the case where constant sparse initializer is the output.
Implement test infra for SparseTensors
Implement SparseDenseMatMul for Csr and COO and tested it.
Add hash for SparseToDenseMatMul
Finish shared provider refactor
Refactor GetOrCreate to Create
Working on py interface
Expose OrtDevice and use it in allocate_numpy
Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
Add and test to_cuda()
Add accessors to format specific indices
Test values and indices views, read-only flag, after GC access
Add sparse related methods to OrtValue
Re-work SparseTensor wrapper, add OrtValue methods
Rework numpy_array_to_cuda/to_cpu
Add run_with_ort_values
Add models and test sparse_mat_mul with run_with_ort_values
Refactor sparse tensor to use a single buffer
Ifdef x86 Eigen CSR sparse matmul implementation
Exclude broken test, check for string type when copying cross device
Split pybind schema, regenerate docs, add exclusion
Conditionally exclude schema module
Update docs fix cuda build
Add test to a filter and renerate JS docs
Add conversion and test string support for sparse tensors
Exclude conversion utils from minimal build
Add CUDA Memcpy and adjust provider interfaces
* Changes to ensure the openvino-ep-2021.4 branch is created
* Fix failing cpp and python unit tests
* Fixed Myriad Tests for Ov_2021.4
* Disabled failing python tests for myriad
* Fixes models which were breaking w.r.t 2021.4
* Added fixes to Fix tinyyolov3 working on Myriad
and MaskRcnn, FasterRcnn using GPU_FP32
* Added FP16 output data type support for ngraph
* Implemented ReadNetwork() method
->Using Core::ReadNetwork() method for reading and creating a CNNNework
->Since OpenVINO™ 2020.4 version, Inference Engine enables reading ONNX models
via the Inference Engine Core API and there is no need to use directly the low-level
ONNX* Importer API anymore. To read ONNX* models, it's recommended to use the
Core::ReadNetwork() method that provide a uniform way to read models from ONNX format.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed ngraph f16 supported output type
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Added comments in data_ops.cc
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed broken windows build
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Disable failing CPP tests on CPU
Some of the convtranspose tests are failing on
OpenVINO-EP CPU due to accuracy mismatch w.r.t
default CPU. so currently we are disbaling
these tests.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated for ov version 2021.4
* Changes to include qdq ops in code
* Disabled failing python tests on GPU
Disabled two maxpool python tests on
GPU as they were passing but throwing
segfault
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fix the backward compatibility issue
ReadNetwork() API has a bug and will only work
starting from OpenVINO 2021.4 version.
The previous versions will still have to use
onnx importer route
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fix CMakeLists.txt for OpenVINO EP
If a directory with OpenVINO is sourced,
the latest OpenVINO settings have to
be imported.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
* Add ability to generate ios static framework
* Fix typos
* Add pod cache clean, update some comments of previous commit
* Fix CI failure with newly added cpuinfo library
* Update test model (CoreML requires node has a name)
* Addressed CR comments
Updates to the iOS packaging pipeline:
- Make it harder to overwrite package archives accidentally when uploading (fails if the archive already exists)
- Only upload package archives for release builds
- Some clean up
* Add memory check for TRT perf
* Revise test app
* Add memory check for TRT perf
* Revise test app
* add test cases
* Modify script and add pipeline YAML
* remove redundant code
* temporarily change
* Change YAML
* revise test app
* fix minor bug
* code refactor
* small fix
* temporarily change for test
* prepare result log
* rm container when it exits
* code refactor
Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations.
Unfortunately it does not work under Windows/ARM. We need to develop our own later
* Add metadata_props to ORT model
* Minor update
* Update python binding, and increase the minimal pipeline size threshold
* Fixed a small bug in serializing ir_version
* Remove temp ort.py.fbs and add it to .gitignore
* first attempt share docker image across python and torch versons
* set dependency between jobs
* fix yaml grammer
* remove python version from first stage
* clean deepspeed directroy
* split into two images according torch version
* fix yaml syntax
* invalidate cache
* remove DS to prevent torch 1.9.0 upgrade
ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel
This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised
PyTorch CPP Extensions for ORTModule can be compiled by running:
python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install
Full build environment is needed for this
Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions:
1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work.
2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long.
3. Please avoid to use C++17 in CUDA files(*.cu). Also, the *.h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.
This is an update to https://github.com/microsoft/onnxruntime/pull/8079
The sample application motivating the original update changed to use an updated version of the model. Now, fewer ops are required. This change removes the previously added ops which are no longer needed.
1. Remove some unused code and simplify tools/ci_build/github/linux/run_dockerbuild.sh.
2. Enable Nuget CUDA tests. The original design was we could leverage Directory.Build.props and let cmake generate the required properties(USE_CUDA/...) there. However, in nuget packaging pipeline we test the package on a different host that doesn't run cmake command and doesn't have the auto-generated Directory.Build.props file.
* Revert for testing TensorRT 7.1
* change to origianl googletest version
* change machine
* remove build arg
* change back machine
* revert back googletest version
* Make it ready to merge to master
* revert onnx-tensorrt to v7.1
* rename yml
* use [[ ]] in bash command
* add sudo
* add chmod
* add correct path
* change another way to revert onnx-tensorrt
* change docker image to manylinux build
* clean up builds for interop_torch
* add python dependency for executables
* disable onnxruntime_ENABLE_TRAINING_TORCH_INTEROP by default; enable it in ortmodule GPU training pipeline only
* disable training unrelated tests when torch interop is enabled
* simplify the python dependency.
* clean up and fix
- Allow anyone to kick off a perf test here. Customize: branch, eps, model selection, cuda version.
- Only run shape inference when required.
- Kill errored out memory processes.
- Remove warmup run.
- Clean up script.
- Standalone_TRT is it's own "EP" vs as an additional run with TRT EP