* Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)"
Previously it fails because it used too much memory.
Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.
* Enable onnxruntime_test_all for NNAPI EP
* switch to use ninja for ANdroid CI
* make android elumator boot faster in android ci
* simplify adb push
* more style change
* more tweaking on android ci
* build.py style update
* build e2e cppwinrt tests
* add use nuget task
* make all referenced to package version prop/target-ified
* remove dupe props/targets reference
* work around project.assets.json error by deleting it
* powershell test invocation
* switch to batch script
* print debug info
* update x86->x64
* stdio.h
* pushd/popd
* add csharp tests
* package.config -> packages.config
* typo
* x86 -> anycpu
* debug is default
* add test path
* update csproj as well
* debug
* really replace all package versions
* debug output
* really use [PackageVersion]
* sleep intead of converting async operation to task and waiting
* dont close software bitmap
* switch to powershell script
* remove binding check
* continue on failure
* continuse on error action
* continueOnError and errorActionPreference
* tabbing
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add protobuf mutator library as a git submodule
* Added files and instructions to build the protobuf mutator library in CMake
* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'
* Added src files and build instructions for the main fuzzing engine
* Removed Random number generation test from inside the engine
* Added license header to files
* Removed all pep8 violations introduced by this change and other E501 violations
* Change NNAPI CI to run on new NNAPI EP
* update android ci to mac 10.15 and remove in install cmake
* update the android ci to targe android api level 29
* remove unnecessary ndk install git submodule call
* Move nnapi dnnlib to subfolder
* dnnlib compile settings
* add nnapi buildin build.py
* add onnxruntime_USE_NNAPI_BUILTIN
* compile using onnxruntime_USE_NNAPI_BUILTIN
* remove dnnlib from built in code
* Group onnxruntime_USE_NNAPI_BUILTIN sources
* add file stubs
* java 32bit compile error
* built in nnapi support 5-26
* init working version
* initializer support
* fix crash on free execution
* add dynamic input support
* bug fixes for dynamic input shape, add mul support, working on conv and batchnorm
* Add batchnormalization, add overflow check for int64 attributes
* add global average/max pool and reshape
* minor changes
* minor changes
* add skip relu and options to use different type of memory
* small bug fix for in operator relu
* bug fix for nnapi
* add transpose support, minor bug fix
* Add transpose support
* minor bug fixes, depthwise conv weight fix
* fixed the bug where the onnx model input has mismatch order than the nnapi model input
* add helper to add scalar operand
* add separated opbuilder to handle single operator
* add cast operator
* fixed reshape, moved some logs to verbose
* Add softmax and identity support, change shaper calling signature, and add support for int32 output
* changed the way to execute the NNAPI
* move NNMemory and InputOutputInfo into Model class
* add limited support for input dynamic shape
* add gemm support, fixed crash when allocating big array on stack
* add abs/exp/floor/log/sigmoid/neg/sin/sqrt/tanh support
* better dynamic input shape support;
* add more check for IsOpSupportedImpl, refactored some code
* some code style fix, switch to safeint
* Move opbuilders to a map with single instance, minor bug fixes
* add GetUniqueName for new temp tensors
* change from throw std to ort_throw
* build settings change and 3rd party notice update
* add readme for nnapi_lib, move to ort log, add comments to public functions, clean the code
* add android log sink and more logging changes, add new string for NnApiErrorDescription
* add nnapi execution options/fp16 relax
* fix a dnnlibrary build break
* addressed review comments
* address review comments, changed adding output for subgraph in NnapiExecutionProvider::GetCapability, minor issue fixes
* formatting in build.py
* more formatting fix in build.py, return fail status instead of throw in compute_func
* moved android_log_sink to platform folder, minor coding style changes
* addressed review comments
1. Increase job timeout, while we are investigating why the tests take much longer
2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy)
3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.
* Add build option to disable traditional ML ops from the binary.
* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.
* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
* expose ACL/ARMNN providers to python
* add -acl / -armnn to package name when use_acl / use_armnn is specified
* build python wheel for ARMNN EP
* link ACL/ARMNN EPs into onnxruntime_pybind11_state
* wrong argument order in build_python_wheel for wheel_name_suffix
* ORT on CUDA 11
1. Seperate HOROVOD and MPI
2. Seperate NCCL from HOROVOD in CMakeLists.txt
2. Remove dependency on external cub
3. cudnnSetRNNDescriptor is changed in cuDNN 8.0
* polish the code about MPI/NCCL in CMakeLists.txt and build.py
* check CUDA version
* ${MPI_INCLUDE_DIRS} should be PUBLIC
* sm30, sm50 are deprecated in CUDA 11 Toolkit
* update change based on code review feedback.
* add sm_52
* improve MPI/NCCL build path
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Modify gradle build so artifactID has _gpu for GPU builds.
Pass USE_CUDA flag on CUDA build
Adjust publishing pipelines to extract POM from a correct path.
Co-Authored-By: @Craigacp
Disable nuphar large model test, because it takes too long(40+ minutes), while the default cpu provider takes about 5 minutes. After this change, we still keep a lot of other nuphar model tests, I think that should be enough.
1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings.
2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.