Make kernels non-template. Add input constraint for learnt data.
Fixup tests.
Add two more featurizers along with tests. Tests fail.
min_max_scalar_transformer
robust_scalar_transformer
Fix tests serialized stream by prepending version bytes.
Add inputation_marker_transfomer and the test.
Fix up float/double type designations.
Added label_encoder_transformer along with a test.
string_throw case is broken at the momement.
Fix labelencodertransfomer_test.cc string_throw case
Rename maxabsscalertransformer_test.cc
Add MissingDummiesTransformer along with the test.
Update manifest.
Add TimeSeriesImputerTransformer definition, implementation and tests
Update featurizers. Fix up constraint issue.
Pass static VCRT library option down to Featurizers CMAKE.
Make build Featurizers OFF by default.
Rename registration call.
Advance commit to 4df80d5865a9d4e97f6d0b9304d4316115a04d9e
Add generated code for the commit before editing.
Import more featurizers.
Rename Automl ops domain to mlfeaturizers.
Rename conditional compilation macro.
Move and rename files getting rid of automl
Rename --use_automl build switch to --use_featurizers
Rename CMake option accordingly. Rename automl CMake targets.
Adjust CI and packaging pipeline switches.
Rename namespace automl to featurizers.
* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms
* Address CR, docs and cmake update
* Doc fix
* Fix mkl
* Fix TVM windows build when using mklml
* Guard unused parameter
Guard unused parameter for Linux Arm and other cases.
* Add ACL (Arm Compute Library) execution provider
Add a new execution provider targeting Arm architecture based on Arm Compute Library.
Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models.
All unit tests are passing.
Comparative performance improvements for ResNet50v1 model obtained with
onnxruntime_perf_test:
A72 2xA72 A53 4xA53
ACL vs CPU 16% 9% 21% 13%
Usage documentation available in ACL-ExecutionProvider.
* Fix eigen unused parameter
Fix eigen unused parameter error for Arm cross-compilation.
* Refine optimizers
* Address PR comments
* Changes from PR comments and discussion.
* Fixed signed/unsigned mismatch
* Address PR comments
* Address PR comments
* Fix linux build
* Fix issue with mkldnn logic.
* Turn off optimizers by default for operator unit tests.
* Handle edge case of graph with no nodes in partitioner so all execution providers don't need to.
* Comment out change to turn off optimizers for unit tests. Add details on what needs to be done to re-enable.
This change adds a new execution provider powered by [DirectML](https://aka.ms/DirectML).
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.
The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.
**Note** that the DML EP code was moved verbatim from the existing WindowsAI project, which is why it doesn't yet conform to the onnxruntime coding style. This is something that can be fixed later; we would like to keep formatting/whitespace changes to a minimum for the time being to make it easier to port fixes from WindowsAI to ORT during this transition.
Summary of changes:
* Initial commit of DML EP files under onnxruntime/core/providers/dml
* Add cmake entries for building the DML EP and for pulling down the DirectML redist using nuget
* Add a submodule dependency on the Windows Implementation Library (WIL)
* Add docs under docs/execution_providers/DirectML-ExecutionProvider.md
* Add support for DML EP to provider tests and perf tests
* Add support for DML EP to fns_candy_style_transfer sample
* Add entries to the C ABI for instantiating the DML EP
* Adjust ngraph cmake files to onnx 1.5.0
* Enable LSTM reverse direction mode in nGraph EP
* Enable full support for the Split op in nGraph EP
* Revert "Disable the unsigned input Shrink op tests for nGraph until the next update"
This reverts commit 257b42a55bdd98f804d4846868542b8e3aeb4b4e.
* Enable Gather and remove unused subgraph attribute
* Remove the unused param from AppendClusterToSubGraph
* Fix for the incorrect onnx opset version
* Use the r0.26 release branch before the tag is created
* Enable the quantizelinear and dequantizelinear for NGEP
* Use the v0.26.0-rc.2 tag in ngraph.cmake
* Add skip for modes others than default in Pad operator
* Reenable negative axis tests for ngraph
* Use temporary ngraph version
* Use branch name instead of SHA for temporary ngraph branch
* Use ngraph v0.26.0-rc.4
* Remove patch for missing symbol in MKLDNN
* Use MKLDNN 1.0 in ngraph
* Exclude the Pad op for opsets greater than 10
* Disable quantizelinear and dequantizelinear tests for ONNX 1.5.0
* Fix the onnx-headers related compilation errors
* ONNX libs linking fix
* Use a tag for ngraph and support more Pad modes
* Use the v0.26.0 release tag for nGraph
* Update ngraph to RC8 - bigobj flag for Windows builds
* Fix the MKLDNN constexpr error on Windows
Remove gsl subodule and replace with a local copy of gsl-lite
Refactor for onnxruntime::make_unique
gsl::span size and index are now size_t
Remove lambda auto argument type detection.
Remove constexpr from fail_fast in gsl due to Linux not being happy.
Comment out std::stream support due to MacOS std lib broken.
Move make_unique into include/core/common so it is accessible for server builds.
Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
due to x86 build.
Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
* Fix symbolic shape inference for faster_rcnn, mask_rcnn, yolov3
Force merge when --auto_merge, on symbolic dims which sympy cannot simplify
Add symbolic inference for Resize opset 10
Add support for step != 1 in Slice
Add support for computed dim in TopK
Bug fixes in passing symbolic dims from subgraph
Fix an outdate comment in Nuphar provider header
* add mimalloc submodule
* basic hooks into execution provider header and build script option
* pull mimalloc into build
* windows has to use the override vcxproj already set up, and disable bfcarena when using mimalloc
* fix import_location
* generalize build msbuild command
* add mimalloc dependency to python package as well as various commenting cleanups
* update mimalloc commit as stop gap
* include mimalloc changes from master
* create capi directory if doesn't exist for mimalloc copying over
* disable runtime hooks and remove old comment
* temporary change to test CI
* fetch the mimalloc output name property
* uniformly call target_link_libraries
* query cmake to get the correct windows sdk to target
* revert change to trailing directory slash
* pickup windows sdk off msbuild path if possible
* copy the produced dll/so at install time, not configure time
* deal with mimalloc unimplemented atomic
* move to dev branch of mimalloc to avoid atomic issues on gcc
* for windows specify solution settings (x86) rather than individual project settings
* pin mimalloc submodule to updated commit
* typo
* Revert "temporary change to test CI"
This reverts commit 764867376936a5d307dded3cc37f00a34e3b0c96.
* Bump onnx to latest
Update onnx.in.proto with changes for SparseTensor.
* add temp skip tests
* remove passed tests from skip list
* skip more tests for new ops in opset 11
* skip crashing tests
* update handling of new attribute types sparse tensor and sparse tensors
* advance onnx commit and remove skip cpu_flaky_tests
* temporarily skip yolo3 model test due to resize opset10 shape inference regression
* update proto for onnxruntime server
* advance onnx commit further
* Implement Nuphar execution provider
Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.
* Fix submodules
* Fix TVM submodule
* Update Nuphar to latest and resolve confliction
* Remove stale files caused by merge -X theirs
* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test
* Fix bad merge
* Merge from Nuphar
* Fix warning treated as error, revert some unnecessary changes
* Revert some more test changes
* Some more test revert or comments to make review easier
New tests could be added later
* One more revert of unnecessary changes
* More change revert. Test could be added back later.
* update MKLML which has bugfix for thread hang. move PATCH_COMMAND outside BUILD_FOR_NATIVE_MACHINE check.
* MKLML_VERSION 2020.0.20190813 is for windows only.
* Update nGraph to 0.21 and adjust the EP
* Share the graph initializers between custom ops
* Update nGraph to 0.22 and exclude Gather entirely
* Enable building on Windows with nGraph v0.21.1-rc.0
* Disable the unsigned input Shrink op tests for nGraph until the next update
* Line-shortening code refactor
* Fix for the master branch merge artifact
* MKLDNN patches adjustment for Windows
* Exclude MatMulInteger for non-const zero points
* Exclude ConvInteger for non-const zero points
* Enable full Cast op support
* Use the v0.22.1 tag
* Skip ConvTranspose_InvalidKernelShape test for ngraph provider
* Create sub-graph ModelProto from fused_node
* remove memory copy between CUDA and TRT
* add info to RegisterExecutionProvider input
* use new IDeviceAllocator for trt allocator
* remove SetDefaultInputsMemoryType from TRT EP
* remove onnx-tensorrt 5.0
* add submodule onnx-tensorrt branch 5.1
* remove redundancy
* Update transformer_memcpy.cc
* Update tensorrt_execution_provider.cc
* switch to TensorRT 5.1.5.0
* update python binding
* disable failed test case on TensorRT
* Update activation_op_test.cc
* upgrade to TensorRT container 19.06
* update according to feedback
* add comments
* remove tensorrt allocator and use cuda(gpu) allocator
* update onnx-tensorrt submodule
* change ci build cuda directory name
* Update DNNLibrary
* Allow fp16 by default
* Add nnapi build in ci
* Fix nnapi ep after #1268
* Remove unused variables
* Support nnapi in onnx_test_runner
* Update DNNLibrary to fix tests
* Update build.py for android build support, solve conflict of
tools/ci_build/build.py
* Support non-ARM Android build, solve conflict of tools/ci_build/build.py
* Enable android test by x86_64 android emulator
* Add dnnlibrary/NNAPI support in build.py
* suppress the verbose adb output
* Remove debug logs
* Install cmake by pip
* Fix undefined host_protoc_path
* cmake==3.13.2 in pypi is actually 3.12.2, so install 3.13.2.post1 instead
* Fix Android ARM64 build
* Use android ndk r20 instead of r19c, fix conflicts in install_deps_android.sh
* sync onnx to get equal op with float support
* doc update
* fix test failure because of updated shape inference logic for roialign.
* filter consum test cases since it's not implemented yet.