this is a big PR. we are going to move it up to layer_dev , which is still a L3 so we are still safe to do work there agile.
we are going to move this into the L3 so that ryan can start doing intergration testing.
we will pause for a full code review and integration test result prior to going into the L2.
>>>> raw comments from previous commits >>>
* LearningModelSession is cleaned up to use the adapter, and parts of binding are.
* moved everything in the winmladapter
made it all nano-com using, WRL to construct objects in the ORT side.
base interfaces for everythign for winml to call
cleaned up a bunch of winml to use the base interfaces.
* more pieces
* GetData across the abi.
* renamed some namepsace
cleaned up OrtValue
cleaned up Tensor
cleaned up custom ops.
everything *but* learnignmodel should be clean
* make sure it's building. winml.dll is still a monolith.
add onecoreuap_apiset.lib in order to avoid linking against kernel32.lib etc and violating our OS layering requirements.
We linked against onecoreuap_apiset.lib in VB so we will continue doing this, but I am still unsure why not to link against onecore instead since that is where we ship. However, since Sheil is the owner of this code we will wait to discuss with him before changing anything.
thread_local/global/static destruction order depends on implementation details of compilers and OS. The bug happens when thread_local is already out of scope while static EP being destructed, thus causing access violation in EP's destructor when accessing thread_local.
The fix is to maintain ownership inside EP with a mapping from tid to ThreadLocalContext, to avoid accessing thread_local in EP's destructor. This way, no matter what the destruction order is, no access violation would be triggered.
* Add logic to try and flatten inner dimensions being copied by Slice and do a block copy if they can be.
Do a block copy for just the inner most dimension where possible (applies even if we don't flatten inner dimensions).
- Improves symbolic shape inference in following ways:
1. Extend suggested merge to map to literals with --auto_merge. For example, MatMul of ['ax1', 'ax2'] x [128, 256] would now map 'ax2' to 128
2. Add --int_max option to simplify computations like Min(100000, 'dim') to be 'dim'. This helps ops like Slice to generate correct shape, i.e. start=0, end=Min(100000, dim - 2) on dim. It was previously treated as equal, since sympy cannot determine Min(100000, dim - 2) < dim.
- Fix a bug in create_shared script on Windows, that AOT dll is not generated because of failure in link, when there are too many obj files
- Fix a bug for Split since TOPI does not support split on symbolic dimension.
- Some build warning fixes for NupharEP.
* dump cuda tensor
* move data_type definition
* Dump cuda tensors for cuda build only.
Output tensor location (if it is not in CPU or pinned)
* update for cuda build
* Update for code review feedback
* update for CR feedback
* use data transfer manager for tensor copy
* Guard unused parameter
Guard unused parameter for Linux Arm and other cases.
* Add ACL (Arm Compute Library) execution provider
Add a new execution provider targeting Arm architecture based on Arm Compute Library.
Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models.
All unit tests are passing.
Comparative performance improvements for ResNet50v1 model obtained with
onnxruntime_perf_test:
A72 2xA72 A53 4xA53
ACL vs CPU 16% 9% 21% 13%
Usage documentation available in ACL-ExecutionProvider.
* Fix eigen unused parameter
Fix eigen unused parameter error for Arm cross-compilation.
* implement cuda topk
* implement heap
* add type support
* refactor interface
* add support for sorting by index
* add test case
* use cub device radix sort
* register for opset 9 and 10
* add opset 9/10 delaration
* refactor code
* refactor code
* fix comment
* fix comment
* switch to scratched mem
* [Nuphar EP] performance improvements
1. Add new ops: Shape, Expand
2. Add support for steps in Slice
3. Simplify Gather
4. Always inline alias nodes
5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible
6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside
* Add node and op type info to error message if there's a type or shape inferencing exception thrown by the ONNX checker.
* Fix line break from auto format
* Remove unused param from unit test code.