* Upgrade onehot to OpSet 11
* Move Onehot test out of blacklist
* Add negative indices support besides negative axis.
* PR comments - 1
* PR comments-2
* work on slice elimination for opset 10
* more work on slice elimination
* first working version
* adding python notebook for building models; fixing test
* fixing build error in macOS
* move some optimizers to level1
* move matmul add fusion to level 1
* bug fix in the test code
* fix make_uniques + add test exceptions
* add exception for tests in c# too
* Initial draft
* updates per review
* fix link
* plus one more link fix
* small changes to the optimizer documentation
* some more changes
* done
* update C_API with doc link
* script for converting ONNX model for BERT performance optimization
* Remove code that not needed anymore.
* refine the script
* Support BERT model exported from PyTorch 1.3
Keep opset version
Exact match in Attention, Layer normalziation fusions.
* read batch_size from input model directly
* Refine optimizers
* Address PR comments
* Changes from PR comments and discussion.
* Fixed signed/unsigned mismatch
* Address PR comments
* Address PR comments
* Fix linux build
* Fix issue with mkldnn logic.
* Turn off optimizers by default for operator unit tests.
* Handle edge case of graph with no nodes in partitioner so all execution providers don't need to.
* Comment out change to turn off optimizers for unit tests. Add details on what needs to be done to re-enable.
This change adds a new execution provider powered by [DirectML](https://aka.ms/DirectML).
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.
The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.
**Note** that the DML EP code was moved verbatim from the existing WindowsAI project, which is why it doesn't yet conform to the onnxruntime coding style. This is something that can be fixed later; we would like to keep formatting/whitespace changes to a minimum for the time being to make it easier to port fixes from WindowsAI to ORT during this transition.
Summary of changes:
* Initial commit of DML EP files under onnxruntime/core/providers/dml
* Add cmake entries for building the DML EP and for pulling down the DirectML redist using nuget
* Add a submodule dependency on the Windows Implementation Library (WIL)
* Add docs under docs/execution_providers/DirectML-ExecutionProvider.md
* Add support for DML EP to provider tests and perf tests
* Add support for DML EP to fns_candy_style_transfer sample
* Add entries to the C ABI for instantiating the DML EP
* use dilations for computing effective kernel shape for conv/pool ops
* when auto_pad is 'VALID', total_pads should be empty
* added support for ArrayFeatureExtractor and ZipMap
* check out_shape only if the output has shape, i.e. output is of TensorType
or SparseTensorType