* Update Versioning.md
Update documentation to cover latest Windows 10 release (Vb) and the NuGet packages.
* PR feedback.
* readability changes
* spell out Windows ML Availability
* Add ability to retrieve inferred shapes when executing a kernel.
This ability helps Recv to know its output shapes without doing
actual cummunication. Of course, if the output shapes cannot be
inferred, Recv still needs to do communication to get shapes from
Send.
* Avoid communicating shape information when it can be inferred statically
* Replace unordered_map with thread-safe wrapper.
We don't want to have racing condition and undefined behavior
when using parallel executor.y
* Remove cout
* Add missing file
* Address comments
* Check dim_value. -1 means missing
* lock properly
* Address comments (remove thread-safe map)
* Remove poc header
* Replace Stream with DeferredReleaseCPUPtr
* Register ILearningModelSessionOptionsNate interface
* Threading options exposed
* Add interrogator for Session options
* Add test
* Polish test
* PR comments
* Set intra op threads
* Add adapter api to grab intra op threads
* Add adapter test for getting intraop num threads
* Make ILearningModelSessionNative and update winml api test
* Make it required when building engine to set the intraop num threads
* Make test more pretty
* Change naming of idl function
* Revert "Change naming of idl function"
This reverts commit c06916aa5bf94e3bf233ed281e508b935fc8638d.
* PR comment on naming
* Skip the test because it's influenced if it's built with openmp
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
* add dynamic output shape support
* fix bugs associates with scalar inputs
* addressed comments, fixed issue the output buffer size is not correctly set, refactor shaper class
* split the execution logic from nnapi::Model into nnapi::Execution
* update comments for certain scenarios, 1. dynamic output buffer size, 2. ONNX scalar input
* move ctor of nnapi::Execution to public
* remove dependency of external jd-dnnlibrary
* add qlinearadd support
* combine some qlinear ops logics, move some throw into return status
* merge master
* minor bug fixes
* addressed comments
* make dml and onnxruntime system32 only when winml and onnxruntime is loaded from system32
* use __ImageBase as that will not incur the unsupport store api call into GetModuleHandleEx
* remove accidental comment
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* added reducesumlogexp gradient
added test
fixed type mismatch when calling cudnnreduce kernel
fixed python frontend to remove redundant states to match pytorch state dict
* minor fix for test dir util
* add pause option for onnx_test_runner
* add flush std to show pause prompt text
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
* Adding CPU implementation of BroadcastGradientArgs op
Modify to take shape as input instead of tensor
Cleanup
Correct schema
Corrected kernel, added tests, addressed review comments.
Initial change, to add ReduceSumTraining cpu op
cpu support
Initial changes to gradient builder
Non-empty reduction case passing.
Added exception,test for invalid broadcast,addresed review comments.
Initial change, to add ReduceSumTraining cpu op
cpu support
cuda support + more UTs
on comments + UT
no op support for {} axes with new attr - noop_with_empty_axes
Add noop attribute to ReduceSumTraining use
Add testing for no-shape graph, modify AddSub grad builder, logging.:
MulGrad support
Div support
Expand support
Gemm support
MatMul grad change
Transpose Grad change
BiasGeluGrad change.
Fixes after squash
* Remove logging, add specific exception for shape inference error
* fix build
* Review comments
* Review comments
* Fix windows build
Co-authored-by: Ethan Tao <ettao@microsoft.com>
* Adjust indentation of statement, without this fix GCC 7.5 errors
out with:
"this ‘if’ clause does not guard this statement, but the
latter is misleadingly indented as if it were guarded by the ‘if’"
* Add braces around the if-statement for improved clarity.
Co-authored-by: Alberto Magni <alberto.magni@microsoft.com>
* Working changes for ConcatTraining op
* Refactor to move changes to orttraining
* Fix segfault
* Support -ve axis for shape inferencing
* fix build
Co-authored-by: Ethan Tao <ettao@microsoft.com>
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
* add LRN/Grouped Conv Support, minor changes
* better pool ops sdk version requirement
* reduce string comparision for gemm/matmul ops
* fix nnapi fall back to cpu for softmax
* addressed review comments, correct a small error in the code
* improve calibration tool
* modify calibration interface name
* modify calibration interface name
* refine calibrate and calibrate_user
* refine and add type info
* refine and add type info
* add e2e user example file
* remove unnecessary files
* remote test images no longer needed
* update readme document
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Updated pushed CPU and CUDA tags.
* Add tensorRT, fix typo.
* Add OpenVINO tags. Remove 2020.2 installation instructions for VAD-M.
* Revert instruction changes for V-ADM and update 2020.2 to 2020.3
* Optimize CreateEnv by not creating the logging manager instance if env instance has already been created.
* Move creation of logging mgr inside if block
* Add BN to ArmNN EP
* Add Concat to ArmNN EP
* ACL logging improvements
* ArmNN logging improvements
* Fallback to CPU for 9x9 convolution in ACL EP
* Fallback to CPU for 9x9 convolution in ArmNN EP
* Enable python support for ACL and ArmNN EPs when compiled with BSP toolchain
* Removed the matmul operator
* Fix conv infer shape function
* Fix provider_names list for armnn
Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>