* Eliminate redundant subexpressions
Apply local value numbering to merge graph nodes that will always
evaluate to the same value.
* Rename cpp->cc
* Handle optional arguments
* Add test models
* Add more tests with optional arguments
* Fix processing of subgraphs
Also, be resilient to possible mixture of optional and variadic
parameters
* Fix random operators
* Address PR comments
* Minor changes and a test
* Move CSE before constant folding
* Random* operators are always non-deterministic
Even when seed is provided.
* Fix a CSE test
* Reuse the list of non-deterministic operators with constant folding pass
* Address PR comments
* Fix formatting
* Address PR comment
* Minor cleanup / comments
* Fix build failure in Linux
* Reuse existing optimizer/utils file.
Also, check for graph outputs when removing a node.
* Add a test
* Fix compiler warnings
* Fix build in older compilers
* More compatibility with old STL versions
* test
* test
* add missing CUDA header include
* debug
* fix
* fix python package for dnnl and tensorrt.
* fix
* fix windows build.
* revert
* target_link_directories for tensorrt shared lib.
* update java API docs
* fix link
* rearrange
* update platforms, use table
* use javadoc.io
* craigacp tested it in java 14
* update link
* fix broken link
* fix testdata link
This commit means that when the thread pool is configured to spin, then we spin at the barrier at the end of parallel sections in the main thread, in addition to having workers spin waiting for work.
The change updates Barrier.h to take an additional boolean to select spin/block, and passes this in based on the thread pool configuration.
It adds an additional test case for barriers, although no problems were identified by the test case.
* Add experimental winrt api idl with dummy type to satisfy the build
* remove experimental from the api_lib target
* make experimental api available on windows builds also
* remove /y /d
* revert some pathing changes
* remove experimental api call from tests
* revert cppwinrt cmake changes
* switch to stdapi
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Fix Transpose MatMul fusion handling of existing TransposeScaleMatMul node's attributes and enable support for missing Transpose perm attribute.
Update expected test data to account for floating point calculation differences resulting from the fusion.
* update batch_norm test, enable dev_mode for nnapi, ignore onnx protobuf warning for nnapi ep
* fix some issues in concat and mark input without shape as not supported for now
* address review comments
* addressed comments
* Reshape optimization
* Refactor the Reshape optimization to be more generic
Co-authored-by: Ke Deng <kedeng@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* add training dockerfile tested for examples repo
* forgot pytorch patch for build from source
* make apt-get update -y adjacent apt-get install -y due to Docker caching rules
* comment for mellanox libraries
* mpi4py comment as I forgot where it came from
* apparently curl not included anymore
* grr.. nvidia change nccl location
* dont need findnccl.patch after nvidia changed nccl location
* pr comment /opt/ompi4 => /opt/openmpi-xxx
* switch to pip install pytorch
* use Release instead of RelWithDebInfo
* comment wording
* wordin
* missed RelWithDebInfo => Release
* replace Mellanox with libibverbs
* stale comment
* ordering
* no more ninja
* add / at end of copy
* update cgmanifest.json
* pr comments
Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* [Nuphar] added Gemm-to-MatMul conversion in model editor
* added a mode gemm_to_matmul that turns Gemm Ops into MatMul Ops
* enabled model_quantizer to quantize MatMul inside a Loop op
* this PR also included Gemm-11 support from Ke Deng
* Fixed a couple of existing bugs
Fixed a couple of old bugs exposed by the newly-added tests and the support
of Gemm-11, including:
* correctly handle aliasing among states and outputs in Scan
* fixed a transpose issue in building tvm IR for MatMul
* fixed an issue related to generating IR for computing Gemm alpha
* disabled several tests that triggered some deep issue (likely) in
the graph partitioner. I think it might be better to have a separate
PR to address the issue.