* test
* test
* add missing CUDA header include
* debug
* fix
* fix python package for dnnl and tensorrt.
* fix
* fix windows build.
* revert
* target_link_directories for tensorrt shared lib.
* update java API docs
* fix link
* rearrange
* update platforms, use table
* use javadoc.io
* craigacp tested it in java 14
* update link
* fix broken link
* fix testdata link
This commit means that when the thread pool is configured to spin, then we spin at the barrier at the end of parallel sections in the main thread, in addition to having workers spin waiting for work.
The change updates Barrier.h to take an additional boolean to select spin/block, and passes this in based on the thread pool configuration.
It adds an additional test case for barriers, although no problems were identified by the test case.
* Add experimental winrt api idl with dummy type to satisfy the build
* remove experimental from the api_lib target
* make experimental api available on windows builds also
* remove /y /d
* revert some pathing changes
* remove experimental api call from tests
* revert cppwinrt cmake changes
* switch to stdapi
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Fix Transpose MatMul fusion handling of existing TransposeScaleMatMul node's attributes and enable support for missing Transpose perm attribute.
Update expected test data to account for floating point calculation differences resulting from the fusion.
* update batch_norm test, enable dev_mode for nnapi, ignore onnx protobuf warning for nnapi ep
* fix some issues in concat and mark input without shape as not supported for now
* address review comments
* addressed comments
* Reshape optimization
* Refactor the Reshape optimization to be more generic
Co-authored-by: Ke Deng <kedeng@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* add training dockerfile tested for examples repo
* forgot pytorch patch for build from source
* make apt-get update -y adjacent apt-get install -y due to Docker caching rules
* comment for mellanox libraries
* mpi4py comment as I forgot where it came from
* apparently curl not included anymore
* grr.. nvidia change nccl location
* dont need findnccl.patch after nvidia changed nccl location
* pr comment /opt/ompi4 => /opt/openmpi-xxx
* switch to pip install pytorch
* use Release instead of RelWithDebInfo
* comment wording
* wordin
* missed RelWithDebInfo => Release
* replace Mellanox with libibverbs
* stale comment
* ordering
* no more ninja
* add / at end of copy
* update cgmanifest.json
* pr comments
Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* [Nuphar] added Gemm-to-MatMul conversion in model editor
* added a mode gemm_to_matmul that turns Gemm Ops into MatMul Ops
* enabled model_quantizer to quantize MatMul inside a Loop op
* this PR also included Gemm-11 support from Ke Deng
* Fixed a couple of existing bugs
Fixed a couple of old bugs exposed by the newly-added tests and the support
of Gemm-11, including:
* correctly handle aliasing among states and outputs in Scan
* fixed a transpose issue in building tvm IR for MatMul
* fixed an issue related to generating IR for computing Gemm alpha
* disabled several tests that triggered some deep issue (likely) in
the graph partitioner. I think it might be better to have a separate
PR to address the issue.
* bump cswinrt version
* add cswinrt
* test dotnetcore 3.0
* rename buildpacakge source
* set folder path to the package source and not the version
* refactor .netframework tests
* build .net core anycpu
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Gelu Activation Recompute Draft
* Prototype for localized recompute
* Introduce localized_recompute rewriter
* Command line args for enabling recompute
* Add logger to Gradient Graph Builder
* use const when possible
* change some function from throw to return status
* move more functions to return status
* move most of the exception to return status
* move logsink on android from CLogSink to AndroidLogSink
* addressed comments
* add type attr to check if the return status is used in compile
Sometimes there is a file named "version.txt" in your CUDA installation dir, but sometimes there isn't one. I couldn't figure out it why, but the latest CUDA 11 on our CI build machines doesn't have this file. As the file is not needed for building onnxruntime, so I removed the check.
Update TransposeMatMul to support scaling of the matrix product by a constant scalar value (analogous to the GEMM alpha parameter). Rename TransposeMatMul to TransposeScaleMatMul.
Fuse MatMul with surrounding Mul/Div with constant scalar into TransposeScaleMatMul.