* add modern standards to function arguments
* code cleanup
* fix code formatting
* add element access convenience function
* change template type name to match rest of code
* remove new At() convenience function
* add better documentation message
* build e2e cppwinrt tests
* add use nuget task
* make all referenced to package version prop/target-ified
* remove dupe props/targets reference
* work around project.assets.json error by deleting it
* powershell test invocation
* switch to batch script
* print debug info
* update x86->x64
* stdio.h
* pushd/popd
* add csharp tests
* package.config -> packages.config
* typo
* x86 -> anycpu
* debug is default
* add test path
* update csproj as well
* debug
* really replace all package versions
* debug output
* really use [PackageVersion]
* sleep intead of converting async operation to task and waiting
* dont close software bitmap
* switch to powershell script
* remove binding check
* continue on failure
* continuse on error action
* continueOnError and errorActionPreference
* tabbing
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* init version to use graph instead of model_proto for IsOpSupported
* move add to modelbuilder to use graph node
* move the rest of model_builder to use graph instead of modelproto
* remove redundant code
* Clear some redundant code
* merge master and some minor style changes
* move check if an initializer is external to individual op instead the whole graph
* Addressed comments
* Change the GetType and GetShape to log waring info inside to simplify the caller, remove some redundant onnxruntime namespace
* add squeeze op support, some more code style clean up
* fix a bug where duplicate output can be added to a subgraph, some other minor logging changes
* Add protobuf mutator library as a git submodule
* Added files and instructions to build the protobuf mutator library in CMake
* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'
* Added src files and build instructions for the main fuzzing engine
* Removed Random number generation test from inside the engine
* Added license header to files
* Removed all pep8 violations introduced by this change and other E501 violations
* Draft for LayerNorm Optimization
* Modify LayernormGrad kernel based on new backward graph.
* keep two LayernormGrad implementations.
One is implemented based on input X, mean. The other is based on output Y, scale, bias. The first one is enabled by default. The second one can be enabled by --use_invertible_layernorm_grad
* expose use_invertible_layernorm_grad to frontend.
* add fp16 tests.
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* optimize transpose
* optimize for the case when the tensor is 3D and the permutation is done in last two dimension.
BERT-L throughput is improved ~1.4% from transpose optimization
* fix UT MegatronSelfAttentionPartitionCorrectnessTest
* polish code.
* add test and change tile size to 16x16 for better perf.
* fix UT
* fix test of mask_rcnn
* address code review comments.
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
For BiasGeluGradDxKernel:
- Implement optimization to first load from global memory into registers as suggested by Weixing.
- Support larger bias sizes which were previously limited by the number of threads per block.
- Address flaky unit test by increasing the error tolerance to the default value.
* Use the file size while reading onnx models. Ensure models are loaded using APIs in model.h for consistency.
* Refactor existing GetFileLength in posix.cc and address PR comments.
* Fix linux build - signed/unsigned conversion
* Add ability to specify just the device when using IOBinding for an output. This enables keeping an output on a different device GPU when it has a dynamic size that is not known ahead of graph execution.
* Keep loss subgraph as FP32 when mixed-p training.
* Fix case where there is no white-list loss op.
* Get nodes from loss_scale instead of whitelist.
* rename const variables.
Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Change NNAPI CI to run on new NNAPI EP
* update android ci to mac 10.15 and remove in install cmake
* update the android ci to targe android api level 29
* remove unnecessary ndk install git submodule call