Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Add amd migraphx execution provider to onnx runtime
* rename MiGraphX to MIGraphX
* remove unnecessary changes in migraphx_execution_provider.cc
* add migraphx EP to tests
* add input requests of the batchnorm operator
* add to support an onnx operator PRelu
* update migrapx dockerfile and removed one unused line
* sync submodules with mater branch
* fixed a small bug
* fix various bugs to run msft real models correctly
* some code cleanup
* fix python file format
* fixed a code style issue
* add default provider for migraphx execution provider
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
In this PR, we
1. create some APIs for creating NVTX objects
2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
* Remove 'model_.' prefix for onnx model initializers in training
* fix test case remove redundant device test
* rename
* Fix state_dict/load_state_dict with frozen_weight
* nit
* Add monkey patch for pt opset 10
* remove pt patch in CI
* nit: newline
Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".
* gpt2 training perf
* gpt2 training perf
* debug
* debug
* debug
* fix bug
* minor
* on comments
* dynamic sql
* fix build
* minor
* linked hash
* on comments
* minor
* mem
* minor
Co-authored-by: Ethan Tao <ettao@microsoft.com>
Update Android build instructions to provide more information.
Add info on testing directly on Android
Update build.py to better support using Ninja generator to build Android on Windows.
Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.
* [java] - adding a cuda enabled test.
* Adding --build_java to the windows gpu ci pipeline.
* Removing a stray line from the unit tests that always enabled CUDA for Java.
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
* Added aarch64 build pipeline
* Fix build error
* Remove auditwheel repair which doesn't work with cross compiling
* Statically link C++
* Added auditwheel repair back and fix stdlib.h
* Remove extra space
* Add signed nuget package to publish ort-nightly nuget feed
* Push managed nuget as well
* Indentation fix
* Indentation fix
* Update gpu.yml to also publish directml nuget
* Fix typo in naming of task
* Fix C# log APIs. Fixes github issue #3409.
* Fix build error due to accidental duplication of GraphOptimizationLevel
* Fix runoptions
* Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.
* initial change to transformer.py
* prepare e2e transformer tests
* refactor transformer tests
* put test python files in a flat folder
* fix typo pip install transform(s)
* python 3.6
* python version to 3.6 in install_ubuntu.sh
* remove argparser
* to use opset ver 12
* workaround loss_scale naming patch in case of loss_fn_
* assign self.loss_fn_ so it can be checked
* skip a few un-needed post-process steps
* fix loss_scale_input_name, clean up post process steps
* skip non-frontend tests
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* update with reviewers' comments
* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference
* fix merge mistakes
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* allow switching between eval and training modes dynamically
Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>
* Enable iOS cross build on MacOS (step#1)
* Changed parallel option
* fixed style issues
* Enable ios arm64 crossbuild on MacOS
* Enable ios arm64 crossbuild on MacOS
* Enable parallel build for xcode
* Fix arm64 function not 4-byte aligned warning
* Rename onnxruntime_ios.cmake to onnxruntime_ios.toolchain.cmake
* change build.py to use the new ios toolchain file name