* initial change to transformer.py
* prepare e2e transformer tests
* refactor transformer tests
* put test python files in a flat folder
* fix typo pip install transform(s)
* python 3.6
* python version to 3.6 in install_ubuntu.sh
* remove argparser
* to use opset ver 12
* workaround loss_scale naming patch in case of loss_fn_
* assign self.loss_fn_ so it can be checked
* skip a few un-needed post-process steps
* fix loss_scale_input_name, clean up post process steps
* skip non-frontend tests
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* update with reviewers' comments
* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference
* fix merge mistakes
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* Add TrySimpleParallerFor so that there's a path with OpenMP awareness for SimpleParallelFor. Makes it consistent with [Try]BatchParallelFor and [Try]ParallelFor.
Update TopK to check for the number of threads better, and to use TrySimpleParallelFor.
* Update doco to mention TrySimpleParallelFor
* Refactor GatherND CPU Kernel (Renaming & Simplify)
* Add batch_dim=1 or 2, negative slices tests
* Rename gather_nd_gard_impl.cu
* Use dispatcher to refactor CUDA GatherND/GatherNDGrad
* Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute
* Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests
* Change naming of moments to Moment_x_<weight_name>
* Add checkpointing code and zero checkpoint aggregation
* Correct aggregation for LAMB, cleanup
* Add simple checkpointing test
* Add test for zero checkpoint aggregation
* Fix tests
* fix test
* Review changes
* Fix test after review comment fix
* Fix API, test
* Fix test after API change
* Decouple save load from ORTTrainer
* Add flag to not break checkpointing with ORTModel'
Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* allow switching between eval and training modes dynamically
Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>
* Enable iOS cross build on MacOS (step#1)
* Changed parallel option
* fixed style issues
* Enable ios arm64 crossbuild on MacOS
* Enable ios arm64 crossbuild on MacOS
* Enable parallel build for xcode
* Fix arm64 function not 4-byte aligned warning
* Rename onnxruntime_ios.cmake to onnxruntime_ios.toolchain.cmake
* change build.py to use the new ios toolchain file name
* updated dockerfile.openvino
* Group all RUN commands and add a 'cd WORKDIR' betwen each
* Update doc with installer and build info
Highlight usage of Online installer package.
Specify --rm option during docker build to avoid caching layer.
Co-authored-by: avidiyal <akhila.vidiyala@intel.com>
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* Set gradient as output only for easy mode (#3694)
* Support GPU Event Operators (#3653)
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.
* Address comment and polish code
* Merge shared code between CPU and GPU kernels
* Move event test to a new file
* Address comments
* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc
* fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com>
The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first.
Will add them back before 1.3 release.
* Support for country-specific holidays in the DateTimeTransformer
Updates the DateTimeTransformer featurizer to support holidays, where holiday information is read from country-specific json files.
* Addressed build breaks
* Enhanced Windows strategies for scenarios when tests run from root dir
* Skipping test for nuget installations
* add windowsai.yml for new Microsoft.AI.MachineLearning nuget
* temporarily add windowsai.yml to gpu.yml
* pass in build arch
* remove install onnx task
* no dml for arm or arm64
* refactor nuget pipeline defs
* update package creation
* pass in build and sources path
* missing hyphens
* copy license file
* fix parameter variable
* disable arm builds for now
* remove commented script block
* download pipeline atifcat name update
* set working dir
* Add bundling nuget script
* path combine
* null path
* combine needs parentheses
* binplace microsoft.* dlls in new nuget package
* update artifact name
* move merged nuget to artifacts directory
* move to merged subfolder in artifacts staging dir
* forward slash to back
* enable arm
* vcvarsall needs x64 vars setup
* Run Tests
* fix tests
* move global variables
* update yml to not have global variable in template
* removed parameters
* fixes
* Add build arch as an env variable
* ne not neq
* %Var% for batch script
* dont pass argument for x64
* disable arm tests
* skip csharp/cxx tests for microsoft nuget package
* remove test-win as it tests only c# cxx and capi
* test build for store apps
* dont build for store
* tools/nuget/generate_nuspec_for_native_nuget.py
* remove args.
* add new props and targets for microsoft.ai
* make windowsai props/targets static
* add dependency
* dont ship dot net props
* Remove c# fom windowsai nuget
* copy license file
* native packages must have win10 as the platform, not win
* cuda header in wrong if branch
* no dml for arm builds
* only build dml for x64/ x86
* User/sheilk/props update (#3616)
* prelim store work
* props
* Fix desktop nuget props/targets
* clean up targets and make store apps work
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* update windowsai.yml with latest
* remove extra dloadhelpers
* Add abi headers to abi dir, and reference native includes
* update windowsai.yml
* minor update
* remove parameters
* add doesrp param
* hard code esrp to true
* add directml for x86/x64
* revert gpu yml changes
* add store builds
* add store builds
* add checks again in old way
* dup job names for store and desktop builds
* move all of the runtime binaries to win10 folder
* only set safeseh on x86
* disable the store builds for now... missing msvcprt.lib
* copy paste deletion...
* switch back to win- (#3646)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* use stahlworks
* & not supported in ado
* add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package
* revert nocontribops
* add underscore...
* extra win/win10 change
* merged nuget... still not being bundled...
* files in merged directory
* missing parens causing dml to be included in cpu package
* more diagnostic info
* switch dir to get-childitem
* wait for compression to complete
* add winml_adapter to mkml and gpu packages
* enable_wcos
* add mklml binaries
* props and targets missing from mklml
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.
* Address comment and polish code
* Merge shared code between CPU and GPU kernels
* Move event test to a new file
* Address comments
* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc