* Enable iOS cross build on MacOS (step#1)
* Changed parallel option
* fixed style issues
* Enable ios arm64 crossbuild on MacOS
* Enable ios arm64 crossbuild on MacOS
* Enable parallel build for xcode
* Fix arm64 function not 4-byte aligned warning
* Rename onnxruntime_ios.cmake to onnxruntime_ios.toolchain.cmake
* change build.py to use the new ios toolchain file name
* updated dockerfile.openvino
* Group all RUN commands and add a 'cd WORKDIR' betwen each
* Update doc with installer and build info
Highlight usage of Online installer package.
Specify --rm option during docker build to avoid caching layer.
Co-authored-by: avidiyal <akhila.vidiyala@intel.com>
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* Set gradient as output only for easy mode (#3694)
* Support GPU Event Operators (#3653)
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.
* Address comment and polish code
* Merge shared code between CPU and GPU kernels
* Move event test to a new file
* Address comments
* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc
* fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com>
The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first.
Will add them back before 1.3 release.
* Support for country-specific holidays in the DateTimeTransformer
Updates the DateTimeTransformer featurizer to support holidays, where holiday information is read from country-specific json files.
* Addressed build breaks
* Enhanced Windows strategies for scenarios when tests run from root dir
* Skipping test for nuget installations
* add windowsai.yml for new Microsoft.AI.MachineLearning nuget
* temporarily add windowsai.yml to gpu.yml
* pass in build arch
* remove install onnx task
* no dml for arm or arm64
* refactor nuget pipeline defs
* update package creation
* pass in build and sources path
* missing hyphens
* copy license file
* fix parameter variable
* disable arm builds for now
* remove commented script block
* download pipeline atifcat name update
* set working dir
* Add bundling nuget script
* path combine
* null path
* combine needs parentheses
* binplace microsoft.* dlls in new nuget package
* update artifact name
* move merged nuget to artifacts directory
* move to merged subfolder in artifacts staging dir
* forward slash to back
* enable arm
* vcvarsall needs x64 vars setup
* Run Tests
* fix tests
* move global variables
* update yml to not have global variable in template
* removed parameters
* fixes
* Add build arch as an env variable
* ne not neq
* %Var% for batch script
* dont pass argument for x64
* disable arm tests
* skip csharp/cxx tests for microsoft nuget package
* remove test-win as it tests only c# cxx and capi
* test build for store apps
* dont build for store
* tools/nuget/generate_nuspec_for_native_nuget.py
* remove args.
* add new props and targets for microsoft.ai
* make windowsai props/targets static
* add dependency
* dont ship dot net props
* Remove c# fom windowsai nuget
* copy license file
* native packages must have win10 as the platform, not win
* cuda header in wrong if branch
* no dml for arm builds
* only build dml for x64/ x86
* User/sheilk/props update (#3616)
* prelim store work
* props
* Fix desktop nuget props/targets
* clean up targets and make store apps work
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* update windowsai.yml with latest
* remove extra dloadhelpers
* Add abi headers to abi dir, and reference native includes
* update windowsai.yml
* minor update
* remove parameters
* add doesrp param
* hard code esrp to true
* add directml for x86/x64
* revert gpu yml changes
* add store builds
* add store builds
* add checks again in old way
* dup job names for store and desktop builds
* move all of the runtime binaries to win10 folder
* only set safeseh on x86
* disable the store builds for now... missing msvcprt.lib
* copy paste deletion...
* switch back to win- (#3646)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* use stahlworks
* & not supported in ado
* add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package
* revert nocontribops
* add underscore...
* extra win/win10 change
* merged nuget... still not being bundled...
* files in merged directory
* missing parens causing dml to be included in cpu package
* more diagnostic info
* switch dir to get-childitem
* wait for compression to complete
* add winml_adapter to mkml and gpu packages
* enable_wcos
* add mklml binaries
* props and targets missing from mklml
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.
* Address comment and polish code
* Merge shared code between CPU and GPU kernels
* Move event test to a new file
* Address comments
* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>