Add 'Install ONNX' step to Windows GPU pipeline
Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073
1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail.
2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.
* Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)"
Previously it fails because it used too much memory.
Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.
* Enable onnxruntime_test_all for NNAPI EP
* switch to use ninja for ANdroid CI
* make android elumator boot faster in android ci
* simplify adb push
* more style change
* more tweaking on android ci
* build.py style update
* build e2e cppwinrt tests
* add use nuget task
* make all referenced to package version prop/target-ified
* remove dupe props/targets reference
* work around project.assets.json error by deleting it
* powershell test invocation
* switch to batch script
* print debug info
* update x86->x64
* stdio.h
* pushd/popd
* add csharp tests
* package.config -> packages.config
* typo
* x86 -> anycpu
* debug is default
* add test path
* update csproj as well
* debug
* really replace all package versions
* debug output
* really use [PackageVersion]
* sleep intead of converting async operation to task and waiting
* dont close software bitmap
* switch to powershell script
* remove binding check
* continue on failure
* continuse on error action
* continueOnError and errorActionPreference
* tabbing
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Change NNAPI CI to run on new NNAPI EP
* update android ci to mac 10.15 and remove in install cmake
* update the android ci to targe android api level 29
* remove unnecessary ndk install git submodule call
1. Increase job timeout, while we are investigating why the tests take much longer
2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy)
3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.
* Add build option to disable traditional ML ops from the binary.
* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.
* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
Modify gradle build so artifactID has _gpu for GPU builds.
Pass USE_CUDA flag on CUDA build
Adjust publishing pipelines to extract POM from a correct path.
Co-Authored-By: @Craigacp
1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings.
2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.
* try mac pipeline
* fix path separator
* copy prebuilds folder
* split esrp yaml for win/mac
* disable mac signing temporarily
* add linux
* fix indent
* add nodetool in linux
* add nodetool in win-ci-2019
* replace linux build by custom docker scripts
* use manylinux as node 12.16 not working on centos6
* try ubuntu
* loosen timeout for test case - multiple runs calls
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
* Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running.
Fix build.py to be compliant again.
Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output.
* Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
In this PR, we
1. create some APIs for creating NVTX objects
2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
* Remove 'model_.' prefix for onnx model initializers in training
* fix test case remove redundant device test
* rename
* Fix state_dict/load_state_dict with frozen_weight
* nit
* Add monkey patch for pt opset 10
* remove pt patch in CI
* nit: newline
Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".
* gpt2 training perf
* gpt2 training perf
* debug
* debug
* debug
* fix bug
* minor
* on comments
* dynamic sql
* fix build
* minor
* linked hash
* on comments
* minor
* mem
* minor
Co-authored-by: Ethan Tao <ettao@microsoft.com>
Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.
* [java] - adding a cuda enabled test.
* Adding --build_java to the windows gpu ci pipeline.
* Removing a stray line from the unit tests that always enabled CUDA for Java.
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
* Added aarch64 build pipeline
* Fix build error
* Remove auditwheel repair which doesn't work with cross compiling
* Statically link C++
* Added auditwheel repair back and fix stdlib.h
* Remove extra space
* Add signed nuget package to publish ort-nightly nuget feed
* Push managed nuget as well
* Indentation fix
* Indentation fix
* Update gpu.yml to also publish directml nuget
* Fix typo in naming of task
* Fix C# log APIs. Fixes github issue #3409.
* Fix build error due to accidental duplication of GraphOptimizationLevel
* Fix runoptions
* Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.
* initial change to transformer.py
* prepare e2e transformer tests
* refactor transformer tests
* put test python files in a flat folder
* fix typo pip install transform(s)
* python 3.6
* python version to 3.6 in install_ubuntu.sh
* remove argparser
* to use opset ver 12
* workaround loss_scale naming patch in case of loss_fn_
* assign self.loss_fn_ so it can be checked
* skip a few un-needed post-process steps
* fix loss_scale_input_name, clean up post process steps
* skip non-frontend tests
* move cpu/cuda related files to coresponding cpu/cuda folder (#3668)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* type cast for ratio is not necessary for dropout (#3682)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* thrustallocator is not needed since cub is used directly for gather now. (#3683)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* GatherND-12 Implementation (#3645)
* Renamed, UT passing
* Move GatherND CUDA Kerenl into onnxruntime
* Merge GatherNDOpTest
* Refactor Test code
* Merge CPU Kernel Impl
* Handle Negative Indice, Fix UT
* Improve CUDA kernel to handle negative index
* Minor Fixes
* Preserve GatherND-1 Cuda kernel
* Fix Mac build
* fix UT
* Fix Build
* fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
* update with reviewers' comments
* testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference
* fix merge mistakes
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>
The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first.
Will add them back before 1.3 release.
* add windowsai.yml for new Microsoft.AI.MachineLearning nuget
* temporarily add windowsai.yml to gpu.yml
* pass in build arch
* remove install onnx task
* no dml for arm or arm64
* refactor nuget pipeline defs
* update package creation
* pass in build and sources path
* missing hyphens
* copy license file
* fix parameter variable
* disable arm builds for now
* remove commented script block
* download pipeline atifcat name update
* set working dir
* Add bundling nuget script
* path combine
* null path
* combine needs parentheses
* binplace microsoft.* dlls in new nuget package
* update artifact name
* move merged nuget to artifacts directory
* move to merged subfolder in artifacts staging dir
* forward slash to back
* enable arm
* vcvarsall needs x64 vars setup
* Run Tests
* fix tests
* move global variables
* update yml to not have global variable in template
* removed parameters
* fixes
* Add build arch as an env variable
* ne not neq
* %Var% for batch script
* dont pass argument for x64
* disable arm tests
* skip csharp/cxx tests for microsoft nuget package
* remove test-win as it tests only c# cxx and capi
* test build for store apps
* dont build for store
* tools/nuget/generate_nuspec_for_native_nuget.py
* remove args.
* add new props and targets for microsoft.ai
* make windowsai props/targets static
* add dependency
* dont ship dot net props
* Remove c# fom windowsai nuget
* copy license file
* native packages must have win10 as the platform, not win
* cuda header in wrong if branch
* no dml for arm builds
* only build dml for x64/ x86
* User/sheilk/props update (#3616)
* prelim store work
* props
* Fix desktop nuget props/targets
* clean up targets and make store apps work
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* update windowsai.yml with latest
* remove extra dloadhelpers
* Add abi headers to abi dir, and reference native includes
* update windowsai.yml
* minor update
* remove parameters
* add doesrp param
* hard code esrp to true
* add directml for x86/x64
* revert gpu yml changes
* add store builds
* add store builds
* add checks again in old way
* dup job names for store and desktop builds
* move all of the runtime binaries to win10 folder
* only set safeseh on x86
* disable the store builds for now... missing msvcprt.lib
* copy paste deletion...
* switch back to win- (#3646)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* use stahlworks
* & not supported in ado
* add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package
* revert nocontribops
* add underscore...
* extra win/win10 change
* merged nuget... still not being bundled...
* files in merged directory
* missing parens causing dml to be included in cpu package
* more diagnostic info
* switch dir to get-childitem
* wait for compression to complete
* add winml_adapter to mkml and gpu packages
* enable_wcos
* add mklml binaries
* props and targets missing from mklml
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>