onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

Author	SHA1	Message	Date
Thiago Crepaldi	42408aa3ed	Add new PytTrch front-end (#4815 ) * Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-17 09:45:25 -07:00
Changming Sun	5eec4f66ed	Refactor manylinux docker image and the related pipelines (#4751 ) 1. Publish the image ACR, instead of building it every time for every PR 2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect. 3. Split nuphar and DNNL to separated pipelines. 4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc. 5. Update the manylinux2010_x86_64 image to the latest.	2020-08-17 09:40:31 -07:00
Yulong Wang	aa993e95c9	enable build flag '--use_openmp' on MacOS (#4774 ) * enable build flag '--use_openmp' on MacOS * cmake 3.16.1 to enable find_package(OpenMP) on mac	2020-08-13 15:56:42 -07:00
jingyanwangms	adda8c66d9	Docker image release pipeline (#4682 ) * create orttraining-1p-linux-gpu-ci-pipeline.yml * fix syntax * fix file path * fix template path * publish docker image to test acr * use right task name * change parameter list * use variables * use python.version * remove --enable_onnx_tests due to segfault * add back --enable_onnx_tests * fix docker push command line * change docker login command * login differently * fix docker tag script * create password.txt * add ortrelease docker image * enable test in build.sh * add pipeline parameter * add pipeline parameter * change timeout * change timeout * fix run_dockerbuild.sh * use PR checkin build docker * fix strategy syntax * fix strategy syntax * change dockerfile * change run_dockerbuild.sh * change tag name * build with root user * use build id for docker image tag * remove all user lines * change docker tag * add mpi, mellanox * add missing args * use release dockerfile for ci build * remove install wheel * use release docker image * fix syntax * use different pool * add Dockerfile.training * remove sudo to run on Linux-Multi-GPU-V100 * change docker file path * update dockerfile * use latest dockerfile * change agent pool * remove --preserve-env * add back parameter * Add test_flag * use azuredevops docker * change repository * use cmd for docker login * echo build script * use ortrelrease ACR * change key vault connection * Move --build flag * change build command * add paramter for image tag * clean up for PR * remove unnecessary changes * whitespace changes * whitespace changes * change build flag * change flag name * change flag * use latest dockerfile * enable build tests * build builder stage and run test * Add back python.version * change build directory * always run build entire dockerfile * fix yml syntax * fix syntax * add en-UTF8 locale * rename * remove unused template * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Test commit sha1 in pipeline * fix parameter * update docker file * fix --from=build * remove commented blocks * PR comments * fix syntax * fix syntax * use timestamp as build number * remove latest tag * add build_timestamp variable * remove wrong property * fix docker run command * test build id * Use datestamp build id * change build tags * add no-cache to docker build * rename BUILD_VERSION -> BUILD_CONFIG Co-authored-by: Jingyan Wang <jingywa@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-12 13:29:37 -07:00
Dmitri Smirnov	ac4997665a	Make Java Publishing and Java GPU pipelines to run nightly (#4749 ) Schedule Java daily Bump up iInux GPU build timeout	2020-08-10 17:38:45 -07:00
stevenlix	77c69a0325	Upgrade TensorRT to v7.1.3.4 (#4704 ) * upgrade to TensorRT 7.1.3.4 * Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4 * fix format issue * fix format issue * fix format issue * Update tensorrt_execution_provider.cc * change cmake version to 3.14 * Remove --msvc_toolset 14.16 * change to onnxruntime::make_unique * use onnxruntime::make_unique * disable some tests for TensorRT * disable some tests for TensorRT * Update upsample_op_test.cc * Update tile_op_test.cc * disable some tests for TensorRT * Update constant_of_shape_test.cc * update parser * Update Dockerfile.ubuntu_tensorrt	2020-08-07 17:43:56 -07:00
Sheil Kumar	5c5efa900d	Add .NET Core 3.0 nuget e2e pipeline tests (#4695 ) * bump cswinrt version * add cswinrt * test dotnetcore 3.0 * rename buildpacakge source * set folder path to the package source and not the version * refactor .netframework tests * build .net core anycpu Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-08-05 13:02:24 -07:00
Changming Sun	d0297f8d24	Add 'Install ONNX' step to Windows GPU pipeline (#4696 ) Add 'Install ONNX' step to Windows GPU pipeline Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073	2020-08-03 18:51:24 -07:00
Changming Sun	01ca6392cb	Avoid building ONNX of every history ONNX versions in our CI (#4678 ) 1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail. 2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.	2020-08-03 10:18:10 -07:00
Changming Sun	f9f25c5559	Remove featurizer from CI build (#4661 )	2020-07-30 18:37:55 -07:00
Changming Sun	51332e3c81	Change Linux CI build time out value to 3 hours (#4664 ) Because it often need more than 1 hr 55 minutes, increase the value so that we'll less likely see pipeline failed.	2020-07-30 02:52:05 -07:00
Xiang Zhang	d73e01e5b9	remove ENABLE_TELEMETRY macro (#4633 )	2020-07-27 20:06:11 -07:00
gwang-msft	c2ec3b734b	[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576 ) * remove dependency of external jd-dnnlibrary * remove extra variables not used any more * update /cgmanifest.json	2020-07-22 14:08:12 -07:00
Sheil Kumar	fa6d035090	Create WindowsAI zip files automatically as part of the pipeline (#4584 ) * copy rename nupkg to zip as part of build task * update both symbols and regular package Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-22 10:53:47 -07:00
Changming Sun	c2c4e6760b	Fix code sign validation errors in nuget and nodejs pipeline (#4527 )	2020-07-20 14:18:47 -07:00
Changming Sun	bc1d197ddf	Re-enable dnnl in CI build (#4544 ) * Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)" Previously it fails because it used too much memory. Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.	2020-07-19 23:20:03 -07:00
Yulong Wang	5086e55a35	Fix condition of running tests in win CI (#4459 )	2020-07-16 16:33:30 -07:00
Changming Sun	8ada440961	Move model tests to onnxruntime_test_all (#4521 ) 1. Move model tests to onnxruntime_test_all 2. Publish TestResults of Windows CI build.	2020-07-15 16:46:18 -07:00
edgchen1	34f73fa1aa	Add sudo --preserve-env option to allow environment to go through to docker commands. (#4512 )	2020-07-14 18:12:31 -07:00
liqunfu	f721f5f1cd	Liqun/multiple choice (#4480 ) * multiple choice runner * add docker cleanup task to frontent pipeline	2020-07-14 17:57:58 -07:00
Sheil Kumar	ee5ca27ae2	Split Microsoft.AI.MachineLearning.nupkg in a NuGet package and symbol NuGet package (#4503 ) * add threadpool interface * generate snupkgs * include_pdb check * fix snupkg generation * Add task to merge snupkgs * folder exists * check dir * revert thread pool stuff Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-14 14:52:39 -07:00
gwang-msft	5f8f443ac4	Android CI build, test copy, emulator boot improvement (#4481 ) * Enable onnxruntime_test_all for NNAPI EP * switch to use ninja for ANdroid CI * make android elumator boot faster in android ci * simplify adb push * more style change * more tweaking on android ci * build.py style update	2020-07-13 14:18:34 -07:00
Dmitri Smirnov	35ee00d888	Pin typing version. (#4490 )	2020-07-13 11:48:30 -07:00
Hariharan Seshadri	26ebcfab88	Fix Nuget GPU pipeline (#4462 )	2020-07-10 14:02:28 -07:00
Yulong Wang	bec18eb3f4	[Node.js binding] support CentOS 7 in CI (#4447 )	2020-07-09 00:59:50 -07:00
Negin Raoof	71aec2adcb	Custom op export test template (#4383 ) * Adding pytorch custom op export tests to CI * Test clean build * Fix export for intended failure * update export script * Build onnxruntime	2020-07-08 10:14:56 -07:00
Hariharan Seshadri	6d6b6b54a5	Support binding a graph output to a specific device via the Python binding (#4439 )	2020-07-07 21:09:37 -07:00
Sheil Kumar	fdb4a3a2e8	Add cppwinrt and cswinrt tests in windowsai nuget pipeline (#4381 ) * build e2e cppwinrt tests * add use nuget task * make all referenced to package version prop/target-ified * remove dupe props/targets reference * work around project.assets.json error by deleting it * powershell test invocation * switch to batch script * print debug info * update x86->x64 * stdio.h * pushd/popd * add csharp tests * package.config -> packages.config * typo * x86 -> anycpu * debug is default * add test path * update csproj as well * debug * really replace all package versions * debug output * really use [PackageVersion] * sleep intead of converting async operation to task and waiting * dont close software bitmap * switch to powershell script * remove binding check * continue on failure * continuse on error action * continueOnError and errorActionPreference * tabbing Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-07 09:36:42 -07:00
suffiank	7a05b3ca87	Increase python packaging pipeline timeout (#4412 ) * increase python packaging pipeline from 90 to 110 min * change timeout to Linux GPU and do 120 min to match Win GPU	2020-07-02 15:38:39 -07:00
gwang-msft	0bef9d5114	Fix the broken Android NNAPI CI (#4403 ) * Change NNAPI CI to run on new NNAPI EP * update android ci to mac 10.15 and remove in install cmake * update the android ci to targe android api level 29 * remove unnecessary ndk install git submodule call	2020-07-02 10:22:18 -07:00
Changming Sun	3bb6a865cc	Revert "remove openmp and scipy from build pipelines (#4305 )"	2020-07-02 00:30:02 -07:00
Tiago Koji Castro Shibata	7fea332f93	Support builds without RTTI (#4333 ) * Support builds without RTTI * Disable RTTI in all builds	2020-07-01 13:05:35 -07:00
Dmitri Smirnov	49268c42da	Change the way java home is set on Mac OS for CI and Java publishing pipeline (#4385 ) * Change the way java_home is set on Mac. * Change the way JAVA_HOME is set on Mac OS	2020-07-01 07:37:14 -07:00
Negin Raoof	37cbe8551d	Adding export registration and tests for custom ops (#4248 )	2020-06-25 22:29:02 -07:00
Changming Sun	5db67ec000	Fix python package issue and upgrade the linux image to 2010 (#4342 ) 1. Increase job timeout, while we are investigating why the tests take much longer 2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy) 3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.	2020-06-25 20:22:39 -07:00
Dmitri Smirnov	a08805daf9	Fix a minor typon in POM file name (#4250 ) Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-06-25 11:15:14 -07:00
Changming Sun	deea945f80	Remove openmp and scipy from build pipelines (#4305 ) 1. Remove openmp because the default thread pool is already good enough. 2. Remove scipy from build pipelines because it stops support python 3.5.	2020-06-23 20:18:16 -07:00
edgchen1	4e39fda06a	Fix version of torch and torchvision in install_deps.sh. (#4316 )	2020-06-23 14:55:18 -07:00
edgchen1	737c22a911	Refactor Python packaging builds (#4283 ) Reuse the same template file for all Python packaging builds.	2020-06-22 17:13:22 -07:00
Pranav Sharma	2204d39a06	Add build option to disable traditional ML ops from the binary. (#4272 ) * Add build option to disable traditional ML ops from the binary. * Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources. * Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.	2020-06-20 06:36:06 -07:00
Changming Sun	0349479b19	Fix component governance and codesign validation errors (#4277 ) Adjust the job steps so that these security tasks run before the build directory clean up.	2020-06-18 15:54:18 -07:00
Changming Sun	43deec2174	Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266 )	2020-06-17 16:25:24 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
Hariharan Seshadri	91a41298cc	Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954 )	2020-06-12 19:32:57 -07:00
Changming Sun	6f4320fb85	Fix the python package name issue (#4207 ) Fix the package package name issue. In my last change(#4197) about enabling code sign. I forgot to pass the additional flags to setup.py,	2020-06-12 08:32:59 -07:00
Changming Sun	8f8d899bf2	Enable code sign in c api pipeline and python pipeline	2020-06-10 19:31:22 -07:00
Yulong Wang	73bc6be5d1	build: split nodejs binding build and test to avoid timeout issue (#4188 ) * split nodejs binding build and test * enable nodejs tests	2020-06-10 19:16:32 -07:00
Dmitri Smirnov	af0750ba1b	Java GPu artifact naming (#4179 ) Modify gradle build so artifactID has _gpu for GPU builds. Pass USE_CUDA flag on CUDA build Adjust publishing pipelines to extract POM from a correct path. Co-Authored-By: @Craigacp	2020-06-10 11:15:48 -07:00
Changming Sun	c0bdbc0b39	Enable telemetry for the C API and python pipeline (#4174 )	2020-06-10 00:07:46 -07:00
George Wu	9d65ce53bc	move back to toolset 14.16 to possibly work around nvcc bug (#4180 )	2020-06-09 19:36:30 -07:00
Sheil Kumar	4377ff4a1a	Enable .NET Core 2.0 and .NET Framework 4.6.1 in Microsoft.AI.MachineLearning NuGet package (#4125 ) * add project to download cswinrt and build winrt c# interop dll * Add to nuget package * reverse if check * run generation before core compile * add generated files to compile * update .net package to binplace native libs * add props to .netstandard2.0 folder * auto binplace ml native binaries * force 'Any CPU' platform build * Fix anycpu and platform targets * fix flake errors * fix variable order * fix flake pep8 errors, semicolon Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-06-09 09:08:19 -07:00
Changming Sun	2ab3a19728	Enlarge the read buffer size in C#/Java test code (#4150 ) 1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings. 2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.	2020-06-08 16:13:11 -07:00
Yulong Wang	842be1535d	[Node.js binding] add linux and mac package (#4157 ) * try mac pipeline * fix path separator * copy prebuilds folder * split esrp yaml for win/mac * disable mac signing temporarily * add linux * fix indent * add nodetool in linux * add nodetool in win-ci-2019 * replace linux build by custom docker scripts * use manylinux as node 12.16 not working on centos6 * try ubuntu * loosen timeout for test case - multiple runs calls	2020-06-08 14:12:05 -07:00
liqunfu	ffed43e9b8	handle loss and name marching wrappers (#4066 ) * handle loss and name marching wrappers Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-05 23:34:26 -07:00
Yulong Wang	2aab20b4ea	[Node.js binding] upgrade node-addon-api to 3.0 (#4148 )	2020-06-05 21:24:34 -07:00
Yulong Wang	2e58097f8f	fix build: pipeline Node.js version to 12.16.3 (#4145 )	2020-06-05 17:56:03 -07:00
Yulong Wang	647a886587	[Nodejs binding] create a new pipeline to generate signed binaries (#4104 ) * add yml files * update pipeline * fix yaml syntax * yaml pop BuildCSharp * udpate yaml * do not stage codesign summary	2020-06-02 01:28:05 -07:00
Dmitri Smirnov	afca0d15ee	Create Java publishing pipeline (#3944 ) Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.	2020-06-01 18:18:57 -07:00
Changming Sun	3eaec57c38	Fix the daily pipeline failures (#4084 ) 1. Fix the nuget cpu pipeline and put code coverage pipeline back. 2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now. 3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.	2020-06-01 14:44:49 -07:00
edgchen1	a715d55bcc	Training Python package fixes (#4063 ) - Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds - Fix passing of environment variables to `sudo docker run` in build definitions - Fix setup.py package naming logic	2020-06-01 09:30:56 -07:00
Scott McKay	1d441f89ac	Re-enable PEP8 check in Win CI build (#4075 ) * Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running. Fix build.py to be compliant again. Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output. * Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.	2020-05-30 09:10:05 +10:00
edgchen1	38d76cc904	Clean up training E2E test (#4078 ) Update training E2E build to not go through CTest and call test scripts directly.	2020-05-29 09:20:47 -07:00
liqunfu	6665d5e2bc	Liqun/a transformer example (#3845 ) Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-05-27 15:21:35 -07:00
Yulong Wang	b3ec8035ee	[Node.js binding] add build flag for node.js binding (#3948 )	2020-05-27 13:30:22 -07:00
Wei-Sheng Chin	24eda3df33	Create Utils for Adding Range and Marker (#4013 ) In this PR, we 1. create some APIs for creating NVTX objects 2. apply those APIs in pipeline-related operators and sequential executor. As a result, we can explicitly see how a pipeline schedule is run by GPUs in Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's limited support.	2020-05-24 22:55:24 -07:00
Changming Sun	aafe988a11	Temporarily disable windows static analysis CI job	2020-05-24 16:31:09 -07:00
Ryan Lai	357bffe47c	Fix deprecated CentOS link for Linux CI pipeline (#4000 ) * Fix Linux_CI_GPU_Dev * centos6	2020-05-20 16:14:48 -07:00
Bowen Bao	0a5395bb78	Remove 'model_.' prefix from onnx model initializers in training (#3881 ) * Remove 'model_.' prefix for onnx model initializers in training * fix test case remove redundant device test * rename * Fix state_dict/load_state_dict with frozen_weight * nit * Add monkey patch for pt opset 10 * remove pt patch in CI * nit: newline	2020-05-20 10:06:31 -07:00
Prabhat	08763e80e0	Fix permission denied while creating directory in azure pipelines (#4001 ) * Fix permission denied while creating directory * Run tar with sudo	2020-05-20 09:47:12 -07:00
edgchen1	989fe2498f	Change training perf test build to use "docker" instead of "sudo docker" (#3995 ) Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".	2020-05-19 16:54:35 -07:00
Ryan Lai	354e571277	Miscounted the number of characters in package version of DirectML nuget (#3993 ) Co-authored-by: Ryan Lai <ryalai96@gamil.com>	2020-05-19 15:28:30 -07:00
ytaous	fb4efafc8e	GPT-2 training perf scripts (#3974 ) * gpt2 training perf * gpt2 training perf * debug * debug * debug * fix bug * minor * on comments * dynamic sql * fix build * minor * linked hash * on comments * minor * mem * minor Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-19 10:21:40 -07:00
Changming Sun	2fa2019daf	Run docker commands with sudo (#3979 )	2020-05-18 17:35:09 -07:00
edgchen1	024b92a970	Use path relative to script location to refer to symbolic_opset10.py from install_deps.sh. (#3975 ) Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.	2020-05-18 13:36:06 -07:00
Adam Pocock	9d2d1eb6f6	[java] Adds a CUDA test (#3956 ) * [java] - adding a cuda enabled test. * Adding --build_java to the windows gpu ci pipeline. * Removing a stray line from the unit tests that always enabled CUDA for Java.	2020-05-18 12:05:51 -07:00
edgchen1	e259a13f8e	Initial training Python packaging pipeline (#3767 ) Add a pipeline to produce training-enabled ORT wheels.	2020-05-18 09:41:00 -07:00
edgchen1	e55f24364a	Disable LTO on Windows training CPU build (#3960 ) Disable LTO on Windows training CPU build. Add a parameter to the win-ci-2019.yml build template for enabling LTO with a default value of true.	2020-05-18 09:24:10 -07:00
Prabhat	4ff73d00b0	Fix python pkg permission issue (#3957 ) * Fix python pkg permission issue * Run chown with sudo * Add workspace clean to arm pipeline * Run docker as current user	2020-05-17 14:06:55 +05:30
Ryan Lai	38467f8c9a	DirectML Nuget package has different time stamp than Native and Managed Nuget (#3950 ) * Fix DirectML nuget creation in Nuget pipeline * DirectML Nuget package has different timestamp * remove accidentally changed file	2020-05-14 18:52:08 -07:00
Scott McKay	5e0928a777	Enable running PEP8 on python scripts using flake8 (#3928 ) * Enable running PEP8 checks via flake8 as part of the build if flake8 is installed. Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason. Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build. Update coding standards doc.	2020-05-15 07:15:06 +10:00
ytaous	93eb9bcfde	Add yaml/perf scripts for new perf test pipeline (#3909 ) * yaml/perf scripts for new pipeline * yaml/perf scripts for new pipeline * remove unused imports * testing some comments change * testing some comments change * testing jdbc * testing jdbc * testing jdbc * exclude pwd from jdbc properties * exclude pwd from jdbc properties * namedtuple * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-13 14:15:17 -07:00
Prabhat	25257a661d	Added onnxruntime aarch64 wheel to pypi publishing pipeline (#3903 ) * Added onnxruntime aarch64 wheel to pypi publishing pipeline * Support nightly build flag * Add support for nightly build	2020-05-13 23:20:29 +05:30
liqunfu	9b5daa2039	patch torch onnx opset 10 (#3910 ) patch pytorch to export onnx nll_loss opset version 10. add mnist test to covert onnx opset version 10.	2020-05-12 18:11:25 -07:00
Ori Levari	7b858d60b0	Various changes for automated downlevel test pipeline (#3901 ) Co-authored-by: Ori Levari <orlevari@microsoft.com>	2020-05-12 17:22:47 -07:00
Prabhat	ce3678ffaf	Added aarch64 build pipeline (#3841 ) * Added aarch64 build pipeline * Fix build error * Remove auditwheel repair which doesn't work with cross compiling * Statically link C++ * Added auditwheel repair back and fix stdlib.h * Remove extra space	2020-05-11 22:56:16 +05:30
Ryan Lai	7fd2c8f9e8	Add signed GPU nuget package to publish ort-nightly nuget feed (#3834 ) * Add signed nuget package to publish ort-nightly nuget feed * Push managed nuget as well * Indentation fix * Indentation fix * Update gpu.yml to also publish directml nuget * Fix typo in naming of task	2020-05-10 16:24:45 -07:00
M. Zeeshan Siddiqui	5e1244eb4d	Update ONNX submodule to ONNX 1.7 release branch. (#3888 ) * Update to ONNX submodule to ONNX 1.7 release branch. * Update to ONNX submodule to ONNX 1.7 release branch. * fix version.	2020-05-10 15:44:44 -07:00
Pranav Sharma	22a711457f	Fix C# log APIs. Also fixes github issue #3409 . (#3840 ) * Fix C# log APIs. Fixes github issue #3409. * Fix build error due to accidental duplication of GraphOptimizationLevel * Fix runoptions * Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.	2020-05-08 14:31:06 -07:00
stevenlix	4ea10c9202	bump up ORT version and extend time limit for windows cpu packaging pipelines (#3852 )	2020-05-07 14:22:20 -07:00
M. Zeeshan Siddiqui	9b02b3df6f	Update ONNX submodule to ONNX 1.7 release candidate 3. (#3838 )	2020-05-06 00:55:19 -07:00
M. Zeeshan Siddiqui	ef4d73e887	Update ONNX submodule to ONNX 1.7 release candidate 2. (#3818 ) * Update ONNX submodule to ONNX 1.7 release candidate 2. * fix build error. * Update ONNX submodule to latest and disable preview op tests.	2020-05-05 15:08:40 -07:00
Changming Sun	c11fbf68e4	Publish gpu package to nuget feed (#3816 )	2020-05-04 21:49:19 -07:00
Changming Sun	2684d47fc5	Disable data downloading in linux-nocontribops-ci-pipeline (#3803 ) * Disable data downloading in linux-nocontribops-ci-pipeline * update * update	2020-05-02 12:59:24 -07:00
Sheil Kumar	37b60251ca	test packaging (#3756 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-05-02 12:23:33 -07:00
Changming Sun	ee8900e21a	Update centos-ci-pipeline.yml (#3800 ) * Update centos-ci-pipeline.yml	2020-05-02 11:04:23 -07:00
edgchen1	440f361363	Remove orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3788 )	2020-05-02 00:35:08 -07:00
M. Zeeshan Siddiqui	517bff9675	Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782 ) * Function expansion support, Update ONNX to 1.7 release candidate 1. * Renable disabled tests.	2020-05-01 10:35:16 -07:00
George Wu	dcb1a21552	fix python package linux gpu failure (#3786 ) * pin base image for manylinux2010_gpu * pin base image for Dockerfile.manylinux2010	2020-05-01 17:04:59 +08:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
Scott McKay	9f72752397	Fix 'Install ONNX' CI failure (#3761 ) * Disable flaky test temporarily * turn off pip upgrade warning Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com> Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>	2020-04-30 18:18:58 +10:00

1 2 3 4 5 ...

515 commits