onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-24 22:17:32 +00:00

Author	SHA1	Message	Date
Edward Chen	45a7352622	Update Mac CI builds to use macOS-10.15 image, Xcode 12.4. (#7437 ) Update Mac CI builds to use macOS-10.15 image, Xcode 12.4.	2021-05-27 09:39:34 -07:00
Tixxx	2a3851cd75	fixed bugs in packed mode and enable pack mode tests in ci (#7848 ) * fixed bugs in packed mode and enable pack mode tests in ci * removed unnecessary space * pr comments * pr comments * disable an average pool test * try disabling another avg pool * disable more avg pool tests * disable maxpool tests	2021-05-27 07:56:58 -07:00
liqunfu	bed6e87cbd	add environment variable to control default training package's local version (#7849 )	2021-05-26 22:44:20 -07:00
Thiago Crepaldi	c5ea5907c0	Fix permission error for ORTModule lock file (#7814 )	2021-05-26 14:18:25 -07:00
George Wu	1c6b6f696e	fixes for cuda centos/manylinux (#7830 ) * fixes for cuda centos/manylinux * remove providers_shared.so dep processing.	2021-05-25 19:38:59 -07:00
Changming Sun	93c8e29782	Improve code coverage report (#7770 )	2021-05-25 08:26:01 -07:00
Guoyu Wang	98007f0be6	Fix typo in the ios packaging script (#7802 )	2021-05-24 11:57:13 -07:00
Suffian Khan	02c78a8aa8	test migration to rocm4.2 (#7800 )	2021-05-24 11:48:44 -07:00
Changming Sun	ee29330cab	Delete unused file: Dockerfile.ubuntu_gpu (#7797 )	2021-05-21 17:05:35 -07:00
liqunfu	f6eb0f76ae	to used cudnn7 to build onnxruntime-training wheel with Cuda 10.2 support (#7760 )	2021-05-20 09:18:41 -07:00
Ryan Hill	c99aa3a3f3	Ryanunderhill/cuda shared (#7626 ) * First iteration of making cuda a shared provider. Separated out shared OpKernel change, so doing this to merge with that change. * More cuda shared library refactoring * More cuda shared library refactoring * More build options tested, converted the training ops over. * Fix merge breaks * Fix submodules * Fix submodules * Fix submodules * Fix python * Fix compile errors * Duplicate symbol fix * Test fix for ROCM provider * Another ROCM test workaround * ROCM Build Test * ROCM build fix * ROCM * ROCM * ROCM * ROCM * ROCM * ROCM test * Reduce header dependencies * Remove redundant namespace * Test fix for linux * Fix linux build * Fix Eigen build error * Fix unused parameter warning * Test link error * Another linker test * Linker test * Linker test * Another test * Another build test * Fix linux link error * Build test * Fix control flow ops to use common base class with core code * Remove extra qualifiers * Fix template syntax for linux * Fix cuda memory leak * Fix pybind * Test disabling cast * Cleanup * Restore cuda in test * Remove more header dependencies * Test not adding cuda provider to session * Make GetProviderInfo_CUDA throw * No-op cuda provider creation * Fix some setup issues * Fix memory cleanup on unload * Diagnostics * Don't unload library * Add diagnostics * Fix deleting registry at right time. * Test disabling profiler * Fix merge break * Revert profiler change * Move unloading of shared providers into Environment * Free more global allocations before library unloads * Add more diagnostics * Move unloading back to the OrtEnv as there are multiple Environments created during a session. Remove some library dependencies for tests. * Fix more cmake files * ERROR -> WARNING * Fix python shutdown * Test not using dml in pipeline * Change python version and disable dml * Update python version * Test adding unload method for shared providers * Disable DLL test * Python test * Revert "Python test" This reverts commit `c7ec2cfe98`. * Revert "Disable DLL test" This reverts commit `e901cb93aa`. * Revert "Test adding unload method for shared providers" This reverts commit `c427b78799`. * Point to RyanWinGPU * Revert python version * Fix id_to_allocator_map * Another python exit test * Remove extra debug messages Try a more clean python shutdown through DllMain * Revert DllMain idea, it didn't work * Merge conflicts * Fix merge with master issues. * Comments * Undo edit to file * Cleanup + new training ops * Revert yml changes * Fix another merge error * ROCM fix * ROCM fix v2 * Put back Linux hack, it is necessary * Stupid fixes * Fix submodule out of sync * ROCM fix 3 * ROCM 4 * Test java fix * Fix typos * Java test on my VM * Fix build error * Spotless fix * Leave temp file around to load properly * Fix cleanup on exit * Fix break * Java comments * Remove LongformerAttentionBase workaround * Spotless fix * Switch yml back to regular build pool * Revert "Switch yml back to regular build pool" This reverts commit `be35fc2a5a`. * Code review feedback * Fix errors due to merge * Spotless fix * Fix minimal build * Java fix for non cuda case * Java fix for CPU build * Fix Nuphar? * Fix nuphar 2 * Fix formatting * Revert "Remove LongformerAttentionBase workaround" This reverts commit `648679b370`. * Training fix * Another java fix * Formatting * Formatting * For orttraining * Last orttraining build fix... * training fixes * Fix test provider error * Missing pass command * Removed in wrong spot * Python typo * Python typos * Python crash on exit, possibly due to unloading of libraries. * Remove test_execution_provider from training build Only enable python atexit on windows Remove assert on provider library exit * Still can't unload providers in python, alas. * Disable Nvtx temporarily * MPI Kernels for Training * MPI Kernels part 2 * Patch through INcclService * Oops, wrong CMakeLists * Missing namespace * Fix missing () * Move INcclService::GetInstance around to link nicer * Missing } * Missing MPI libraries for Cuda * Add extra GetType functions used by MPI * Missing Nccl library * Remove LOGS statements as a test * Add in a couple more missing GetType methods * Update comments * Missed a logging reference in mpi_context.h * Convert aten_op to shared (due to marge with master) * Test moving DistributedRunContext instance into shared provider layer (with purpose error to verify it's being built properly) * Test passed, now with fix * Missing static * Oops, scope DistributedRunContext to just NCCL * Merge related issues and code review feedback. * Merge error * Bump to rel-1.9.1 (#7684) * Formatting * Code review feedback for Java build on non Windows * Remove cupti library dependency from core library * Test Java pipeline fix * Linux build fix * Revert "Linux build fix" This reverts commit `a73a811516`. * Revert "Remove cupti library dependency from core library" This reverts commit `6a889ee8bf`. * Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages * Add cuda to Tensorrt nuget package * onnxruntime_common still has a cuda header dependency Co-authored-by: ashbhandare <ash.bhandare@gmail.com>	2021-05-20 07:53:47 -07:00
Changming Sun	31e6d3f85c	Revert CUPTI profiling feature (#7763 ) For unknown reason it causes deadlocks when it is used with CUDA 11.1	2021-05-19 21:54:29 -07:00
Changming Sun	6c868341e3	Fix CUDA 10.2 pipelines (#7759 ) 1. Move the multi GPU pipeline to CUDA 11.0 2. Exclude the keras2coreml_SimpleRNN_ImageNet model test. 3. Add a test for NV6+CUDA 11.0 BTW, it's known our code doesn’t build with CUDA10.2 + Nvidia T4.	2021-05-19 13:52:06 -07:00
Changming Sun	3a68c389d9	Add version lock to manylinux build scripts (#7755 )	2021-05-19 09:28:40 -07:00
Changming Sun	38d90b0f15	Cleanup install_deps.sh (#7734 )	2021-05-17 19:27:47 -07:00
Jesse Benson	f977644324	ROCM support int reductions	2021-05-17 16:42:06 -07:00
liqunfu	d604281a86	Liqun/training pkg to run tests (#7662 )	2021-05-16 09:10:57 -07:00
liqunfu	3ead2f2f39	update pt lightning version (#7711 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-05-15 21:46:16 -07:00
Yulong Wang	97d9bcd644	[js/web] fix bundle for multi-thread, add e2e test and support nodejs (#7688 ) * fix bundle for multi-thread, add e2e test and support nodejs * add copyright banner * resolve comments * add comments for isMultiThreadSupported()	2021-05-14 18:15:38 -07:00
Yulong Wang	c53b5be509	force multi steps to use the same commit in CI (#7697 )	2021-05-14 15:13:38 -07:00
liqunfu	359fe1d197	Liqun/ort training version (#7620 )	2021-05-14 09:54:19 -07:00
ashbhandare	56e993a434	Bump to rel-1.9.1 (#7684 )	2021-05-13 18:41:28 -07:00
Olivia Jain	29172d8f54	Setup EP Dashboard (#7321 ) * setting up dashboard * posting to ort dashboard * creating separate docker file * including common deps * tracking latency over time	2021-05-11 10:33:39 -07:00
Guoyu Wang	ce8473a4ea	Add script to build fat iOS framework (#7607 )	2021-05-11 09:46:24 -07:00
Scott McKay	d39db89fbb	Add info on some additional pytorch models that were added to the test models. No new operators are required. (#7644 )	2021-05-11 19:48:28 +10:00
Hariharan Seshadri	4b691a5c0d	Add ability for memory arenas to "shrink" periodically (#7284 )	2021-05-08 07:53:21 -07:00
Changming Sun	41e370c2b3	Update protobuf to 3.16 (#7616 )	2021-05-07 14:09:23 -07:00
baijumeswani	f3a70f1aec	Ignore invalid input argument to install_os_deps.sh (#7566 )	2021-05-05 14:33:31 -07:00
Changming Sun	a284eede64	Fix Linux CPU pipeline (#7584 )	2021-05-05 13:26:10 -07:00
Sheil Kumar	91985ab03d	add use_dml (#7569 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-05-05 08:55:13 -07:00
Guoyu Wang	e05528a365	Update Android AAR packaging pipeline script (#7559 ) * update android package pipeline * update shell script * update script * add kMSExperimentalDomain to reduction	2021-05-04 11:13:33 -07:00
Scott McKay	594dde2647	Validate that the conversion script from the python package can be used to convert models. (#7517 )	2021-05-04 16:25:04 +10:00
George Wu	faea7a222d	linux trt package pipeline (#7537 )	2021-05-03 19:14:20 -07:00
baijumeswani	cab84d902e	Install and use conda on ortmodule CI pipelines (#7530 ) * Install and use conda on ortmodule CI pipelines * Update build script to install onnxruntime wheel before running unit tests * Remove python 3.5 from install_python_deps * Pinning deepspeed version to 0.3.15	2021-05-03 15:52:22 -07:00
Yulong Wang	7079dfb93d	[wasm] fix and unify webassembly target name (#7549 )	2021-05-03 10:37:25 -07:00
Sheil Kumar	94c4c44bfc	Enable Microsoft.Ai.MachineLearning package to work on .NET5 down to 17763 Windows SDK (#7522 ) * upgrade cswinrt and downgrade target framework * fix sdk version references * cswinrt 1.1.0 Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-05-01 00:56:36 -07:00
Adrian Tsai	70e67ddd2b	Update DirectML version to 1.5.1 and enable ARM/ARM64 builds with DML (#7511 ) * Update DirectML to version 1.5.1 * Enable --use_dml with ARM and ARM64 * Add ARM/ARM64 binaries to nuget packages	2021-04-30 00:49:30 -07:00
Yulong Wang	00aaa6dabb	update CI for onnxruntime-web (#7497 )	2021-04-29 22:22:52 -07:00
Scott McKay	d6df5764d7	Android package infrastructure (#7430 ) * Include ORT format model conversion scripts and infrastructure in ORT python package. - tweak existing script setup so it can be easily run directly and from the ORT python package Add config file and readme for Android minimal build package Update ORT Mobile doco Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot). * Address PR comments	2021-04-30 14:23:54 +10:00
Changming Sun	7b003967b1	Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489 )	2021-04-29 11:54:57 -07:00
liqunfu	196e6702ad	to support multiple cuda versions in published onnxruntime-training package (#7468 ) to support multiple CUDA versions in published onnxruntime-training package	2021-04-27 17:15:33 -07:00
Edward Chen	d21304ceb0	Initial Objective-C API (#7366 ) Initial implementation of an Objective-C API.	2021-04-27 10:06:30 -07:00
Changming Sun	78e583d08c	Add CMAKE_CUDA_ARCHITECTURES=52 to TensorRT CI pipelines (#7455 )	2021-04-27 09:55:23 -07:00
Edward Chen	4804ede501	Update build docker image cache cleanup build definition (#7452 ) Decrease default cache history length to 4 days. Other minor updates to build definition.	2021-04-26 14:39:46 -07:00
Suffian Khan	7a3c1787af	Add CI pipeline to publish Python training package targeting Rocm (#7417 ) * first attempt rocm training wheel * modifications needed to python packaging pipeline for Rocm 4.1 * changges to not conflict with cuda missed stage1 changes remove package push add option r to getopt try again without python install try again without python install try again without python install split pipelines and add back push to remote storage try on cuda gpu pool try again try again try running without az subscription set try again on original pipeline change pool passing AMD Rocm whl on AMD-GPU pool split rocm pipeline from cuda pipeline remove comments * try adding Rocm tests as well * try with tests in place * fix trailing ws * add training data * try again as root for tests * use python3 * typo * try to map video, render group into container * try again * try again * try to avoid yum error code * make UID 1001 * try without yum downgrade * define rocm_version=None * remove CUDA related comments for Rocm Dockerfile * Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1) * missed requirements-rocm.txt from last commit * fix whitespace	2021-04-23 17:22:31 -07:00
Changming Sun	9f683bae78	Revert the TRT change and move the build to a new pool (#7434 )	2021-04-23 14:00:26 -07:00
Ashwini Khade	75e054cd33	pick onnx release candidate (#7177 ) * pick onnx release candidate * fix typo * filter batchnorm tests * add implementation for reshape 14 * add identity op kernel for opset 14 * fix typo * update onnx commit * update commit to latest master * add hashes for new kernel registrations and update 1 * TEST commit * update onnx back to right commit * Update onnx to latest in rel-1.9.0 * temp fix * remove nonzeroshapesetter transformer * pick rel branch latest commit * fix build failures * fix build failures * fix build failures * update the commit to latest in release branch * add test filters for not impemented op14 ops in c# tests * plus review comments	2021-04-22 23:57:09 -07:00
Guoyu Wang	d414039189	Add ios coreml ci, and speedup ios ci run (#7420 )	2021-04-22 23:41:58 -07:00
Changming Sun	6822ae95ec	Reduce the number of TensorRT tests needed to run (#7419 )	2021-04-22 19:14:39 -07:00
Changming Sun	afa7b23609	Update docs/ContribOperators.md and the script that generates it. (#7399 )	2021-04-21 16:20:56 -07:00
Changming Sun	65b2b87f83	Update CI build docker images (#7386 ) Update CI build docker images: delete ubuntu 16.04 support.	2021-04-21 13:18:34 -07:00
Changming Sun	b4cfa88bf7	Update protobuf to the latest version (#7396 )	2021-04-21 10:30:06 -07:00
Changming Sun	243713c464	Upload detailed code coverage result to azure blob storage (#7392 )	2021-04-21 08:24:44 -07:00
Guoyu Wang	96cdc65d57	Fix android CI failure after gradle updated to 7.0 (#7364 ) * Fix android ci failure after gradle updated to 7.0 * minor update	2021-04-16 15:28:28 -07:00
Yulong Wang	009f342caf	[JS] refactor Javascript/Typescript libraries in ONNX Runtime (#7308 ) * working on re-organizing js code for ortweb * remove dup files * move folder * fix common references * fix common es5 * add webpack to common * split interfact/impl * use cjs for node * add npmignore for common * update sourcemap config for common * update node * adjust folder/path in CI and build * update folder * nit: readme * add bundle for dev * correct nodejs paths * enable ORT_API_MANUAL_INIT * set name for umd library * correct name for commonjs export * add priority into registerBackend() * fix npm ci pwd * update eslintrc * revise code * revert package-lock lockfileVersion 2->1 * update prebuild * resolve comments * update document * revise eslint config * update eslint for typescript rules * revert changes by mistake in backend.ts * add env * resolve comments	2021-04-16 01:33:10 -07:00
Changming Sun	f1c1c38d44	Delete an unused var in nuget pipelines(#7345 )	2021-04-15 07:29:52 -07:00
Jesse Benson	be79575c6a	Use built-in reduce_sum() for simple reduction cases, specifically reduce all to a scalar.	2021-04-14 08:55:35 -07:00
liqunfu	4c862c73ed	for training to use new python package naming convention to explicitl… (#7204 )	2021-04-13 16:19:42 -07:00
Guoyu Wang	fce67e2b9b	Create Android Package pipeline (#7295 ) * Create Android Package pipeline * adress CR comments * Switch to jdk11	2021-04-12 17:56:25 -07:00
Weixing Zhang	75c0192e4f	enable more unit tests for ROCM EP (#7307 )	2021-04-09 15:15:13 -07:00
Weixing Zhang	c22963c23d	Polish Lamb Kernel (#7299 )	2021-04-09 09:55:57 -07:00
Weixing Zhang	8ad5007f8f	Polish Adam kernel (#7294 ) * Polish Adam kernel	2021-04-09 01:11:09 -07:00
Yulong Wang	405ca49012	build ONNXRuntime into WebAssembly (#6478 ) * Simplified version of WebAssembly support to keep most of existing data structures and add cmake using Ninja and emcmake * Clean up CMakeLists.txt and add an example to create and compute a kernel * Load a model from bytes and remove graph building steps * Add all cpu and contrib ops with mlas library * WebAssembly build with Onnxruntime C/CXX API * Use protobuf cmakefile directory instead of adding every necessary source file * Fix invalid output at example * add missing files * Change an example to use Teams model and support ort mobile format * add API for javascript * fix input releasing in _ort_run() * update API * Let onnxruntime cmake build WebAssembly with option '--wasm' * allow one-step building for wasm * Make build script working on Linux and MacOS * Fix broken build from Windows command * Enable unit test on building WebAssembly * Resolve comments * update build flags * wasm conv improvement from: 1) GemmV; 2) Depthwise direct convolution 3x3; 3) Direct convolution 3x3 * Cleaned mlas unittest. * use glob * update comments * Update baseline due to loss scale fix (#6948) * fix stream sync issue (#6954) * Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960) * Update EyeLike CPU kernel. * Update Mod CPU kernel. * Update Multinomial CPU kernel. * Slight improvement to Pad CPU kernel binary size. * Update RandomNormal[Like], RandomUniform[Like] CPU kernels. * Fix warning from setting multiple MSVC warning level options. (#6917) Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one. * MLAS: quantized GEMM update (#6916) Various updates to the int8_t GEMMs: 1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before. 2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size. 3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator. * Implement QLinearAveragePool with unit tests. (#6896) Implement QLinearAveragePool with unit tests. * Attention fusion detect num_heads and hidden_size automatically (#6920) * fixed type to experimental session constructor (#6950) * fixed type to experimental session constructor Co-authored-by: David Medine <david.medine@brainproducts.com> * Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Fix possible fd leak in NNAPI (#6966) * Release buffers for prepacked tensors (#6820) Unsolved problems: 1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller? 2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too. 3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked. * Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963) * add CI * fix test in ci * fix flags for nsync in wasm build * add copyright banner * fix wasm source glob * add missing exports * resolve comments * Perf gain by make packb wide to 4 from 16 on GEMM for WASM. Remove no need direct conv in previous perf tuning. * fix buildbreak introduced from latest master merge * fix buildbreak in mlasi.h * resolve all comments except MLAS * rewrite packb related 3 functions for WASM_SCALAR seperately rather than using #ifdef in each. and other changes according to PR feedback in mlas. * More complete scalar path in sgemm from Tracy. * Fix edge case handling in depthwise conv2d kernel 3x3. where: ) support input W==1 and H==1 ) recalc in accurate pad_right and pad_bottom ) support hidden pad_right == 2 or pad_bottom == 2 when W == 1 or H==1 and no pad left/top Add more test coverage for conv depthwise from Tracy. Fix one typo according to PR. * resolve comments * replace typedef by using * do not use throw in OrtRun() * output error message Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com> Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com> Co-authored-by: David Medine <david.eric.medine@gmail.com> Co-authored-by: David Medine <david.medine@brainproducts.com> Co-authored-by: Ori Levari <ori.levari@microsoft.com> Co-authored-by: Ori Levari <orlevari@microsoft.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: Chen Fu <chenfucs@gmail.com>	2021-04-06 16:18:10 -07:00
Olivia Jain	fb40602ea2	Mem trt (#6868 ) * adding trt comparison and memory consumption * creating separate docker file	2021-04-05 22:16:12 -07:00
Guoyu Wang	c5973fbbac	Update the build script for Android AAR package (#7229 ) * Update the build script for Android AAR package * Address CR comments	2021-04-05 16:37:22 -07:00
Suffian Khan	9f14af9809	Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240 ) * restore bs test and add perf test * update perf number and fix path to results	2021-04-05 15:51:52 -07:00
Edward Chen	0ebeaf529d	Check kernel def hashes (#7120 ) Add unit test for verifying kernel def hashes. Add way to add new types to kernel definition without changing hash.	2021-04-01 17:42:58 -07:00
Jesse Benson	4543459984	MIOpen supports MIOPEN_REDUCE_TENSOR_AVG now.	2021-04-01 16:00:34 -07:00
sfatimar	52bcef4d4f	Openvino ep 2021.3 (#7180 ) * Integrate openvino-ep-2021.3 * operators type * changed the myriad as it is case sensitive * logging information for openvino-ep-2021.3 * Unit test fix * Resize operator added for myriad * Fixed python tests for CPU and GPU * data commit for loop tile and gatherelements failure * adding checks for Where * fixing gatherelements and loop tests * disabling instance normalization test for now as there seems to be a myriad bug, putting loop in ops supported only because all the tests fail * gather elements op test taking care of warning message * condition needs to be an intializers * Disabled python test for Myriad * Disable compilation warning for MSVC windows compiler * softmax_test, threedimaxis0 and 1 test give accuracy mismatch tensoroptest disables test gives accuracy mismatch gather test gives accuracy mismatch * Updated with ov version 2021.3 * Updated with ov version 2021.3 * Updated README * Disabling python tests for cpu * Disabling python tests with accuracy mismatch on cpu * Added fix for Linux CI Pipeline failure -> Disabled tests that were throwing segfault Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: Aravind <aravindx.gunda@intel.com>	2021-04-01 11:28:54 -07:00
baijumeswani	249a2c14ef	Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167 ) * Pin version of pytorch to 1.8.1 for ORTModule CI pipeline * Use pytorch-lightning stable version 1.2.5 * Revert to cuda 10.1	2021-04-01 09:37:47 -07:00
Edward Chen	04679e31ab	Specify CUDA compute capability 7.5 in Linux GPU build (#7203 ) Recently a build agent pool was changed to use T4 GPUs (CUDA compute capability 7.5). Updating some CUDA build options to accommodate that.	2021-03-31 18:51:44 -07:00
Guoyu Wang	d500c5952b	Add Android AAR packaging script for ORT-Mobile (#7138 ) * Add Android aar packaging script for ORT-Mobile * Address CR comments	2021-03-30 18:42:18 -07:00
liqunfu	e545604499	. (#7165 )	2021-03-30 13:58:30 -07:00
Changming Sun	bbcf419ac6	Move the Windows GPU machine pool of Onnxruntime packaging pipelines to a new one (#7161 )	2021-03-29 17:32:03 -07:00
RandySheriffH	aeca7c2940	Cuda Profiler (#7110 ) * implement cuda profiler * add counters * downgrade cupti kernel version * move mutex * add cupti to path * fix win gpu build err * add path for cuda10 * fix linux com err * extend include path * add init flag * fix test case * fix tensorrt pipeline * add UT Co-authored-by: Ubuntu <randysheriff@rashuai-linux-gpu-3.3cfnmjowvu4e5bidlsmcxsmzwg.xx.internal.cloudapp.net>	2021-03-29 12:04:36 -07:00
Ashwini Khade	b22e60bd44	pull onnx latest commit (#7102 ) * update onnx commit * fix test scripts to remove deprecated call * update filters * add registration for relu and cumsum ver 14 * add promote trilu to onnx domain * update onnx-tensorrt submodule * update flag * update flag * update dependencies * fix android ci failure	2021-03-29 11:00:38 -07:00
Suffian Khan	f27835c4de	Disable batch size test for AMD CI pipeline after agent upgrade to Rocm 4.1 (#7153 ) * disable batch size test for rocm 4.1 until resolved * Update orttraining-pai-ci-pipeline.yml Forgot to modify both pipelines	2021-03-26 22:32:39 -05:00
Suffian Khan	2b31b80b1f	icnrease timeout (#7145 )	2021-03-26 11:26:18 -07:00
Changming Sun	2e3bbad19f	Move TensorRT Windows CI build to the machine pool (#7127 )	2021-03-24 14:28:25 -07:00
harshithapv	540eac253e	Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078 ) * adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule * fixed typo in args * addressed Thiago's comments * Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-03-24 09:43:05 -07:00
Changming Sun	b07e168a2b	Delete an unused file: download_test_data.py (#7109 )	2021-03-23 14:49:26 -07:00
baijumeswani	a7a2a16edd	Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094 )	2021-03-22 21:48:32 -07:00
liqunfu	309885b08d	upload ort-gpu-training python nightly package to azure feed (#6998 )	2021-03-22 18:44:54 -07:00
Thiago Crepaldi	867804bea1	Add auto doc gen for ORTModule API during CI build (#7046 ) In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI	2021-03-22 10:20:33 -07:00
Edward Chen	8d5bfdeb47	Increase timeout for Android CI pipeline by 30 minutes. (#7065 )	2021-03-19 08:03:22 -07:00
Changming Sun	701e73b5b8	Move Linux minimal build CI pipeline to the new Linux machine pool (#7050 )	2021-03-18 12:09:12 -07:00
Xavier Dupré	514444d820	Fix pipeline generating python documentation (#7027 ) Co-authored-by: xavier dupré <xavier.dupre@gmail.com>	2021-03-17 16:57:51 -07:00
Thiago Crepaldi	335edaa2c4	Merge pull request #6973 from microsoft/thiagofc/merge-ortmodule-into-master Introduce ORTModule training API to ONNX Runtime	2021-03-17 10:30:06 -07:00
Changming Sun	ed2d441a2e	Update ORT server build pipeline (#7030 ) 1. Migrated it to Ed's new docker build script 2. Use python 3.6 instead, because it is the default one in ubuntu 18.04 3. Move the "pip install" command to the docker image build stage(instead of when running the image)	2021-03-16 18:02:09 -07:00
Changming Sun	2361cb99b6	Remove CentOS CI pipeline (#6997 )	2021-03-16 10:55:03 -07:00
Tiago Koji Castro Shibata	975e4efb8a	Package ARM artifacts (#6805 )	2021-03-16 10:12:48 -07:00
Changming Sun	4161758058	Remove openmp related packaging pipeline (#6991 ) 1. Remove openmp related packaging pipelines and build jobs. 2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently. 3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support 4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols	2021-03-12 10:02:59 -08:00
Xavier Dupré	694389a85d	Automate generation of python documentation (#6909 ) Co-authored-by: xavier dupré <xavier.dupre@gmail.com>	2021-03-11 19:02:45 -08:00
Thiago Crepaldi	89d450697b	Introduce ORTModule training API to ONNX Runtime	2021-03-10 10:48:10 -08:00
Edward Chen	b6c4a7ac54	Support required types when excluding typed registrations (#6871 )	2021-03-08 08:22:07 -08:00
baijumeswani	79f832c682	Separate requirements.txt file for ORTModule pipelines (#6879 ) * Move all ORTModule dependency installations to ortmodule subfolder	2021-03-05 14:12:11 -08:00
RandySheriffH	f986ffcb5f	move pipeline file and change relative path (#6882 ) Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-03-04 15:31:42 -08:00
Tiago Koji Castro Shibata	fa8d1b44b8	Fix app packaging in UWP (#6804 ) * Change msbuild condition for UAP * update .netcore target as well * create nuget packages with _native path * validate path under _native directory for windowsai package * pep8 * add diagnostic error message * pep8 * use baseame * lib\uap10.0 * uap10 * build\\uap10.0 * Manually binplace winmds into appx when PackageReference is used. * always binplace winmd regardless of packagereference since c# should work with packages.config also * resolve all paths to full paths to avoid some reference warnings * move winmds out of lib folder to prevent automatic component registration Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-03-04 11:16:25 -08:00
Baiju Meswani	d5667554e6	Merge branch 'master' of github.com:microsoft/onnxruntime into bmeswani/merge_master_onto_ortmodule	2021-03-03 20:37:29 -08:00
Guoyu Wang	36a44d55ed	Only report Android Baseline binary size for master branch (#6844 ) * Only report binary size from master * update script * Correct the typo	2021-03-01 15:57:18 -08:00
Sherlock	12edf22f11	Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-master-210226 Sync from master	2021-02-27 12:32:36 -08:00
Thiago Crepaldi	f71d93ea2b	Enable PyTorch Lightning basic test on CI (#6809 )	2021-02-27 09:35:42 -08:00
M. Zeeshan Siddiqui	ca48310d6d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/ortmodule-api-sync-from-master-210226	2021-02-27 04:25:23 +00:00
baijumeswani	c1b0cf6d0b	Add pipeline to clear the cache for huggingface transormers models (#6813 )	2021-02-26 10:39:22 -08:00
baijumeswani	fa8a9015bd	Mount hf model cache and use cache for loading hf models (#6810 )	2021-02-25 13:30:14 -08:00
Olivia Jain	db05d53b94	Setup perf in docker and add features (#6582 ) * setup scripts to run in docker * percent threshold for accuracy * branch testing	2021-02-25 09:31:03 -08:00
Maajid khan	7465673e33	[OpenVINO-EP] Find package changes (#6801 ) * Find package changes to cmake * Removing unwanted code from cmake Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>	2021-02-25 05:12:57 -08:00
Suffian Khan	8a148e44fb	make ci pipeline also run batch and convergence test (#6798 )	2021-02-24 20:18:03 -08:00
Weixing Zhang	40fa40f3ce	Enable more unit tests for ROCM EP (#6776 ) * enable more ops and unit tests for ROCM EP	2021-02-24 15:20:50 -08:00
baijumeswani	65ba51d93e	Re-enable test and increase timeout (#6785 )	2021-02-23 18:51:06 -08:00
Edward Chen	5db0c9c648	Enable CI to cover globally allowed types (#6778 ) Add test to CI build to cover type reduction with globally allowed types.	2021-02-23 10:24:12 -08:00
liqunfu	79b966b01a	. (#6751 ) make ort training pert test green. remove unneeded yaml	2021-02-19 09:03:58 -08:00
stevenlix	53eb948f4c	Upgrade TensorRT to v7.2.2 (#6452 ) * upgrade to TensorRT 7.2.2 * extend GPU tensorrt CI timeout to 150 minutes * update docker image name * disable user interaction to avoid tensorrt container stuck when install tzdata * upgrade to libssl1.1 for ubuntu20.04 * remove libicu60 from ubuntu20.04 * add libicu66 for ubuntu20.04 * debug * llvm * llvm * disable ReverseSequenceTest.InvalidInput * disable ReverseSequenceTest.InvalidInput * fix issues * fix issues * Update linux-gpu-tensorrt-ci-pipeline.yml * disable warning 4458 for TensorRT parser * update onnx-tensorrt submodule * disable warnings for TensorRT parser * update onnx-tensorrt submodule to include latest bug fixes * update setup_env_trt * update pool for win trt ci pipeline' Co-authored-by: George Wu <jywu@microsoft.com>	2021-02-18 04:30:47 -08:00
M. Zeeshan Siddiqui	40dda452cf	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/sync-from-master	2021-02-18 03:03:01 +00:00
liqunfu	dd8ef4409a	Liqun/migrate perf test (#6733 ) move ort training perf tests to azure devops	2021-02-17 17:48:47 -08:00
Thiago Crepaldi	3184c47ad1	Merge branch 'master' into thiagofc/merge-from-master	2021-02-17 11:49:52 -08:00
Changming Sun	0be5475de6	Update packaging pipelines(#6664 )	2021-02-17 09:53:36 -08:00
Changming Sun	46c06f6ac7	Change Windows GPU CI pipeline to CUDA11 (#6616 )	2021-02-17 09:44:44 -08:00
RandySheriffH	9043df8b66	Deprecate OMP from nuget pipeline (release:1.7) (#6671 ) * deprecate OMP from nuget * remove omp build * remove * add openmp build * add variants * rename package * move GPU to no omp pipeline * reset path * switch to abs path * reset path * add cpu package * remove obsolete name * set package name Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-02-17 00:03:44 -08:00
baijumeswani	01dfa8e125	Support non tuple return values from torch.nn.module (#6660 ) * Support dictionary, namedtuples and huffingface ModelOutput type for model return values	2021-02-16 20:48:32 -08:00
Scott McKay	02c7873b0e	Update ORT model conversion script to support custom ops (#6701 ) * Add support for custom ops library to the ORT model conversion script Simplify model conversion now that we read ops from the ORT format model. Enable custom ops in the python bindings if custom ops are turned on in a minimal build. * Add test of model conversion involving custom ops.	2021-02-17 12:52:39 +10:00
RandySheriffH	c36ee4bd40	Rename Python packaging pipelines (#6682 ) * rename pipelines * resync and rename * resync master * rename package id * remove OrtPackageId which is for nuget Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-02-16 11:43:03 -08:00
RandySheriffH	497eef8d3d	remove omp (#6675 ) Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-02-16 11:42:32 -08:00
Changming Sun	d48a4c0a54	Add CG step to nuget GPU pipeline (#6678 )	2021-02-16 09:46:20 -08:00
Changming Sun	9a01174037	Disable some unit tests for training (#6699 )	2021-02-16 09:45:59 -08:00
RandySheriffH	df3d6bad5f	Deprecate OMP from Python package (#6610 ) 1. For previous openmp build, remove --use_openmp, so thread pool will become default; 2. For previous non-openmp build, add --use_openmp and rename the package to indicate the inclusion. 3. Add a mac build with openmp enabled.	2021-02-12 21:50:41 -08:00
Scott McKay	25f7c93504	Require explicit inclusion of custom op support in a minimal build (#6663 ) * Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build. Add ability to explicitly enable custom op support in a minimal build. Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)	2021-02-13 12:42:33 +10:00
Changming Sun	dd50c39ac6	Change Linux python packaging pipeline compile flags (#6668 )	2021-02-12 15:28:56 -08:00
Sheil Kumar	87cb6fd495	Add LearningModelBuilder to WinML Experimental Namespace along with various Audio operators (#6623 ) * model building * fix build * winml adapter model building api * model building * make build * make build again * add model building with audio op * inplace and inorder fft * add ifft * works! * cleanup * add comments * switch to iterative rather than recursive and use parallelization * batched parallelization * fft->dft * cleanup * window functions * add melweightmatrix op * updates to make spectrogram test work * push latest * add onesided * cleanup * Clean up building apis and fix mel * cleanup * cleanup * naive stft * fix test output * middle c complete * 3 tones * cleanup * signal def new line * Add save functionality * Perf improvements, 10x improvement * cleanup * use bitreverse lookup table for performance * implement constant initializers for tensors * small changes * add matmul tests * merge issues * support add attribute * add tests for double data type windowfunctions and minor cleanup * stft onesided/and not tests * cleanup * cleanup * clean up * cleanup * remove threading attribute * forward declare orttypeinfo * warnings * fwd declare * fix warnings * 1 more warning * remove saving to e drive... * cleanup and fix stft test * add opset picker * small additions * add onnxruntime tests * add signed/unsigned * fix warning * fix warning * finish onnxruntime tests * make windows namespace build succeed * add experimental flag * add experimental api into nuget package * add experimental api build flag and add to windows ai nuget package * turn experimental for tests * add minimum opset version to new experimental domain * api cleanup * disable ms experimental ops test when --ms_experimental is not enabled * add macro behind flag * remove unused x * pr feedback Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-02-12 14:17:10 -08:00
Suffian Khan	e6de0eb813	Add nightly pipeline for MI100 to run convergence and batch size test similar to V100. (#6611 ) * Partial updating of ROCM reduction code. * Update reduction_all.cu * Add reduce template parameters. * miopen common * Reuse CUDA's reduction_functions.cc * Reduction ops. * Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels. * Disable a couple more unsupported tests. * Code formatting. * Delete ROCM-specific reduction code that is identical to CUDA reduction code. * Fix scratch buffer early free. * Fix merge conflict. * first attempt nightly amd ci pipeline * try fix bad yaml file * try again with corrected model directory * add convergence test as well * update reference loss for amd mi100 * include mi100 test results csv * update the mi100 convergence test reference values * update batch sizes for mi100 32g * fix gpu sku for run_convergence_test.py * undo unrelated changes to master * pr comments * pr comment Co-authored-by: Jesse Benson <jesseb@microsoft.com>	2021-02-12 13:22:06 -08:00
Changming Sun	8378a45ae7	Add python 3.8/3.9 support for Windows GPU and Linux ARM64 (#6615 ) Add python 3.8/3.9 support for Windows GPU and Linux ARM64 Delete jemalloc from cgmanifest.json. Add onnx node test to Nuphar pipeline. Change $ANDROID_HOME/ndk-bundle to $ANDROID_NDK_HOME. The later one is more accurate. Delete Java GPU packaging pipeline Remove test data download step in Nuget Mac OS pipeline. Because these machines are out of control and out of our network, it's hard to make it reliable and the data secure. Fix a doc problem in c-api-artifacts-package-and-publish-steps-windows.yml. It shouldn't copy C_API.md, because the file has been moved into a different branch. Delete the CI build docker file for Ubuntu cuda 9.x and Ubuntu x86 32 bits And, due to some internal restrictions, I need to rename some of the agent pools	2021-02-11 16:43:35 -08:00
Changming Sun	042964f633	Change how ONNX get installed	2021-02-10 14:41:26 -08:00
Edward Chen	e59cb9455e	Add CI build with type reduction enabled (#6622 )	2021-02-10 13:31:51 -08:00
Changming Sun	e70344e648	Fix training python packaging pipeline (#6613 ) In a previous PR, I set the docker file name to a wrong value.	2021-02-09 11:04:39 -08:00
Changming Sun	0b89f931d0	Update CUDA build configs (#6598 ) 1. Fix Nuget package build break caused by #6225 2. Delete Dockerfile.centos_gpu. It is not used anywhere. 3. Fix Linux CUDA 10.2 build error caused by glibc upgrade	2021-02-08 22:55:42 -08:00
Xavier Dupré	d3a2c8c1c7	Support double for operators ReduceMax, ReduceMin (#6265 ) * Support double for operators ReduceMax, ReduceMin * add unit test to pai-excluded-tests.txt Co-authored-by: xavier dupré <xavier.dupre@gmail.com>	2021-02-08 19:14:26 -08:00
Randy Shuai	ff063309b0	enable omp for debug build	2021-02-08 19:10:13 -08:00
Randy Shuai	6c5f50d00e	deprecate omp in ci	2021-02-08 19:10:13 -08:00
Jesse Benson	d18aa45b46	Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models.	2021-02-06 14:40:34 -08:00
Changming Sun	b5bd14fc9f	Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585 ) Update gpu packaging pipelines to CUDA11 In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.	2021-02-05 16:58:37 -08:00
Chun-Wei Chen	f2ce3aae13	add set_model_dir and update ONNX (#6119 )	2021-02-05 09:30:49 -08:00
Jesse Benson	21a47ec8d9	Disable a couple more unsupported tests.	2021-02-04 15:00:05 -08:00
Jesse Benson	0b147702af	Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels.	2021-02-04 15:00:05 -08:00
Jesse Benson	a28ddb85b6	Reduction ops.	2021-02-04 15:00:05 -08:00
Changming Sun	aa31ba5774	Merge CPU packaging pipelines (#6480 ) 1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one. 2. Enable compile warnings for cuda files(*.cu) on Windows. 3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code. 4. Fix some warnings in the training code. 5. Enable code sign for Java. Previously we forgot it. 6. Update TPN.txt to remove Jemalloc.	2021-02-04 08:38:56 -08:00
Guoyu Wang	6cf54ff296	Switch Android CI java build to JDK 11 (#6552 ) * switch to jdk11 * fix java * Update	2021-02-03 17:49:23 -08:00
baijumeswani	62ac164279	Cache datasets on CI machines (#6525 )	2021-02-02 21:11:35 -08:00
ashbhandare	85434273ff	Fix CUDA Reduction kernel for ArgMax/ArgMix for when reduction dim=1 (#6490 ) * Fix for when reduction dim=1 * Disable test for AMD GPUs * Specify Async	2021-02-02 09:50:16 -08:00
Thiago Crepaldi	8a890ddfd7	Sync ORTModule branch with master and fix tests (#6526 ) * Deprecate Python global configuration functions [Part 1] (#5923) Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions. * remove dnnl_dll_path from post build copy (#6142) * Model Fusion For Bart (#6105) Fusion fix for Bart models * Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108) * Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers * Change Provider_IExecutionProviderFactory to be the core version. * Enable running the mnist_training sample without cuda (#6085) Signed-off-by: George Nash <george.nash@intel.com> * nnapi add min max support (#6117) * Fix CUDA test hang: (#6138) - Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present. * Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115) * add static subgraph kernel index * change kernel naming to avoid conflicts * Add gradient registration for Abs. (#6139) * Partition initial optimizer state for Zero-1 (#6093) * Initial changes * Working changes * Working changes * Cleanup * fix windows CI * Review comments * review comments * Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145) #4656 * Revert "work around of the build break in mac (#6069)" (#6150) This reverts commit `3cae28699b`. * Fix clean_docker_image_cache.py detection of image pushes. (#6151) Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200. * MLAS: add NEON version of int8 depthwise convolution (#6152) * Using a map of of ops to stages as input of partition function. (#5940) * New partition algorithm running before AD * Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2 * Removing code for partition of bwd graphs * Remove old code * Adding some verification code * Handle Shared Initializer * Renaming rank with stage * Added first unit test * new test * redundant check * undo change in bert * Moved cut-based partition to testing utils file Co-authored-by: xzhu1900 Co-authored-by: wschin * New conversion function and tests * minor * remove test that is not needed2 * improve GetDeviceAssignment and PR comments * minor changes * PR comments * improving documentation and variable naming * add documentation * Variable naming and docs * more doc improvements * more doc improvements * missing static cast * Fix test file for windows * Fix test file for windows * Fix test file for windows * stage id is not the same as rank id * PR comments * PR comments * More comments * More comments * Minor fix to satisfy c++14 (#6162) * Deprecating Horovod and refactored Adasum computations (#5468) deprecated horovod submodule refactored adasum logic to be ort-native added tests for native kernel and e2e tests * Update TensorRT-ExecutionProvider.md (#6161) * Bugfix for topk cuda kernel (#6164) * fix the issue that std::numeric_limits cannot handle half type * adding a test Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169) This reverts commit `f2dcba7afe`. * Remove ignored build warnings for pybind on Mac (#6165) * save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136) * save_checkpoint and load_checkpoint implementations * checkpoint aggregation logic * unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints * Don't try to bind unused inputs in the Training frontend (#6166) * Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) * aggregate model states only for the case when mixed precision was true (#6176) * [NNAPI EP] Enable per-channel quantization for QlinearConv (#6155) * Enable qlinearconv per-channel quantization * Fix the android CI test failure * Add Android Version Check for Per-Channel Quant * Address PR comments * Fix some minor issues * Add verification of per-channel zero points * Make the error tolerance configurable * Fix typo in BERT pretraining script (#6175) A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail. * Update get_docker_image.py to enable use without image cache container registry. (#6177) Update get_docker_image.py to enable use without image cache container registry. * Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156) * Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions. Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach). * Restructure the helper so it can be called across the EP bridge. Add ability to call id generation helper from EP bridge - convert DNNL EP to use helper to validate Address issue where a new Model may be loaded into the same address as a previous one. - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time. - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution. * Backend APIs for checkpointing (#5803) * Add backend API GetOptimizerState and GetModelState * add GetPartitionInfoMap * Android coverage dashboard (#6163) * Write the report to a file. * Post code coverage to the Dashboard database. * Add usage details of unified MCR container image (#6182) Going forward, a single unifed docker image will be published in MCR. The hardware accelerator target choice will have to be made in the application using OpenVINO EP's runtime config options. * improve perf for softmax (#6128) * improve perf for both gathergrad and softmax * revert the change in gathergrad and will be done in another PR. * address comments from code review. * Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174) * tune fast gelu to use exp(x) instead of tanh(x) on rocm * update to use expression 2/(1+exp(-2x))-1 for stability * Add Status.csv to EP Perf Tool (#6167) * merge master, keep postprocess status commit * download float16.py everytime * removing hardcoded values * Lochi/quantization tool for trt (#6103) * Initial implementation of generating calibration dynamic range table * Initialize validation support for Quantization * Initialize validation support for Quantization (cont.) * Improve validation support for Quantization * Improve validation support for Quantization * Rewrite/Refine for calibration and validation * Rewrite/Refine for calibration and validation (cont.) * Refine code * Refine code * Add data reader for BERT * Add flatbuffers to serialize calibration table * Refine code and add BERT evaluation * Refine the code * minor modification * Add preprocess/postprocess of vision team yolov3 and refine the code * Update annotation * Make bbox cooridates more accurate * Fix bug * Add support of batch processing * Batch processing for model zoo yolov3 * Add batch inference for evaluation * Refine the code * Add README * Add comments * Refine the code for PR * Remove batch support checking in data_reader and refine the code * Refine the code for PR * Refine the code for PR review Co-authored-by: Olivia Jain <oljain@microsoft.com> * Implement ScatterND for CUDA EP (#6184) * Condition fix in Resize operator (#6193) * Clean up checkpoint tests to use the new checkpoint functions (#6188) * add deprecation warning for old checkpoint functions * update all the distributed checkpoint tests to use new checkpoint functions * Implement comparing outputs that are sequence of maps of strings to floats (#6180) * Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats * PR comments * Dockerfile to build onnxruntime with ROCm 4.0 * Add ability to skip GPU tests based on GPU adapter name (#6198) * Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats * PR comments * Add ability to skip gpu tests according to adapter description * spacing * spacing * spacing * Openvino ep 2021.2 (#6196) * Enabling fasterrcnn variant and vehicle detector * changes for 2021_2 branch * yolov3_pytorch commit * fixed braces in basic_backend.cc * ci information added * faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2 * some changes to support unit tests * disable some tests which are failing * fix myriad tests for vehicle detector * Did some cleanup cleaned up comments Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0 tests on MYRIAD_FP16 backend due to a bug cleaned up capability_2021_2.cc file Removed extra conditions which were added for some validation in backend_utils Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * yolov3 pytorch workaround to ensure that the output names are matched * gemmoptest fixed on myriad * Fixed MYRIADX CPP Test Failures Expand,GatherND,Range,Round op's are only supported in model where op with float input data types are not supported and fixed Scatter and ScatterElements op's with negative axis are fixed Reshape op with 0 dim value are not supported and fixed Disabled InstanceNorm_2 test on MYRIADX Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> make changes to yolov3 pytorch * Fixed python unit tests Fixed failing python tests on vpu, GPU and CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Fixes POW op failures on GPU_FP16 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Clean up capability_2021_2.cc Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for MultiThreading option Added extra info on setting the num_of_threads option using the API and it's actual usage Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> fixed slice and removed extra prints * Disabled failing python tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes added in capabilty_2021_2 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * made changes to slice to avoid failures * Disabling FP16 support for GPU_FP32 ->Inferencing an FP16 model on GPU_FP32 leads to accuracy mismatches. so, we would rather use GPU_FP16 to infer an FP16 model on GPU Device Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for Inferencing a FP16 Model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * fix for mask rcnn * Script for installing openvino from source * Updated with openvino 2021.2 online installation * code comment fixes fixed accuracy mismatch for div * Update OpenvinoEP-ExecutionProvider.md updated for 2021.2 branch * Update README.md updated dockerfile documentation * Update BUILD.md build.md update documentation * permissiong change of install_openvino.sh * made changes to align with microsoft onnxruntime changes * Updated with ov 2021.2.200 Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com> * Fix a memory leak in test_inference.cc (#6201) * Fix a memory leak in test_inference.cc * Use TArray in AMD element-wise kernels, rather than manually copying memory to device. * Remove most ROCm-specific element-wise code and reuse CUDA element-wise code. * Minor change to improve performance for operator Pad. (#5537) * small improvment for pad * Support double for operators Log, Reciprocal, Sum (CPU) (#6032) * Support double for operators Log, Reciprocal, Sum * remove tesdt erf_double * Support double for operators Where, LpNormalisation (#6034) * Support double for operators Relu, Tanh, Sigmoid (#6221) * Fix ImportError in build.py (#6231) There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already * Removed executor todo that looks dead. (#6234) * Remove MKLML/openblas/jemalloc build config (#6212) * Remove python 3.5 * Update the readme file * Upgrade build.py to assert for python 3.6+ Upgrade build.py to assert for python 3.6+ as python 3.5 cannot build anymore todays master. * Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233) * MLAS: handle MlasGemm(M/N/K==0) cases (#6238) * Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220) * Support double for operator TopK * add static classes for topk/double * fix cast issue in topk * Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223) * Support double for operator Gemm * fix type size while copying data in gemm operator for GPU * fix type in gemm implementation for rocm * Support double for operator ReduceMean, ReduceLogSumExp (#6217) * Support double for operators ReduceMean, ReduceLogSumExp * Support double for operator ArgMin (#6222) * Support double for operator ArgMin * add test specifically for double * add new test on pai-excluded-tests.txt * Update BUILD.md * Update manylinux docker image to the latest (#6242) * Fix allocator issue for TensorRT IOBinding (#6240) * Fix issue: https://github.com/microsoft/onnxruntime/issues/6094 Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt. Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT * Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239) * bias gelu grad use exp(...) instead * update cuda to rocm * missing semicolon * comment * remove dockerfile * missing factor of two * Refactor EP Perf Tool (#6202) * merge master, keep postprocess status commit * download float16.py everytime * using variables to reference eps * adding ACL EP to ep perf tool * accuracy with absolute tolerance configurable * add acl to dict + remove commented line * Documentation for distributed CI tests pipeline (#6140) * Remove a debug log in provider_test_utils.cc (#6200) * Add the Concat Slice Elimination transform, fix constant_folding transform (#5457) * Add concat slice transform + test * Cosmetic improvements in concat slice transform * Remove unrelated file, fix comment, fix constant folding bug * Add test onnx graph * fix windows build * Review comments * review comment Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252) * Add MakeStringLite which uses current locale, update macros to use that to generate messages. * Convert calls to MakeStringLite(). * Liqun/speech model loop to scan (#6070) Provide a tool to convert Loop to Scan for Nuphar performance Fix Nuphar CI pipeline failures. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * model parallel refinement (#6244) * Megatron Transformation as a seperate step * remove useless header * clang formating * Re-Structure megatron transformer for subsquent changes * fix comments * Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248) * Fix Linux/Mac error message on input type mismatch (#6256) * add bfloat16 to gathergrad type constrains (#6267) Co-authored-by: Cheng Tang <chenta@microsoft.com> * Fix VS 2017 build break (#6276) * Deprecate Python global configuration functions [Part 2] (#6171) Update Python API to allow more flexibility for setting providers and provider options. The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict). Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order. Convert some usages of the deprecated global configuration functions to use EP-specific options instead. Update some EP-specific option parsing to fail on unknown options. Other clean up. * Add script to preprocess python documentation before publishing (#6129) * add script to preprocessing python documentation before publishing * rename past to past_key_values for GPT-2 (#6269) rename past to past_key_values for transformers 4.* * Rename MakeString and ParseString functions. (#6272) Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, ParseString to ParseStringWithClassicLocale. Add missing pass-through versions of MakeStringWithClassicLocale for string types. * Increase timeout for Linux GPU CUDA11 build. (#6280) * Add helper to compare model with different precision (#6270) * add parity_check_helper.py * add real example * remove lines * Fix Min/Max CPU kernels for float16 type (#6205) * fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284) fix io binding crash for past_sequence_length=0 * A list of changes in transformers tool (#6224) * longformer fp16 e2e * add fp16/fp32 parity check helper file * excludes nodes with subgraph in profiling * use onnxconverter_common to do fp32->fp16 * add version check for onnxconverter_common * remove helper file * add pkg installation on notebooks and script * Workaround for static_cast<double>(half) * Add workaround to remove ROCm-specific binary-elementwise files. * Update nuget build (#6297) 1. Update the ProtoSrc path. The old one is not used anymore. 2. Regenerate OnnxMl.cs 3. Delete some unused code in tools/ci_build/build.py 4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build. 5. Fix a typo in the C API pipeline. * Enable ONNX backend test of SequenceProto input/output (#6043) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name * add --sequence_lengths option (#6285) * more dtype for Equal CUDA kernel (#6288) Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Force reinstall onnx python package on Windows (#6309) * update transformers required package versions (#6315) * Remove abs in LpPool (#6303) * Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295) * Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models. * Add longformer to python package (#6314) * add longformer to python package * move test related script and data to a new folder * Avoid false sharing on thread pool data structures (#6298) Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops. Motivation and Context MobileNet on a 212-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread). The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 214-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine. Before change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232 After change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317 * fix opset imports for function body (#6287) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix * Remove false positive prefast warning from threadpool (#6324) * Java: add Semmle to Java publishing pipelines (#6326) Add Semmle to Java API pipeline Add security results publishing and add Java GPU. * Quantization support for split operator with its NHWC support (#6107) * Make split working for quantization. * NHWC transformer support for split operator * Refactor some according to Feedback. Will add test cases soon. * Fix build error on windows. * Add test case for split op on uint8_t support * Add nhwc_transformer_test for split uint8_t support * Some change according to PR feedbacks. * Liqun/enable pipeline parallel test (#6331) enable pipeline parallel test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340) This removes a special case of the cuda EP. * MLAS: add fallback implementation for quantized GEMM (#6335) Add a non-vectorized version of the kernel used for the quantized version of MlasGemm. * Delete float16.py (#6336) No longer needed. Also doesn't pass policheck. * Enable add + softmax fusion for Rocm platform (#6259) * add bias softmax; tests appear to pass * check fusion occurs for rocm as well * check for rocm provider compatible as well * build for cpu scenario as well * try again; broader cope * proper scope on kGpuExecutionProvider * been editing wrong file * remove commented #include lines * try again due to mac os ci error * try again * test fusion both cuda and rocm to avoid mac ci error * add external data support to tensor proto utils (#6257) * update unpack tensor utilities to support loading external data * more updates * fix test * fix nuphar build * minor build fix * add tests * fix Android CI * fix warning * fix DML build failure and some warnings * more updates * more updates * plus few updates * plus some refactoring * changes per review * plus some change * remove temp code * plus updates to safeint usage * build fix * fix for safeint * changed wording. (#6337) * Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321) * remove gemmlowp submodule (#6341) * [NNAPI] Add pow support (#6310) * Add support for running Android emulator from build.py on Windows. (#6317) * fix the pipeline failure (#6346) * Train BERT Using BFloat16 on A100 (#6090) * traing bert using bf16 * Adam support bf16 * bugfix * add fusedmatmul support * fix after merge from master. * bugfix * bugfix after merge from master * fast reduction for bf16. * resolve comments * fix win build * bugfix * change header file. Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Fix DerefNullPtr issues raised by SDLNativeRules. (#6348) * update quantize to support basic optimization and e2e example for image classification (#6313) update the resnet50-v1 to standard one from onnx zoo. add an example for mobilenet run basic optimization before quantization fix a bug in Clip * Enable graph save for orttrainer (#6333) * Enable graph save for orttrainer * Fix CI * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add PREfast to python packaging pipeline (#6343) * Add PREfast to python packaging pipeline * fix longformer benchmark io_binding output_buffers (#6345) * fix longformer benchmark io_binding output_buffers * format * import benchmark_helper from parent directory. * Use readelf for minimal build binary size checks. (#6338) * Use readelf for minimal build binary size checks. The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes. Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored) * Remove unused function * Java: Set C language warnings to W4 and adjust JNI code (#6347) Set /W3 for C language and fix up JNI warnings. * Pipeline Parallel Experimental Python API (#5815) * Add create session to WinML telemetry to track WinML Usage (#6356) * Fix one more SDL warning (#6359) * fix -Wdangling-gsl (#6357) * Add python example of TensorRT INT8 inference on ResNet model (#6255) * add trt int8 example on resnet model * Update e2e_tensorrt_resnet_example.py * remove keras dependency and update class names * move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example * simplify e2e_tensorrt_resnet_example.py * Update preprocessing.py * merge tensorrt_calibrate * Update calibrate.py * Update calibrate.py * generalize calibrate * Update calibrate.py * fix issues * fix formating * remove augment_all * This added telemetry isn't needed (#6363) * Wezuo/memory analysis (#5658) * merged alloc_plan * pass compilation * Start running, incorrect allocation memory info * add in comments * fix a bug of recording pattern too early. * debugging lifetime * fix lifetime * passed mnist * in process of visualization * Add code to generate chrome trace for allocations. * in process of collecting fragmentation * before rebuild * passed mnist * passed bert tiny * fix the inplace reuse * fix the exception of weight in pinned memory * add guards to ensure the tensor is in AllocPlan * add customized profiling * debugging * debugging * fix the reuse of differnt location type * add rank * add the rank * add fragmentation * add time_step_trace * Add summary for each execution step (total bytes, used/free bytes). * add top k * change type of top k parameter * remove prints * change heap to set{ * add the name pattern * add the useage for pattern * add partition * change to static class * add custom group * remove const * update memory_info * in process of adding it as runtime config * change the memory profiling to be an argument * add some comments * add checks to recored meomry_info in traaining session * set the "local rank setting" to correct argument. * addressing comments * format adjustment * formatting * remove alloc_interval * update memory_info.cc to skip session when there is no tensor for a particular memory type * fix memory_info multiple iteration seg-fault * consolidate mainz changes * fixed some minor errors * guard by ORT_MINIMAL_BUILD * add ORT_MEMORY_PROFILE flag * added compiler flag to turn on/off memory profiling related code * clean up the code regarding comments * add comments * revoke the onnx version * clean up the code to match master * clean up the code to match master * clean up the code to match master Co-authored-by: Jesse Benson <benson.jesse@gmail.com> Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com> * Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355) * Add CumSum-14 for Cuda * fix convert_common version retrival (#6382) * Refine auto_pad based pad computation in ConvTranspose (#6305) * Fix SDL warning (#6390) * Add max_norm for gradient clipping. (#6289) * add max_norm as user option for gradient clipping * add adam and lamb test cases for clip norm * add frontend tests * Add the custom op project information (#6334) * Dont use default string marshalling in C# (#6219) * Fix Windows x86 compiler warnings in the optimizers project (#6377) * [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376) * Unblock Android CI code coverage failure (#6393) * fix build on cuda11 (#6394) Co-authored-by: Vincent Wang <weicwang@microsoft.com> * Load the model path correctly (#6369) * Fix some compile warnings (#6316) * OpenVino docker file changes to bypass privileged mode Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device. Motivation and Context This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments. * Megatron checkpointing (#6293) * Add bart fairseq run script * Add frontend change to enable megatron * Initial changes for checkpointing * Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H * Add load_checkpoint changes * Fix CI * Cleanup * Fix CI * review comments * review comments * review comments: * Fix generate_submodule_cgmanifest.py Windows issues. (#6404) * Continue memory planning when unknown shape tensor is encountered. (#6413) * Reintroduce experimental api changes and fix remote build break (#6385) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Add support for custom ops to minimal build. (#6228) * Add support for custom ops to minimal build. Cost is only ~8KB so including in base minimal build. * enable pipeline to run quantization tests (#6416) * enable pipeline to run quantization tests setup test pipeline for quantization * Minor cmake change (#6431) * Liqun/liqun/enable pipeline parallel test2 (#6399) * enable data and pipeline parallism test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Farewell TrainableDropout (#5793) * Deprecate TrainableDropout kernel. * Update bert_toy_postprocessed.onnx to opset 12. * Add more dropout tests. * Fix BiasDropout kernel. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com> * fix null dereference warning (#6437) * Expose graph ModelPath to TensorRT shared library (#6353) * Update graph_viewer.cc * Update tensorrt_execution_provider.cc * Update graph_viewer.h * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update provider_api.h * Update provider_bridge_ort.cc * Update provider_interfaces.h * Update provider_interfaces.h * expose GraphViewer ModelPath API to TRT shared lib * add modelpath to compile * update * add model_path to onnx tensorrt parser * use GenerateMetaDefId to generate unique TRT kernel name * use GenerateMetaDefId to generate unique TRT engine name * fix issue * Update tensorrt_execution_provider.cc * remove GetVecHash * Update tensorrt_execution_provider.h * convert wchar_t to char for tensorrt parser * update tensorrt parser to include latest changes * fix issues * Update tensorrt_execution_provider.cc * merge trt parser latest change * add PROVIDER_DISALLOW_ALL(Path) * add tool for generating test data for longformer (#6415) * only build experimental api in redist (#6465) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * Add an option to save the training graph after optimization (#6410) * expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions` * Share allocator between CUDA EP & TRT EP. (#6332) * Share allocator between CUDA EP & TRT EP. limitation: 1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it 2. Need to have more identifiers to make it able to share CPU allocator across all EPs * fix max norm clipping test in python packaging pipeline test (#6468) * fix python packaging pipeline * make clip norm test compatabile with both V100 and M60 GPUs * Initial version of CoreML EP (#6392) * Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460) * update load library code to have the fullly qualified path * make it work for syswow32 * git Revert "make it work for syswow32" This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3. Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * dequantize 1st input of lstm back if it is quantized (#6444) * [java] Adds support for OrtEnvironment thread pools (#6406) * Updates for Gradle 7. * Adding support for OrtThreadingOptions into the Java API. * Fixing a typo in the JNI code. * Adding a test for the environment's thread pool. * Fix cuda test, add comment to failure. * Updating build.gradle * fix SDL native rule warning #6246 (#6461) * fix SDL rule (#6464) * use tickcount64 (#6447) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Update pypi package metadata (#6354) * Update setup file data * add missing comma * remove python 3.5 * fix typo bracket * Delete nuget extra configs (#6477) * Op kernel type reduction infrastructure. (#6466) Add infrastructure to support type reduction in Op kernel implementations. Update Cast and IsInf CPU kernels to use it. * Fixing a leak in OnnxSequences with String keys or values. (#6473) * Increase the distributes tests pipeline timeout to 120 minutes (#6479) * [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481) * Add macos coreml CI and coreml_flags * Move save debuggubg model to use environment var * Move pipeline off from macos CI template * Fix an issue building using unix make, add parallel to build script * Fixed build break for shared_lib and cmpile warning * Fix a compile warning * test * Revert the accidental push from another branch This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877. * Add ability to track per operator types in reduced build config. (#6428) * Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that. - Add python bindings for ORT format models - Add script to update bindings and help info - Add parsing of ORT format models - Add ability to enable type reduction to config generation - Update build.py to only allow operator/type reduction via config - simpler to require config to be generated first - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled - Add script to create reduced build config - Update CIs * merge e2e with distributed pipeline (#6443) merge e2e with distributed pipeline * Fix test breaks in Windows ingestion pipeline (#6476) * fix various build breaks with Windows build * fix runtime errors loading libraries from system32 * add build_inbox check to winml_test_common * use raw string * cleanup * fix dll load Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * Speed up the Mac CI runs (#6483) * expose learningmodelpixelrange property (#5877) * Fix of support api version bug for [de]quantize (#6492) * SDL fixes: add proper casts/format specifiers (#6446) * SDL annotation fixes (#6448) Co-authored-by: Ori Levari <orlevari@microsoft.com> * [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493) * Removed OpenVINO 2020.2 support * Updated documentation and build.py * Removed unnecessary libraries from setup.py * Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325) Support pad operator in quantization tool. Support pad operator in quantized nhwc transformer. Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value. Add more test cases to cover pad() operator bug fixed here. Fix quantization tools uint8/int8 value overflow issue when quantize weights in python. * Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454) Description: This PR makes two changes identified while looking at a PGAN model. First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes. Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer. Motivation and Context Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation. * Update document of transformer optimization (#6487) * nuphar test to avoid test data download to improve passing rate (#6467) nuphar test to avoid test data download to improve passing rate * Fuse cuda conv with activation (#6351) * optimize cuda conv by fused activation * remove needless print out * exclude test from cpu * handle status error from cudnn 8.x * add reference to base class * add hipify * [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498) * Init change * Move some helper from nnapi ep to shared * Add transpose support * Fix trt ci build break * Refine transformers profiler output (#6502) * output nodes in the original order; grouped by node name * add document for profiler * Update to match new test setup. (#6496) * Update to match new test setup. * Add Gemm(7) manually for now. Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model. * Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504) * Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD * enable Equal tests * enable fast_matrix_reduction test case * Optimize GatherGrad for AMD GPU (#6381) * optimize gathergrad * address comments Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * add explicit barriers for buffer overread and overrwrite (#6484) Co-authored-by: Ori Levari <orlevari@microsoft.com> * fix sdl bugs for uninitialized variables and returns (#6450) Co-authored-by: Ori Levari <orlevari@microsoft.com> * handle hr error conditions (#6449) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Dnnl training (#6045) * Add ReluGrad and ConvGrad ops for the dnnl provider * the mnist sample is updated to add the --use_dnnl option that will cause the sample to use the dnnl execution provider for nodes that exist in dnnl provider. * Added the ability to find forward ops. Dnnl backward gradient ops require the forward primitive description and workspace from the forward operation. * Enable specifying the execution provider for Gradient Checker Tests * Prevent memory leak when running dnnl_provider in training mode Prevent creating a SubgraphPrimitivePool when the code is built with the ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly. The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be stashed in a map for reuse. Due to the way the Training Loop uses threads the pool of SubgraphPrimitives were not being reuse instead a new pool of SubgraphPrimitives being created each run. The old pool was not instantly freed. This behavior could be a language error when using thread_local memory. Signed-off-by: George Nash <george.nash@intel.com> * Added fixes to maxpoolgrad and memory leak. Maxpoolgrad will now pass all unit tests. With the conv and convgrad disabled for dnnl, mnist is able to train till 95% Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Fixed misc issues when testing training code with dnnl provider * fix conv_grad dnnl tests with dilation to run dnnl execution provider * update mnist training sample to accept convolution type models convolution models require the input shape to be {1, 28, 28} instead of the flat {728} image that is used for the gemm models this will enable models that require the different shape by adding `--model_type conv` to the command line when running the mnist sample. (while testing a workaround was used see #4762) * Disable weight caching in dnnl conv operator when using training When training we can not use cached weights because the weight will be updated each run. This re-enables dnnl Conv and ConvGrad Ops. The weight caching was the source of the error from Conv when training. * Fix issues found when building grad ops on Linux * The dnnl_convgrad code was over using the scope operator causing a compilation problem. * The dnnl_maxpoolgrad code had a logic error that is was comparing with the source description when it should have been comparing with the destination despription. * Update BUILD.md so it shows DNNL for training * Updated the table of contents. Since the same providers are listed twice. Once for Infrance and again for Training an HTML anchor was added to distinguish the second header from the first for the TOC. * Fix build failure when not using --enable-training build option * reorganize the gradient operators so they are grouped together * Fix issues found when running onnx_backend_test_series.py * Pooling code only supports 2 outputs when built with --enable-training * Address code review feedback * class member variables end in underscore_ * use dst instead of dist to match pattern use elsewhere in DNNL code. * Remove workaround that was introduced to handle problems running convolution based training models. See issue #4762 Signed-off-by: George Nash <george.nash@intel.com> * Isolate training code and code cleanup * Do not build if dnnl_gpu_runtime if enable_training is set training code does not support dnnl_gpu_runtime yet. * Isolated Training code inside ifdefs so that they wont affect project if built without training enabled * Inadvertant changes in whitespace were removed to make code review simpler * Undid some code reordering that was not needed * comments added to closing #endif statments to simplify reading complex ifdefs * Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw pointer. This matches what was done in Pool code and is safer memory code. Signed-off-by: George Nash <george.nash@intel.com> * Address code review issues - whitespace changes caused by running clang-format on the code - Several spelling errors fixed - Removed/changed some ifdefs to improve readability - other misc. changes in responce to code review. Signed-off-by: George Nash <george.nash@intel.com> * Code changes to address code review - Simplify iteration code using `auto` keyword - remove C style cast that was not needed - remove instance variable that was not needed [relugrad.h] - added the execution providers to `ComputeGradientErrorInternal()` and `ComputeTheoreticalJacobianTranspose()` instead of using a pointer to an instance varaible [gradient_checker.h/.cc] Signed-off-by: George Nash <george.nash@intel.com> * Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function. This will reduce repeated code. Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created. This will prevent memory leak from convolution kernels being pushed constantly onto the stack. Signed-off-by: chethan.palangotu.keshava@intel.com * Code clean up and formating updates - Removed empty else statment - updated indentation of code that was causing double curly brackets to look unususal - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code. - isolated training code Signed-off-by: George Nash <george.nash@intel.com> * Restore inadvertantly removed ConvGrad tests When combining the DNNL and CPU version of the ConvGrad tests two test were inadvertantly excluded. This adds back the Conv3d and Conv3d with strides test cases. Signed-off-by: George Nash <george.nash@intel.com> * Add validation to ConvGrad This validates the dimensions of the ConvGrad match the passed in Convolution forward primitive description. The current code for DNNL ConvGrad makes the assumption that the ConvGrad nodes will be visited in the reverse order from the corresponding Conv nodes The added validation will return an error if this assumption is not true. Signed-off-by: George Nash <george.nash@intel.com> * Do not create new execution providers in provider_test_utils This removes the code that generated new execution providers in the OpTester::Run function. This was added because the std::move was leaving the `entry` value empty so subsequent calls would cause a segfault. Problem is this potentially changed the execution_provider because it would create the default provider dropping any custom arguments. When the now removed code was originally added the std::move was causing crashes when the GradientChecker unit tests were run. However, it is no longer causing problems even with the code removed. Signed-off-by: George Nash <george.nash@intel.com> * Change the forward conv stack to a forward conv map This changes how the forward conv kernel is mapped to the bwd ConvGrad kernel the problematic stack is no longer used. The convolution stack made the assumption that the corresponding ConvGrad operator would be visited in reverse order of the forward Conv operators. This was always problematic and was unlikely to work for inception models. Important changes: - The weight_name is added to the ConvGrad dnnl_node making it possible to use the weight_name as a lookup key to find the Conv forward Kernel - the `std::vector fwd_conv_stack_` has been replaced with a `std::map fwd_conv_kernel_map_` - Although it is not needed lock_guards were added when writing to and reading from the fwd_conv_kernel_map_ as well as the fwd_kernel_map_. These should always be accessed by a single thread when preparing the dnnl subgraphs so the guard should not be needed but its added just in case. - Updated the comments ConvGrad.h code to no longer mention the stack. The error check is not removed. It will be good to verify there are no errors as we continue to test against more models. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com> * Lochi/refactor yolov3 quantization (#6290) * Refactor the code and move data reader, preprocessing, evaluation to E2E_example_mode * Refactor the code. Move data reader, preprocessing, evaluation to model specific example under E2E_example_mode * refactor code * Move yolov3 example to specific folder and add additional pre/post processing * Print a warning message for using newer c_api header on old binary (#6507) * Fix issues with ArmNN build setup (#6495) * ArmNN build fixes * Update BUILD.md to document that the ACL paths must be specified to build ArmNN * Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that. * Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518) * Update onnxruntime_test_python.py to work with numpy 1.20. Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that. * Fix another test script * Fix ORTModule branch for orttraining-* pipelines * Update pytorch nightly version dependency Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: George Nash <george.nash@intel.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Derek Murray <Derek.Murray@microsoft.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com> Co-authored-by: Juliana Franco <jufranc@microsoft.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Tixxx <tix@microsoft.com> Co-authored-by: Jay Rodge <jayrodge@live.com> Co-authored-by: Du Li <duli1@microsoft.com> Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: baijumeswani <bmeswani@microsoft.com> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com> Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com> Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Olivia Jain <oljain@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Ryan Lai <rylai@microsoft.com> Co-authored-by: Jesse Benson <jesseb@microsoft.com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin@vols.utk.edu> Co-authored-by: Michael Giba <michaelgiba@gmail.com> Co-authored-by: William Tambellini <wtambellini@sdl.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: liqunfu <liqfu@microsoft.com> Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: Tang, Cheng <souptc@gmail.com> Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: Vincent Wang <wangwchpku@outlook.com> Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: Luyao Ren <375833274@qq.com> Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com> Co-authored-by: Tim Harris <tiharr@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com> Co-authored-by: Jesse Benson <benson.jesse@gmail.com> Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com> Co-authored-by: Martin Man <supermt@gmail.com> Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com> Co-authored-by: Ori Levari <ori.levari@microsoft.com> Co-authored-by: Ori Levari <orlevari@microsoft.com> Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sheil Kumar <smk2007@gmail.com> Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Ryota Tomioka <ryoto@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: Yulong Wang <f.s@qq.com> Co-authored-by: Faith Xu <faxu@microsoft.com> Co-authored-by: Xiang Zhang <xianz@microsoft.com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>	2021-02-02 08:59:56 -08:00
Suffian Khan	76bc0e479c	Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504 ) * Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD * enable Equal tests * enable fast_matrix_reduction test case	2021-01-29 13:12:34 -08:00
Scott McKay	8c6d76a4c0	Update to match new test setup. (#6496 ) * Update to match new test setup. * Add Gemm(7) manually for now. Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.	2021-01-30 06:27:19 +10:00
suryasidd	1a5b75a554	[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493 ) * Removed OpenVINO 2020.2 support * Updated documentation and build.py * Removed unnecessary libraries from setup.py	2021-01-28 23:00:41 -08:00
Guoyu Wang	3f60b27703	Speed up the Mac CI runs (#6483 )	2021-01-28 15:13:44 -08:00
liqunfu	00afd00059	merge e2e with distributed pipeline (#6443 ) merge e2e with distributed pipeline	2021-01-28 14:17:47 -08:00
Scott McKay	c84bb9df9f	Add ability to track per operator types in reduced build config. (#6428 ) * Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that. - Add python bindings for ORT format models - Add script to update bindings and help info - Add parsing of ORT format models - Add ability to enable type reduction to config generation - Update build.py to only allow operator/type reduction via config - simpler to require config to be generated first - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled - Add script to create reduced build config - Update CIs	2021-01-29 07:59:51 +10:00
Guoyu Wang	752627c5bb	[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481 ) * Add macos coreml CI and coreml_flags * Move save debuggubg model to use environment var * Move pipeline off from macos CI template * Fix an issue building using unix make, add parallel to build script * Fixed build break for shared_lib and cmpile warning * Fix a compile warning * test * Revert the accidental push from another branch This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.	2021-01-28 12:25:46 -08:00
baijumeswani	2e228d74d0	Increase the distributes tests pipeline timeout to 120 minutes (#6479 )	2021-01-28 12:04:26 -08:00
liqunfu	6ed12402a4	Liqun/liqun/enable pipeline parallel test2 (#6399 ) * enable data and pipeline parallism test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-25 15:15:26 -08:00
ashbhandare	60c772e2bc	Megatron checkpointing (#6293 ) * Add bart fairseq run script * Add frontend change to enable megatron * Initial changes for checkpointing * Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H * Add load_checkpoint changes * Fix CI * Cleanup * Fix CI * review comments * review comments * review comments:	2021-01-22 11:26:47 -08:00
Guoyu Wang	eb946c4177	Unblock Android CI code coverage failure (#6393 )	2021-01-20 21:26:10 -08:00
pengwa	453431f7bb	Add max_norm for gradient clipping. (#6289 ) * add max_norm as user option for gradient clipping * add adam and lamb test cases for clip norm * add frontend tests	2021-01-21 01:01:11 +08:00
Hariharan Seshadri	d7bdd96425	Refine auto_pad based pad computation in ConvTranspose (#6305 )	2021-01-19 19:01:49 -08:00
Scott McKay	e54e2f969d	Use readelf for minimal build binary size checks. (#6338 ) * Use readelf for minimal build binary size checks. The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes. Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored) * Remove unused function	2021-01-15 07:46:02 +10:00
Changming Sun	ea6789b754	Add PREfast to python packaging pipeline (#6343 ) * Add PREfast to python packaging pipeline	2021-01-14 10:39:24 -08:00
Guoyu Wang	e35db194e3	fix the pipeline failure (#6346 )	2021-01-14 00:33:22 -08:00
Edward Chen	042053c55e	Add support for running Android emulator from build.py on Windows. (#6317 )	2021-01-13 19:21:49 -08:00
baijumeswani	0586c610b2	Add ORTModule BERT classifier to CI the pipeline (#6330 )	2021-01-13 12:34:04 -08:00
baijumeswani	9b7510d88c	Add ORTModule distributed CI pipeline (#6278 ) * Add ortmodule distributed ci pipeline	2021-01-13 11:24:01 -08:00
liqunfu	aeca96caba	Liqun/enable pipeline parallel test (#6331 ) enable pipeline parallel test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-13 10:24:04 -08:00
Dmitri Smirnov	6b73bae035	Java: add Semmle to Java publishing pipelines (#6326 ) Add Semmle to Java API pipeline Add security results publishing and add Java GPU.	2021-01-12 15:12:13 -08:00
Ashwini Khade	0ed56d491a	fix opset imports for function body (#6287 ) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix	2021-01-12 13:44:36 -08:00
Changming Sun	c43ca45c4f	Force reinstall onnx python package on Windows (#6309 )	2021-01-11 22:12:56 -08:00
Chun-Wei Chen	84024bdfa9	Enable ONNX backend test of SequenceProto input/output (#6043 ) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name	2021-01-11 11:30:33 -08:00
Changming Sun	5084ce0969	Update nuget build (#6297 ) 1. Update the ProtoSrc path. The old one is not used anymore. 2. Regenerate OnnxMl.cs 3. Delete some unused code in tools/ci_build/build.py 4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build. 5. Fix a typo in the C API pipeline.	2021-01-11 10:49:05 -08:00
Edward Chen	04287ec770	Increase timeout for Linux GPU CUDA11 build. (#6280 )	2021-01-07 15:44:42 -08:00
baijumeswani	e0f2a12c2c	ortmodule ci pipeline setup (#6251 )	2021-01-05 09:13:19 -08:00
baijumeswani	93bf7c4d52	Documentation for distributed CI tests pipeline (#6140 )	2021-01-04 10:09:39 -08:00
Changming Sun	1685167e46	Update manylinux docker image to the latest (#6242 )	2020-12-31 19:57:04 -08:00
Xavier Dupré	cd14c1af29	Support double for operator ArgMin (#6222 ) * Support double for operator ArgMin * add test specifically for double * add new test on pai-excluded-tests.txt	2020-12-31 11:25:46 +01:00
Xavier Dupré	84addcd2cf	Support double for operator ReduceMean, ReduceLogSumExp (#6217 ) * Support double for operators ReduceMean, ReduceLogSumExp	2020-12-31 11:24:54 +01:00
Changming Sun	3911105f09	Remove python 3.5	2020-12-30 20:16:45 -08:00
Changming Sun	1b23b28706	Remove MKLML/openblas/jemalloc build config (#6212 )	2020-12-30 17:18:19 -08:00
sfatimar	7347996942	Openvino ep 2021.2 (#6196 ) * Enabling fasterrcnn variant and vehicle detector * changes for 2021_2 branch * yolov3_pytorch commit * fixed braces in basic_backend.cc * ci information added * faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2 * some changes to support unit tests * disable some tests which are failing * fix myriad tests for vehicle detector * Did some cleanup cleaned up comments Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0 tests on MYRIAD_FP16 backend due to a bug cleaned up capability_2021_2.cc file Removed extra conditions which were added for some validation in backend_utils Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * yolov3 pytorch workaround to ensure that the output names are matched * gemmoptest fixed on myriad * Fixed MYRIADX CPP Test Failures Expand,GatherND,Range,Round op's are only supported in model where op with float input data types are not supported and fixed Scatter and ScatterElements op's with negative axis are fixed Reshape op with 0 dim value are not supported and fixed Disabled InstanceNorm_2 test on MYRIADX Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> make changes to yolov3 pytorch * Fixed python unit tests Fixed failing python tests on vpu, GPU and CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Fixes POW op failures on GPU_FP16 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Clean up capability_2021_2.cc Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for MultiThreading option Added extra info on setting the num_of_threads option using the API and it's actual usage Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> fixed slice and removed extra prints * Disabled failing python tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes added in capabilty_2021_2 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * made changes to slice to avoid failures * Disabling FP16 support for GPU_FP32 ->Inferencing an FP16 model on GPU_FP32 leads to accuracy mismatches. so, we would rather use GPU_FP16 to infer an FP16 model on GPU Device Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Updated docx for Inferencing a FP16 Model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * fix for mask rcnn * Script for installing openvino from source * Updated with openvino 2021.2 online installation * code comment fixes fixed accuracy mismatch for div * Update OpenvinoEP-ExecutionProvider.md updated for 2021.2 branch * Update README.md updated dockerfile documentation * Update BUILD.md build.md update documentation * permissiong change of install_openvino.sh * made changes to align with microsoft onnxruntime changes * Updated with ov 2021.2.200 Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: mohdansx <mohdx.ansari@intel.com>	2020-12-23 08:47:22 -08:00
satyajandhyala	201d0dbb1a	Android coverage dashboard (#6163 ) * Write the report to a file. * Post code coverage to the Dashboard database.	2020-12-21 10:34:01 -08:00
Edward Chen	cd3a5acca0	Update get_docker_image.py to enable use without image cache container registry. (#6177 ) Update get_docker_image.py to enable use without image cache container registry.	2020-12-18 19:01:02 -08:00
Changming Sun	344a2a8ee5	Revert "work around of the build break in mac (#6069 )" (#6150 ) This reverts commit `3cae28699b`.	2020-12-16 14:41:18 -08:00
Sheil Kumar	a6a23db130	Enable C# .NET5 for WinML (#6120 ) * build for .net5 * only reference cswinrt for .net5 * remove netstandard2.0 references * upgrade language version * net5 * remove extra comment closure * add targetframework * set target framework * remove net* * pep8 errors * make test project build with .net windows SDK projection * disable c# builds for non-x64 builds * fix pep8 errors * disable for store build * fix tests * remove cswinrt and sdk references from package * bump cswinrt down to 1.0.1 * fix bin path Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-12-14 15:05:15 -08:00
liqunfu	cde723a136	Liqun/move nightly pl to linux multi gpu v100 (#6024 ) * move e2e nightly pipeline to azure devop Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-14 12:43:41 -08:00
Vincent Wang	7ddeafdfcc	Add ReduceL2Grad and ClipGrad (#5970 ) * ReduceL2Grad and ClipGrad. * fix win build and amd ci pipeline * resolve comments. Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-12-10 11:03:26 +08:00
Edward Chen	e357486707	Fix build definition template typo, add logging (#6065 ) Fix a typo in tools/ci_build/github/azure-pipelines/templates/get-docker-image-steps.yml. Add logging to tools/ci_build/get_docker_image.py for easier debugging.	2020-12-08 15:16:50 -08:00
satyajandhyala	f68a256140	Android code coverage (#6061 ) * Added Onnxruntime_GCOV_COVERAGE flag for Android. * Set CMAKE_SYSTEM_NAME explicityly for Android. * Added GCOV_PREFIX option to collect code coverage data. Added a new python script to generate code coverage info. Modified build pipeline to geneate Android code coverage info * Added build command line option --android_coverage * Added a comment describing the GCOV environment variables * Fixed PEP8 issues. * Added --android_coverage option to the build command. * Increased Android emulator memory from 3K to 8K. * Increased Android partition-size from 2GB to 4GB to overcome no-space-left-on-device error * Removed source_dir from command line args. * Use cwd absolute path to run tests. * Added commands to output the contents of /data/local/tmp on the emulator. * Added run_adb_shell function. * Format changes. * Removed keywd argument cwd. * Removed Android in the --build_dir path. * Removed commands added for debugging. * Removed exxtra new-lines. * Fix MacOs build pipeline failures by uninstalling openssl before running build script. * Revert "Fix MacOs build pipeline failures by uninstalling openssl before running build script." This reverts commit 90d0568fe533e9456c20d061a2d435c8fea48266. * Change dir to the build directory where the tar file is copied. * Changed the option from --android_coverage to --code_coverage * Moved steps to generate Android code coverage to run_nnap_code_coverage.sh * Require --android option if --code_coverage is specified. * No code coverage needed for onnx_test_runner. * Expect that the emulator is running when the script is executed. * Fixed the title in the buildpipeline step. * Fixed the formatting issue. * Added a command line argument, ORT_ROOT, to run_nnapi_code_coverage.sh script Co-authored-by: Satya Jandhyala <satyajandhyala@Satyas-Mac-mini.local>	2020-12-08 10:55:02 -08:00
Suffian Khan	e35211c0ff	Fix AMD GPU pipeline by adjusting reference /opt/rocm-3.9.0 => /opt/rocm (#6063 ) * use /opt/rocm instead * fix indent	2020-12-08 08:53:20 -08:00
Yufeng Li	3cae28699b	work around of the build break in mac (#6069 ) * Fix the build break in macos release * revert android change	2020-12-07 20:39:36 -08:00
Edward Chen	b348538c8a	Update build docker image cache cleanup (#6048 ) The current image cache cleanup is not removing many images. Upon examining the cache container registry logs, it appears there are some infrequent pulls of old images which may be made by something other than CI builds (perhaps some automated scan of the registry). This change adds a minimum access count for images in the cache so that infrequently but periodically accessed images can be removed. The idea is that images used by CI builds that are worth caching will have a higher volume of accesses.	2020-12-07 13:07:19 -08:00
Changming Sun	925879a8b0	Remove python 3.8 Windows GPU build from python packaging pipeline (#6054 ) Revert the last a few changes to get the pipeline back to a normal state.	2020-12-07 10:23:07 -08:00
Edward Chen	d8139814fd	Clean up builds (#6015 ) Update training Python packaging build to use get_docker_image.py. Remove BUILD_EXTR_PAR docker build argument. Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.	2020-12-04 15:13:17 -08:00
Jesse Benson	98ea7372d3	Re-enable Lamb unit tests for AMD	2020-12-03 13:06:34 -08:00
Edward Chen	6572a4d306	Disable Python 3.9 for training Python packaging build. (#6012 ) Disable Python 3.9 for training Python packaging build. Python 3.9 is not supported by the PyTorch dependency.	2020-12-03 11:42:28 -08:00
Edward Chen	6d642a3dba	Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906 )	2020-12-01 19:10:23 -08:00
Changming Sun	2d9dcc4576	Add python 3.9 support (#5874 ) 1. Add python 3.9 support(except Linux ARM) 2. Add Windows GPU python 3.8 to our packaging pipeline.	2020-11-30 12:02:48 -08:00
Changming Sun	5fdd9f0fd2	Fix Python Linux GPU package name (#5943 ) Fix Python Linux GPU package name. I accidentally added "noopenmp" to it.	2020-11-25 17:46:11 -08:00
Edward Chen	7546d251e0	Expose parameters in clean build Docker image cache build. (#5941 ) Expose some parameters in the clean build Docker image cache build. In particular, whether to do a dry-run and the lifetime of unused cache images.	2020-11-25 14:15:54 -08:00
Ashwini Khade	705d093167	Update onnx (#5720 ) * update onnx * update docker image for testing	2020-11-24 11:20:15 -08:00
Suffian Khan	9b8189dd0a	Rework AMD CI pipeline to use pool AMD-GPU and disable more tests in order to enable it. (#5885 ) Move AMD test pipeline to use self-hosted pool AMD-GPU. For time being, remove failing/flaky unit tests for AMD pipeline.	2020-11-24 09:38:14 -08:00
Guoyu Wang	846c5fb917	Report arm64 minimal baseline binary size only for continuous integration (#5913 ) * Report binary size only for continuous integration	2020-11-24 20:24:08 +10:00
Guoyu Wang	4137c18d9b	Add ORT minimal with NNAPI EP to Android CI (#5890 ) Description: Add ORT minimal with NNAPI EP to Android CI Motivation and Context The added build/test to Android CI will only run UT, additional onnx_test_runner with customer .ort models will be added later	2020-11-23 18:21:34 -08:00
Edward Chen	5e8fcda24a	Build docker image cache fixes. (#5902 ) Fix Python 3.5 compatibility issue in tools/ci_build/get_docker_image.py. Fix line endings in tools/ci_build/github/azure-pipelines/clean-build-docker-image-cache-pipeline.yml.	2020-11-23 14:43:12 -08:00
baijumeswani	208f4c1d3c	Azure ci pipeline for distributed environment tests (#5881 )	2020-11-23 14:01:00 -08:00
satyajandhyala	353e071b7e	Fuzz testing misc (#5862 ) * Run only required steps relevant to fuzz testing. * Exit status non-zero for any uncaught exception other than ort_exception in the driver code Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>	2020-11-23 13:43:44 -08:00
Guoyu Wang	26e6ced172	Temporary fix for Android CI failure (#5889 ) * Unblock the Android CI * Add python to android ci's command	2020-11-21 17:58:32 +10:00
Dmitri Smirnov	ceedf5630b	Document all C# API pubic interfaces (#5853 ) Address documentation shortcomings. Document all required public interfaces. Add pipeline configuration. Make Doxygen lookup a env vars for paths.	2020-11-20 14:03:55 -08:00
Edward Chen	bef06dac93	Automatically clean up build docker image cache. (#5843 ) Follow up to #5811 to automate cleanup of the build docker image cache. Added a script and build definition to clean up docker images that haven't been accessed recently.	2020-11-20 11:56:26 -08:00
S. Manohar Karlapalem	ff58f621fa	Remove nGraph Execution Provider (#5858 ) * Remove nGraph Execution Provider Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice Deprecation Notice \| \| \| \| --- \| --- \| \| Deprecation Begins \| June 1, 2020 \| \| Removal Date \| December 1, 2020 \| Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit. Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware. * Remove nGraph Licence info from ThirdPartyNotices.txt * Use simple Test.Run() for tests without EP exclusions To be consistent with rest of test code. * Remove nGraph EP functions from Java code	2020-11-19 16:47:55 -08:00
Hariharan Seshadri	62508ef0e4	Revert "Remove MKLML build config (#5559 )" (#5855 )	2020-11-19 10:53:08 -08:00
Changming Sun	26db396b4b	Reduce the number of CI build variants (#5856 )	2020-11-18 20:41:30 -08:00
satyajandhyala	b495ae8103	ORT fuzz testing (#5771 ) * Added fuzz testing using ORT model. * The onnxruntime_security_fuzz driver code should accept either ONNX or ORT (based on the file extension) input file if /f flag is provided. * Added ValidateOrtFormatModelDoesNotRunOptimizersInFullBuild test. * Added win-ci-fuzz-testing.yml to run build pipeline. * Prevent out-of-range access in the graph.cpp.	2020-11-18 16:07:36 -08:00
Changming Sun	85f945a875	Regenerate CI build docker images (#5850 )	2020-11-18 14:36:59 -08:00
Edward Chen	71e7c2b423	Cache build docker images in container registry. (#5811 ) This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry. Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources. With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository. The cache container registry will need to be cleaned up periodically. This is not automated yet.	2020-11-17 17:02:24 -08:00
Justin Stoecker	bd236ecc26	Switch to unified DirectML 1.4.0 redistributable (#5794 ) Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.	2020-11-17 13:42:23 -08:00
Scott McKay	c84bc25e28	Add validation of op registrations (#5817 ) * Add validation of operator registrations to the reduction script - the script has all the logic to process the registrations, and there's a CI that uses it Fix some operator registrations * Fix CUDA PRelu registration * Refactor to split out kernel registration file parsing and use in the exclude ops script and an op registration validation script. Run op validation in minimal build CI * Fix PEP8 error and some comments	2020-11-17 10:44:09 -08:00
Sherlock	241b2226a7	Update orttraining-linux-gpu-ci-pipeline.yml for Azure Pipelines (#5826 )	2020-11-17 09:27:59 -08:00
Guoyu Wang	1a66dfc0f9	Enable Squeeze Opset 13 for NNAPI (#5717 ) * Add copy sparse model in minimal CI * Add squeeze 13 support * fix small typo * Add ut for squeeze in NNAPI * Fix some issue in the UT and code * Modify based on the master change * Fix build break	2020-11-17 00:26:06 -08:00
edgchen1	4d517c68a3	Fix reference to old download_e2e_test_data.py script. It was renamed to download_azure_blob.py. (#5790 )	2020-11-12 15:48:06 -08:00
liqunfu	1416d12f0b	Liqun/merge e2e pipelines (#5702 ) * Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-11 09:42:08 -08:00
Changming Sun	79350a642a	Update install_deps.sh: remove the unnecessary data generating step (#5758 ) We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.	2020-11-10 22:19:03 -08:00
Changming Sun	4094a09a56	Merge pull request #5731 from microsoft/snnn/rtti Disable RTTI in Windows GPU CI pipeline	2020-11-10 09:02:59 -08:00
Edward Chen	919c270f3c	Increase build timeouts.	2020-11-09 22:26:27 -08:00
Chi Lo	92292de135	Tensorrt perf tool (#5436 ) * Add YAML file for pipeline * Modify typo * Add working directory * Modify and test * Modfiy and test * Modify and test * Modify and test * Modify * Modify * Modify * Modify * Make sure to copy all the result files * Add clearn up * Modify * Modify agent pool name * Upload only specific artifacts * Modify * Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target * Fix bug * Fix bug * Add reading the information regarding previously known failing models and then skip testing them during benchmark/validation * Modify the script file for CI * Replace print with logger.info * Fix bug * Fix bug * Refine the code * Modify the script so that it can capture script segmentation fault while running ORT * Fix bug * fix bug * fix bug * Add debug info * fix bug * Refine perf code * Refine the code * fix bug * Code refactoring * change many-models path * remove metadata after validation/benchmark are done * Update README.md * Fix bug so that metadata doesn't hold stale value * Remove hardcode and update README * Add arguments to the script to make it run correctly * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Fix bug so that metadata doesn't hold stale value * Fix small bug of finding test dataset directory for FP16 test data, as well as modification of some output information * use -i random for perf test of TRT changes Co-authored-by: Olivia Jain <oljain@microsoft.com>	2020-11-06 12:27:42 -08:00
RandySheriffH	71f90e08f1	Nuget packaging no omp (#5666 ) * create new nuget packaging pipeline without openmp * rename package * update image name * rename package name * rename managed package * reset project attribute * merge master * set package name * set NoOpenMP as cpu build * shorten line length Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-11-06 11:43:35 -08:00
Tiago Koji Castro Shibata	9e68e98423	Add static CRT DLLs to Nuget package (#5661 ) * Add static runtime yaml option * Add to WAI Nuget build matrix * Support empty build flags * Add DML to x64 * Bundle static rt * Bundle after Nugets are built * Fix typo * Skip static tests * Pack test artifact only in x64 dynamic * No DML static runtime * Add Store static * Revert "Add Store static" This reverts commit `69133e5838`. * Static subfolder	2020-11-05 09:26:17 -08:00
Changming Sun	357a51c75c	Update python packaging pipeline's docker image (#5680 )	2020-11-03 12:01:36 -08:00
Ashwini Khade	1cca903680	update onnx commit id (#5594 ) * update onnx commit id * update onnx commit for docker images * update docker images	2020-11-02 09:46:36 -08:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Changming Sun	e6956be40c	Publish no-openmp python packages to test pypi (#5610 ) Publish no-openmp python packages to test pypi	2020-10-28 19:49:53 -07:00
liqunfu	92662659ba	Liqun/remove number matching (#5606 ) replace number matching with relaxed comparison in frontend tests Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-27 21:27:37 -07:00
Changming Sun	5802fe1699	Remove MKLML build config (#5559 ) Remove MKLML build config	2020-10-21 13:11:25 -07:00
Ashwini Khade	df22611026	Update ONNX commit (#5487 ) * update ONNX * update onnx + register kernels for reduction ops * bug fix kernel reg * update cgmanifests * revert unsqueeze op 13 registration * filter ops which are not implemented yet * filter some tests * update onnx commit to include conv transpose bug fix * update docker images * undo not required test changes * fix test failures	2020-10-21 07:22:20 -07:00
Guoyu Wang	915d475353	Android CI update (#5474 ) * Update Android CI * update comments	2020-10-14 16:56:50 -07:00
sfatimar	6d2a30eae3	[OPENVINO-EP] 2021.1 Release (#5431 ) * Cmake changes for 2021.1 * added new ov version 2020.1 for faster rcnn * Added missing defs * equal op modified * changes to incoroporate faster rcnn * backend util.cc * hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp * changing myriad precision bool to i32 * gather is not enabled for gpu * conv2D and pooltest auto_pad attribute should not be null * negative indices are not valid for scatter op in myriad * non max suppression op only supported in faster rcnn mode * maxpool indices output is not supported * Cleaned redundant code in backends * Added ifdefs for HDDL config * cast output dimensions check topk operator k input it seems only resolved for myriad as it is throwing issues for ask rcnn . need to verify * we are limiting the subgraph size to 3 here * taking care of review comments * Fixed minor bugs * Modified Slice op checks * Added NonZero, Upsample * Removed TopK if it's in the middle of a subgraph * incorporated upsample conditions too * Dockerfile changes for 2021.1 release * dockerfile aptkey update * Minor fixes * ceil condition added again * Fixed few gpu models * Disabled LSTM and yolov3 in ModelTests * python softmax cross entropy tests and negative log likelihood * Update Build.md Updated for openvino 2021.1 * Update OpenVINO-ExecutionProvider.md update openvino execution provider for 2021.1 * Update READMe.md updated new openvino version * Update Dockerfile.openvino added environment variable for DEBIAN Frontend * Fixed myriad models * Fixed gather condition * Fixed mask rcnn model on myriad * Modified Gather condition * set default target of MCR dockerfile to MYRIAD_FP16 * Fixed tinyolov3 on CPU * Update OpenVINO-ExecutionProvider.md update openvino execution provider documentation * Update Dockerfile.openvino Removed environment variable * Update OpenVINO-ExecutionProvider.md update image manipulation networks supported * Update onnx_backend_test_series_filters.jsonc removed test_upsample_nearest from cpu test cases * New InternalCI changes for 2021.1 * Full protobuf removed for OpenVINO * Protobuf added * Updated with apt installation for openvino * Revert the testing changes * Reverted testing changes * File permessions are changed to original * Deleted openvino installation and cmake change * Optimized Dockerfile Removed unnecessary cmake installation, numpy * Added missing ifdefs * delete array fix * backend_utils.cc output_shape * Revert "set default target of MCR dockerfile to MYRIAD_FP16" This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d. Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com> Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: Aravind <aravindx.gunda@intel.com> Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>	2020-10-14 15:56:00 -07:00
Pranav Sharma	c2c78399ee	Include config keys header file in the release packages for Linux and Mac. (#5388 )	2020-10-08 15:00:29 -07:00
Changming Sun	09aef240d6	Skip running onnx tests in python mac os pipeline (#5416 )	2020-10-08 11:49:28 -07:00
liqunfu	773992c7d4	Liqun/bert pretrain tb (#5377 ) * add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-06 16:28:31 -07:00
Wenbing Li	4721729fdc	Enable iOS CI pipeline (#5360 ) * add the ios ci build. * no dependency on mac ci pipeline. * fix the command line. * keep sync * automatically retrieve sdpath * fix the case errors and warnings * fix the vlog switch issue. * add parallel flag for build. * update the display name of the pipeline.	2020-10-02 20:14:45 -07:00
Guoyu Wang	9df0790856	Update linux minimal CI to report Android mininal baseline binary size (#5361 ) * Update linux minimal CI to report Android mininal baseline binary size * Fix some issues in the script	2020-10-02 17:35:23 -07:00
edgchen1	d62873a331	Docker image release build updates (#5326 ) - Update docker image release build to use build commit. - Use valid default in component governance detection step. - Use smaller docker build context.	2020-10-01 12:25:31 -07:00
liqunfu	fe50213491	Liqun/bert pretrain2 (#5327 ) * bert single node multi GPU pretrain w/o checkpoint Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-01 11:01:26 -07:00
Changming Sun	17f1178c2e	Downgrade GCC (#5269 ) Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-09-24 21:14:54 -07:00
Dmitri Smirnov	89742411ec	Insert telemetry template into GPU build, add telemry build switches. (#5278 )	2020-09-24 17:13:09 -07:00
edgchen1	6d5b93b805	Synchronize training dependency versions between Docker image and Python wheel. (#5261 ) Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.	2020-09-23 19:03:42 -07:00
suffian khan	417929b049	jobs timeout ..	2020-09-21 21:51:59 -07:00
Xueyun Zhu	55e4b5d302	add pipeline distributed training test (#5222 ) * add pipeline distributed training test * fix max line length error in windows build * function header indent * fix * fix flake8 error	2020-09-21 14:35:01 -07:00
Guoyu Wang	78a29aebbc	[ORT Mobile] ORT Minimal E2E CI (#5200 ) * Modify the ort minimal CI to ort minimal e2e ci	2020-09-19 18:43:22 +10:00
KeDengMS	ce3b67e0cd	[Python] Move symbolic_shape_infer from nuphar to tools (#5162 ) * [Python] Move symbolic shape inference from nuphar to tools * Fix PEP8 ERROR	2020-09-18 09:31:06 -07:00
liqunfu	f37e1292a1	--shm-size=1024m to fix nccl shared memory issue (#5214 ) * --shm-size=256m to fix nccl shared memory issue Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-17 17:21:47 -07:00
Guoyu Wang	8156e0dd10	[ORT Mobile] Some updates to iOS/Android build settings (#5184 ) * Update android CI and build settings * add build_java to arm64 also * Add ios signing param * fix a small build warning * address pr comments	2020-09-17 15:53:14 -07:00
Tiago Koji Castro Shibata	1a2e289d2d	Fix nuget build (#5163 ) * Fix nuget content * Revert "Fix nuget content" This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443. * Nuget packaging * skip tests * msbuild path * Force msbuild version * Workaround https://github.com/NuGet/Home/issues/7621 * cleanup	2020-09-16 10:37:09 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Scott McKay	089789c135	Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168 )	2020-09-15 15:11:06 +10:00
RandySheriffH	1dde215d96	promote cuda version on packacking pipelines (#5154 ) * promote cuda version on packacking pipelines * fix cudnn version in py packaing template Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-14 21:09:09 -07:00
RandySheriffH	9392aa2f64	Promote Cuda version to 10.2 for windows pipelines (#5138 )	2020-09-13 20:32:06 -07:00
Scott McKay	323a1ba8a4	Add option to exclude support for loading ORT format models in full build. (#5129 ) * Add ability to exclude support for loading ORT format models. Disable support for ORT format models in packages	2020-09-12 12:21:30 +10:00
RandySheriffH	120e3cda74	fix path (#5131 ) Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-11 12:18:07 -07:00
Changming Sun	c5efb0085d	Update Linux GPU build pipelines to CUDA 10.2 (#5120 ) * Update Linux GPU build pipelines to CUDA 10.2	2020-09-10 17:40:51 -07:00
Changming Sun	a5530358c9	Fix a path problem in Dockerfile.manylinux2014_cuda10_2 (#5106 )	2020-09-10 10:30:13 -07:00
Tiago Koji Castro Shibata	62848c4de5	Add store builds to nuget packaging (#5040 ) * Nuget store packaging * Move DNNL workaround to EP * Fix warning as error * Disable store tests * Skip store tests * msbuild target * Cross compile protoc in Store * Disable DML in store * Move store builds to CPU queue * Copy uap10 to final nuget * Fix pip8 error * Remove extra dml copies * Fix argparse * pep8 * Forward IsStoreBuild * Apply is_store_build to duplicate generate_nuspec * runtimes * Refactor uap10 * Store .NET * uap * PR feedback	2020-09-09 21:38:14 -07:00
RandySheriffH	5e10cde006	PipelinesForCuda11Cudnn8 (#4938 ) * cancel night build on pyop * setup win cuda11 pipeline * add debug build * test base gpu settings * setup pipelines to test cuda 10.2 and 11 * rename linux docker images * rename docker image tag and add clean up job * fix typo in cuda 11 config * set cuda11 env * update linux cuda 11 pipeline * reset docker image name * disable uninitialized warning from linux build * change the way to silence uninitialized warning * add flags to linux gpu pipeline * switch docker image for linux cuda 10.2 * switch linuc cuda 10.2 image * test cuda11 with devtool8 * try latest built images Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-09 16:13:58 -07:00
Changming Sun	924ecb0623	Use manylinux2014 for Linux CPU build (#5091 )	2020-09-09 10:09:52 -07:00
gwang-msft	a1a81470e3	Add minimal build binary size verification (arm64) to Android CI (#5087 ) * Add minimal build binary size verification (arm64) to Android CI * Add comments in the CI ymal	2020-09-09 19:06:20 +10:00
gwang-msft	a40d34386a	Add Linux CPU CI for ORT minimal build (#5074 ) * initial test version * update yml * minor updates * minor updates * Test minimal build * update with include ops for minimal build ut only * error case to see build failure * test no_exceptio * Remove error cases * address pr comments Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-09-08 17:09:33 -07:00
Changming Sun	370d194db7	Add a docker file for CI build CUDA 10.2 (#5065 )	2020-09-04 16:28:45 -07:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Changming Sun	d5d5e37e76	Build system enhancements (#5012 ) 1. Add a docker file for CUDA11 2. Support setting CUDA_ARCHITECTURES from command line.	2020-09-02 10:13:26 -07:00
RandySheriffH	14b51d6502	CiPipeline@ReducedOpsBuild (#4917 ) * cancel night build on pyop * setup ci pipeline for build of reduced ops * add back c# test * remove debugging print * add testing model * add more arg in pipeline script * disable pipeline trigger temporarily * fix yaml format * fix yaml format * fix pipeline error * rid c# test * add ops for test cases * add Conv from domain com.microsoft.nchwc * remove --reduce_ops * fix typo * remove --build_java * add test case for excluded op * update doc with --skip_test * formatting code, renaming files and simplify yaml * remove debug build from yaml * remove surplus ops from included_ops.txt * add MinSizeRel build to yaml * rename test cases and models * exclude ir test from minimum build * restrict ir test to be only applied to reduced ops build	2020-08-31 21:21:18 -07:00
Ashwini Khade	8679a7244e	Enable rejecting models based on onnx opset (#4912 ) * enable rejecting models based on onnx opset * enable unreleased opsets in linux and mac CI * test fixes and more updates * enable unreleased opsets in CI builds * enable released opsets in linux cis * try fix windows ci yml * yml fixes * update yml * yml updates post master merge * review comments * bug fix	2020-08-31 13:35:36 -07:00
Hariharan Seshadri	b945225de3	Include DirectML pdb in x86 bin folder (#4953 )	2020-08-28 11:29:26 -07:00
Changming Sun	c37fa7c278	Delete Dockerfile.centos6_gpu (#4851 )	2020-08-28 09:56:52 -07:00
edgchen1	71d8846635	Fix telemetry-steps.yml (#4903 ) Fix bug in telemetry-steps.yml that causes telemetry setup to be disabled even if TELEMETRYGUID is set.	2020-08-24 22:14:40 -07:00
Changming Sun	f34ed3a576	Hot fix for the python packaging pipeline Linux ARM build (#4902 )	2020-08-24 20:14:33 -07:00
Rayan-Krishnan	eb05db5a2a	Fix OptimizerConfig params groups (#4877 ) * Copy samples to build folder and load models from there. Fix CI * This PR also includes a fix to path validation for save_as_onnx API * Add torchtext to CI for GPU training * Remove new frontend tests from CI Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2020-08-22 22:04:17 -07:00
liqunfu	6260d073b3	Glue parallel training (#4550 ) add mpi size, rank python API add single node parallel training example	2020-08-21 21:24:27 -07:00
Yulong Wang	c6119a548c	enable telemetry in node.js binding	2020-08-20 09:47:57 -07:00
suryasidd	3a00b50cf8	[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836 ) * Removed building ngraph from source * Disabled some tests temporarily * Enabled softmax for all dims * Added onnx importer to link libraries * int64 changes * fixed * temp * slice update start and end need to be initializer * Disabled GatherND, ScatterND, ReverseSequence operators * Added supported ops instead of unsupported ops * Set precision only for CPU * Removed some unecessary conditions * Fixed segfault in slice * Softmax restriction removed * changes * Setting precision for all plugins * Changes added to include precision and supported ops for gpu and vpu * branch op support * checking for disabled python test failure * mapped input names and tensors directly rather than copying which was leading to mismatch * last index is not supported mkldnn does not support pow between integers * included the code changes * Rename inner-scoped variable to avoid MSVC warning * applied changed to vadm as well and removed the utility function getinputtensors() completely * OpenVINO multi version support: CMake changes * OpenVINO multi version support: C++ support * removed commented code * Remove redundant code lines * Revert "Rename inner-scoped variable to avoid MSVC warning" This reverts commit 2f650493162675bc6fb70730de9656ec400be332. Merged separately in master. * vadm changes disabled reduction op test * putting test_gather_negative_indices in unsupported list for now * Update MCR Dockerfile with 2020.4 Installs OpenVINO 2020.4 from deb packages via APT tool. * Update build docs with 2020.4 info * Update dockerfile with OV 2020.4 info Instructions for building OpenVINO based docker image no longer require downloading installer package as it is installed by the dockerfile using OpenVINO 2020.4 APT package for Ubuntu 18.04 * Added constant folding bypass logic * Added cout statements for ci * Added NDEBUG flag for debug symbols * Update Ops info in docs * fixes multiple unit tests * mathoptest.ceil disabled for gpu and myriad * activation test temp disabled * Fix models for CPU * Fixed a syntax error * local cmmit * fixing unit tests for myriad * Fixed Variadic Split, Topk issues * fix_model commit * Fix models in myriad * Added ifdefs for OpenVINO 2020.4 * temp * made some changes to not operator * Added unused parameter * relu enabled * Fixed bug in Conv output * Consolidated GPU failing tests into one category * Made it compatible to InternalCI 2020.4 * Made changes for ngraph * Disabled test for mask,fastercnn,tinyyolov3 * Removed proxy for ci * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * run_dockerbuild.sh restored to same version * Updated documentation for 2020.4 * Removed FP32 to FP16 transformation for GPU * Disabled Coreml-FNS-Candy model test * Added FP16 transformations Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com> Co-authored-by: intel <you@example.com> Co-authored-by: gundaarx <aravindx.gunda@intel.com>	2020-08-19 23:18:08 -07:00
Changming Sun	1ba07ccfaf	Codesign validator fixes	2020-08-18 16:20:15 -07:00
Changming Sun	e98697ec28	Fix nuget cpu package pipeline (#4832 )	2020-08-17 17:08:48 -07:00
Ksenija Stanojevic	ea37a4d89b	Add Trilu custom op (#4537 ) Co-authored-by: neginraoof <neginmr@utexas.edu>	2020-08-17 14:42:26 -07:00
Thiago Crepaldi	42408aa3ed	Add new PytTrch front-end (#4815 ) * Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-17 09:45:25 -07:00
Changming Sun	5eec4f66ed	Refactor manylinux docker image and the related pipelines (#4751 ) 1. Publish the image ACR, instead of building it every time for every PR 2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect. 3. Split nuphar and DNNL to separated pipelines. 4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc. 5. Update the manylinux2010_x86_64 image to the latest.	2020-08-17 09:40:31 -07:00
Yulong Wang	aa993e95c9	enable build flag '--use_openmp' on MacOS (#4774 ) * enable build flag '--use_openmp' on MacOS * cmake 3.16.1 to enable find_package(OpenMP) on mac	2020-08-13 15:56:42 -07:00
jingyanwangms	adda8c66d9	Docker image release pipeline (#4682 ) * create orttraining-1p-linux-gpu-ci-pipeline.yml * fix syntax * fix file path * fix template path * publish docker image to test acr * use right task name * change parameter list * use variables * use python.version * remove --enable_onnx_tests due to segfault * add back --enable_onnx_tests * fix docker push command line * change docker login command * login differently * fix docker tag script * create password.txt * add ortrelease docker image * enable test in build.sh * add pipeline parameter * add pipeline parameter * change timeout * change timeout * fix run_dockerbuild.sh * use PR checkin build docker * fix strategy syntax * fix strategy syntax * change dockerfile * change run_dockerbuild.sh * change tag name * build with root user * use build id for docker image tag * remove all user lines * change docker tag * add mpi, mellanox * add missing args * use release dockerfile for ci build * remove install wheel * use release docker image * fix syntax * use different pool * add Dockerfile.training * remove sudo to run on Linux-Multi-GPU-V100 * change docker file path * update dockerfile * use latest dockerfile * change agent pool * remove --preserve-env * add back parameter * Add test_flag * use azuredevops docker * change repository * use cmd for docker login * echo build script * use ortrelrease ACR * change key vault connection * Move --build flag * change build command * add paramter for image tag * clean up for PR * remove unnecessary changes * whitespace changes * whitespace changes * change build flag * change flag name * change flag * use latest dockerfile * enable build tests * build builder stage and run test * Add back python.version * change build directory * always run build entire dockerfile * fix yml syntax * fix syntax * add en-UTF8 locale * rename * remove unused template * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Update orttraining-linux-gpu-docker-release-pipeline.yml for Azure Pipelines * Test commit sha1 in pipeline * fix parameter * update docker file * fix --from=build * remove commented blocks * PR comments * fix syntax * fix syntax * use timestamp as build number * remove latest tag * add build_timestamp variable * remove wrong property * fix docker run command * test build id * Use datestamp build id * change build tags * add no-cache to docker build * rename BUILD_VERSION -> BUILD_CONFIG Co-authored-by: Jingyan Wang <jingywa@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-12 13:29:37 -07:00
Dmitri Smirnov	ac4997665a	Make Java Publishing and Java GPU pipelines to run nightly (#4749 ) Schedule Java daily Bump up iInux GPU build timeout	2020-08-10 17:38:45 -07:00
stevenlix	77c69a0325	Upgrade TensorRT to v7.1.3.4 (#4704 ) * upgrade to TensorRT 7.1.3.4 * Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4 * fix format issue * fix format issue * fix format issue * Update tensorrt_execution_provider.cc * change cmake version to 3.14 * Remove --msvc_toolset 14.16 * change to onnxruntime::make_unique * use onnxruntime::make_unique * disable some tests for TensorRT * disable some tests for TensorRT * Update upsample_op_test.cc * Update tile_op_test.cc * disable some tests for TensorRT * Update constant_of_shape_test.cc * update parser * Update Dockerfile.ubuntu_tensorrt	2020-08-07 17:43:56 -07:00
Sheil Kumar	5c5efa900d	Add .NET Core 3.0 nuget e2e pipeline tests (#4695 ) * bump cswinrt version * add cswinrt * test dotnetcore 3.0 * rename buildpacakge source * set folder path to the package source and not the version * refactor .netframework tests * build .net core anycpu Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-08-05 13:02:24 -07:00
Changming Sun	d0297f8d24	Add 'Install ONNX' step to Windows GPU pipeline (#4696 ) Add 'Install ONNX' step to Windows GPU pipeline Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073	2020-08-03 18:51:24 -07:00
Changming Sun	01ca6392cb	Avoid building ONNX of every history ONNX versions in our CI (#4678 ) 1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail. 2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.	2020-08-03 10:18:10 -07:00
Changming Sun	f9f25c5559	Remove featurizer from CI build (#4661 )	2020-07-30 18:37:55 -07:00
Changming Sun	51332e3c81	Change Linux CI build time out value to 3 hours (#4664 ) Because it often need more than 1 hr 55 minutes, increase the value so that we'll less likely see pipeline failed.	2020-07-30 02:52:05 -07:00
Xiang Zhang	d73e01e5b9	remove ENABLE_TELEMETRY macro (#4633 )	2020-07-27 20:06:11 -07:00
gwang-msft	c2ec3b734b	[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576 ) * remove dependency of external jd-dnnlibrary * remove extra variables not used any more * update /cgmanifest.json	2020-07-22 14:08:12 -07:00
Sheil Kumar	fa6d035090	Create WindowsAI zip files automatically as part of the pipeline (#4584 ) * copy rename nupkg to zip as part of build task * update both symbols and regular package Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-22 10:53:47 -07:00
Changming Sun	c2c4e6760b	Fix code sign validation errors in nuget and nodejs pipeline (#4527 )	2020-07-20 14:18:47 -07:00
Changming Sun	bc1d197ddf	Re-enable dnnl in CI build (#4544 ) * Revert "Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266)" Previously it fails because it used too much memory. Now we only run dnnl EP with opset12 models in unit tests, to reduce peak memory usage.	2020-07-19 23:20:03 -07:00
Yulong Wang	5086e55a35	Fix condition of running tests in win CI (#4459 )	2020-07-16 16:33:30 -07:00
Changming Sun	8ada440961	Move model tests to onnxruntime_test_all (#4521 ) 1. Move model tests to onnxruntime_test_all 2. Publish TestResults of Windows CI build.	2020-07-15 16:46:18 -07:00
edgchen1	34f73fa1aa	Add sudo --preserve-env option to allow environment to go through to docker commands. (#4512 )	2020-07-14 18:12:31 -07:00
liqunfu	f721f5f1cd	Liqun/multiple choice (#4480 ) * multiple choice runner * add docker cleanup task to frontent pipeline	2020-07-14 17:57:58 -07:00
Sheil Kumar	ee5ca27ae2	Split Microsoft.AI.MachineLearning.nupkg in a NuGet package and symbol NuGet package (#4503 ) * add threadpool interface * generate snupkgs * include_pdb check * fix snupkg generation * Add task to merge snupkgs * folder exists * check dir * revert thread pool stuff Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-14 14:52:39 -07:00
gwang-msft	5f8f443ac4	Android CI build, test copy, emulator boot improvement (#4481 ) * Enable onnxruntime_test_all for NNAPI EP * switch to use ninja for ANdroid CI * make android elumator boot faster in android ci * simplify adb push * more style change * more tweaking on android ci * build.py style update	2020-07-13 14:18:34 -07:00
Dmitri Smirnov	35ee00d888	Pin typing version. (#4490 )	2020-07-13 11:48:30 -07:00
Hariharan Seshadri	26ebcfab88	Fix Nuget GPU pipeline (#4462 )	2020-07-10 14:02:28 -07:00
Yulong Wang	bec18eb3f4	[Node.js binding] support CentOS 7 in CI (#4447 )	2020-07-09 00:59:50 -07:00
Negin Raoof	71aec2adcb	Custom op export test template (#4383 ) * Adding pytorch custom op export tests to CI * Test clean build * Fix export for intended failure * update export script * Build onnxruntime	2020-07-08 10:14:56 -07:00
Hariharan Seshadri	6d6b6b54a5	Support binding a graph output to a specific device via the Python binding (#4439 )	2020-07-07 21:09:37 -07:00
Sheil Kumar	fdb4a3a2e8	Add cppwinrt and cswinrt tests in windowsai nuget pipeline (#4381 ) * build e2e cppwinrt tests * add use nuget task * make all referenced to package version prop/target-ified * remove dupe props/targets reference * work around project.assets.json error by deleting it * powershell test invocation * switch to batch script * print debug info * update x86->x64 * stdio.h * pushd/popd * add csharp tests * package.config -> packages.config * typo * x86 -> anycpu * debug is default * add test path * update csproj as well * debug * really replace all package versions * debug output * really use [PackageVersion] * sleep intead of converting async operation to task and waiting * dont close software bitmap * switch to powershell script * remove binding check * continue on failure * continuse on error action * continueOnError and errorActionPreference * tabbing Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-07 09:36:42 -07:00
suffiank	7a05b3ca87	Increase python packaging pipeline timeout (#4412 ) * increase python packaging pipeline from 90 to 110 min * change timeout to Linux GPU and do 120 min to match Win GPU	2020-07-02 15:38:39 -07:00
gwang-msft	0bef9d5114	Fix the broken Android NNAPI CI (#4403 ) * Change NNAPI CI to run on new NNAPI EP * update android ci to mac 10.15 and remove in install cmake * update the android ci to targe android api level 29 * remove unnecessary ndk install git submodule call	2020-07-02 10:22:18 -07:00
Changming Sun	3bb6a865cc	Revert "remove openmp and scipy from build pipelines (#4305 )"	2020-07-02 00:30:02 -07:00
Tiago Koji Castro Shibata	7fea332f93	Support builds without RTTI (#4333 ) * Support builds without RTTI * Disable RTTI in all builds	2020-07-01 13:05:35 -07:00
Dmitri Smirnov	49268c42da	Change the way java home is set on Mac OS for CI and Java publishing pipeline (#4385 ) * Change the way java_home is set on Mac. * Change the way JAVA_HOME is set on Mac OS	2020-07-01 07:37:14 -07:00
Negin Raoof	37cbe8551d	Adding export registration and tests for custom ops (#4248 )	2020-06-25 22:29:02 -07:00
Changming Sun	5db67ec000	Fix python package issue and upgrade the linux image to 2010 (#4342 ) 1. Increase job timeout, while we are investigating why the tests take much longer 2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy) 3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.	2020-06-25 20:22:39 -07:00
Dmitri Smirnov	a08805daf9	Fix a minor typon in POM file name (#4250 ) Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-06-25 11:15:14 -07:00
Changming Sun	deea945f80	Remove openmp and scipy from build pipelines (#4305 ) 1. Remove openmp because the default thread pool is already good enough. 2. Remove scipy from build pipelines because it stops support python 3.5.	2020-06-23 20:18:16 -07:00
edgchen1	4e39fda06a	Fix version of torch and torchvision in install_deps.sh. (#4316 )	2020-06-23 14:55:18 -07:00
edgchen1	737c22a911	Refactor Python packaging builds (#4283 ) Reuse the same template file for all Python packaging builds.	2020-06-22 17:13:22 -07:00
Pranav Sharma	2204d39a06	Add build option to disable traditional ML ops from the binary. (#4272 ) * Add build option to disable traditional ML ops from the binary. * Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources. * Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.	2020-06-20 06:36:06 -07:00
Changming Sun	0349479b19	Fix component governance and codesign validation errors (#4277 ) Adjust the job steps so that these security tasks run before the build directory clean up.	2020-06-18 15:54:18 -07:00
Changming Sun	43deec2174	Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266 )	2020-06-17 16:25:24 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
Hariharan Seshadri	91a41298cc	Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954 )	2020-06-12 19:32:57 -07:00
Changming Sun	6f4320fb85	Fix the python package name issue (#4207 ) Fix the package package name issue. In my last change(#4197) about enabling code sign. I forgot to pass the additional flags to setup.py,	2020-06-12 08:32:59 -07:00
Changming Sun	8f8d899bf2	Enable code sign in c api pipeline and python pipeline	2020-06-10 19:31:22 -07:00
Yulong Wang	73bc6be5d1	build: split nodejs binding build and test to avoid timeout issue (#4188 ) * split nodejs binding build and test * enable nodejs tests	2020-06-10 19:16:32 -07:00
Dmitri Smirnov	af0750ba1b	Java GPu artifact naming (#4179 ) Modify gradle build so artifactID has _gpu for GPU builds. Pass USE_CUDA flag on CUDA build Adjust publishing pipelines to extract POM from a correct path. Co-Authored-By: @Craigacp	2020-06-10 11:15:48 -07:00
Changming Sun	c0bdbc0b39	Enable telemetry for the C API and python pipeline (#4174 )	2020-06-10 00:07:46 -07:00
George Wu	9d65ce53bc	move back to toolset 14.16 to possibly work around nvcc bug (#4180 )	2020-06-09 19:36:30 -07:00
Sheil Kumar	4377ff4a1a	Enable .NET Core 2.0 and .NET Framework 4.6.1 in Microsoft.AI.MachineLearning NuGet package (#4125 ) * add project to download cswinrt and build winrt c# interop dll * Add to nuget package * reverse if check * run generation before core compile * add generated files to compile * update .net package to binplace native libs * add props to .netstandard2.0 folder * auto binplace ml native binaries * force 'Any CPU' platform build * Fix anycpu and platform targets * fix flake errors * fix variable order * fix flake pep8 errors, semicolon Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-06-09 09:08:19 -07:00
Changming Sun	2ab3a19728	Enlarge the read buffer size in C#/Java test code (#4150 ) 1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings. 2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.	2020-06-08 16:13:11 -07:00
Yulong Wang	842be1535d	[Node.js binding] add linux and mac package (#4157 ) * try mac pipeline * fix path separator * copy prebuilds folder * split esrp yaml for win/mac * disable mac signing temporarily * add linux * fix indent * add nodetool in linux * add nodetool in win-ci-2019 * replace linux build by custom docker scripts * use manylinux as node 12.16 not working on centos6 * try ubuntu * loosen timeout for test case - multiple runs calls	2020-06-08 14:12:05 -07:00
liqunfu	ffed43e9b8	handle loss and name marching wrappers (#4066 ) * handle loss and name marching wrappers Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-05 23:34:26 -07:00
Yulong Wang	2aab20b4ea	[Node.js binding] upgrade node-addon-api to 3.0 (#4148 )	2020-06-05 21:24:34 -07:00
Yulong Wang	2e58097f8f	fix build: pipeline Node.js version to 12.16.3 (#4145 )	2020-06-05 17:56:03 -07:00
Yulong Wang	647a886587	[Nodejs binding] create a new pipeline to generate signed binaries (#4104 ) * add yml files * update pipeline * fix yaml syntax * yaml pop BuildCSharp * udpate yaml * do not stage codesign summary	2020-06-02 01:28:05 -07:00
Dmitri Smirnov	afca0d15ee	Create Java publishing pipeline (#3944 ) Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.	2020-06-01 18:18:57 -07:00
Changming Sun	3eaec57c38	Fix the daily pipeline failures (#4084 ) 1. Fix the nuget cpu pipeline and put code coverage pipeline back. 2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now. 3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.	2020-06-01 14:44:49 -07:00
edgchen1	a715d55bcc	Training Python package fixes (#4063 ) - Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds - Fix passing of environment variables to `sudo docker run` in build definitions - Fix setup.py package naming logic	2020-06-01 09:30:56 -07:00
Scott McKay	1d441f89ac	Re-enable PEP8 check in Win CI build (#4075 ) * Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running. Fix build.py to be compliant again. Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output. * Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.	2020-05-30 09:10:05 +10:00
edgchen1	38d76cc904	Clean up training E2E test (#4078 ) Update training E2E build to not go through CTest and call test scripts directly.	2020-05-29 09:20:47 -07:00
liqunfu	6665d5e2bc	Liqun/a transformer example (#3845 ) Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-05-27 15:21:35 -07:00
Yulong Wang	b3ec8035ee	[Node.js binding] add build flag for node.js binding (#3948 )	2020-05-27 13:30:22 -07:00
Wei-Sheng Chin	24eda3df33	Create Utils for Adding Range and Marker (#4013 ) In this PR, we 1. create some APIs for creating NVTX objects 2. apply those APIs in pipeline-related operators and sequential executor. As a result, we can explicitly see how a pipeline schedule is run by GPUs in Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's limited support.	2020-05-24 22:55:24 -07:00
Changming Sun	aafe988a11	Temporarily disable windows static analysis CI job	2020-05-24 16:31:09 -07:00
Ryan Lai	357bffe47c	Fix deprecated CentOS link for Linux CI pipeline (#4000 ) * Fix Linux_CI_GPU_Dev * centos6	2020-05-20 16:14:48 -07:00
Bowen Bao	0a5395bb78	Remove 'model_.' prefix from onnx model initializers in training (#3881 ) * Remove 'model_.' prefix for onnx model initializers in training * fix test case remove redundant device test * rename * Fix state_dict/load_state_dict with frozen_weight * nit * Add monkey patch for pt opset 10 * remove pt patch in CI * nit: newline	2020-05-20 10:06:31 -07:00
Prabhat	08763e80e0	Fix permission denied while creating directory in azure pipelines (#4001 ) * Fix permission denied while creating directory * Run tar with sudo	2020-05-20 09:47:12 -07:00
edgchen1	989fe2498f	Change training perf test build to use "docker" instead of "sudo docker" (#3995 ) Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".	2020-05-19 16:54:35 -07:00
Ryan Lai	354e571277	Miscounted the number of characters in package version of DirectML nuget (#3993 ) Co-authored-by: Ryan Lai <ryalai96@gamil.com>	2020-05-19 15:28:30 -07:00
ytaous	fb4efafc8e	GPT-2 training perf scripts (#3974 ) * gpt2 training perf * gpt2 training perf * debug * debug * debug * fix bug * minor * on comments * dynamic sql * fix build * minor * linked hash * on comments * minor * mem * minor Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-19 10:21:40 -07:00
Changming Sun	2fa2019daf	Run docker commands with sudo (#3979 )	2020-05-18 17:35:09 -07:00
edgchen1	024b92a970	Use path relative to script location to refer to symbolic_opset10.py from install_deps.sh. (#3975 ) Update install_deps.sh to use relative path from script directory to symbolic_opset10.py. This allows install_deps.sh to be called from different working directories.	2020-05-18 13:36:06 -07:00
Adam Pocock	9d2d1eb6f6	[java] Adds a CUDA test (#3956 ) * [java] - adding a cuda enabled test. * Adding --build_java to the windows gpu ci pipeline. * Removing a stray line from the unit tests that always enabled CUDA for Java.	2020-05-18 12:05:51 -07:00
edgchen1	e259a13f8e	Initial training Python packaging pipeline (#3767 ) Add a pipeline to produce training-enabled ORT wheels.	2020-05-18 09:41:00 -07:00
edgchen1	e55f24364a	Disable LTO on Windows training CPU build (#3960 ) Disable LTO on Windows training CPU build. Add a parameter to the win-ci-2019.yml build template for enabling LTO with a default value of true.	2020-05-18 09:24:10 -07:00
Prabhat	4ff73d00b0	Fix python pkg permission issue (#3957 ) * Fix python pkg permission issue * Run chown with sudo * Add workspace clean to arm pipeline * Run docker as current user	2020-05-17 14:06:55 +05:30
Ryan Lai	38467f8c9a	DirectML Nuget package has different time stamp than Native and Managed Nuget (#3950 ) * Fix DirectML nuget creation in Nuget pipeline * DirectML Nuget package has different timestamp * remove accidentally changed file	2020-05-14 18:52:08 -07:00
Scott McKay	5e0928a777	Enable running PEP8 on python scripts using flake8 (#3928 ) * Enable running PEP8 checks via flake8 as part of the build if flake8 is installed. Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason. Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build. Update coding standards doc.	2020-05-15 07:15:06 +10:00
ytaous	93eb9bcfde	Add yaml/perf scripts for new perf test pipeline (#3909 ) * yaml/perf scripts for new pipeline * yaml/perf scripts for new pipeline * remove unused imports * testing some comments change * testing some comments change * testing jdbc * testing jdbc * testing jdbc * exclude pwd from jdbc properties * exclude pwd from jdbc properties * namedtuple * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-13 14:15:17 -07:00
Prabhat	25257a661d	Added onnxruntime aarch64 wheel to pypi publishing pipeline (#3903 ) * Added onnxruntime aarch64 wheel to pypi publishing pipeline * Support nightly build flag * Add support for nightly build	2020-05-13 23:20:29 +05:30
liqunfu	9b5daa2039	patch torch onnx opset 10 (#3910 ) patch pytorch to export onnx nll_loss opset version 10. add mnist test to covert onnx opset version 10.	2020-05-12 18:11:25 -07:00
Ori Levari	7b858d60b0	Various changes for automated downlevel test pipeline (#3901 ) Co-authored-by: Ori Levari <orlevari@microsoft.com>	2020-05-12 17:22:47 -07:00
Prabhat	ce3678ffaf	Added aarch64 build pipeline (#3841 ) * Added aarch64 build pipeline * Fix build error * Remove auditwheel repair which doesn't work with cross compiling * Statically link C++ * Added auditwheel repair back and fix stdlib.h * Remove extra space	2020-05-11 22:56:16 +05:30
Ryan Lai	7fd2c8f9e8	Add signed GPU nuget package to publish ort-nightly nuget feed (#3834 ) * Add signed nuget package to publish ort-nightly nuget feed * Push managed nuget as well * Indentation fix * Indentation fix * Update gpu.yml to also publish directml nuget * Fix typo in naming of task	2020-05-10 16:24:45 -07:00
M. Zeeshan Siddiqui	5e1244eb4d	Update ONNX submodule to ONNX 1.7 release branch. (#3888 ) * Update to ONNX submodule to ONNX 1.7 release branch. * Update to ONNX submodule to ONNX 1.7 release branch. * fix version.	2020-05-10 15:44:44 -07:00
Pranav Sharma	22a711457f	Fix C# log APIs. Also fixes github issue #3409 . (#3840 ) * Fix C# log APIs. Fixes github issue #3409. * Fix build error due to accidental duplication of GraphOptimizationLevel * Fix runoptions * Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.	2020-05-08 14:31:06 -07:00
stevenlix	4ea10c9202	bump up ORT version and extend time limit for windows cpu packaging pipelines (#3852 )	2020-05-07 14:22:20 -07:00
M. Zeeshan Siddiqui	9b02b3df6f	Update ONNX submodule to ONNX 1.7 release candidate 3. (#3838 )	2020-05-06 00:55:19 -07:00
M. Zeeshan Siddiqui	ef4d73e887	Update ONNX submodule to ONNX 1.7 release candidate 2. (#3818 ) * Update ONNX submodule to ONNX 1.7 release candidate 2. * fix build error. * Update ONNX submodule to latest and disable preview op tests.	2020-05-05 15:08:40 -07:00
Changming Sun	c11fbf68e4	Publish gpu package to nuget feed (#3816 )	2020-05-04 21:49:19 -07:00
Changming Sun	2684d47fc5	Disable data downloading in linux-nocontribops-ci-pipeline (#3803 ) * Disable data downloading in linux-nocontribops-ci-pipeline * update * update	2020-05-02 12:59:24 -07:00
Sheil Kumar	37b60251ca	test packaging (#3756 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-05-02 12:23:33 -07:00
Changming Sun	ee8900e21a	Update centos-ci-pipeline.yml (#3800 ) * Update centos-ci-pipeline.yml	2020-05-02 11:04:23 -07:00
edgchen1	440f361363	Remove orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3788 )	2020-05-02 00:35:08 -07:00
M. Zeeshan Siddiqui	517bff9675	Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782 ) * Function expansion support, Update ONNX to 1.7 release candidate 1. * Renable disabled tests.	2020-05-01 10:35:16 -07:00
George Wu	dcb1a21552	fix python package linux gpu failure (#3786 ) * pin base image for manylinux2010_gpu * pin base image for Dockerfile.manylinux2010	2020-05-01 17:04:59 +08:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
Scott McKay	9f72752397	Fix 'Install ONNX' CI failure (#3761 ) * Disable flaky test temporarily * turn off pip upgrade warning Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com> Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>	2020-04-30 18:18:58 +10:00
Changming Sun	7ff06056bd	Fix the test coverage pipeline (#3710 )	2020-04-28 21:21:19 -07:00
Sheil Kumar	f1a948fd62	Enable telemetry on windows zip packages (#3738 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-04-28 14:07:11 -07:00
Ori Levari	78fde2c4cb	add downlevel test artifact to windowsai-nuget build (#3711 )	2020-04-28 10:05:32 -07:00
Changming Sun	805ffc01e5	Temp remove --enable_wcos --use_winml from CI build (#3707 ) The flags "--enable_wcos --use_winml" don't work with the latest VC++ and CMake. I don't know which caused the failure. But it doesn't work. Remove it to make the pipelines work first. Will add them back before 1.3 release.	2020-04-26 16:10:25 -07:00
edgchen1	e22d97ba56	Merge pull request #3643 from microsoft/ort_training_for_merge_to_master Introduce ORT training implementation	2020-04-25 07:15:22 -07:00
Sheil Kumar	a475f2824d	Create the Nuget WindowsAI Pipeline (#3684 ) * add windowsai.yml for new Microsoft.AI.MachineLearning nuget * temporarily add windowsai.yml to gpu.yml * pass in build arch * remove install onnx task * no dml for arm or arm64 * refactor nuget pipeline defs * update package creation * pass in build and sources path * missing hyphens * copy license file * fix parameter variable * disable arm builds for now * remove commented script block * download pipeline atifcat name update * set working dir * Add bundling nuget script * path combine * null path * combine needs parentheses * binplace microsoft.* dlls in new nuget package * update artifact name * move merged nuget to artifacts directory * move to merged subfolder in artifacts staging dir * forward slash to back * enable arm * vcvarsall needs x64 vars setup * Run Tests * fix tests * move global variables * update yml to not have global variable in template * removed parameters * fixes * Add build arch as an env variable * ne not neq * %Var% for batch script * dont pass argument for x64 * disable arm tests * skip csharp/cxx tests for microsoft nuget package * remove test-win as it tests only c# cxx and capi * test build for store apps * dont build for store * tools/nuget/generate_nuspec_for_native_nuget.py * remove args. * add new props and targets for microsoft.ai * make windowsai props/targets static * add dependency * dont ship dot net props * Remove c# fom windowsai nuget * copy license file * native packages must have win10 as the platform, not win * cuda header in wrong if branch * no dml for arm builds * only build dml for x64/ x86 * User/sheilk/props update (#3616) * prelim store work * props * Fix desktop nuget props/targets * clean up targets and make store apps work Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * update windowsai.yml with latest * remove extra dloadhelpers * Add abi headers to abi dir, and reference native includes * update windowsai.yml * minor update * remove parameters * add doesrp param * hard code esrp to true * add directml for x86/x64 * revert gpu yml changes * add store builds * add store builds * add checks again in old way * dup job names for store and desktop builds * move all of the runtime binaries to win10 folder * only set safeseh on x86 * disable the store builds for now... missing msvcprt.lib * copy paste deletion... * switch back to win- (#3646) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * use stahlworks * & not supported in ado * add cuda to cpu nuget(???) and EnableDelayedExpansion to enable x86 dml package * revert nocontribops * add underscore... * extra win/win10 change * merged nuget... still not being bundled... * files in merged directory * missing parens causing dml to be included in cpu package * more diagnostic info * switch dir to get-childitem * wait for compression to complete * add winml_adapter to mkml and gpu packages * enable_wcos * add mklml binaries * props and targets missing from mklml Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-04-24 20:20:04 -07:00
Ethan Tao	e9f1e7e797	resolve conflicts	2020-04-24 15:15:36 -07:00
S. Manohar Karlapalem	6d4f2f5bf9	OpenVINO EP v2.0 (#3585 ) * Added FP16 transformations * Revert "Added CMAKE_BUILD_TYPE to make building dynamic" This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85. * Added FP16 transformations for FP16 builds * Backend logic cleanup Cleans the backend(intel_graph.) code in the following ways:- 1. Minimize global usage: Since all the IR graphs need to be re-generated on every Infer, it is bad practice to rely on globals for their saving and usage as there would be multiple readers and writers to the same global variable leading to incorrect usages or contentions. This change replaces globals with locals where possible. This change also fixes an existing bug with due to incorrect global usage. 2. Remove all unused functions. 3. Remove all unused headers and prepocessor directives. removed commented out code * Disabled default optimization for Intel EP Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Fix missed plugins.xml for python bindings * Fixed the build after latest master changes Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Disabled unsupported ops for accelerators Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Added some more disabled ops Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Added environment variable to enable debugging Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Added more debug statements Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Fixed unsupported ops list for GPU and VPU Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Fixed unsqueeze unit tests Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Added error message to the status Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Overwrite Model proto with shape info from data Overwrites the shape info of Model proto with the shape from actual input data. Needed for inferring models with Dynamic shapes. * Removed print statement and disabled where op Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com> * Disabled Reshape with Empty initializer * Added more debug statements for 1P * Don't allow 1D inputs with symbol for dimension * Disabled some 3rd phase ops * Disabled split and added zero dimension check for OutputDefs * Cleanup zero dimensionality check * Added different data type check for inputs and initializers * Added conditions for Mod, Cast and Pad * Removed unused variable * Disabled scan and added conditions for squeeze * Added changes for fixing all C++ unit tests * Implements Backend Manager class for caching Backend Manager provides a layer of indirection between EP interface and OV backend that provides caching services for models with symbolic dims in input shapes. * clean up commented blocks * clang-formatting * Read I/O type info from ModleProto Read the tensor element type information from ModelProto object, as FusedNode is no longer available. * code cleanup * clang-formatting * Added print statement for jenkins * Disabled some python tests * Changed the path of convert fp32 to fp16 hpp * Added conditions for BatchNorm in GetCapability * Fixed failed tests * Revert "Added conditions for BatchNorm in GetCapability" This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90. * Added Intel to onnxruntime backends * pick up vars set by OV package setupvars.sh * Added conditions for Identity * remove a few cout prints * Added conditions for GPU_FP32 unit tests * Revert "pick up vars set by OV package setupvars.sh" This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b. * Commented out fatal message for protobuf * Might need to be removed * Add interface class for current backend * moved common logic to base class * simplified cpu backend * Removed unused headers * use vectors to save i/o tensors for windows compatibility * move utils fxns to backend_utils namespace * rename ov_backend to ibackend * Factory pattern for backend creation * rename CPU backend to Basic backend * renamed to vad-M and added to factory list * Added conditions for VPU * Added print statements * Changed the logic for checking for symbolic shapes * Modified logic for zero dimension check * Removed VPU single dimension condition * Removed comments * Modified logic in DimensionCheck method * Remove legacy OpenVINO EP Remove all the legacy code for OpenVINO EP. UEP code will take its place going forward. This change does NOT remove OVEP files in the following areas asa they will be reused by UEP:- 1. Documentation: All .md files 2. Docker releated files 3. Python bindings 4. Java bindings 5. C# bindings 6. ORT Server 7. CI pipeline setup files * Rename Intel EP to OpenVINO EP * Added unique names to the subgraphs * Removed subgraphs with only constant inputs * Modified subgraph partitioning algorithm to remove const input subgraphs * Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc * Tracking output names to fix the output order bug * Changed output names to a unordered map * Modified logic to check for symbolic input shapes * Fixed a bug in Reshape check * Added empty model path to Model constructor * Made necessary changes to cmake to build from the binary package * Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR * Enable dyn device selection with C++ API * Added Round operator to unsupported list * Modified subgraph partition logic for MYRIAD * Removed supported ops from the list * Enable dyn dev selection in Py API's * Add documentation for dynamic device selection * Use MYRIAD \|\| HDDL instead of VPU * Removed temporary cast of Int64 to FP32 * Disabled unit Tests for CPU_FP32 and GPU_FP32 * Removed default "CPU" from unit tests to allow overriding * Removed ops Concat, Squeeze, Unsqueeze from unsupported list * Get the device id from info * Removed overwriting device_id and precision * Enabled ConvTranspose and EyeLike * Reordered unsupported ops in alphabetical order * Fixed syntax error * Fixed syntax error * Code clean-up: Handle exceptions, logs and formatting Code formatted according to ORT coding guidelines. * remove debug print from pybind code * updated docs with ops and models * formatting prints * Added default values for c and j for openvino * Overriding the values set for c and j to be 1 * BACKEND_OPENVINO should be empty if openvino is not in build * Overriding c value with default for perftest * fix VAD-M device string bug * Add IE error details to exceptions * Use IE specific device names in EP * Add VAD-F (FPGA) device support * Removed unecessary libraries from whl package * Code changes for Windows compatibility * Add VAD-F option to python API * [revert before merge] cmake changes for RC * Enable Windows build in CMake * Unset macro OPTIONAL for windows builds inference_engine.hpp's include chain defines a macro 'OPTIONAL' which conflicts with onnx project's headers when using MSVC. So would need to explictly unset it for MSVC. * Use a single copy of plugin/IE::Core Defined as a static member in Backend manager * Remove restriction of single subgraphs for myriad * Passed subgraph name to Backend to enhance log statements * Disabled zero dimension conditions * Disabled concat to remove zero dims * Enabled building ngraph as part of ORT * Removed serializing and added versioning * Fix CPU_FP32 unit tests * Removed unecessary condition * add ngraph.so.0.0 to .whl * Check for zero dimensions only for inputs and outputs * Restrict loading only 10 subgraphs on myriad * Build ngraph.dll within UEP. Doesn't link yet * Rename Linux included libngraph.so to libovep_ngraph.so Renames locally built libngraph.so containing ONNX importer to libovep_ngraph.so in order to avoid linkage conflicts with libngraph.so supplied by OpenVINO binary installer. Applies only for Linux builds. * use output_name cmake properties for lib name * fix .so name format in lib_name.patch * CMake code cleanup * Rename WIN32 included ngraph.dll to ovep_ngraph.dll To avoid conflict with ngraph.dll distributed by openvino. * Added myriad config for networks without 4 dimensions * Loading the 10 max clusters for inference on myriad * Refactor code and add Batching support Encapsulate subgraph settings into context structs. Add batching support for completely supported models. * Disabled some broken tests * use input_indexes to avoid batch-checking initializers * Avoid static initialization order error on WOS * Added candy to broken tests * InternalCI changes for 2020.2 * Updated DLDT instructions * Unsaved changed in install_openvino.sh * Changes after manual check * Remove custom ngraph onnx_import build for WOS ONNX Importer on WOS does not have protobuf issue. * Remove FP32ToFP16 ngraph pass This conversion is performed implicitly within IE. * Surround debug logic by #ifndef NDEBUG * remove invalid TODO comments * removed references to ngrpah-ep * clang-formatting * remove commented code * comment edits * updating copyright year to that of first OpenVINO-EP release * remove redundant log msg * Modified operator and topology support * Update build instructions * doc formatting * Fixed clip unit tests * Revert "Remove FP32ToFP16 ngraph pass" This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1. * Applying FP16 transformation only for GPU FP16 * Fixed GPU FP32 python tests * automatically use full protobuf * disable onnxrt server for now * Disabled upsample * update dockerfile instructions * Removed MO paths and added ngraph path * Remove OVEP from ORT Server docs Will put it back in after validation * Updated path to Ngraph lib * Disabled Resize and some other python tests * Removed unnecesary header files * Use commit SHA to fetch ngraph repo * Avoid un-needed file changes due to version update * Fixed clip tests * Fixed Pow, max and min onnx tests * build.md doc typo * Update cmake patch command for ngraph src * remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY * use spaces instead of tab * remove commented code * Add info about protobuf version * edit debug env var and enable for WIN32 * specify only version tag of 2020.2 for dockerbuilds * remove unnecessary file changes * Pass empty string as default argument to C# tests * Use ${OPENVINO_VERSION} to name openvino install directory in CI builds * Enabled unnecessarily disabled tests * Fixed ngraph protobuf patch * Fixed error in protobuf patch * Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds" This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002. * Remove unsetting OPTIONAL macro This is no longer used in recent ONNX update onnx/onnx@da13be2, so this unset workaround is no longer necessary. * Use a null string default argument for C# API * Set OpenVINO version yml files and pass to CI Docker builds Git Tag info for DLDT as well as install directory are set using this value. This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5. * Documentation: recommendation and instructions for disabling ORT graph optimizations * more doc updates * Reduced the number of models according to CI time constraints Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com> Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com> Co-authored-by: mbencer <mateusz.bencer@intel.com> Co-authored-by: Aravind <aravindx.gunda@intel.com> Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>	2020-04-24 04:06:02 -07:00
Edward Chen	deac467683	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-23 20:50:33 +00:00
David Brownell	3ce31933bb	Wheel file updates for FeaturizerLibrary data (#3640 )	2020-04-23 13:27:22 -07:00
edgchen1	49a1c5e546	Change CentOS build to use agent pool because builds on hosted agents run out of disk space. (#3662 )	2020-04-23 12:19:19 -07:00
Changming Sun	00917917d6	Downgrade numpy requirement to 1.16.6 (#3635 )	2020-04-22 16:11:33 -07:00
Edward Chen	8d09cefafc	Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master	2020-04-22 16:56:15 +00:00
edgchen1	5492d02c4e	Remove Windows CUDA 9 build definition and helper scripts. (#3615 )	2020-04-21 15:22:27 -07:00
Edward Chen	47f1758fdc	Add --skip_onnx_tests to orttraining Windows builds.	2020-04-21 21:50:35 +00:00
Edward Chen	297ab43b0c	Add --enable_onnx_tests to Windows builds to allow set up of test data directory.	2020-04-21 20:34:55 +00:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00
Changming Sun	911d125323	Remove openmp from gpu build	2020-04-20 17:13:54 -07:00
liqunfu	781e1c36be	Add front-end MNIST test (#3231 ) * add frontend minst test * to use torch nightly with torchvision * remove incorrect comment per reviewer's comment * experiment torchvision import failure * experiment install_deps.sh * more experiment install_deps.sh * experiment install_deps.sh with --upgrade * Experiment with install_deps.sh. * Experiment with install_ubuntu.sh. * Use Ubuntu 18.04 and Python 3.6 for CI. * Update cmake version for CI. * Install MPI on Ubuntu 18.04 for CI. * Increase tolerance for MNIST test. * Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa. * Clean-up. * Update ort_trainer.py from ort_training. * Get default Ubuntu Python ver back to 3.5. * Add underscore to opset_version parameter name in ORTTrainer constructor. * Move loss/model wrap before the call for sample output. * Update expected values for MNIST test. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>	2020-04-20 11:19:31 -07:00
Changming Sun	d68245853e	Disable downloading test data on Linux (#3581 )	2020-04-18 15:54:58 -07:00
Hector Li	5acd8dbe7d	remove option --enable_lto (#3515 )	2020-04-17 14:18:56 -07:00
Sheil Kumar	2717c178cc	Fork the WinML APIs into the Microsoft namespace (#3503 ) * Migrate winml to Microsoft Namespace (packaging changes are pending) * add ns_prefix toggle * fix packaging * Users/sheilk/add missing raw header (#3484) * add dualapipartition * wrong variable for repo root Co-authored-by: Sheil Kumar <sheilk@microsoft.com> * remove existence check to force failures * extra paren * dualapipartition needs to be referenced from the source * add microsoft.ai.machinelearning.dll to the output dir * rename the idl file so that assembly info is correctly added into the winmd * fix namespaces * update namespaces * default to microsoft, and add namespace override as build argument * update cmakesetings.json as well * remove from cmakelists.txt Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-04-17 06:18:54 -07:00
Changming Sun	1a222b3f6e	Disable downloading test data on Windows (#3551 ) * Disable downloading test data on Windows	2020-04-16 22:15:20 -07:00
harshitha	80e0c64e2e	merged with master	2020-04-16 17:13:36 +00:00
Changming Sun	7c89f38a34	Fix static analysis warnings found by VC++ (#3530 ) 1. Fix static analysis warnings found by VC++ 2. Add a new pipeline for static analysis 3. Merge all the windows CI build into one single yaml file.(Easier to queue them all). 4. Make DNNL build faster by disabling building the tests and examples. 5. Enable custom op unitest.	2020-04-16 01:46:47 -07:00
David Brownell	72cd61baae	Removed use of parameters in python wheel build scripts (#3524 )	2020-04-15 10:31:14 -07:00
Changming Sun	a2feb29b0d	Fix build break (#3528 ) Ignore some known test failures Install ONNX package before running Windows CI builds	2020-04-14 18:07:56 -07:00
David Brownell	006c5be1b1	Optionally produce a python wheel that includes featurizers (#3491 )	2020-04-14 09:00:13 -07:00
M. Zeeshan Siddiqui	5d99f179b9	Merge pull request #3486 from microsoft/sedymche/merge_master_ort_training Merge from master into ort_training	2020-04-13 10:55:36 -07:00
Sergii Dymchenko	bf3df41424	Put back SubmoduleCheckoutMode parameter into mac-ci.yml.	2020-04-12 21:49:38 -07:00
George Wu	7f6e407e09	fix python packaging manylinux1 build break. (#3482 )	2020-04-11 06:58:22 +08:00
edgchen1	cffdff6702	Publish unit test results from Linux and Mac builds (#3480 ) * Added publish test results step to Linux and Mac builds. * Fix test result file pattern.	2020-04-10 14:51:56 -07:00
liqunfu	e7297e6c9d	create pipeline for ci frontend tests (#3422 ) create pipeline for nightly python front-end e2e tests	2020-04-09 15:31:22 -07:00
Sergii Dymchenko	6ba7c99e50	Merge branch 'master' into ort_training	2020-04-09 12:42:04 -07:00
Changming Sun	33006f48c0	Update onnx submodule to 1.7.0 release candidate (#3405 ) Update onnx submodule to 1.7.0 release candidate. This isn't a release tag, but it will be released soon, in 1-2 weeks.	2020-04-04 16:23:42 -07:00
Changming Sun	a5fea26cb4	Disable model tests for Mac OS X builds	2020-04-02 15:14:32 -07:00
Thiago Crepaldi	759818f2c1	Merge remote-tracking branch 'origin/master' into thiagofc/ort_training_merge_from_master	2020-03-31 10:53:22 -07:00
stevenlix	2332a93db0	Update onnx-tensorrt parser (#3369 ) * sync onnx-tensorrt parser and update TensorRT doc * remove --msvc_toolset 14.16 in tensorrt ci pipeline	2020-03-30 20:31:59 -07:00
Xueyun Zhu	ccc3535e72	resolve conflict	2020-03-20 20:20:35 +00:00
Tiago Koji Castro Shibata	3bdb0b620a	Fix WCOS/Win32 linking bugs (#3126 ) * Fix WCOS/Win32 linking bugs * Remove unused NODEFAULTLIB flags * Avoid plain target_link_libraries signature * Avoid plain target_link_libraries signature * Fix library list escaping * Use library list instead of string * Remove duplicate link to windowsapp.lib * Remove Win32 build workarounds * Specify CMake policies before initializing language * Expose Win32 header definitions during build * Force set API family * Enable Win32 APIs in featurizer * Use MT dynamic CRT * Expose Win32 specific functions * Disable app container globally * Disable default wide functions in featurizers * Add featurizers to test include path * Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428 * Revert pipeline debugging hacks * Skip /FI in CUDA sources * Default to Win32 builds * Enable WCOS when using WinML * Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only	2020-03-19 08:52:40 -07:00
Changming Sun	0fceb33288	Fix onnxruntime server docker file build failure (#3219 ) 1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well. 2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.	2020-03-15 14:46:46 -07:00
Tracy Sharpe	fe0b2b2abd	QLinearConv speed up (#3196 ) For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.	2020-03-13 16:54:55 -07:00
Zeeshan Siddiqui	2cad08bd60	Merged PR 5688: Upgrade ONNX submodule to the latest from github ONNX master. We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec! - Reverse integrate changes from *.in.proto files in github ONNX repo. - Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs - Disable ONNX tests that don't have op implementation for the latest opset.	2020-03-12 16:51:45 -07:00
edgchen1	fa4dd51e3b	Add back orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3182 )	2020-03-11 18:03:58 -07:00
Edward Chen	80dd62a240	Enable CI for training.	2020-03-11 14:41:32 -07:00
Edward Chen	e542cfd0e0	Introduce training changes.	2020-03-11 14:39:03 -07:00
Hariharan Seshadri	3464801c3e	Explicitly specify NugetPackage parameter while validating nuget in some release pipelines (#3139 )	2020-03-10 15:14:09 -07:00
Dmitri Smirnov	f87b6913cd	Add package download step before pushing to feeds (#3162 ) Add package download step before publishing.	2020-03-09 14:32:18 -07:00
Changming Sun	6ed5d7c332	Update post_binary_sizes_to_dashboard.py (#3161 ) Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.	2020-03-09 13:21:58 -07:00
Tiago Koji Castro Shibata	a59243090a	Publish release symbols (#3152 ) * Publish release symbols * Publish symbols if IsReleaseBuild	2020-03-05 22:32:18 -08:00
Dmitri Smirnov	e2894c5ffb	Fix package name overrides (#3150 ) Add env var with the package name.	2020-03-05 17:10:55 -08:00
Dmitri Smirnov	2c446a7f2f	Add push to ORT-NIGHTLY. (#3146 )	2020-03-05 11:38:22 -08:00
Dmitri Smirnov	ef8768a53f	Override native package name. Preserve managed package name the same. (#3133 ) Override native package name. Preserve managed package name the same. Specify pckage name for validation purposes. Fix up validation package name parameter.	2020-03-04 10:12:55 -08:00
Changming Sun	12605f05d1	Fix CUDA PATH (#3131 ) Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess. So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts. (The problem only affect the C# test, not the C/C++ tests that forked from build.py).	2020-03-03 14:34:19 -08:00
smk2007	6cdd2b4934	Enable DML Nuget Package for x64 or x86 architectures (#3120 ) * add dml gpu pipelines * add x86 to the gpu dml dev build pipeline * Enable DML x86 builds * Fix uint64_t -> size_t warning * fix warnings * enable dml on x86 ci builds * operatorHelper 773 error uint32_t vs uint64_t * operatorHelper 773 error uint32_t vs uint64_t * make x86 pipeline use the gpu pool * more warnings * fix x86 directml path * make dml nuget package * disable tf_pnasnet_large * disable zfnet512 * make validation use wildcards * disable x86 dml gpu tests * add args. * update gpu.yml * change nupkg wildcard * add debug statements * package x86 dml nupkg * dont drop managed nuget again from dml pipeline build * Add DML EULA * directml license should be renamed to not clobber the existing license * casing on dml package.... * {} to () * fix license name * disable dml from x86 ci * typo and cr feedback * remove featurizers * ship the dml pdb as well	2020-03-02 20:18:46 -08:00
Dmitri Smirnov	e45326b5df	Create NuGet packaging pipeline for ORT Featurizers (#3125 ) Create a new pipeline to publish ORT with Featurizers Update pipeline for two separate packages. Change package names.	2020-03-02 17:00:56 -08:00
Hariharan Seshadri	86b755774f	Create a separate Nuget hosting just managed assemblies (#3020 ) * Initial commit * More changes * More changes * More changes 3 * More changes 4 * More changes 5 * More changes 5 * More changes 6 * More changes 7 * More changes 8 * Remove C# ifdefs * More changes 10 * More changes 11 * YAML changes for other release pipelines * Add release notes metadata * Props and Targets change * Add CSHarp proj * More changes 12 * More changes * Minor fix * Minor fix * Fix yaml * Some missing logic for winml * Minor update * Fix casing for winmd file * Fix casing * Add targets and props for managed section into native nuget * revert file * a	2020-02-27 18:00:17 -08:00
daquexian	37a905f557	Make Java API available on Android (#3030 )	2020-02-27 08:23:50 -08:00
Changming Sun	d7500b26bd	Remove Publish Build Symbols from pre-checkin CI build (#3088 )	2020-02-25 08:02:36 -08:00
stevenlix	f4a5d17294	Upgrade to CUDA10.2 for TensorRT (#3084 ) * Switch to CUDA10.2 * Update win-gpu-tensorrt-ci-pipeline.yml * Update win-gpu-tensorrt-ci-pipeline.yml * remove dynamic_shape * update onnx-tensorrt submodule * check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests * fix format issue * add shape inference instruction for TensorRT * update according to the reviews * Update win-gpu-tensorrt-ci-pipeline.yml	2020-02-25 05:36:01 -08:00
Hariharan Seshadri	d7f2cdcc7e	Fix target platform of managed OnnxRuntime dll and enable x86 .NET testing (#3056 ) * WIP: Re-enable x86 .NET testing in Release pipelines Enabling x86 testing will make sure that ORT packages doesn’t break x86 projects of customers * Remove setting some env variables * Comment out a test failing on x86 builds * More changes * Minor fix * More changes * More changes * s * s * s * Revert minor change * More changes * More changes * More changes 2 * explicitly set platform target * Delete bin and obj folders * Clean output dirs * Add back TargetFramwork * Disable x86 .net framework tests * Skip x86 tests in MKLML pipeline	2020-02-24 23:02:59 -08:00
Dmitri Smirnov	dae9a31719	Introduce new Featurizers packaging pipeline. (#3068 ) Introduce new Featruizers packaging pipeline.	2020-02-24 13:57:38 -08:00
Changming Sun	61ae134469	Fix binary size report (#3080 )	2020-02-22 21:01:06 -08:00
Changming Sun	fb871978b5	Adjust build flags for the release pipelines (#3066 ) 1. Add LTCG back. It was set to default OFF in my previous PR to speed up Windows build. It is only needed in release pipelines. 2. Remove --use_featurizers from all the packaging pipelines 3. Make sure all the packages have openmp	2020-02-21 16:45:42 -08:00
Changming Sun	179603775f	Use CUDA 10.1 for Linux build (#3057 ) Use CUDA 10.1 for Linux build (Windows change is already in) Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits. libcublas10-10.1.0.105 won't work!!! The cuda docker image by viswamy is already using 10.1, no need to change.	2020-02-21 11:55:32 -08:00

... 7 8 9 10 11 ...

1150 commits