onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-06 00:03:22 +00:00

Author	SHA1	Message	Date
Edward Chen	d761571afc	Deprecate Python global configuration functions [Part 2] (#6171 ) Update Python API to allow more flexibility for setting providers and provider options. The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict). Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order. Convert some usages of the deprecated global configuration functions to use EP-specific options instead. Update some EP-specific option parsing to fail on unknown options. Other clean up.	2021-01-07 10:10:55 -08:00
Tang, Cheng	431604ef89	add bfloat16 to gathergrad type constrains (#6267 ) Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-01-06 15:04:14 -08:00
pengwa	eea3806db1	model parallel refinement (#6244 ) * Megatron Transformation as a seperate step * remove useless header * clang formating * Re-Structure megatron transformer for subsquent changes * fix comments	2021-01-06 10:30:22 +08:00
ashbhandare	493bf931c5	Add the Concat Slice Elimination transform, fix constant_folding transform (#5457 ) * Add concat slice transform + test * Cosmetic improvements in concat slice transform * Remove unrelated file, fix comment, fix constant folding bug * Add test onnx graph * fix windows build * Review comments * review comment Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-04 16:18:33 -08:00
baijumeswani	93bf7c4d52	Documentation for distributed CI tests pipeline (#6140 )	2021-01-04 10:09:39 -08:00
Suffian Khan	46e0e4e69f	Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239 ) * bias gelu grad use exp(...) instead * update cuda to rocm * missing semicolon * comment * remove dockerfile * missing factor of two	2021-01-02 08:54:16 -08:00
Jesse Benson	7ccdfed1a6	Remove most ROCm-specific element-wise code and reuse CUDA element-wise code.	2020-12-27 10:30:29 -08:00
Jesse Benson	52228a703c	Use TArray in AMD element-wise kernels, rather than manually copying memory to device.	2020-12-27 10:30:29 -08:00
Jesse Benson	c562952750	Dockerfile to build onnxruntime with ROCm 4.0	2020-12-22 10:21:12 -08:00
baijumeswani	a8b482681a	Clean up checkpoint tests to use the new checkpoint functions (#6188 ) * add deprecation warning for old checkpoint functions * update all the distributed checkpoint tests to use new checkpoint functions	2020-12-22 09:15:57 -08:00
Weixing Zhang	53307a5f2e	improve perf for softmax (#6128 ) * improve perf for both gathergrad and softmax * revert the change in gathergrad and will be done in another PR. * address comments from code review.	2020-12-21 14:15:54 -08:00
jingyanwangms	f874260b9e	Backend APIs for checkpointing (#5803 ) * Add backend API GetOptimizerState and GetModelState * add GetPartitionInfoMap	2020-12-21 08:21:29 -08:00
Derek Murray	11b0a5401e	Fix typo in BERT pretraining script (#6175 ) A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.	2020-12-18 16:38:14 -08:00
baijumeswani	39aedbc97f	aggregate model states only for the case when mixed precision was true (#6176 )	2020-12-18 14:09:32 -08:00
Sergii Dymchenko	824ef9a1de	Don't try to bind unused inputs in the Training frontend (#6166 )	2020-12-17 21:41:28 -08:00
baijumeswani	adc2071043	save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136 ) * save_checkpoint and load_checkpoint implementations * checkpoint aggregation logic * unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints	2020-12-17 21:01:36 -08:00
Tixxx	32c67c2944	Deprecating Horovod and refactored Adasum computations (#5468 ) deprecated horovod submodule refactored adasum logic to be ort-native added tests for native kernel and e2e tests	2020-12-17 16:21:33 -08:00
Juliana Franco	36c03b32e9	Using a map of of ops to stages as input of partition function. (#5940 ) * New partition algorithm running before AD * Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2 * Removing code for partition of bwd graphs * Remove old code * Adding some verification code * Handle Shared Initializer * Renaming rank with stage * Added first unit test * new test * redundant check * undo change in bert * Moved cut-based partition to testing utils file Co-authored-by: xzhu1900 Co-authored-by: wschin * New conversion function and tests * minor * remove test that is not needed2 * improve GetDeviceAssignment and PR comments * minor changes * PR comments * improving documentation and variable naming * add documentation * Variable naming and docs * more doc improvements * more doc improvements * missing static cast * Fix test file for windows * Fix test file for windows * Fix test file for windows * stage id is not the same as rank id * PR comments * PR comments * More comments * More comments	2020-12-17 09:03:33 -08:00
ashbhandare	82690486c1	Partition initial optimizer state for Zero-1 (#6093 ) * Initial changes * Working changes * Working changes * Cleanup * fix windows CI * Review comments * review comments	2020-12-16 15:27:42 -05:00
Derek Murray	8fd085801a	Add gradient registration for Abs. (#6139 )	2020-12-16 08:32:10 -08:00
George Nash	939cc9b410	Enable running the mnist_training sample without cuda (#6085 ) Signed-off-by: George Nash <george.nash@intel.com>	2020-12-15 17:06:54 -08:00
Edward Chen	64709b1335	Deprecate Python global configuration functions [Part 1] (#5923 ) Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.	2020-12-15 11:32:43 -08:00
Edward Chen	9810b9e02b	Reduce amount of compiled CUDA device code (#6118 ) Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight. Make corresponding changes for ROCM execution provider code. Other minor cleanup.	2020-12-14 15:27:40 -08:00
liqunfu	cde723a136	Liqun/move nightly pl to linux multi gpu v100 (#6024 ) * move e2e nightly pipeline to azure devop Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-14 12:43:41 -08:00
baijumeswani	dd2e5a1a05	state_dict and load_state_dict for ORTTrainer (#6095 ) * add functions state_dict and load_state_dict to ORTTrainer * unit tests for state_dict and load_state_dict for ORTTrainer	2020-12-14 11:55:52 -08:00
Suffian Khan	6cb5d3ac09	Fix multi-tensor LAMB reduction to be deterministic (#6028 ) * define ordering of reduction across blocks * save state * remove debug code * remove debug code * review comments * significant correction for reduction only over blocks on same tensor * addressing ocmments * update rocm/lamb.cc to build as well * remove times 2048size in multitensor test until threshold error in rocm resolved convert tuple => struct as per recomendation * update comment * apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer * remove excess template arguments from rocm lamb.cc launch_multitensor as well * fixes for AMD build * pr comments * run formatter from vscode * formatter on cuda files	2020-12-11 13:13:05 -08:00
Sherlock	a53f4dd379	Introduce VariadicAlias, remove hardcoded alias limits (#6106 ) * Introduce VariadicAlias, remove hardcoded alias limits * Include optional-lite in winml build Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-11 10:47:08 -08:00
Jesse Benson	38c49c2483	Make ROCM and CUDA reduction_all code more similar.	2020-12-11 09:35:07 -08:00
Vincent Wang	7ddeafdfcc	Add ReduceL2Grad and ClipGrad (#5970 ) * ReduceL2Grad and ClipGrad. * fix win build and amd ci pipeline * resolve comments. Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-12-10 11:03:26 +08:00
Sergii Dymchenko	9e26e59a37	Deprecate opsets <12 for training. (#6027 )	2020-12-09 00:15:27 -08:00
Weixing Zhang	d95fc5e849	clean un-used code. (#6059 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-12-08 23:15:30 -08:00
Weixing Zhang	2705115732	add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821 ) * add HSA_NO_SCRATCH_RECLAIM=1 to dockerfile It is to work around an issue in AMD compiler which generates poor GPU ISA when the type of kernel parameter is a structure and “pass-by-value” is used * update BUILD.md * add dockerfile for rocm3.10	2020-12-08 23:14:56 -08:00
ashbhandare	b1a75d0e98	Enable passing initial optimizer state while creating training session (#5869 ) * Support to pass initial optimizer states to optimizer graph builder * Changes for passing init optim state to training session config * Pass optimizer state through cpp and python frontend * Cleanup * Review comments * Fix windows and mac CI * Review comments * review comments * Review comments * Frontend review changes * Fix CI	2020-12-08 21:20:51 -05:00
Sherlock	7a43fa0028	Fix AllReduce kernel for contiguous buffer (#6064 )	2020-12-08 15:55:13 -08:00
baijumeswani	523d187193	save data to and load data from an hdf5 file for checkpointing (#5975 ) * save python dictionary to hdf5 representation and load an hdf5 file into a python dictionary * unit tests for saving data to and loading data from hdf5 file	2020-12-08 11:40:57 -08:00
ashbhandare	7cebf76a46	Improve checkpointing for Zero stage 1 (#5478 ) * Initial running changes * Checkpointing aggregation changes * compare with older version * initial cleanup * Add zero test, minor fix * Fix zero test, transform, formatting * Review comments * add more unit tests * review comments * Try fix CI * Add additional check on just aggregation code * Try fix ckpt gen * Add pregenerated ckpt for CI, enable zero test in e2e * Moving test to nightly, removing ckpt files * Add tests to dist GPU CI * Fix dist test * Review comments * Fix test	2020-12-07 09:16:01 -08:00
Jesse Benson	14f6eb14b1	Use __launch_bounds__ workaround, rather than limiting threads to 256 on AMD.	2020-12-03 13:06:34 -08:00
Jesse Benson	245d43615d	Fix AMD multi-tensor implementation.	2020-12-03 13:06:34 -08:00
Sherlock	c86a1e5c13	Fix Flaky orttraining tests (#5977 ) * Fix Flacky orttraining tests	2020-12-03 10:24:25 -08:00
Alberto Magni	fb310fba0c	Avoid adding non-existent inputs to new Event nodes (#5915 ) During graph resolve non-existent nodes cause shape-inference failures.	2020-12-01 08:21:05 -08:00
Jesse Benson	45966d878a	Code review feedback	2020-11-30 09:24:22 -08:00
Jesse Benson	86e30a2db6	Update CUDA IsAllFinite kernel	2020-11-30 09:24:22 -08:00
Jesse Benson	bd96f60888	Use CUDA's IsAllFinite kernel for ROCm	2020-11-30 09:24:22 -08:00
baijumeswani	69b9368c93	Add unit tests to identify configuration migration scenarios for checkpointing (#5678 )	2020-11-25 09:40:26 -08:00
baijumeswani	208f4c1d3c	Azure ci pipeline for distributed environment tests (#5881 )	2020-11-23 14:01:00 -08:00
Vincent Wang	47185b9513	reducealll2 cpu kernel (#5833 ) Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-11-19 10:20:05 +08:00
Tracy Sharpe	f964bb94ba	Add QLinearConv NHWC transformer (#5824 ) The implementation of QLinearConv internally does a transpose(NHWC)->im2col+GEMM->transpose(NCHW). This adds a graph transformer to change a model to use a com.microsoft.QLinearConv that supports NHWC natively to avoid unnecessary transposes.	2020-11-17 20:51:02 -08:00
Edward Chen	71e7c2b423	Cache build docker images in container registry. (#5811 ) This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry. Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources. With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository. The cache container registry will need to be cleaned up periodically. This is not automated yet.	2020-11-17 17:02:24 -08:00
zhijxu	89e5b3a24f	resolve review comments	2020-11-16 11:23:01 +08:00
zhijxu	89902c2519	fix frontend bug. old ort session may already exists when creating new ort session, this may cause OOM error	2020-11-16 11:23:01 +08:00

1 2 3 4 5 ...

392 commits