onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-29 23:06:41 +00:00

Author	SHA1	Message	Date
baijumeswani	5da4e07daa	Make FusedAdam mathematically equivalent to Transformers AdamW (#9343 )	2021-10-18 16:03:18 -07:00
pengwa	f05c285a58	Exception when duplicated autograd.Function name detected (#9351 ) * Exception when duplicated autograd.Function name detected * reorder a bit for a bittle bit better perf * fix a bug in previous PR :( * correct the error message a bit	2021-10-15 12:23:13 +08:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Abhishek Jindal	23700a15a0	Abjindal/eager windows build (#9326 ) * removing warnings which are causing errors from torch and changing flags for Windows * adding MKL library resolution and comments * cleaning up the code * fixing onnxruntime_python file for windows build * fix the include order to aovid the python_d.lib issue on win debug build * changes for warnings, typos and other comments * merge conflict * adding fix for mkl library error * Revert "adding fix for mkl library error" This reverts commit `73b87c73c2`. * fix for dll path for windows * typo for dll path Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-10-14 12:54:49 -07:00
Xavier Dupré	22e3f8bf54	Refactor TrainingManager.forward (#9354 ) * Refactor TrainingManager.forward	2021-10-14 12:54:31 +02:00
pengwa	5ee47e3ffa	legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184 ) * megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional * add deepspeed zero1 and zero2 - checkoverflow & clip norm * re-structure code and add the copyright * update the document * refine the code after validation	2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan	ba0cca96f0	Hooked up eager logging to ORT default logger. (#9340 ) * Hooked up eager logging to ORT default logger.	2021-10-13 18:10:32 -04:00
Tang, Cheng	f0bc35c4ba	fix a hardcode type (#9337 )	2021-10-12 13:44:46 -07:00
Tang, Cheng	48737091c0	resolve the provider options before create training session in orttrainer (#9199 ) * resolve the provider options before create training session in orttrainer * Update orttraining/orttraining/python/orttraining_pybind_common.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * support clear the training ep instance pool * fix status error Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-12 09:30:45 -07:00
ashbhandare	52c021d1f3	Fix export of aten op for Max and Avg Pool 2D (#9330 )	2021-10-12 09:03:14 -07:00
Edward Chen	79e736ed25	Make onnxruntime::Status nodiscard (#9279 ) Mark onnxruntime::Status class with [[nodiscard]] attribute. Fix existing warnings.	2021-10-08 17:10:31 -07:00
satyajandhyala	29379db432	Added SigmoidGrad schema and kernels. (#9244 ) * Added SigmoidGrad schema and kernels. * Added test_sigmoid_grad function.	2021-10-08 11:03:28 -07:00
Tang, Cheng	68601fc296	error handling ffor eager mode's data transfer (#9261 )	2021-10-07 17:16:33 -07:00
ytaous	7166586d7e	Enable SkipCheck by default (#9215 ) * Enable SkipCheck by default * fix UTs * fix UT * fix UTs * fix UTs * address comments * fix UT * enable skipchecks * move _SkipCheck back * move _SkipCheck back * move _SkipCheck back * Update orttraining/orttraining/python/training/ortmodule/_inference_manager.py * Update orttraining/orttraining/python/training/ortmodule/_utils.py Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-07 15:47:14 -07:00
Tang, Cheng	c002dc86a3	set mpi group init flag after add group (#9293 )	2021-10-07 10:09:16 -07:00
Thiago Crepaldi	52d067402a	Fix all-or-nothing fallback for bad ORTModule init (#9277 ) * Fix all-or-nothing fallback for bad ORTModule init * Address comments	2021-10-06 15:12:27 -04:00
baijumeswani	bcdb411c8d	Implement FusedAdam for ORT adapted from DeepSpeed (#9266 )	2021-10-05 20:50:34 -07:00
ashbhandare	35c2102cfa	Fixes for GatherND, Multinomial (#9143 ) * register gathernd kernel, aten multinomial * fix CI, add test * review comments	2021-10-05 14:51:58 -07:00
G. Ramalingam	0b77c9ca7c	Cleanup function definitions of contrib ops (#9265 ) * Simplify function definitions * Simplify fast-gelu function definition * Simplify training function op body definitions Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Eliminate redundant function Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Formatting changes Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Minor formatting changes Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add comment Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Specify int64 type for constant 1 Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>	2021-10-05 11:38:42 -07:00
Thiago Crepaldi	6e2f66ee9c	Allow custom exporter args + bug fix (#9242 )	2021-10-04 11:32:42 -04:00
baijumeswani	45399d5ace	Remove TORCH_WARN to avoid torch string related operations that take up time (#9238 )	2021-10-01 13:56:04 -04:00
Tang, Cheng	be4d887439	Fix ONNX exporter call with latest API for ORTrainer (#9228 ) * update the exporter call with latest api in orttrainer * use official export api instead of the private call	2021-10-01 13:49:55 -04:00
G. Ramalingam	e79be39081	LayerNormGrad function body and LayerNorm inference/body fix (#9160 ) * Add function body for LayerNormGrad * Fix LayerNorm schema for multiple normalization dims	2021-09-30 12:03:08 -07:00
Thiago Crepaldi	ceb51dda4a	Support external torch cpp extensions on ORTModule (#9223 )	2021-09-30 10:37:35 -04:00
satyajandhyala	278928a102	Added a test case for python gradient builder. (#9207 ) * Register Cos operator gradient using ORTModule's register_gradient and compare gradient against PyTorch.	2021-09-29 09:24:12 -07:00
Suffian Khan	6f580f07de	Switch AMD CI pipeline to use environment image from onnxruntimecibuildenvironment (#9206 ) * shift docker image reference for amd ci pipeline * fix service endpoint * reduce perf tolerance	2021-09-28 13:06:16 -07:00
ytaous	d3f859fe30	Dropout Vectorized Kernel (#9157 ) * vectorized kernel * fix build * re-calibrate expected loss * fix build * re-calibrate convergence results * more re-calibrate on loss * divide kernels * adress comments * more calibration * calibration * per comments * enable sync Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-09-27 17:19:12 -07:00
Wei-Sheng Chin	1b0816859f	Only wrap sub-modules which can be wrapped as ORTModule (#9021 )	2021-09-27 17:18:22 -07:00
baijumeswani	c30cc9190a	Change the agent pool for orttraining-distributed pipeline (#9179 )	2021-09-26 21:26:44 -07:00
baijumeswani	fd91bf91c9	Print full stacktrace exception when exporter fails (#9169 )	2021-09-24 10:24:37 -04:00
Vincent Wang	39dc6ea8a3	Fix to_dlpack Failure on PyTorch-1.10 (#9151 ) * workaround to_dlpack fail in new pt version * add torch code link	2021-09-24 09:48:07 +08:00
Thiago Crepaldi	153767bab4	Add internal determinism flag configuration for ORTModule (#9074 )	2021-09-21 15:11:41 -04:00
Ryan Hill	b876e5675b	C API Enum Name Fixes (#9092 )	2021-09-17 15:11:26 -07:00
Ryan Hill	26509465f0	Add default C++ initialization to OrtCUDAProviderOptions (#9064 ) * Add default C++ initialization to OrtCUDAProviderOptions	2021-09-16 15:03:58 -07:00
Suffian Khan	e758870b18	Upgrade ROCm CI pipeline for ROCm 4.3.1 and permit run inside container (#9070 ) * try to run inside 4.3.1 container * no \ in container run command * remove networking options * try with adding video render groups * add job to build docker image * try without 1st stage * change alpha, beta to float * try adding service connection * retain huggingface directory * static video and render gid * use runtime expression for variables * install torch-ort * pin sacrebleu==1.5.1 * update curves for rocm 4.3.1 * try again * disable determinism and only check tail of loss curve and with a much larger threshold of 0.05 * disable RoBERTa due to high run variablity on ROCm 4.3.1 * put reduction unit tests back in	2021-09-15 12:32:02 -07:00
ashbhandare	98ac341c5b	Filter nones from ctx saved tensors (#9063 ) Co-authored-by: Aishwarya Bhandare <aibhanda@5cb7a9c3931a4b19a66ae028b49221a6000001.ahkw4qp232huflxlm4gmpq4nbh.jx.internal.cloudapp.net>	2021-09-15 10:13:45 -07:00
G. Ramalingam	7d28b596f4	Add function-body to opschema of FastGeluGrad (#9028 ) * Add function body to FastGeluGrad * Add test case	2021-09-14 12:27:55 -07:00
Sherlock	9174cbe3d5	Optimize CUDA Kernel for 3D and 4D Transpose (#8928 ) * Optimize Transpose120 and Transpose102 * Generalize Transpose0123 for more input shapes * Add Transpose3D test cases * update rocm kernel	2021-09-13 23:00:53 -07:00
baijumeswani	34f37d2920	Disable fallback for ortmodule api tests (#9018 )	2021-09-13 16:00:13 -07:00
mindest	a1021a1cf4	Add BatchNorm kernel for ROCm (#9014 ) * Add BatchNorm kernel for ROCm, update BN test * correct epsilon_ setting; limit min epsilon	2021-09-13 15:15:05 +08:00
Ryan Hill	c3321b1778	Fix NVTX profiling so it can run in the shared CUDA provider (#9035 ) * Move NVTX profiling so it can run in the shared provider properly	2021-09-11 00:35:54 -07:00
Tang, Cheng	8eb6546e8e	enable eager mode with ortmodule (#8961 ) * initial change for eager/ortmodule integration * pdate to latest pytorch api * add test model;fix torch version issue * fix comments in pr * fix python test break * fix api change * fix comments in PR * pass device into the fw function	2021-09-10 15:09:23 -07:00
satyajandhyala	ce7b12bf5d	Added new fp16 allow/safe opcodes in PropagateCastOps (#8964 ) * Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps. * Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops * Added onnx models for squeeze/unsqueeze tests.	2021-09-10 11:53:26 -07:00
Bowen Bao	31af88c0bc	Update cross_entropy_loss symbolic for new argument from upstream torch (#9007 ) In torch 1.10, `label_smoothing` is added as additional input to `cross_entropy_loss`. Update the symbolic function to handle this change.	2021-09-10 10:32:59 -07:00
baijumeswani	d78e90d1af	Adding preprocessor checks for torch version during torch cpp extensions compilation (#8989 )	2021-09-09 10:26:38 -07:00
pengwa	d209fe29b9	custom autograd func memory refinement (#8993 ) * Release torch tensor referenced by torch gradient graph (created in PythonOp) * Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/torch_interop_utils/torch_interop_utils.cc * refine with comments Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2021-09-09 18:37:24 +08:00
Ashwini Khade	ec63d10303	add model local function support (#8540 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * draft - enable model local function * enable model local functions in ORT * update to latest rel onnx commit * plus tests * plus more updates * plus updates * test updates * Fix for nested functions + shape inference * plus bug fix and updates per review * plus fixes per review * plus test updates * plus updates per review * plus fixes * fix a test	2021-09-08 11:47:01 -07:00
baijumeswani	0cc2909573	Auto forward non method attribute lookups to the user's model and bind custom methods to ORTModule (#8798 )	2021-09-03 08:25:44 -07:00
Vincent Wang	c343f7cb43	Add Algorithm Search for ConvGrad (#8613 ) * algo search for conv grad * global cache, bigger workspace size * fix build error * refactor * refactor * resolve comments * fix rocm * change lock places * rename variable * remove setting for inference * resolve comments	2021-09-03 11:25:17 +08:00
Gary Miguel	47435311f4	Include pytorch_export_contrib_ops in inference builds (#8878 ) * Include pytorch_export_contrib_ops in inference builds Rename / move it from tools/python/register_custom_ops_pytorch_exporter to onnxruntime/python/tools/pytorch_export_contrib_ops. Rationale for inclusion in inference builds: This code is potentially useful for anyone using ORT, not just training. Rationale for new name: "Contrib op" is the nomenclature used within ORT to refer to the set of ops that are not in the standard op set but are included by default with ORT. This is more specific than "custom op", which is what the PyTorch exporter uses to refer to any non-standard op. Step 1 of addressing #8818. After this is merged I will update the docs. * Enable test_pytorch_export_contrib_ops.py in CI Fixes AB#1342330	2021-09-02 14:26:58 -07:00

1 2 3 4 5 ...

797 commits