onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-26 03:00:54 +00:00

Author	SHA1	Message	Date
Ryan Hill	80cae23393	Merge with master	2021-04-14 19:07:25 -07:00
Jesse Benson	be79575c6a	Use built-in reduce_sum() for simple reduction cases, specifically reduce all to a scalar.	2021-04-14 08:55:35 -07:00
ashbhandare	6ceee5d131	IsInf ReduceSum transform (#7188 ) * IsInf ReduceSum transform * Revert unnecessary changes, add isinf_only and isnan_only attr * add tests, review comments * Disable test for non-cuda * Move IsAllFinite from training to contrib op * review comments * Review comment, formatting * Enable test for ROCm EP	2021-04-13 16:05:21 -07:00
G. Ramalingam	f8a36dd6b3	Add DropoutGrad function body (#7310 ) * Add DropoutGrad function body * Add DropoutGrad function body * Fix documentation and add test cases * Fix template specialization * Check expansion for float16 and bfloat16	2021-04-13 14:31:53 -07:00
harshithapv	a5d3a52d1a	Add Tile grad (#7289 ) * tile grad * fixed bugs * added tile grad test * bug fix * Added tests. Addressed comments * added optimization recommended and addressed comments * fixed comment	2021-04-13 12:54:45 -07:00
Ryan Hill	20644043e5	Fix merge breaks	2021-04-12 17:08:11 -07:00
Ryan Hill	57591f5b27	Merge with master	2021-04-12 16:51:35 -07:00
Ryan Hill	a841d17d06	More build options tested, converted the training ops over.	2021-04-12 14:02:08 -07:00
Weixing Zhang	75c0192e4f	enable more unit tests for ROCM EP (#7307 )	2021-04-09 15:15:13 -07:00
baijumeswani	b221a4fd86	Better error message when ORTModule used with torch.DataParallel (#7287 ) * Better error message when ORTModule used with torch.DataParallel	2021-04-09 10:07:22 -07:00
Weixing Zhang	c22963c23d	Polish Lamb Kernel (#7299 )	2021-04-09 09:55:57 -07:00
Weixing Zhang	8ad5007f8f	Polish Adam kernel (#7294 ) * Polish Adam kernel	2021-04-09 01:11:09 -07:00
Thiago Crepaldi	7b4362c21a	Add support to dynamic positional/keyword input for ORTModule (#7189 )	2021-04-08 12:46:21 -07:00
ytaous	e14b291ce7	Enable symbolic shape inference in ORTModule (#7282 ) Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-08 09:47:09 -07:00
baijumeswani	d272c8434d	Suppress tracer warnings from onnx export in ORTModule (#7221 ) * Suppress tracer warnings from onnx export in ORTModule	2021-04-08 03:41:38 -07:00
Sherlock	aa2c465143	Restrict ConvGrad to __CUDA_ARCH__>=700 (#7278 ) * Restrict ConvGrad to __CUDA_ARCH__>=700 Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-07 20:10:29 -07:00
Vincent Wang	beb299e17d	ConvGrad CUDA Kernel Bugfix (#7273 ) * bugfix * add ut	2021-04-08 08:22:18 +08:00
baijumeswani	844361bc67	Support eval mode and torch.no_grad context in ORTModule and restructure ortmodule.py (#7162 )	2021-04-07 09:29:54 -07:00
Sherlock	4bc17ca04e	CUDA ConvGrad Kernel (#7227 ) * ConvGrad CUDA impl * Set up the test case for Deberta Conv1D * Add fp16 test Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-06 22:09:06 -07:00
Derek Murray	25e261f196	Avoid passing zero bias to Gemm in gradients (#7244 ) * Avoid passing zero bias to Gemm in gradients The bias argument to Gemm is optional and defaults to zero. Therefore we do not need to generate zero initializers and pass them to that argument. * Remove unused declaration.	2021-04-06 16:49:34 -07:00
ashbhandare	2aa89989c4	Not-where fusion (#7182 ) * Not-where fusion * Change to rewrite rule * Add to inference transforms * Support numtiple where consumers * review comments	2021-04-06 16:12:26 -07:00
raviskolli	5d759e182b	Allocate external Rocm allocator via PyBind (#7148 ) * Enabled rocm support for graph transformations * Support for external Hip allocator * Added const_cast to reinterpret_cast to fix compiler issue * Another crack at fixing the compile error * More compilation fixes * Added compilation flags to load_inline extension * Added ROCM, ROCM_PINNED constants * Changes to address PR comments * Changed gpu identifier from ROCM to CUDA * Added HIP compilation flag for torch inline functions * Fixed a typo in header allocator string formatting * Fix for runtime error with external_cuda_allocator * Removed cuda/rocm specific code paths for allocators * More name changes to generic gpu from rocm/cuda * Removed duplicate allocator creation * Rename cuda_external_ config options as gpu_external_ * Rename hip_mem_limit to gpu_mem_limit * Rename cuda_mem_limit to gpu_mem_limit	2021-04-06 15:23:51 -07:00
G. Ramalingam	a9ff4c29e5	Add function body to GeluGrad schema (#7190 ) * Add GeluGrad function definition * complete gelugrad function definition * add opset to function definition	2021-04-06 12:40:59 -07:00
ashari4	56b22c1c6b	Fix assert that the tensor's device type is 'cpu' #7248	2021-04-06 09:08:32 -07:00
Pranav Prakash	3b16afc0db	Make dW optional for convgrad (#7083 )	2021-04-05 17:05:20 -07:00
Suffian Khan	9f14af9809	Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240 ) * restore bs test and add perf test * update perf number and fix path to results	2021-04-05 15:51:52 -07:00
ashbhandare	2b8513539e	Div mul fusion (#7183 ) * Div mul fusion * Change to rewrite rule * Add to inference transformers	2021-04-05 09:35:30 -07:00
Weixing Zhang	74ee24cf7f	rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226 ) With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-05 09:04:04 -07:00
baijumeswani	68b12a6179	Support for saving and loading pytorch compatible state dictionaries (#7220 ) * Override methods on torch.nn.Module to get direct access to the methods on the original module.	2021-04-05 03:40:41 -07:00
Weixing Zhang	59b57d8322	HSA_NO_SCRATCH_RECLAIM and RCCL_ALLTOALL_KERNEL_DISABLE are not needed for ROCm 4.1 (#7224 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-02 18:19:11 -07:00
Weixing Zhang	ef88dc912c	enable more unit tests for ROCM EP (#7222 )	2021-04-02 15:57:08 -07:00
Sherlock	a98c2ebb8c	Enable saving optimized models in OrtModule (#7214 ) * Enable saving optimized models in OrtModule Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-02 12:37:05 -07:00
Weixing Zhang	a3f17c8b0d	update lamb and GatherGrad kernel for ROCm EP (#7184 ) With ROCm4.1, the CUDA implementation of Lamb and GatherGrad can be utilized for ROCm EP.	2021-04-02 09:02:49 -07:00
Edward Chen	0ebeaf529d	Check kernel def hashes (#7120 ) Add unit test for verifying kernel def hashes. Add way to add new types to kernel definition without changing hash.	2021-04-01 17:42:58 -07:00
ashbhandare	15c67ddbf0	Make output 1 of ConcatTraining Optional and place on CPU (#7199 ) * Optional input 1 on CPU ConcatTraining * Rename output_1	2021-04-01 16:05:17 -07:00
Tang, Cheng	07201bac7a	expose session option and provider options (#7112 ) * expose session option and provider options * merge provider_names and provider_options * integrate into orttrainer options * fix doc string * fix a typo * Update orttraining/orttraining/python/training/orttrainer.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update orttraining/orttraining/python/training/orttrainer.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update orttraining/orttraining/python/training/orttrainer_options.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * fix the usage of provider_options * Update orttraining/orttraining/python/training/orttrainer.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update orttraining/orttraining/python/training/orttrainer.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * update expected result in tests * fix default provider options * minor update to trigger rebuild * minor update to trigger rebuild Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-03-30 09:49:45 -07:00
Scott McKay	9297527b7a	Enable NHWC transformer when generating ORT format model (#7126 ) * Allow specific optimizers to be disabled. - replace unused ability to specify just the optimizers to run - never used so not needed Allow the disabled list to be specified via the python bindings - expected usage is internal, so using kwargs for that so as not to pollute the documentation with stuff no user is likely to need Update the ORT format model conversion script to disable NCHWc transformer when level is 'all' - currently there aren't any known use cases where we'd want the NCHWc transformations to run as they create a device specific model and aren't used on ARM - the ORT format model is not expected to be generated on the target device (e.g. generate on Windows/Linux/macOS to deploy to Android/iOS so there's a good chance we'd generate a useless/invalid model - default to 'all' as ARM and MLAS prefer NHWC and the NHWC transformer runs at that level * Add matching changes to optimizer generation in training code	2021-03-29 18:39:48 +10:00
Jeff Daily	65ce5f07b3	add Dockerfile.rocm4.1.pytorch (#7152 )	2021-03-26 21:40:10 -07:00
Sherlock	ab86634c36	Address comments from ORTModule master merge (#7101 ) * Address ortmodule merge master comments Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-26 16:26:42 -07:00
Thiago Crepaldi	a01f15198c	Add support for large models (#7113 ) * Add support for large models * Handle models with registered buffers	2021-03-26 14:08:46 -07:00
KeDengMS	c9b29fbd06	Disable MatmulTransposeFusion for CPU EP (#7135 ) It causes convergence issue in BERT on CPU	2021-03-25 17:16:58 -07:00
G. Ramalingam	cc0e7bee76	Add function-body to SoftmaxGrad (#6988 ) * Add function body to SoftmaxGrad schema * Add type context and cleanup * Add test case with symbolic dimensions * Add opset specification to function * handle opset dependence * Exclude from minimal build	2021-03-25 11:34:06 -07:00
Vincent Wang	fda0470683	Add New AllocKind for YieldOp Outputs, Run YieldOp with InferenceSession in UT (#7125 ) * new allockind, add ut * change macro * fix win build * rename alloc kind * fix mem leak	2021-03-25 15:18:51 +08:00
Sherlock	1c8d874412	Promote BiasDropout from orttraining to onnxruntime (#7116 ) * Promote BiasDropout from orttraining to onnxruntime Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-24 20:42:42 -07:00
jingyanwangms	cd67f12add	Move IOBinding and RunOptions to ctx (#7028 ) * Liqun/ort module perf1 (#6806) add mysql script to log perf data Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989) * Reduce logging for ORTModule for the end user (#6982) * Support none types in forward output (#7001) * Missed test case for none type output (#7014) * save iobinding to ctx * save run_options to ctx * remove debug tests * PR comments and clean up * add RunStateInfo * remove whitespace edits * PR comments * remove test changes * fix test failure * Fit unit test test_nesting_forward_backward_calls Co-authored-by: liqunfu <liqfu@microsoft.com> Co-authored-by: baijumeswani <bmeswani@microsoft.com> Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-24 17:51:00 -07:00
harshithapv	540eac253e	Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078 ) * adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule * fixed typo in args * addressed Thiago's comments * Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-03-24 09:43:05 -07:00
Suffian Khan	5cb8934459	update Dockerfile for workaround for issue in RCCL for rocm4.0 (#7108 )	2021-03-23 13:36:04 -07:00
Suffian Khan	c0994fdfbb	Update ORTTrainer to permit Rocm and permit export of opset 13 (#7059 ) * update orttrainer to permit rocm and allow export for opset 13 * wrap rocm check in try-except block	2021-03-23 11:09:48 -07:00
baijumeswani	c3310efdcd	Support for models having partially non trainable parameters (#7058 ) * Support for models having partially non trainable parameters	2021-03-23 09:41:16 -07:00
Sherlock	5ec0e71542	ORTModule support non-differentiable module output (#7048 ) * Handle non-differentiable module output Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-22 15:46:11 -07:00

1 2 3 4 5 ...

582 commits