onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-02 23:39:58 +00:00

Author	SHA1	Message	Date
M. Zeeshan Siddiqui	82108b18e3	Partial graph execution perf improvements. (#7438 ) * Partial graph execution perf improvements. * PR feedback. * Decrement reference count of tensors in ORTModule. * PR feedback. * PR feedback. * PR feedback.	2021-04-26 17:13:55 -07:00
Thiago Crepaldi	0702a14ee7	Add pytorch version check before loading Python ONNX Runtime training module (#7377 )	2021-04-26 14:53:50 -07:00
Vincent Wang	368e4a324f	SqueezeGrad Bugfix (#7412 ) * squeezegrad bugfix * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-04-26 09:12:03 +08:00
Weixing Zhang	ca9b3f18e9	Explicitly pass cuda stream to thrust function rather than use cuda default stream implicitly (#7414 ) * Pass cuda stream to thrust function to not use default stream. In the commit `299ace0`, ORT has been changed to not use cuda default stream. * update amd_hipify.py * remove un-necessary stream sync Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-25 01:18:56 -07:00
Thiago Crepaldi	410a81b21b	Add support for ORTModule to execute the graph when ONNX drops unused… (#7424 )	2021-04-23 18:10:57 -07:00
M. Zeeshan Siddiqui	34ebf7d3dd	Partial graph execution made simple. (#7324 ) * Python changes. * C++ changes. * fixes/hacks. * more hacks. * perf. * changes. * changes. * re-architect partial graph execution and remove iobinding. * changes. * refactor. * prevent copies from python to c++. * perf. * merge conflicts. * misc. * fix merge conflicts and tests. * Ifdef partial executor. * PR feedback. * Delete ORT Task et al. * Clean up. * clean up. * Restore SetOutputMLValue(). * PR feedback. * Re-enable disabled ORTModule tests. * PR feedback. * PR feedback.	2021-04-23 15:09:18 -07:00
Tang, Cheng	1fa6d8fe1c	support loading external execution provider from python frontend (#7332 ) * initial dynamic load example * support load EP in the provider options * support dynamic load EP in orttrainer * split the provider interface; fix comments in pr * remove experiment code * add test * remove useless file * add test model file;fix linux brewak * fix linux build and missing file * fix python build * fix python build * fix python binding * fix python test * fix runtime path for posix env * exclude the shared library from minimal build * fix comments in pr; * seperate the provider shared lib loading * excluded from minimal / macos / ios build * skip copy the provider shared lib for minimal build and mac os * fix macos build * exclude the test for macos build * exclude from andorid build * exclude from web assembly build * enable the invalid ep test Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-04-23 09:54:09 -07:00
Ashwini Khade	75e054cd33	pick onnx release candidate (#7177 ) * pick onnx release candidate * fix typo * filter batchnorm tests * add implementation for reshape 14 * add identity op kernel for opset 14 * fix typo * update onnx commit * update commit to latest master * add hashes for new kernel registrations and update 1 * TEST commit * update onnx back to right commit * Update onnx to latest in rel-1.9.0 * temp fix * remove nonzeroshapesetter transformer * pick rel branch latest commit * fix build failures * fix build failures * fix build failures * update the commit to latest in release branch * add test filters for not impemented op14 ops in c# tests * plus review comments	2021-04-22 23:57:09 -07:00
Thiago Crepaldi	771a6d235b	Fix IsContiguousTensor check on backend (#7391 )	2021-04-21 17:01:17 -07:00
Sherlock	16ca7677e6	Relax ConvGrad Test tol (#7393 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-21 08:06:00 -07:00
Thiago Crepaldi	8421124344	Add support to **kwargs in ORTModule forward() method (#7360 )	2021-04-20 16:21:52 -07:00
ashbhandare	76cc118dbe	Gemm transpose fusion (#7306 ) * Gemm transpose fusion * Correct rewrite rule effect * Add to inference transforms to trigger on gradient graph	2021-04-20 09:35:05 -07:00
mindest	1a3ddf0714	Add gradient registration and tests for Min/Max (#7217 ) * Add gradient registration and tests for Min/Max * Add helper function for min/max grad test * limit Min/Max Grad to accept at most two inputs; modify test case accordingly * resolve merge error	2021-04-20 18:14:31 +08:00
Sherlock	ce7ff27bac	Fix perf issue in Conv CUDA kernel (#7348 ) * Fix perf issue in Conv CUDA kernel * Read avaiable memory from device * assuming 10% fragmentation Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-19 23:37:05 -07:00
ashbhandare	ac346a1b90	Modify SimplifiedLayerNormFusion to allow fusion in the presence of Casts optionally (#7352 ) * LN transform partial changes * LN transform fix * Make transform optional, remove unnecessary code * Fix windows build * review comment, windows CI fix * review comments	2021-04-19 19:59:23 -07:00
ytaous	7abe1fd392	Identity elimination with graph output (#7312 ) * Identity removal * fix build * fix build * fix build * fix builld * UTs * fix UT * fix UTs * per comments * fix UTs * fix UTs * per comments Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-19 16:36:35 -07:00
satyajandhyala	bb1e417da0	Add logging support to Cast Propagation transformation from python (#7353 ) * Fixes needed to PropagateCast transformation. * Added number of passes to the logs. * Added logging support to OrtModuleGraphBuilder. * Added new testcases. * Added NodeArgToConsumerMap	2021-04-19 12:14:30 -07:00
M. Zeeshan Siddiqui	6dda1e0681	Flag for tensor memory re-use in allocation planner. (#7359 )	2021-04-16 17:53:25 -07:00
satyajandhyala	0da085ed48	Propagate Cast operations to maximize lower precision (float16) computation (#7191 ) * Added propagate_cast_ops option and PropagateCastOps transformation. * Added test cases to propagate Cast operations. * Expose GraphTransformerConfiguration to python interface and added propagate_cast_ops options. * Added functionality to propagate Cast operations. * Added logging. * Apply cast propagation to the subgraphs.	2021-04-14 20:54:24 -07:00
Jesse Benson	be79575c6a	Use built-in reduce_sum() for simple reduction cases, specifically reduce all to a scalar.	2021-04-14 08:55:35 -07:00
ashbhandare	6ceee5d131	IsInf ReduceSum transform (#7188 ) * IsInf ReduceSum transform * Revert unnecessary changes, add isinf_only and isnan_only attr * add tests, review comments * Disable test for non-cuda * Move IsAllFinite from training to contrib op * review comments * Review comment, formatting * Enable test for ROCm EP	2021-04-13 16:05:21 -07:00
G. Ramalingam	f8a36dd6b3	Add DropoutGrad function body (#7310 ) * Add DropoutGrad function body * Add DropoutGrad function body * Fix documentation and add test cases * Fix template specialization * Check expansion for float16 and bfloat16	2021-04-13 14:31:53 -07:00
harshithapv	a5d3a52d1a	Add Tile grad (#7289 ) * tile grad * fixed bugs * added tile grad test * bug fix * Added tests. Addressed comments * added optimization recommended and addressed comments * fixed comment	2021-04-13 12:54:45 -07:00
Weixing Zhang	75c0192e4f	enable more unit tests for ROCM EP (#7307 )	2021-04-09 15:15:13 -07:00
baijumeswani	b221a4fd86	Better error message when ORTModule used with torch.DataParallel (#7287 ) * Better error message when ORTModule used with torch.DataParallel	2021-04-09 10:07:22 -07:00
Weixing Zhang	c22963c23d	Polish Lamb Kernel (#7299 )	2021-04-09 09:55:57 -07:00
Weixing Zhang	8ad5007f8f	Polish Adam kernel (#7294 ) * Polish Adam kernel	2021-04-09 01:11:09 -07:00
Thiago Crepaldi	7b4362c21a	Add support to dynamic positional/keyword input for ORTModule (#7189 )	2021-04-08 12:46:21 -07:00
ytaous	e14b291ce7	Enable symbolic shape inference in ORTModule (#7282 ) Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-08 09:47:09 -07:00
baijumeswani	d272c8434d	Suppress tracer warnings from onnx export in ORTModule (#7221 ) * Suppress tracer warnings from onnx export in ORTModule	2021-04-08 03:41:38 -07:00
Sherlock	aa2c465143	Restrict ConvGrad to __CUDA_ARCH__>=700 (#7278 ) * Restrict ConvGrad to __CUDA_ARCH__>=700 Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-07 20:10:29 -07:00
Vincent Wang	beb299e17d	ConvGrad CUDA Kernel Bugfix (#7273 ) * bugfix * add ut	2021-04-08 08:22:18 +08:00
baijumeswani	844361bc67	Support eval mode and torch.no_grad context in ORTModule and restructure ortmodule.py (#7162 )	2021-04-07 09:29:54 -07:00
Sherlock	4bc17ca04e	CUDA ConvGrad Kernel (#7227 ) * ConvGrad CUDA impl * Set up the test case for Deberta Conv1D * Add fp16 test Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-06 22:09:06 -07:00
Derek Murray	25e261f196	Avoid passing zero bias to Gemm in gradients (#7244 ) * Avoid passing zero bias to Gemm in gradients The bias argument to Gemm is optional and defaults to zero. Therefore we do not need to generate zero initializers and pass them to that argument. * Remove unused declaration.	2021-04-06 16:49:34 -07:00
ashbhandare	2aa89989c4	Not-where fusion (#7182 ) * Not-where fusion * Change to rewrite rule * Add to inference transforms * Support numtiple where consumers * review comments	2021-04-06 16:12:26 -07:00
raviskolli	5d759e182b	Allocate external Rocm allocator via PyBind (#7148 ) * Enabled rocm support for graph transformations * Support for external Hip allocator * Added const_cast to reinterpret_cast to fix compiler issue * Another crack at fixing the compile error * More compilation fixes * Added compilation flags to load_inline extension * Added ROCM, ROCM_PINNED constants * Changes to address PR comments * Changed gpu identifier from ROCM to CUDA * Added HIP compilation flag for torch inline functions * Fixed a typo in header allocator string formatting * Fix for runtime error with external_cuda_allocator * Removed cuda/rocm specific code paths for allocators * More name changes to generic gpu from rocm/cuda * Removed duplicate allocator creation * Rename cuda_external_ config options as gpu_external_ * Rename hip_mem_limit to gpu_mem_limit * Rename cuda_mem_limit to gpu_mem_limit	2021-04-06 15:23:51 -07:00
G. Ramalingam	a9ff4c29e5	Add function body to GeluGrad schema (#7190 ) * Add GeluGrad function definition * complete gelugrad function definition * add opset to function definition	2021-04-06 12:40:59 -07:00
ashari4	56b22c1c6b	Fix assert that the tensor's device type is 'cpu' #7248	2021-04-06 09:08:32 -07:00
Pranav Prakash	3b16afc0db	Make dW optional for convgrad (#7083 )	2021-04-05 17:05:20 -07:00
Suffian Khan	9f14af9809	Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240 ) * restore bs test and add perf test * update perf number and fix path to results	2021-04-05 15:51:52 -07:00
ashbhandare	2b8513539e	Div mul fusion (#7183 ) * Div mul fusion * Change to rewrite rule * Add to inference transformers	2021-04-05 09:35:30 -07:00
Weixing Zhang	74ee24cf7f	rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226 ) With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-05 09:04:04 -07:00
baijumeswani	68b12a6179	Support for saving and loading pytorch compatible state dictionaries (#7220 ) * Override methods on torch.nn.Module to get direct access to the methods on the original module.	2021-04-05 03:40:41 -07:00
Weixing Zhang	59b57d8322	HSA_NO_SCRATCH_RECLAIM and RCCL_ALLTOALL_KERNEL_DISABLE are not needed for ROCm 4.1 (#7224 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-02 18:19:11 -07:00
Weixing Zhang	ef88dc912c	enable more unit tests for ROCM EP (#7222 )	2021-04-02 15:57:08 -07:00
Sherlock	a98c2ebb8c	Enable saving optimized models in OrtModule (#7214 ) * Enable saving optimized models in OrtModule Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-02 12:37:05 -07:00
Weixing Zhang	a3f17c8b0d	update lamb and GatherGrad kernel for ROCm EP (#7184 ) With ROCm4.1, the CUDA implementation of Lamb and GatherGrad can be utilized for ROCm EP.	2021-04-02 09:02:49 -07:00
Edward Chen	0ebeaf529d	Check kernel def hashes (#7120 ) Add unit test for verifying kernel def hashes. Add way to add new types to kernel definition without changing hash.	2021-04-01 17:42:58 -07:00
ashbhandare	15c67ddbf0	Make output 1 of ConcatTraining Optional and place on CPU (#7199 ) * Optional input 1 on CPU ConcatTraining * Rename output_1	2021-04-01 16:05:17 -07:00

1 2 3 4 5 ...

597 commits