onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

Author	SHA1	Message	Date
baijumeswani	f1ade14e44	Assert that the data is on the same device as ORTModule (#6942 )	2021-03-08 17:03:28 -08:00
Vincent Wang	56c5620fd2	Disable Materializing Grads (#6822 ) * disable materialize grads * gradient builder bugfix * fix ut * fix ut * resolve comments and bugfix * add more assert * disable forward compare for now	2021-03-08 16:56:06 +08:00
Thiago Crepaldi	dfc7c18e31	Introducing TrainingAgent interface to performance training using YieldOp (#6898 )	2021-03-05 17:03:46 -08:00
baijumeswani	79f832c682	Separate requirements.txt file for ORTModule pipelines (#6879 ) * Move all ORTModule dependency installations to ortmodule subfolder	2021-03-05 14:12:11 -08:00
ytaous	ac4d615553	Enable priority-based execution order as default to support inputs with symbolic/dynamic shape (#6892 ) * priority-based exec order * disable 1 failing test * fix UT * more comments Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-04 22:36:25 -08:00
Sherlock	749e6a08a6	Add more asserts for ORTModule forward's correctness (#6887 ) * Add more asserts on forward outputs * Found one more failing case Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-03-03 19:57:42 -08:00
Vincent Wang	4238ce341a	Add External Outputs Flag for YieldOp (#6789 ) * add external outputs flag for YieldOp * use kPreExisting * add ut for mem_pattern * fix ut after merge from master	2021-03-02 11:38:18 +08:00
Sherlock	12edf22f11	Merge pull request #6838 from microsoft/mzs/ortmodule-api-sync-from-master-210226 Sync from master	2021-02-27 12:32:36 -08:00
Thiago Crepaldi	f71d93ea2b	Enable PyTorch Lightning basic test on CI (#6809 )	2021-02-27 09:35:42 -08:00
M. Zeeshan Siddiqui	ca48310d6d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/ortmodule-api-sync-from-master-210226	2021-02-27 04:25:23 +00:00
Sergii Dymchenko	059ed1c241	Copy forward signature from PyTorch model. (#6777 )	2021-02-26 13:02:13 -08:00
satyajandhyala	355057cf9c	Added RequiredGrad attribute to YieldOp (#6657 ) * Added required_grad attribute to YieldOp * Chagened YieldOp attribute to hold the indices of the required gradient outputs from the count, and removed the code reordering the outputs. * Changed backward_output_grad_names to a map from backward output gradient name to the corresponding output index.	2021-02-26 10:38:38 -08:00
Sherlock	8a450d523f	Check gradient correctness in the UTs (#6803 ) * Check gradient correctness in the UTs Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-02-25 13:31:07 -08:00
baijumeswani	fa8a9015bd	Mount hf model cache and use cache for loading hf models (#6810 )	2021-02-25 13:30:14 -08:00
Sergii Dymchenko	99ffffbe6a	Remove backward workaround from test. (#6811 )	2021-02-25 13:23:46 -08:00
ashbhandare	b05403d877	Clear iobinding outputs (#6774 )	2021-02-25 11:50:43 -08:00
Sherlock	8e200e13fe	Rewrite ORTModule background task coordination (#6700 ) * Introduce OrtTasks to replace EventPool * return run_id to frontend * pass run_id to backward * OrtTasks support multiple bg_events * make message_queue a member of orttask * Replace MessageQueue with std::promise * Move status_promise into Task * Move terminate flag into Task * Reenable previously disabled UTs * Add unit tests * Replace condition variables with std::promise * Move to CreateBackgroundTask in the main thread * return status and output in forward_future * use throw for terminating background thread * cleanup tasks at destructor * reenable test_mixed_nnmodule_ortmodules_training * add mutex for ORTTasks functions * add mutex for bg_threads * delay tests before start * add ut for multi-task common backbone Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-02-24 18:00:25 -08:00
baijumeswani	7ce4075bbd	Support nested sequence and mapping types in ORTModule (#6791 )	2021-02-24 15:45:56 -08:00
Weixing Zhang	40fa40f3ce	Enable more unit tests for ROCM EP (#6776 ) * enable more ops and unit tests for ROCM EP	2021-02-24 15:20:50 -08:00
Thiago Crepaldi	aa5cd37ac8	Refactor device handling and basic support for PyTorch Lightning (#6758 )	2021-02-24 14:12:55 -08:00
jingyanwangms	c02ec38f8a	[Running CI now] Remove duplicate tests to speed up CI (#6768 ) * remove tests to speed up CI * add back _into_data_parallelism tests to see how long the CI test takes * remove unnecessary save calls * add back data_parallelism_full_precision_bart_path * add data_parallelism_full_precision_path * remove data parallelism tests Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-02-23 23:21:06 -08:00
baijumeswani	65ba51d93e	Re-enable test and increase timeout (#6785 )	2021-02-23 18:51:06 -08:00
Thiago Crepaldi	563218dcda	Update torchtext usage for pytorch transformer sample (#6767 ) * Update torchtext usage for pytorch transformer sample * Temporarily disable tests to unblock repo (failures are being worked on already) * Update loss numbers for ORTTrainer UTs	2021-02-23 14:06:35 -08:00
Sergii Dymchenko	58f3aca95d	Support keyword arguments for ORTModule (#6539 ) * Support keyword arguments for ORTModule. * Add backward workaround to the test. * Specify test name directly without -k. * Handle unused inputs removed by ONNX exporter.	2021-02-19 13:40:44 -08:00
M. Zeeshan Siddiqui	1a2f1bd23a	Enable external CUDA allocator in ORTModule. (#6745 ) * Enable external CUDA allocator in ORTModule. * Fix assert after unification of allocators. * Update no grad memory test. * update comments. * fix provider options array when not sharing allocator.	2021-02-18 20:01:13 -08:00
Thiago Crepaldi	fb3f1f5cc1	Enable custom ops on ORTModule (#6740 )	2021-02-18 09:08:10 -08:00
M. Zeeshan Siddiqui	40dda452cf	Merge branch 'master' of https://github.com/microsoft/onnxruntime into mzs/sync-from-master	2021-02-18 03:03:01 +00:00
M. Zeeshan Siddiqui	5b7e7aaa45	Move event_pool and message_queue to core.	2021-02-17 11:50:56 -08:00
M. Zeeshan Siddiqui	eecce31a8b	Fix build, cleanup.	2021-02-17 11:50:41 -08:00
Thiago Crepaldi	3184c47ad1	Merge branch 'master' into thiagofc/merge-from-master	2021-02-17 11:49:52 -08:00
Wei-Sheng Chin	9e67b88c83	Use local rank as GPU ID (#6719 )	2021-02-17 22:42:54 +08:00
baijumeswani	01dfa8e125	Support non tuple return values from torch.nn.module (#6660 ) * Support dictionary, namedtuples and huffingface ModelOutput type for model return values	2021-02-16 20:48:32 -08:00
Thiago Crepaldi	7f33671ade	Handle multiple devices scenarios (#6672 ) * Handle multiple devices scenarios	2021-02-16 18:22:30 -08:00
Thiago Crepaldi	7ee5baa60d	Remove monkey patch for PyTorch Nightly + ORTTrainer (#6659 )	2021-02-16 17:24:50 -08:00
Edward Chen	b2cddc5337	Consolidate MLTypeCallDispatcher classes (#6651 )	2021-02-12 13:26:56 -08:00
Suffian Khan	e6de0eb813	Add nightly pipeline for MI100 to run convergence and batch size test similar to V100. (#6611 ) * Partial updating of ROCM reduction code. * Update reduction_all.cu * Add reduce template parameters. * miopen common * Reuse CUDA's reduction_functions.cc * Reduction ops. * Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels. * Disable a couple more unsupported tests. * Code formatting. * Delete ROCM-specific reduction code that is identical to CUDA reduction code. * Fix scratch buffer early free. * Fix merge conflict. * first attempt nightly amd ci pipeline * try fix bad yaml file * try again with corrected model directory * add convergence test as well * update reference loss for amd mi100 * include mi100 test results csv * update the mi100 convergence test reference values * update batch sizes for mi100 32g * fix gpu sku for run_convergence_test.py * undo unrelated changes to master * pr comments * pr comment Co-authored-by: Jesse Benson <jesseb@microsoft.com>	2021-02-12 13:22:06 -08:00
Yufeng Li	1c3168c0f6	Skip constant folding dequantizelinear for quant qdq format (#6643 ) * skip constant folding dequantizelinear for quant qdq format	2021-02-11 14:06:13 -08:00
Thiago Crepaldi	0732d72706	Add support for dynamic axes for outputs + check model output type before export (#6648 )	2021-02-11 10:18:02 -08:00
Sherlock	9294dde143	Rename ONNX graphs variables in ORTModule (#6645 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-02-10 20:10:23 -08:00
Vincent Wang	eec602e48a	OrtModule v0.21 (#6395 ) * ortmodule v0.2 * use pt module for eval * get user outputs in yield op * pass output grads to yield output without copy * Disable mem_pattern for ORTModule * Avoid allocating output buffer for Yield op * Change to WaitAndReset to avoid overriding signal * remove unnecessory signal/wait at the end of bg thread * Return Session.Run result as a std::future * export model with torch.no_grad() * Handle bg thread's early return in Forward call * Removed duplicated Yield kernel * Silence "CUDA kernel missing log" * Add missing transforms, clear iobinding (#6532) * revert ortmodule.py to a working state first * Apply ortmodule.py change from dev branch * Rename to YieldOp Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Sherlock <baihan.huang@gmail.com>	2021-02-10 13:27:15 -08:00
Derek Murray	88d48063fa	Log warning when GetGradientForOp() silently fails. (#6586 ) * Add warning when GetGradientForOp() silently fails. In some cases, `GetGradientForOp()` can return without creating any nodes, which may lead to an invalid graph being created.	2021-02-10 10:01:16 -08:00
Wei-Sheng Chin	8972621138	Generate shape-independent graph if any input dimension < 2 (#6581 ) * Throw for non-supported case * Not to go to shape-dependent branch when seeing unsupported shapes	2021-02-10 15:44:25 +08:00
Cian Hayes	16eed68a1e	Fix layer_norm.cc on x86 (#6556 ) * Fix LayerNromGrad on x86 * PR feedback	2021-02-08 17:36:14 -08:00
Jesse Benson	d18aa45b46	Enable more ROCM ops that are sharing CUDA code. Some are needed for Turing NLG models.	2021-02-06 14:40:34 -08:00
George Nash	b50b0a89aa	Fix build failure when building with --build_wheel on Windows This resolves issue #6536 Signed-off-by: George Nash <george.nash@intel.com>	2021-02-05 18:59:01 -08:00
Scott McKay	ccfd90291b	Remove condition from ORT_RETURN_IF[_NOT] macro output. (#6563 ) Remove condition from ORT_RETURN_IF[_NOT] macro output as repeating the condition doesn't add much value compared to the explicit error message, and the error message includes the file and line anyway so it's easy enough to find the condition if needed. Update the few places where the macros were used without an explicit error message to provide an explicit error message. Saves 12.5KB in a minimal MinSizeRel build with all DNN ops, 16KB in full release build.	2021-02-05 17:33:29 -08:00
Weixing Zhang	299ace0759	Support to allow user to specify compute stream per session (#3723 ) * Support to allow user to specify compute stream per session Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream. remove some redudant cudaStreamSynchronize fix gpt2 model test failures don't use default stream in nccl either. add stream schronization in OnRunEnd() using cub::DeviceScan::InclusiveSum which can be called with stream specified. fix topK failure due to latest rebase fix tensorrt support user specified stream add user_stream support in tensorrt EP use same stream for both tensort and CUDA EP. fix ScatterND specify stream for adasum and p2p kernels. fix loop fix CApiTest.custom_op_handler fix CApiTest.varied_input_custom_op_handler change for cudaMemcpyFromSymbol improve provider options for user specified compute stream * add changes for ROCM EP * fix GatherGrad UT for ROCM EP * clean code and fix NonMaxSuppression * use default stream for ROCM now * fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt * fix tensorrt ut: CApiTest.io_binding_cuda Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-02-05 15:48:18 -08:00
Jesse Benson	a9e4d70b50	Fix merge conflict.	2021-02-04 15:00:05 -08:00
Jesse Benson	86ac11af1a	Delete ROCM-specific reduction code that is identical to CUDA reduction code.	2021-02-04 15:00:05 -08:00
Jesse Benson	5d8792705b	Code formatting.	2021-02-04 15:00:05 -08:00

1 2 3 4 5 ...

514 commits