onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Tim Harris	5c6a27408a	Remove signed/unsigned compiler warnings, add additional pipeline test case (#4314 ) * Avoid signed/unsigned warning on loops * Report sizes when distributed world configuration is inconsistent * Add DistributedRunContextTest for pipeline stage configuration	2020-06-24 11:36:18 +01:00
Vincent Wang	f26c149d7d	Set NonZero Output Shape for Gradient Building. (#4246 ) * Set NonZero output shape for gradient building. * Resolve comments. Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-06-24 13:43:22 +08:00
Vincent Wang	3374733783	Refactor ReduceMean/Sum Gradient without Shape Dependency. (#4261 ) * ReduceMean/Sum gradient without shape dependency. * optimize expand and use it to replace add. * Adjust test. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-24 11:36:53 +08:00
Bowen Bao	15cb4b3023	Fix session load state & run extra_postpasses only once (#4255 ) * Fix session load state & run extra_postpasses only once * add testcase for onnx model as well	2020-06-23 11:45:26 -07:00
Vincent Wang	b41fcf1570	Bugfix for shape inference and GetShape. (#4243 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-17 15:11:02 +08:00
Wei-Sheng Chin	189fb60ef9	Fix a bug and add code to profile memory (#4241 ) * Fix a bug and add code to profile memory 1. Compile Send/Recv again (currently broken because of HOROVOD refactor). 2. Add code to print out initializer allocation size and activation memory size. * Address comments * Split memory counts per locations * Fix a metric	2020-06-16 10:17:27 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
ytaous	5d28efd434	opset12 code cleanup (#4242 ) * opset12 code cleanup * opset12 code cleanup Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 19:45:35 -07:00
ytaous	e0334f177c	Opset12 upgrade for existing models used by perf/e2e pipelines (#4238 ) * opset12 support * opset12 support * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 14:26:53 -07:00
Bowen Bao	b08771f00e	Add ONNX Training Post-Passes to Front-End - Cont (#4041 ) * Add ONNX postpasses * add flag + add bert test from onnx file * address PR comments * fix typo * fix rebase * address comments * Fix test failures * add new pass for expand for new pt version, add comments * fix rebase Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-15 10:33:26 -07:00
Weixing Zhang	b4b1c6440a	Enable ORT with CUDA 11 toolkit (#4168 ) * ORT on CUDA 11 1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0 * polish the code about MPI/NCCL in CMakeLists.txt and build.py * check CUDA version * ${MPI_INCLUDE_DIRS} should be PUBLIC * sm30, sm50 are deprecated in CUDA 11 Toolkit * update change based on code review feedback. * add sm_52 * improve MPI/NCCL build path Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-15 08:47:03 -07:00
Wei-Sheng Chin	ecc901717e	Use subset to release gradient tensors earlier (#4222 )	2020-06-14 22:52:54 -07:00
Wei-Sheng Chin	de9da123cf	Enable static memory planning for pipeline. (#4204 ) * Enable static memory planning for pipeline. 1. We fix a bug when resolving symbolic shape for scalars. 2. We pass the original inputs to all pipeline stages so that the symbolic shapes can be resolved. * Further Improvements 1. Address comments. 2. Further reduce activation size by ~50% when pipeline is on. This is done by removing all but one gradient tensor from the last RecordEvent in the backward pass. * Address a comment * Fix Windows build	2020-06-12 21:43:50 -07:00
Edward Chen	6b4f652017	Clean up status checks in gradient_graph_builder_test.cc.	2020-06-12 14:28:39 -07:00
Edward Chen	7096e6f5ef	Reduce severity of GraphAugmenter logging statement.	2020-06-12 14:28:39 -07:00
pengwa	e6ccb1ac28	GatherNDGrad for CPU (#4123 ) * GatherNDGrad on CPU * Remove __CUDA_ARCH__ check in .cc files	2020-06-12 02:43:49 +08:00
Xueyun Zhu	65a682354b	enable pipeline to run with mixed precision (#4113 ) * enable pipeline to run with mixed precision * address feedback * address feedback * test log * pipe infomation if test fails * ci failure	2020-06-10 22:16:24 -07:00
suffiank	7f5339505e	Discover trainable parameters using reverse DFS from loss node (#4116 ) Discover trainable parameters using reverse DFS from loss node, omitting recursion along untrainable inputs. Co-authored-by: suffian khan <sukha@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: suffian khan <sukha@microsoft.com>	2020-06-08 14:16:10 -07:00
Sergii Dymchenko	653417ae4b	Fix scaler->scalar typo. (#4142 )	2020-06-08 13:02:12 -07:00
Dmitri Smirnov	4e1dac67cd	Address memory leak and improve memory handling (#4124 ) Fix memory leak when a Python list passed as a feed. Create a custom allocator that can take ownership of python arrays that are created inside pybind. Allow direct memory use if continuous array is a copy because we now can take ownership of it by the allocator.	2020-06-08 09:29:46 -07:00
liqunfu	ffed43e9b8	handle loss and name marching wrappers (#4066 ) * handle loss and name marching wrappers Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-05 23:34:26 -07:00
Bowen Bao	1e5307d458	Bug fix for parameter names of models not using wrapper (#4061 ) * bug fix for models not using wrapper * add test case for no wrapper case * update test case to use internal learning rate * fix bug with frozen weight update	2020-06-05 12:03:38 -07:00
Thiago Crepaldi	81101c9efd	Fix DropoutGrad op (#4052 ) Dropout op was recently changed to accept a new input named 'training_mode', which is passed in to DropoutGrad automatically. This PR updates the DropoutGrad schema to accommodate the new input. Tests were also update to reflect the API change Co-authored-by: Thiago Crepaldi <thiag.crepaldi@microsoft.com>	2020-06-04 15:00:02 -07:00
liqunfu	905c535626	still need to make the test stable. Lower the acc number a bit to make the test pass for now (#4117 ) Co-authored-by: liqun fu <liqun@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-02 21:37:48 -07:00
ashbhandare	f18a99b245	Exclude non-trainable torch buffers from trainable weights (#4099 ) * Initial changes * Removed redundant fix * Revert unintended formatting change. * Add unit test	2020-06-02 14:05:44 -07:00
edgchen1	ba74914c5a	Remove evaluation output from training e2e test baseline data. (#4092 )	2020-06-01 15:06:21 -07:00
ytaous	72d508b7a0	New perf metric - e2e throughput (#4085 ) * new metric * on comments * tab to spaces Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-01 12:11:34 -07:00
Tixxx	6404aba5ae	Orttraining rc1 master merge (#4080 ) * fixed seg fault when using concrete shape disable gradient as output * fix evaluation hang issue for multiple gpu run * Remove dead code, ORTModel and improve docstrings (#3814) * Refine ORTTrainer docstring descriptions (#3907)	2020-05-29 12:28:12 -07:00
Wei-Sheng Chin	e951b29a0b	Fix a macro and memory regression (#4068 ) onnxruntime_training_bert can run the following command again. ./onnxruntime_training_bert --model_name=bert-large-uncased_L_24_H_1024_A_16_V_30528_S_512_Dp_0.1_optimized_layer_norm --num_train_steps=16 --train_batch_size=52 --mode=train --train_data_dir=/bert_data/128/books_wiki_en_corpus/train --test_data_dir=/bert_data/128/books_wiki_en_corpus/test --gradient_accumulation_steps=16 --optimizer=Lamb --learning_rate=3e-3 --max_seq_length=128 --max_predictions_per_seq=20 --warmup_ratio=0.2843 --warmup_mode=Poly --display_loss_steps=100 --use_mixed_precision=True --allreduce_in_fp16 --use_nccl	2020-05-29 09:24:40 -07:00
edgchen1	38d76cc904	Clean up training E2E test (#4078 ) Update training E2E build to not go through CTest and call test scripts directly.	2020-05-29 09:20:47 -07:00
pengwa	6d03470587	Add e2e measurement for training (#4049 ) * add e2e measurement	2020-05-29 10:08:29 +08:00
liqunfu	6665d5e2bc	Liqun/a transformer example (#3845 ) Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-05-27 15:21:35 -07:00
Xueyun Zhu	633008b5ef	Add pipeline online partition logic for pipeline (#3996 ) * online partition * fix when multiple consumer nodes is in cut info * fix windows build * address feedback * adding test * feedback * address feedback * add parser for cut edge * windows build	2020-05-26 17:44:09 -07:00
Wei-Sheng Chin	24eda3df33	Create Utils for Adding Range and Marker (#4013 ) In this PR, we 1. create some APIs for creating NVTX objects 2. apply those APIs in pipeline-related operators and sequential executor. As a result, we can explicitly see how a pipeline schedule is run by GPUs in Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's limited support.	2020-05-24 22:55:24 -07:00
Bowen Bao	0a5395bb78	Remove 'model_.' prefix from onnx model initializers in training (#3881 ) * Remove 'model_.' prefix for onnx model initializers in training * fix test case remove redundant device test * rename * Fix state_dict/load_state_dict with frozen_weight * nit * Add monkey patch for pt opset 10 * remove pt patch in CI * nit: newline	2020-05-20 10:06:31 -07:00
ytaous	fb4efafc8e	GPT-2 training perf scripts (#3974 ) * gpt2 training perf * gpt2 training perf * debug * debug * debug * fix bug * minor * on comments * dynamic sql * fix build * minor * linked hash * on comments * minor * mem * minor Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-19 10:21:40 -07:00
Faith Xu	b8a255e1b5	Doc Updates for Build (#3976 ) * Initial update of readme * Readme updates * Review of consolidated README (#3930) * Proposed updates for readme (#3953) I found some of the information was duplicated within the doc, so attempted to streamline * Fix links * More updates - fix build instructions - nodejs doc reorganization - roadmap update - version fixes * Update ORT Server build instructions * More doc cleanup * fix python dev notes name * Update nodejs and some links * sync eigen version back to master * Minor fixes * add nodsjs to sample table of content * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * address PR feedback * address PR feedback * nodejs build instruction * Update Java instructions to include gradle * Roadmap refresh Reformat some data, fix link, minor rewording * Clarify Visual C++ runtime req Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com> Co-authored-by: Prasanth Pulavarthi <prasantp@microsoft.com> Co-authored-by: manashgoswami <magoswam@microsoft.com>	2020-05-18 20:08:36 -07:00
M. Zeeshan Siddiqui	44731e88bb	Add comments for zero valued normalization factor in SoftmaxCrossEntropyLossGrad CUDA kernel. (#3972 )	2020-05-18 09:08:09 -07:00
Wei-Sheng Chin	0d11649bb3	Address comments from #3823 and polish code (#3964 ) * Address comments from #3823 and polish code * One line	2020-05-17 14:08:33 -07:00
M. Zeeshan Siddiqui	a296b16719	Prevent divide by zero in CUDA implementation of SoftmaxCrossEntropyLossGrad. (#3962 )	2020-05-16 00:33:25 -07:00
Wei-Sheng Chin	33208c9f6b	Modify Pipeline Facilities to Fix PipeDream Deadlock (#3823 ) * Prepare utils for adding Wait's and Record's * Have a running PipeDream * Add comments * Polish comments * Clean code * Fix test * Polish names * Polish names * Remove debug headers * Fix a shape inference bug (not related to pipeline code) * Fix a warning * Address some comments * Address comments * Only touch consumers of outputs when re-wire edges	2020-05-15 18:27:19 -07:00
ytaous	bc441b7e5c	Add cpu/mem usage for perf metrics (#3947 ) * add cpu/mem usage * on comments * on comments * renaming Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-15 12:29:40 -07:00
ytaous	93eb9bcfde	Add yaml/perf scripts for new perf test pipeline (#3909 ) * yaml/perf scripts for new pipeline * yaml/perf scripts for new pipeline * remove unused imports * testing some comments change * testing some comments change * testing jdbc * testing jdbc * testing jdbc * exclude pwd from jdbc properties * exclude pwd from jdbc properties * namedtuple * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-13 14:15:17 -07:00
Bowen Bao	0f82b42fed	Ensure pt model is set to cpu in ort_trainer (#3867 ) * Ensure pt model is set to cpu in ort_trainer * add note comment	2020-05-12 13:32:27 -07:00
Thiago Crepaldi	70abb120b3	Remove ORTModel from frontend API (#3825 ) * Resolve conflict * Address review	2020-05-11 18:20:33 -07:00
M. Zeeshan Siddiqui	c46a9e8d65	Add numerical stability to SoftmaxGrad test inputs. (#3857 ) * Increase the tolerance for SoftmaxGrad CPU-GPU compare tests. * Increase the tolerance for SoftmaxGrad CPU-GPU compare tests. * Add 1e-2 to Y for numerical stability. * build break. * comments. * PR feedback. * PR feedback.	2020-05-11 17:59:24 -07:00
ytaous	96030fdcbc	dashboard integration - output training perf metrics as json (#3809 ) * dashboard integration - first phase * change a field * perf scripts * addressing PR comments * address comments and fix build * minor * make GetConfigFromData() const * more update for comments * addressing comments * more on addressing comments * minor * fix build * add condition check * more on comments * retrun status * remove batch size * on comments * rename pkg path * rename pkg path * additional commentss Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-10 10:29:38 -07:00
M. Zeeshan Siddiqui	eb33d5eda9	Do not register Dropout(12) as training ONLY kernel. (#3859 ) * Do not register Dropout(12) as training ONLY kernel. * Move Dropout forward implementation in inference project. * fix inference build test failures. * remove fp16 test since its support is absent on CPU. * build break.	2020-05-09 21:38:17 -07:00
Vincent Wang	3c24841569	Fold Shape Node During Constant Folding (#3748 ) * Fold Shape node in constant folding. * bugfix * Fix test failure. * Bugfix for C++ frontend. * Bugfix for C++ frontend. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-05-09 20:15:03 +08:00
ashbhandare	424a00bf04	Fix enabling gradient as output for easy mode. (#3866 )	2020-05-07 15:07:14 -07:00

1 2 3

149 commits