onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-28 03:20:58 +00:00

Author	SHA1	Message	Date
suffiank	7f5339505e	Discover trainable parameters using reverse DFS from loss node (#4116 ) Discover trainable parameters using reverse DFS from loss node, omitting recursion along untrainable inputs. Co-authored-by: suffian khan <sukha@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: suffian khan <sukha@microsoft.com>	2020-06-08 14:16:10 -07:00
Sergii Dymchenko	653417ae4b	Fix scaler->scalar typo. (#4142 )	2020-06-08 13:02:12 -07:00
Dmitri Smirnov	4e1dac67cd	Address memory leak and improve memory handling (#4124 ) Fix memory leak when a Python list passed as a feed. Create a custom allocator that can take ownership of python arrays that are created inside pybind. Allow direct memory use if continuous array is a copy because we now can take ownership of it by the allocator.	2020-06-08 09:29:46 -07:00
liqunfu	ffed43e9b8	handle loss and name marching wrappers (#4066 ) * handle loss and name marching wrappers Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-05 23:34:26 -07:00
Bowen Bao	1e5307d458	Bug fix for parameter names of models not using wrapper (#4061 ) * bug fix for models not using wrapper * add test case for no wrapper case * update test case to use internal learning rate * fix bug with frozen weight update	2020-06-05 12:03:38 -07:00
Thiago Crepaldi	81101c9efd	Fix DropoutGrad op (#4052 ) Dropout op was recently changed to accept a new input named 'training_mode', which is passed in to DropoutGrad automatically. This PR updates the DropoutGrad schema to accommodate the new input. Tests were also update to reflect the API change Co-authored-by: Thiago Crepaldi <thiag.crepaldi@microsoft.com>	2020-06-04 15:00:02 -07:00
liqunfu	905c535626	still need to make the test stable. Lower the acc number a bit to make the test pass for now (#4117 ) Co-authored-by: liqun fu <liqun@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-02 21:37:48 -07:00
ashbhandare	f18a99b245	Exclude non-trainable torch buffers from trainable weights (#4099 ) * Initial changes * Removed redundant fix * Revert unintended formatting change. * Add unit test	2020-06-02 14:05:44 -07:00
edgchen1	ba74914c5a	Remove evaluation output from training e2e test baseline data. (#4092 )	2020-06-01 15:06:21 -07:00
ytaous	72d508b7a0	New perf metric - e2e throughput (#4085 ) * new metric * on comments * tab to spaces Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-01 12:11:34 -07:00
Tixxx	6404aba5ae	Orttraining rc1 master merge (#4080 ) * fixed seg fault when using concrete shape disable gradient as output * fix evaluation hang issue for multiple gpu run * Remove dead code, ORTModel and improve docstrings (#3814) * Refine ORTTrainer docstring descriptions (#3907)	2020-05-29 12:28:12 -07:00
Wei-Sheng Chin	e951b29a0b	Fix a macro and memory regression (#4068 ) onnxruntime_training_bert can run the following command again. ./onnxruntime_training_bert --model_name=bert-large-uncased_L_24_H_1024_A_16_V_30528_S_512_Dp_0.1_optimized_layer_norm --num_train_steps=16 --train_batch_size=52 --mode=train --train_data_dir=/bert_data/128/books_wiki_en_corpus/train --test_data_dir=/bert_data/128/books_wiki_en_corpus/test --gradient_accumulation_steps=16 --optimizer=Lamb --learning_rate=3e-3 --max_seq_length=128 --max_predictions_per_seq=20 --warmup_ratio=0.2843 --warmup_mode=Poly --display_loss_steps=100 --use_mixed_precision=True --allreduce_in_fp16 --use_nccl	2020-05-29 09:24:40 -07:00
edgchen1	38d76cc904	Clean up training E2E test (#4078 ) Update training E2E build to not go through CTest and call test scripts directly.	2020-05-29 09:20:47 -07:00
pengwa	6d03470587	Add e2e measurement for training (#4049 ) * add e2e measurement	2020-05-29 10:08:29 +08:00
liqunfu	6665d5e2bc	Liqun/a transformer example (#3845 ) Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-05-27 15:21:35 -07:00
Xueyun Zhu	633008b5ef	Add pipeline online partition logic for pipeline (#3996 ) * online partition * fix when multiple consumer nodes is in cut info * fix windows build * address feedback * adding test * feedback * address feedback * add parser for cut edge * windows build	2020-05-26 17:44:09 -07:00
Wei-Sheng Chin	24eda3df33	Create Utils for Adding Range and Marker (#4013 ) In this PR, we 1. create some APIs for creating NVTX objects 2. apply those APIs in pipeline-related operators and sequential executor. As a result, we can explicitly see how a pipeline schedule is run by GPUs in Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's limited support.	2020-05-24 22:55:24 -07:00
Bowen Bao	0a5395bb78	Remove 'model_.' prefix from onnx model initializers in training (#3881 ) * Remove 'model_.' prefix for onnx model initializers in training * fix test case remove redundant device test * rename * Fix state_dict/load_state_dict with frozen_weight * nit * Add monkey patch for pt opset 10 * remove pt patch in CI * nit: newline	2020-05-20 10:06:31 -07:00
ytaous	fb4efafc8e	GPT-2 training perf scripts (#3974 ) * gpt2 training perf * gpt2 training perf * debug * debug * debug * fix bug * minor * on comments * dynamic sql * fix build * minor * linked hash * on comments * minor * mem * minor Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-19 10:21:40 -07:00
Faith Xu	b8a255e1b5	Doc Updates for Build (#3976 ) * Initial update of readme * Readme updates * Review of consolidated README (#3930) * Proposed updates for readme (#3953) I found some of the information was duplicated within the doc, so attempted to streamline * Fix links * More updates - fix build instructions - nodejs doc reorganization - roadmap update - version fixes * Update ORT Server build instructions * More doc cleanup * fix python dev notes name * Update nodejs and some links * sync eigen version back to master * Minor fixes * add nodsjs to sample table of content * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * address PR feedback * address PR feedback * nodejs build instruction * Update Java instructions to include gradle * Roadmap refresh Reformat some data, fix link, minor rewording * Clarify Visual C++ runtime req Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com> Co-authored-by: Prasanth Pulavarthi <prasantp@microsoft.com> Co-authored-by: manashgoswami <magoswam@microsoft.com>	2020-05-18 20:08:36 -07:00
M. Zeeshan Siddiqui	44731e88bb	Add comments for zero valued normalization factor in SoftmaxCrossEntropyLossGrad CUDA kernel. (#3972 )	2020-05-18 09:08:09 -07:00
Wei-Sheng Chin	0d11649bb3	Address comments from #3823 and polish code (#3964 ) * Address comments from #3823 and polish code * One line	2020-05-17 14:08:33 -07:00
M. Zeeshan Siddiqui	a296b16719	Prevent divide by zero in CUDA implementation of SoftmaxCrossEntropyLossGrad. (#3962 )	2020-05-16 00:33:25 -07:00
Wei-Sheng Chin	33208c9f6b	Modify Pipeline Facilities to Fix PipeDream Deadlock (#3823 ) * Prepare utils for adding Wait's and Record's * Have a running PipeDream * Add comments * Polish comments * Clean code * Fix test * Polish names * Polish names * Remove debug headers * Fix a shape inference bug (not related to pipeline code) * Fix a warning * Address some comments * Address comments * Only touch consumers of outputs when re-wire edges	2020-05-15 18:27:19 -07:00
ytaous	bc441b7e5c	Add cpu/mem usage for perf metrics (#3947 ) * add cpu/mem usage * on comments * on comments * renaming Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-15 12:29:40 -07:00
ytaous	93eb9bcfde	Add yaml/perf scripts for new perf test pipeline (#3909 ) * yaml/perf scripts for new pipeline * yaml/perf scripts for new pipeline * remove unused imports * testing some comments change * testing some comments change * testing jdbc * testing jdbc * testing jdbc * exclude pwd from jdbc properties * exclude pwd from jdbc properties * namedtuple * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-13 14:15:17 -07:00
Bowen Bao	0f82b42fed	Ensure pt model is set to cpu in ort_trainer (#3867 ) * Ensure pt model is set to cpu in ort_trainer * add note comment	2020-05-12 13:32:27 -07:00
Thiago Crepaldi	70abb120b3	Remove ORTModel from frontend API (#3825 ) * Resolve conflict * Address review	2020-05-11 18:20:33 -07:00
M. Zeeshan Siddiqui	c46a9e8d65	Add numerical stability to SoftmaxGrad test inputs. (#3857 ) * Increase the tolerance for SoftmaxGrad CPU-GPU compare tests. * Increase the tolerance for SoftmaxGrad CPU-GPU compare tests. * Add 1e-2 to Y for numerical stability. * build break. * comments. * PR feedback. * PR feedback.	2020-05-11 17:59:24 -07:00
ytaous	96030fdcbc	dashboard integration - output training perf metrics as json (#3809 ) * dashboard integration - first phase * change a field * perf scripts * addressing PR comments * address comments and fix build * minor * make GetConfigFromData() const * more update for comments * addressing comments * more on addressing comments * minor * fix build * add condition check * more on comments * retrun status * remove batch size * on comments * rename pkg path * rename pkg path * additional commentss Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-05-10 10:29:38 -07:00
M. Zeeshan Siddiqui	eb33d5eda9	Do not register Dropout(12) as training ONLY kernel. (#3859 ) * Do not register Dropout(12) as training ONLY kernel. * Move Dropout forward implementation in inference project. * fix inference build test failures. * remove fp16 test since its support is absent on CPU. * build break.	2020-05-09 21:38:17 -07:00
Vincent Wang	3c24841569	Fold Shape Node During Constant Folding (#3748 ) * Fold Shape node in constant folding. * bugfix * Fix test failure. * Bugfix for C++ frontend. * Bugfix for C++ frontend. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-05-09 20:15:03 +08:00
ashbhandare	424a00bf04	Fix enabling gradient as output for easy mode. (#3866 )	2020-05-07 15:07:14 -07:00
Wei-Sheng Chin	0aeb383273	Support Pipeline in Training Runner (#3770 )	2020-05-06 21:03:36 -07:00
Xueyun Zhu	0e59668c1b	add support for symbolic broadcast for Add/Sub/Mul (#3743 ) * add support for symbolic broadcast * fix comment * address feedback	2020-05-06 10:40:57 -07:00
Bowen Bao	f7ff5a7aa1	Fix state_dict and save_as_onnx for training (#3774 )	2020-05-05 11:47:46 -07:00
Changming Sun	bd78364411	Parallel all the activations ops (#3722 ) 1. Parallel all the activations ops. 2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms). 3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh. 4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty. 5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.	2020-05-05 01:18:17 -07:00
M. Zeeshan Siddiqui	a24c71af40	Update Dropout(12) forward kernel with training_mode input. (#3805 ) * Update Dropout(12) forward and backward kernel with training_mode input. * Revert deleted assert. * clean up. * PR feedback.	2020-05-04 20:05:42 -07:00
M. Zeeshan Siddiqui	6f95cdfa68	Use new cost based threadpool abstractions in CPU gradient operators. (#3807 ) * Use ThreadPool abstractions instead of OpenMP. * PR feedback.	2020-05-04 15:23:10 -07:00
Sherlock	2f8a2364c3	Fix loss function builder (#3801 )	2020-05-04 10:41:15 -07:00
M. Zeeshan Siddiqui	4f9f6aedea	CUDA/CPU test for NegativeLogLikelihoodLoss(12) function based loss operator. (#3793 )	2020-05-01 21:36:29 -07:00
Xueyun Zhu	e8e95110d3	add pipeline to distributed context config (#3789 ) * add pipeline to distributed context * white space	2020-05-01 13:49:51 -07:00
edgchen1	047975e404	Address flaky test ReduceApiTest.Sum. (#3716 ) Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.	2020-05-01 09:18:26 -07:00
pengwa	98b97be635	collect the last few iteration latency for throuput calculation (#3766 )	2020-05-01 13:24:17 +08:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
M. Zeeshan Siddiqui	b9a5ed1fe2	Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760 )	2020-04-30 02:48:21 -07:00
pengwa	0531acccc5	Refine GatherND CPU/CUDA Kernels & Add UTs (#3688 ) * Refactor GatherND CPU Kernel (Renaming & Simplify) * Add batch_dim=1 or 2, negative slices tests * Rename gather_nd_gard_impl.cu * Use dispatcher to refactor CUDA GatherND/GatherNDGrad * Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute * Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests	2020-04-30 10:17:54 +08:00
ashbhandare	58f53966d3	Add Distributed Checkpointing support (#3639 ) * Change naming of moments to Moment_x_<weight_name> * Add checkpointing code and zero checkpoint aggregation * Correct aggregation for LAMB, cleanup * Add simple checkpointing test * Add test for zero checkpoint aggregation * Fix tests * fix test * Review changes * Fix test after review comment fix * Fix API, test * Fix test after API change * Decouple save load from ORTTrainer * Add flag to not break checkpointing with ORTModel' Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-29 14:52:21 -07:00
suffiank	ea0e2d1dde	fix warning treated as error due to ignoring return status (#3739 ) Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-29 02:38:53 -07:00
Tixxx	0638565fe0	Fix evaluation issues (#3538 ) * allow switching between eval and training modes dynamically Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>	2020-04-28 21:03:37 -07:00

1 2 3

132 commits