onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
Wei-Sheng Chin	0aeb383273	Support Pipeline in Training Runner (#3770 )	2020-05-06 21:03:36 -07:00
Xueyun Zhu	0e59668c1b	add support for symbolic broadcast for Add/Sub/Mul (#3743 ) * add support for symbolic broadcast * fix comment * address feedback	2020-05-06 10:40:57 -07:00
Bowen Bao	f7ff5a7aa1	Fix state_dict and save_as_onnx for training (#3774 )	2020-05-05 11:47:46 -07:00
Changming Sun	bd78364411	Parallel all the activations ops (#3722 ) 1. Parallel all the activations ops. 2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms). 3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh. 4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty. 5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.	2020-05-05 01:18:17 -07:00
M. Zeeshan Siddiqui	a24c71af40	Update Dropout(12) forward kernel with training_mode input. (#3805 ) * Update Dropout(12) forward and backward kernel with training_mode input. * Revert deleted assert. * clean up. * PR feedback.	2020-05-04 20:05:42 -07:00
M. Zeeshan Siddiqui	6f95cdfa68	Use new cost based threadpool abstractions in CPU gradient operators. (#3807 ) * Use ThreadPool abstractions instead of OpenMP. * PR feedback.	2020-05-04 15:23:10 -07:00
Sherlock	2f8a2364c3	Fix loss function builder (#3801 )	2020-05-04 10:41:15 -07:00
M. Zeeshan Siddiqui	4f9f6aedea	CUDA/CPU test for NegativeLogLikelihoodLoss(12) function based loss operator. (#3793 )	2020-05-01 21:36:29 -07:00
Xueyun Zhu	e8e95110d3	add pipeline to distributed context config (#3789 ) * add pipeline to distributed context * white space	2020-05-01 13:49:51 -07:00
edgchen1	047975e404	Address flaky test ReduceApiTest.Sum. (#3716 ) Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.	2020-05-01 09:18:26 -07:00
pengwa	98b97be635	collect the last few iteration latency for throuput calculation (#3766 )	2020-05-01 13:24:17 +08:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
M. Zeeshan Siddiqui	b9a5ed1fe2	Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760 )	2020-04-30 02:48:21 -07:00
pengwa	0531acccc5	Refine GatherND CPU/CUDA Kernels & Add UTs (#3688 ) * Refactor GatherND CPU Kernel (Renaming & Simplify) * Add batch_dim=1 or 2, negative slices tests * Rename gather_nd_gard_impl.cu * Use dispatcher to refactor CUDA GatherND/GatherNDGrad * Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute * Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests	2020-04-30 10:17:54 +08:00
ashbhandare	58f53966d3	Add Distributed Checkpointing support (#3639 ) * Change naming of moments to Moment_x_<weight_name> * Add checkpointing code and zero checkpoint aggregation * Correct aggregation for LAMB, cleanup * Add simple checkpointing test * Add test for zero checkpoint aggregation * Fix tests * fix test * Review changes * Fix test after review comment fix * Fix API, test * Fix test after API change * Decouple save load from ORTTrainer * Add flag to not break checkpointing with ORTModel' Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-29 14:52:21 -07:00
suffiank	ea0e2d1dde	fix warning treated as error due to ignoring return status (#3739 ) Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-29 02:38:53 -07:00
Tixxx	0638565fe0	Fix evaluation issues (#3538 ) * allow switching between eval and training modes dynamically Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>	2020-04-28 21:03:37 -07:00
M. Zeeshan Siddiqui	939589c265	Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. (#3734 ) * Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. * fix gather test? * PR feedback.	2020-04-28 19:35:14 -07:00
edgchen1	1bcfd49918	Merge pull request #3731 from microsoft/ettao/ort-2-master Merge from ort_training to master	2020-04-28 07:56:05 -07:00
ytaous	75c24a5fac	Revert "Merge from ort_training to master (#3719 )" (#3726 ) This reverts commit `b990ba0059`.	2020-04-27 20:42:43 -07:00
ytaous	b990ba0059	Merge from ort_training to master (#3719 ) * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * Set gradient as output only for easy mode (#3694) * Support GPU Event Operators (#3653) * Add GPU event operators to support in-place updates in gradient accumulator and optimizer for modifying the tensors passing through those event operators. * Address comment and polish code * Merge shared code between CPU and GPU kernels * Move event test to a new file * Address comments * Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc * fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-04-27 16:45:21 -07:00
Sherlock	635bc9cd04	Fix graph transformers to support opset 12 ops (#3715 )	2020-04-27 11:53:45 -07:00
Ethan Tao	0516e7d22e	Merge branch 'ort_public_ort_training' into ettao/ort-2-master	2020-04-27 18:17:17 +00:00
Wei-Sheng Chin	72b38f0a8b	Support GPU Event Operators (#3653 ) * Add GPU event operators to support in-place updates in gradient accumulator and optimizer for modifying the tensors passing through those event operators. * Address comment and polish code * Merge shared code between CPU and GPU kernels * Move event test to a new file * Address comments * Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc	2020-04-24 17:43:04 -07:00
edgchen1	8b5d6fbaf5	Remove internal work item links. (#3698 )	2020-04-24 15:38:30 -07:00
ashbhandare	d06763ac1c	Set gradient as output only for easy mode (#3694 )	2020-04-24 15:28:28 -07:00
Sherlock	b4d4ea2e5f	GatherND-12 Implementation (#3645 ) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-24 20:55:30 +08:00
Weixing Zhang	2f8a17dcde	thrustallocator is not needed since cub is used directly for gather now. (#3683 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 01:51:54 -07:00
Weixing Zhang	c929963d74	type cast for ratio is not necessary for dropout (#3682 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 00:49:37 -07:00
Weixing Zhang	f4a04c04e1	move cpu/cuda related files to coresponding cpu/cuda folder (#3668 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 00:12:02 -07:00
Weixing Zhang	336624806e	Simplify and clean code (#3655 ) 1. It is not necessary to include cudnn_common.h for kernels which are not implemented with CUDNN. 2. Minor change in layer norm kernel to simplify the code and resolve building warning. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-23 10:12:55 -07:00
XiaocenDong	125f68f305	fixed mnist bug (#3569 ) * fixed mnist bug * fixed train_step param	2020-04-23 23:22:38 +08:00
Xueyun Zhu	f1ba9aaf34	Add pipeline transformer for wait/record node (#3513 ) * pipeline transformer * clean up * address feedback * add record/wait for first stage and updated split script * address feedback * make recv/send signal as initializer * merge * address feedback * unify input and initializer * address feedback and bug fix * minor fix * windows build * fix	2020-04-22 23:28:01 -07:00
pengwa	6136fd0789	GatherElementsGrad Kernels (#3627 ) * GatherElementsGrad cuda kernel & tests * Fix comments * Fix include path	2020-04-23 14:02:34 +08:00
Vincent Wang	ffe19ae49b	Expand elimination and Expand gradient. (#3610 ) * Expand elmination and Expand gradient. * Resolve comments. * Fix test break. * Check if graph can remove the node. * Resolve comment. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-04-23 13:17:15 +08:00
Tang, Cheng	37f4f74308	expose training session so the training app could register custom kernel and transformers (#3642 ) Co-authored-by: Cheng Tang <chenta@microsoft.com>	2020-04-22 21:35:41 -07:00
suffiank	0e12d05cd2	fixes for ort_trainer.py to resume from checkpoint (#3510 ) * fixes for ort_trainer.py to resume from checkpoint * define self.state_dict_ during init * add comment of explanation * add unit test for restore from checkpoint * fix file not found Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-22 16:33:58 -07:00
Weixing Zhang	e4fc83252d	Refactoring code related to WARP_SIZE. (#3623 ) 1. Centralize its definition in common.cuh. 2. Rename it to GPU_WARP_SIZE which can be extended to AMD GPU later. 3. Centralize warp shuffle functions. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-22 15:19:06 -07:00
edgchen1	bb9b0ba5b3	Merge pull request #3607 from microsoft/edgchen1/merge_from_master Merge from master to ort_training	2020-04-22 13:22:32 -07:00
Wei-Sheng Chin	ab70625b29	Add Lamb shape inference (#3634 )	2020-04-22 11:32:28 -07:00
Edward Chen	8d09cefafc	Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master	2020-04-22 16:56:15 +00:00
edgchen1	b518cb2a7a	Clean up OPTIONAL name conflict workarounds in ort_training. (#3622 ) * Clean up OPTIONAL name conflict workarounds. * Cleanup unnecessory header files onnx_protobuf.h Co-authored-by: Sherlock Huang	2020-04-22 09:07:55 -07:00
Vincent Wang	d3a2ac5c5c	Eliminate Useless Cast during Transformer. (#3606 ) * Remove Useless Cast during Transformer. * Resolve comments. * Check if graph can remove the node. Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-22 16:36:46 +08:00
Sherlock	d66d5bb86a	Update Optimizer Domain and Opset (#3602 ) * Update Domain and Opset for SGD * Update Adam Domain and Opset * Update Lamb Domain and Opset	2020-04-21 15:06:02 -07:00
Edward Chen	2e4b9b1d0e	Disable CudaKernelTest.SoftmaxCrossEntropyLoss_LargeSizeTensor because it's flaky.	2020-04-21 20:30:45 +00:00
Edward Chen	d50c3e7a71	Fix GraphTransformationTests tests.	2020-04-21 18:43:49 +00:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00
liqunfu	781e1c36be	Add front-end MNIST test (#3231 ) * add frontend minst test * to use torch nightly with torchvision * remove incorrect comment per reviewer's comment * experiment torchvision import failure * experiment install_deps.sh * more experiment install_deps.sh * experiment install_deps.sh with --upgrade * Experiment with install_deps.sh. * Experiment with install_ubuntu.sh. * Use Ubuntu 18.04 and Python 3.6 for CI. * Update cmake version for CI. * Install MPI on Ubuntu 18.04 for CI. * Increase tolerance for MNIST test. * Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa. * Clean-up. * Update ort_trainer.py from ort_training. * Get default Ubuntu Python ver back to 3.5. * Add underscore to opset_version parameter name in ORTTrainer constructor. * Move loss/model wrap before the call for sample output. * Update expected values for MNIST test. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>	2020-04-20 11:19:31 -07:00
edgchen1	811bd67872	Clean up docs. (#3579 ) * Fix orttraining/README.md formatting. * Delete ORT_TRAINING_BUILDS.md. * Fix typo.	2020-04-17 22:13:11 -07:00
edgchen1	2cb8cb816f	Disable or update flaky tests, improve test random seed accessibility. (#3495 ) - Add output of test random seed - Allow setting of test random seed with environment variable - Disable / relax tolerance for flaky tests	2020-04-17 15:57:32 -07:00

1 2

99 commits