onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-09 00:30:53 +00:00

Author	SHA1	Message	Date
Xueyun Zhu	e8e95110d3	add pipeline to distributed context config (#3789 ) * add pipeline to distributed context * white space	2020-05-01 13:49:51 -07:00
edgchen1	047975e404	Address flaky test ReduceApiTest.Sum. (#3716 ) Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.	2020-05-01 09:18:26 -07:00
pengwa	98b97be635	collect the last few iteration latency for throuput calculation (#3766 )	2020-05-01 13:24:17 +08:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
M. Zeeshan Siddiqui	b9a5ed1fe2	Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760 )	2020-04-30 02:48:21 -07:00
pengwa	0531acccc5	Refine GatherND CPU/CUDA Kernels & Add UTs (#3688 ) * Refactor GatherND CPU Kernel (Renaming & Simplify) * Add batch_dim=1 or 2, negative slices tests * Rename gather_nd_gard_impl.cu * Use dispatcher to refactor CUDA GatherND/GatherNDGrad * Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute * Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests	2020-04-30 10:17:54 +08:00
ashbhandare	58f53966d3	Add Distributed Checkpointing support (#3639 ) * Change naming of moments to Moment_x_<weight_name> * Add checkpointing code and zero checkpoint aggregation * Correct aggregation for LAMB, cleanup * Add simple checkpointing test * Add test for zero checkpoint aggregation * Fix tests * fix test * Review changes * Fix test after review comment fix * Fix API, test * Fix test after API change * Decouple save load from ORTTrainer * Add flag to not break checkpointing with ORTModel' Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-29 14:52:21 -07:00
suffiank	ea0e2d1dde	fix warning treated as error due to ignoring return status (#3739 ) Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-29 02:38:53 -07:00
Tixxx	0638565fe0	Fix evaluation issues (#3538 ) * allow switching between eval and training modes dynamically Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>	2020-04-28 21:03:37 -07:00
M. Zeeshan Siddiqui	939589c265	Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. (#3734 ) * Fix flaky test and avoid divide by zero in SoftmaxCrossEntropyLoss-CPU. * fix gather test? * PR feedback.	2020-04-28 19:35:14 -07:00
edgchen1	1bcfd49918	Merge pull request #3731 from microsoft/ettao/ort-2-master Merge from ort_training to master	2020-04-28 07:56:05 -07:00
ytaous	75c24a5fac	Revert "Merge from ort_training to master (#3719 )" (#3726 ) This reverts commit `b990ba0059`.	2020-04-27 20:42:43 -07:00
ytaous	b990ba0059	Merge from ort_training to master (#3719 ) * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * Set gradient as output only for easy mode (#3694) * Support GPU Event Operators (#3653) * Add GPU event operators to support in-place updates in gradient accumulator and optimizer for modifying the tensors passing through those event operators. * Address comment and polish code * Merge shared code between CPU and GPU kernels * Move event test to a new file * Address comments * Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc * fix path of cpu_featurizers_kernels.cc and cpu_featurizers_kernels.h Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-04-27 16:45:21 -07:00
Sherlock	635bc9cd04	Fix graph transformers to support opset 12 ops (#3715 )	2020-04-27 11:53:45 -07:00
Ethan Tao	0516e7d22e	Merge branch 'ort_public_ort_training' into ettao/ort-2-master	2020-04-27 18:17:17 +00:00
Wei-Sheng Chin	72b38f0a8b	Support GPU Event Operators (#3653 ) * Add GPU event operators to support in-place updates in gradient accumulator and optimizer for modifying the tensors passing through those event operators. * Address comment and polish code * Merge shared code between CPU and GPU kernels * Move event test to a new file * Address comments * Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc	2020-04-24 17:43:04 -07:00
edgchen1	8b5d6fbaf5	Remove internal work item links. (#3698 )	2020-04-24 15:38:30 -07:00
ashbhandare	d06763ac1c	Set gradient as output only for easy mode (#3694 )	2020-04-24 15:28:28 -07:00
Sherlock	b4d4ea2e5f	GatherND-12 Implementation (#3645 ) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-24 20:55:30 +08:00
Weixing Zhang	2f8a17dcde	thrustallocator is not needed since cub is used directly for gather now. (#3683 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 01:51:54 -07:00
Weixing Zhang	c929963d74	type cast for ratio is not necessary for dropout (#3682 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 00:49:37 -07:00
Weixing Zhang	f4a04c04e1	move cpu/cuda related files to coresponding cpu/cuda folder (#3668 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-24 00:12:02 -07:00
Weixing Zhang	336624806e	Simplify and clean code (#3655 ) 1. It is not necessary to include cudnn_common.h for kernels which are not implemented with CUDNN. 2. Minor change in layer norm kernel to simplify the code and resolve building warning. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-23 10:12:55 -07:00
XiaocenDong	125f68f305	fixed mnist bug (#3569 ) * fixed mnist bug * fixed train_step param	2020-04-23 23:22:38 +08:00
Xueyun Zhu	f1ba9aaf34	Add pipeline transformer for wait/record node (#3513 ) * pipeline transformer * clean up * address feedback * add record/wait for first stage and updated split script * address feedback * make recv/send signal as initializer * merge * address feedback * unify input and initializer * address feedback and bug fix * minor fix * windows build * fix	2020-04-22 23:28:01 -07:00
pengwa	6136fd0789	GatherElementsGrad Kernels (#3627 ) * GatherElementsGrad cuda kernel & tests * Fix comments * Fix include path	2020-04-23 14:02:34 +08:00
Vincent Wang	ffe19ae49b	Expand elimination and Expand gradient. (#3610 ) * Expand elmination and Expand gradient. * Resolve comments. * Fix test break. * Check if graph can remove the node. * Resolve comment. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-04-23 13:17:15 +08:00
Tang, Cheng	37f4f74308	expose training session so the training app could register custom kernel and transformers (#3642 ) Co-authored-by: Cheng Tang <chenta@microsoft.com>	2020-04-22 21:35:41 -07:00
suffiank	0e12d05cd2	fixes for ort_trainer.py to resume from checkpoint (#3510 ) * fixes for ort_trainer.py to resume from checkpoint * define self.state_dict_ during init * add comment of explanation * add unit test for restore from checkpoint * fix file not found Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-22 16:33:58 -07:00
Weixing Zhang	e4fc83252d	Refactoring code related to WARP_SIZE. (#3623 ) 1. Centralize its definition in common.cuh. 2. Rename it to GPU_WARP_SIZE which can be extended to AMD GPU later. 3. Centralize warp shuffle functions. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-22 15:19:06 -07:00
edgchen1	bb9b0ba5b3	Merge pull request #3607 from microsoft/edgchen1/merge_from_master Merge from master to ort_training	2020-04-22 13:22:32 -07:00
Wei-Sheng Chin	ab70625b29	Add Lamb shape inference (#3634 )	2020-04-22 11:32:28 -07:00
Edward Chen	8d09cefafc	Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master	2020-04-22 16:56:15 +00:00
edgchen1	b518cb2a7a	Clean up OPTIONAL name conflict workarounds in ort_training. (#3622 ) * Clean up OPTIONAL name conflict workarounds. * Cleanup unnecessory header files onnx_protobuf.h Co-authored-by: Sherlock Huang	2020-04-22 09:07:55 -07:00
Vincent Wang	d3a2ac5c5c	Eliminate Useless Cast during Transformer. (#3606 ) * Remove Useless Cast during Transformer. * Resolve comments. * Check if graph can remove the node. Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-22 16:36:46 +08:00
Sherlock	d66d5bb86a	Update Optimizer Domain and Opset (#3602 ) * Update Domain and Opset for SGD * Update Adam Domain and Opset * Update Lamb Domain and Opset	2020-04-21 15:06:02 -07:00
Edward Chen	2e4b9b1d0e	Disable CudaKernelTest.SoftmaxCrossEntropyLoss_LargeSizeTensor because it's flaky.	2020-04-21 20:30:45 +00:00
Edward Chen	d50c3e7a71	Fix GraphTransformationTests tests.	2020-04-21 18:43:49 +00:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00
liqunfu	781e1c36be	Add front-end MNIST test (#3231 ) * add frontend minst test * to use torch nightly with torchvision * remove incorrect comment per reviewer's comment * experiment torchvision import failure * experiment install_deps.sh * more experiment install_deps.sh * experiment install_deps.sh with --upgrade * Experiment with install_deps.sh. * Experiment with install_ubuntu.sh. * Use Ubuntu 18.04 and Python 3.6 for CI. * Update cmake version for CI. * Install MPI on Ubuntu 18.04 for CI. * Increase tolerance for MNIST test. * Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa. * Clean-up. * Update ort_trainer.py from ort_training. * Get default Ubuntu Python ver back to 3.5. * Add underscore to opset_version parameter name in ORTTrainer constructor. * Move loss/model wrap before the call for sample output. * Update expected values for MNIST test. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>	2020-04-20 11:19:31 -07:00
edgchen1	811bd67872	Clean up docs. (#3579 ) * Fix orttraining/README.md formatting. * Delete ORT_TRAINING_BUILDS.md. * Fix typo.	2020-04-17 22:13:11 -07:00
edgchen1	2cb8cb816f	Disable or update flaky tests, improve test random seed accessibility. (#3495 ) - Add output of test random seed - Allow setting of test random seed with environment variable - Disable / relax tolerance for flaky tests	2020-04-17 15:57:32 -07:00
manashgoswami	9fc2b6482b	Ort training README (#3404 ) Added README for ORT Training	2020-04-16 14:51:33 -07:00
M. Zeeshan Siddiqui	6c1ccb659f	SoftmaxCrossEntropyLoss-12 forward and backward kernel implementation. (#3465 ) * Update ONNX submodule commit to the latest. * build break. * SoftmaxCrossEntropyLoss: Forward and backward kernel implementation. * Revert "build break." This reverts commit 847cb50d294efbe6c09fa760e7cacf25bfb6146d. * Add more tests and misc clean up. * revert unintended changes. * PR feedback. * cleanup. * PR feedback.	2020-04-16 12:27:07 -07:00
Jesse Benson	644bc05830	Add Python API to set random seed: onnxruntime.seed(<seed>)	2020-04-15 09:44:48 -07:00
pengwa	2c7c45076b	MaxBatchSize E2E Test (#3454 ) * max batch size e2e test *update test data snapshot	2020-04-15 09:50:44 +08:00
edgchen1	4fa88a0a23	Remove cast to OpKernelContextInternal to get threadpool and directly use OpKernelContext. (#3523 )	2020-04-14 14:30:26 -07:00
Tixxx	06b63975c0	Fix fp16 type mismatch when graph output is an fp32-only node (#3411 ) * verify output node before changing its type in mixed precision mode	2020-04-14 09:35:19 -07:00
edgchen1	ba7225f986	Update Graph SetInputs and SetOutputs for training (#3446 ) Fix training modification of Graph SetInputs() and SetOutputs(). Originally there were distinct code paths in Graph based on whether the graph was loaded from a GraphProto or created from scratch. The training modifications made that distinction a bit ambiguous - i.e., even though the Graph is loaded from a GraphProto for training, sometimes we rely on the other code path, e.g., to deduce the graph inputs after modifying it. Consequently, there was some odd behavior when using SetInputs(). For correctness, this change separates the cases where the graph is loaded from a GraphProto and where it is created from scratch.	2020-04-13 19:10:44 -07:00
M. Zeeshan Siddiqui	5d99f179b9	Merge pull request #3486 from microsoft/sedymche/merge_master_ort_training Merge from master into ort_training	2020-04-13 10:55:36 -07:00

1 2

91 commits