onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-30 03:37:44 +00:00

Author	SHA1	Message	Date
liqunfu	0bff55512e	updated expected values for frontend test to pass frontend e2e pipeline. raise tolerance to reduce future risk of failure (#4497 ) * updated expected values for frontend test, raise tol	2020-07-13 19:25:54 -07:00
edgchen1	c71c49aaa0	Make TArray safer to use and update method name for consistency. (#4483 ) - make size_ and data_ data members private - rename GetCapacity() to Capacity() to be consistent (e.g., with Size()) - add static_assert for trivially copyable T because it is copied with memcpy	2020-07-13 09:59:56 -07:00
Vincent Wang	7fb194d03d	Update convergence baseline for ci_test. (#4465 ) Co-authored-by: Vincent Wang <weicwang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-07-09 15:29:36 +08:00
Tixxx	b156ae4448	Support training_mode flag in eval (#4324 ) * add training_mode feed for evaluation to support opset12	2020-07-08 10:38:54 -07:00
Hariharan Seshadri	6d6b6b54a5	Support binding a graph output to a specific device via the Python binding (#4439 )	2020-07-07 21:09:37 -07:00
Ashwini Khade	dd73e8c016	add function initialization back to graph resolve (#4434 )	2020-07-06 15:17:27 -07:00
liqunfu	0fdb1e9f60	Liqun/roberta (#4408 ) add GLUE Roberta example, fix unused initializer issue at backend. Bert GLUE expected out updated due to graph changes between June29 to July1st	2020-07-06 09:19:30 -07:00
pengwa	8bcdefc9c1	Optimize GatherND (#4097 ) * Optimize GatherND * Refine the code, Fix few comments	2020-07-03 19:42:32 +08:00
Weixing Zhang	bd11ab6816	Optimize LayernormGrad (#4156 ) * Draft for LayerNorm Optimization * Modify LayernormGrad kernel based on new backward graph. * keep two LayernormGrad implementations. One is implemented based on input X, mean. The other is based on output Y, scale, bias. The first one is enabled by default. The second one can be enabled by --use_invertible_layernorm_grad * expose use_invertible_layernorm_grad to frontend. * add fp16 tests. Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-07-02 22:09:30 -07:00
edgchen1	dba22b17b4	Update BiasGeluGradDxKernel and tests. (#4400 ) For BiasGeluGradDxKernel: - Implement optimization to first load from global memory into registers as suggested by Weixing. - Support larger bias sizes which were previously limited by the number of threads per block. - Address flaky unit test by increasing the error tolerance to the default value.	2020-07-02 18:55:44 -07:00
Vincent Wang	28e4c0edf5	Keep loss_scale and Whole Loss Subgraph in FP32 during Mixed Precision Training (#4268 ) * Keep loss subgraph as FP32 when mixed-p training. * Fix case where there is no white-list loss op. * Get nodes from loss_scale instead of whitelist. * rename const variables. Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-07-03 06:54:56 +08:00
Sherlock	2d54c89d77	Update filename and Cleanup unused cudnn kernels (#4387 ) * Update filename and Cleanup unused cudnn kernels * Cleanup unnecessary dependency	2020-07-01 17:19:49 -07:00
Bowen Bao	7ec9a73202	deprecate frontend layernorm postpass (#4372 )	2020-07-01 13:06:03 -07:00
liqunfu	5dcb9b4858	Liqun/backprop deterministic graph (#4315 ) make gradient graph deterministic add to session option use_deterministic_compute.	2020-07-01 12:39:10 -07:00
Sherlock	6365760906	BiasDropoutFusion (#4167 ) * Implement BiasDropout Fusion and Kernel Dropout kernel for residual input BiasDropout Fusion to take residual input Fix BiasDropout Kernel Optimize DropoutGrad with 4 elements per thread * Add graph transformer UT * MLTypeCallDispatcher for RatioData * Use MLTypeDispatcher for ratio tensor * Handle traing_mode input for BiasDropout fusion * Add test case for missing ratio input * Replace using FinalizeNodeFusion * Make BiasDropout kernel template-less * Make DropoutGrad template-less * Make Dropout and TrainableDropout template-less * Regenerate onnx file for UT * Minior fix on divmod in BiasDropoutKernel * Adjust pt frontend test due to dropout randomnesss * Make dropout kernel opeartion in fp32 Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-30 15:43:14 -07:00
Ashwini Khade	0404763f23	Update function body initialization for ONNX functions (#4332 ) * Update function body initialization * minor fix * changes per review comments * minor fix * format fix * add function initialization in mixed precision transformer * more updates * more fixes	2020-06-30 14:30:59 -07:00
ytaous	4380b8ba68	adjust bs size (#4375 ) Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-30 10:29:48 -07:00
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
liqunfu	c3c4ce5ceb	refactor prototypes into headers (#4337 ) * refactor prototypes into headers	2020-06-26 12:02:14 -07:00
edgchen1	0b450dcd9f	Enable BiasGelu fusion for training (#4146 ) Add gradient for BiasGelu and FastGelu with bias. Enable BiasGeluFusion and GeluApproximation transformers in TrainingSession.	2020-06-25 17:48:12 -07:00
edgchen1	a6d10376df	Fix build error when USE_NCCL is defined. (#4334 )	2020-06-24 23:32:31 -07:00
Tim Harris	a241eb0bbe	Renaming --partition_optimizer to --deepspeed_zero_stage (#4312 ) * Rename partition_optimizer -> deepspeed_zero * Use ZeROConfig in orttraining_pybind_state.cc * deepspeed_zero -> deepspeed_zero_stage for clarity * Expose as deepspeed_zero_stage in pybind	2020-06-24 22:05:03 +01:00
Tim Harris	5c6a27408a	Remove signed/unsigned compiler warnings, add additional pipeline test case (#4314 ) * Avoid signed/unsigned warning on loops * Report sizes when distributed world configuration is inconsistent * Add DistributedRunContextTest for pipeline stage configuration	2020-06-24 11:36:18 +01:00
Vincent Wang	f26c149d7d	Set NonZero Output Shape for Gradient Building. (#4246 ) * Set NonZero output shape for gradient building. * Resolve comments. Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-06-24 13:43:22 +08:00
Vincent Wang	3374733783	Refactor ReduceMean/Sum Gradient without Shape Dependency. (#4261 ) * ReduceMean/Sum gradient without shape dependency. * optimize expand and use it to replace add. * Adjust test. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-24 11:36:53 +08:00
Bowen Bao	15cb4b3023	Fix session load state & run extra_postpasses only once (#4255 ) * Fix session load state & run extra_postpasses only once * add testcase for onnx model as well	2020-06-23 11:45:26 -07:00
Vincent Wang	b41fcf1570	Bugfix for shape inference and GetShape. (#4243 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-17 15:11:02 +08:00
Wei-Sheng Chin	189fb60ef9	Fix a bug and add code to profile memory (#4241 ) * Fix a bug and add code to profile memory 1. Compile Send/Recv again (currently broken because of HOROVOD refactor). 2. Add code to print out initializer allocation size and activation memory size. * Address comments * Split memory counts per locations * Fix a metric	2020-06-16 10:17:27 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
ytaous	5d28efd434	opset12 code cleanup (#4242 ) * opset12 code cleanup * opset12 code cleanup Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 19:45:35 -07:00
ytaous	e0334f177c	Opset12 upgrade for existing models used by perf/e2e pipelines (#4238 ) * opset12 support * opset12 support * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 14:26:53 -07:00
Bowen Bao	b08771f00e	Add ONNX Training Post-Passes to Front-End - Cont (#4041 ) * Add ONNX postpasses * add flag + add bert test from onnx file * address PR comments * fix typo * fix rebase * address comments * Fix test failures * add new pass for expand for new pt version, add comments * fix rebase Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-15 10:33:26 -07:00
Weixing Zhang	b4b1c6440a	Enable ORT with CUDA 11 toolkit (#4168 ) * ORT on CUDA 11 1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0 * polish the code about MPI/NCCL in CMakeLists.txt and build.py * check CUDA version * ${MPI_INCLUDE_DIRS} should be PUBLIC * sm30, sm50 are deprecated in CUDA 11 Toolkit * update change based on code review feedback. * add sm_52 * improve MPI/NCCL build path Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-15 08:47:03 -07:00
Wei-Sheng Chin	ecc901717e	Use subset to release gradient tensors earlier (#4222 )	2020-06-14 22:52:54 -07:00
Wei-Sheng Chin	de9da123cf	Enable static memory planning for pipeline. (#4204 ) * Enable static memory planning for pipeline. 1. We fix a bug when resolving symbolic shape for scalars. 2. We pass the original inputs to all pipeline stages so that the symbolic shapes can be resolved. * Further Improvements 1. Address comments. 2. Further reduce activation size by ~50% when pipeline is on. This is done by removing all but one gradient tensor from the last RecordEvent in the backward pass. * Address a comment * Fix Windows build	2020-06-12 21:43:50 -07:00
Edward Chen	6b4f652017	Clean up status checks in gradient_graph_builder_test.cc.	2020-06-12 14:28:39 -07:00
Edward Chen	7096e6f5ef	Reduce severity of GraphAugmenter logging statement.	2020-06-12 14:28:39 -07:00
pengwa	e6ccb1ac28	GatherNDGrad for CPU (#4123 ) * GatherNDGrad on CPU * Remove __CUDA_ARCH__ check in .cc files	2020-06-12 02:43:49 +08:00
Xueyun Zhu	65a682354b	enable pipeline to run with mixed precision (#4113 ) * enable pipeline to run with mixed precision * address feedback * address feedback * test log * pipe infomation if test fails * ci failure	2020-06-10 22:16:24 -07:00
suffiank	7f5339505e	Discover trainable parameters using reverse DFS from loss node (#4116 ) Discover trainable parameters using reverse DFS from loss node, omitting recursion along untrainable inputs. Co-authored-by: suffian khan <sukha@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: suffian khan <sukha@microsoft.com>	2020-06-08 14:16:10 -07:00
Sergii Dymchenko	653417ae4b	Fix scaler->scalar typo. (#4142 )	2020-06-08 13:02:12 -07:00
Dmitri Smirnov	4e1dac67cd	Address memory leak and improve memory handling (#4124 ) Fix memory leak when a Python list passed as a feed. Create a custom allocator that can take ownership of python arrays that are created inside pybind. Allow direct memory use if continuous array is a copy because we now can take ownership of it by the allocator.	2020-06-08 09:29:46 -07:00
liqunfu	ffed43e9b8	handle loss and name marching wrappers (#4066 ) * handle loss and name marching wrappers Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-05 23:34:26 -07:00
Bowen Bao	1e5307d458	Bug fix for parameter names of models not using wrapper (#4061 ) * bug fix for models not using wrapper * add test case for no wrapper case * update test case to use internal learning rate * fix bug with frozen weight update	2020-06-05 12:03:38 -07:00
Thiago Crepaldi	81101c9efd	Fix DropoutGrad op (#4052 ) Dropout op was recently changed to accept a new input named 'training_mode', which is passed in to DropoutGrad automatically. This PR updates the DropoutGrad schema to accommodate the new input. Tests were also update to reflect the API change Co-authored-by: Thiago Crepaldi <thiag.crepaldi@microsoft.com>	2020-06-04 15:00:02 -07:00
liqunfu	905c535626	still need to make the test stable. Lower the acc number a bit to make the test pass for now (#4117 ) Co-authored-by: liqun fu <liqun@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-02 21:37:48 -07:00
ashbhandare	f18a99b245	Exclude non-trainable torch buffers from trainable weights (#4099 ) * Initial changes * Removed redundant fix * Revert unintended formatting change. * Add unit test	2020-06-02 14:05:44 -07:00
edgchen1	ba74914c5a	Remove evaluation output from training e2e test baseline data. (#4092 )	2020-06-01 15:06:21 -07:00
ytaous	72d508b7a0	New perf metric - e2e throughput (#4085 ) * new metric * on comments * tab to spaces Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-01 12:11:34 -07:00
Tixxx	6404aba5ae	Orttraining rc1 master merge (#4080 ) * fixed seg fault when using concrete shape disable gradient as output * fix evaluation hang issue for multiple gpu run * Remove dead code, ORTModel and improve docstrings (#3814) * Refine ORTTrainer docstring descriptions (#3907)	2020-05-29 12:28:12 -07:00

1 2 3 4

171 commits