onnxruntime/orttraining
Suffian Khan 6cb5d3ac09
Fix multi-tensor LAMB reduction to be deterministic (#6028)
* define ordering of reduction across blocks

* save state

* remove debug code

* remove debug code

* review comments

* significant correction for reduction only over blocks on same tensor

* addressing ocmments

* update rocm/lamb.cc to build as well

* remove times 2048*size in multitensor test until threshold error in rocm resolved

* convert tuple => struct as per recomendation

* update comment

* apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer

* remove excess template arguments from rocm lamb.cc launch_multitensor as well

* fixes for AMD build

* pr comments

* run formatter from vscode

* formatter on cuda files
2020-12-11 13:13:05 -08:00
..
orttraining Fix multi-tensor LAMB reduction to be deterministic (#6028) 2020-12-11 13:13:05 -08:00
pytorch_frontend_examples Fix mnist example (#4926) 2020-08-26 15:28:39 -07:00
tools add dockerfile for ROCm3.10 and update BUILD.md for ROCm EP (#5821) 2020-12-08 23:14:56 -08:00