onnxruntime/orttraining
Suffian Khan 225439193e
Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833)
* special case concat and split when sizes are equal

* add tests for 16 and 32 inputs with same dim

* add tests for 16/64 inputs on concat or 16/64 outputs on split

* try eliminate windows warning

* outter => outer
2021-09-01 15:25:45 -07:00
..
orttraining Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833) 2021-09-01 15:25:45 -07:00
pytorch_frontend_examples Sync ORTModule branch with master and fix tests (#6526) 2021-02-02 08:59:56 -08:00
tools Add hugging-face models loss curve and performance guards to ROCm CI pipeline. (#8915) 2021-09-01 09:03:10 -07:00