onnxruntime/orttraining
pengwa 003c7d3e4d
Add CPU allocation test for multiple GPU distributed run (#15829)
### Add CPU allocation test for non-CPU devices distributed run

When CUDA EP is enabled in distributed training, CPU memory is still
used for some node output. Early we have distributed run test coverage,
but don't cover the case when some of the node are using CPU devices for
storing tensor output. As a result, I recalled we hit regression twice
in the passing months:
- https://github.com/microsoft/onnxruntime/pull/14050
- https://github.com/microsoft/onnxruntime/pull/15823

So adding this test to avoid future regressions. 

The test graph looks like this:


![image](https://user-images.githubusercontent.com/10530022/236594940-70c68a55-18bf-4e09-bbf5-8a64895d3045.png)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-09 10:27:19 +08:00
..
orttraining Add CPU allocation test for multiple GPU distributed run (#15829) 2023-05-09 10:27:19 +08:00
pytorch_frontend_examples Enable pylint and numpy rules (#15218) 2023-03-27 20:37:53 -07:00
tools Bump ruff in CI (#15533) 2023-04-17 10:11:44 -07:00