Ensure that Loop operators run on CPU. Fix memcpy for Sequence Tensors, so that empty sequences (like when SequenceEmpty runs on DirectML) can be copied back to CPU.