in gpt_j_residual(https://arxiv.org/pdf/2204.06745.pdf), there are 2 LN nodes will share one same input, and ORT does CSE graph optimization before LN fusion, which will modify the LN graph pattern and thus make LN fusion failure. 