* checkin * add 4dmask support in attention cuda op * trim * add comments * fix build/test error * review comments and add tests * sync doc * review comments * minor change