* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor: get_dummy_inputs returns a data class.
* Support another two format of mask_index input: 2D attention mask, or 1D mask index with end and start positions.
* Update dynamic axes of gpt2 with past state
* Update script to fuse model with attention mask
* support position_ids input
* support fp16 conversion for gpt2 past state
* output results to csv file
* Remove the useless check that output of matmul is in cuda
* update benchmark_gpt2 to use past state only
* update dynamic axes of input/output tensors
* Remove --use_openmp option since it is default for onnxruntime 1.3 cpu.
* Use same option names as benchmark.py