onnxruntime/onnxruntime/test
Adam Louly c55c6255e0
Eliminate safe nodes that are followed by a shape node. (#16065)
### Description
Eliminate Cast operator if Shape is the next one.

### Motivation and Context
#### Cast
When working with onnx opset 15 and above, the shape operator now
accepts all types of variables.
This change is documented in the [onnx
Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#Shape-15).

As a result, casting variables right before the shape operation becomes
unnecessary.
Removing these unnecessary casts will improve the graph and potentially
provide better performance gains.


## Results
On :
torchrun examples/onnxruntime/training/language-modeling/run_clm.py
--model_name_or_path gpt2 --do_train --overwrite_output_dir --output_dir
./outputs/ --seed 1337 --fp16 True --per_device_train_batch_size 4
--num_train_epochs 1 --dataset_name wikitext --dataset_config_name
wikitext-2-raw-v1 --learning_rate 2e-5 --report_to none --optim
adamw_ort_fused

without changes:
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     3.2981
  train_runtime            = 0:02:13.29
  train_samples            =       2318
  train_samples_per_second =      17.39
  train_steps_per_second   =      4.351

With my changes:
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     3.2981
  train_runtime            = 0:02:08.98
  train_samples            =       2318
  train_samples_per_second =     17.971
  train_steps_per_second   =      4.497

We see around 3% gain.

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-26 16:35:07 +08:00
..
api_tests_without_env Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
common Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161) 2023-06-20 14:54:55 -07:00
contrib_ops Use M_PI to replace 3.14 constants (#16421) 2023-06-20 15:09:10 -07:00
custom_op_registration Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
debug_node_inputs_outputs Separate out operator vs model testing. (#16228) 2023-06-17 12:58:57 +10:00
framework Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161) 2023-06-20 14:54:55 -07:00
fuzzing Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
global_thread_pools Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
ir Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
logging_apis Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
mlas NhwcFusedConv: Add before Activation (#15837) 2023-05-08 21:02:35 -07:00
onnx ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833) 2023-06-19 17:44:45 -07:00
opaque_api Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
optimizer Eliminate safe nodes that are followed by a shape node. (#16065) 2023-06-26 16:35:07 +08:00
perftest CUDA graph support for TRT EP (#16081) 2023-06-21 09:36:45 -07:00
platform Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
proto
providers Fix file list for test of build with IO debug (#16474) 2023-06-26 16:36:22 +10:00
python Allow saving of large models after optimization (github issue 12882) (#16440) 2023-06-21 22:46:26 -07:00
quantization ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833) 2023-06-19 17:44:45 -07:00
shared_lib CUDA graph support for TRT EP (#16081) 2023-06-21 09:36:45 -07:00
testdata Eliminate safe nodes that are followed by a shape node. (#16065) 2023-06-26 16:35:07 +08:00
unittest_main [TensorRT EP] avoid excessive library load/unload overhead when running unit tests. (#15639) 2023-04-24 14:43:13 -07:00
util ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833) 2023-06-19 17:44:45 -07:00
wasm Enable Web CI on Linux (#16419) 2023-06-22 15:42:58 +08:00
win_getopt Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
xctest Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00