Improve code readability and performance. (#2257)
Remove one time checks from loops.
Move out GetType<>() calls from loop as they
go through local function statics.
Get rid of index calculations from input and output
so we can simlpy advance ptrs and potentially do better pre-fetch.
Improve code readability.
* Disable optimizers for operator unit tests as they're intended to test the operator directly rather than something that could have been modified by an optimizer.
Disable TensorRT for Scan9 unit tests that fails when optimizers are enabled. Bug 525222 tracks that.
* Disable TRT for the lenient shape inferencing test as it uses Unsqueeze and TRT doesn't cope with that op.
* added more input data types for pad
* replacing the comments
* replacing the comments
* added first set of tests
* added tests
* added more tests
* keep NGRAPH test
* avoid type cast
* avoid type conversion for value float to T
* fixed tabs
* Update tests exclusion list
* Nits
* comments fix
* Format files
* Nit updates
* rebased
Add the necessary checks.
Trim the output defs in case there are any optional outputs (already checked they don't exist so known to be unused) before finalizing fusion as we copy those to the Conv node to maintain the output names.
Add unit tests for both cases.
* Split graph_utils methods for finalization of fusion in order to support more than 2 nodes being fused into one.
Update GELU fusion to use graph_utils to set up the input/output edges for the fused node, and removing nodes that are being replaced.
Skip GemmNoTrans_f16 test for CUDA if the hardware does not support fp16
Motivation and Context
Unblock the multi_gpu build pipeline. The build agent uses Nvidia K80 GPU which doesn't have fp16 support.
* MaxUnpool should reconstitute what was pooled by MaxPool. The kernel_shape, pads and strides inputs are purely to infer an output shape, if output_shape is not explicitly provided.
The unpool should not be adding new padding, so output_shape is not about auto generating pad values and inserting pads.
The current ORT implementation misinterprets the usage of output_shape and inserts pads instead of just allocating an output of the specified shape, and directly unpooling into it.
Update to simply find the correct output shape to use, and simply unpool into that.
Update unit tests to reflect this.
* Exclude maxunpool_export_with_output_shape which has invalid data in the output.
* Fix test name in backend test series exclusion
* Add script to find calls to graph_utils::IsSupportedOptypeVersionAndDomain where the latest supported version is prior to the latest defined version.
* enable exclude outside for resize mode
* fix centos error
* updates per review + plus more data types for resize
* fix typo in error message
* reset wrong fix
* Upgrade onehot to OpSet 11
* Move Onehot test out of blacklist
* Add negative indices support besides negative axis.
* PR comments - 1
* PR comments-2