onnxruntime/onnxruntime/contrib_ops
Dmitri Smirnov 88c58c19d4
Improve code readability and performance. (#2257)
Improve code readability and performance. (#2257)  
  Remove one time checks from loops.
  Move out GetType<>() calls from loop as they
  go through local function statics.
  Get rid of index calculations from input and output
  so we can simlpy advance ptrs and potentially do better pre-fetch.
  Improve code readability.
2019-10-25 16:19:59 -07:00
..
cpu Improve code readability and performance. (#2257) 2019-10-25 16:19:59 -07:00
cuda use cublasHgemm for Volta GPU (#2074) 2019-10-14 17:29:13 -07:00
cpu_contrib_kernels.cc Fix kernel registry bug (#2137) 2019-10-17 23:10:54 -07:00
cpu_contrib_kernels.h Fix kernel registry bug (#2137) 2019-10-17 23:10:54 -07:00
cuda_contrib_kernels.cc Add EmbedLayerNormalization and SkipLayerNormalization ops for bert optimization (#2012) 2019-10-07 17:29:43 -07:00
cuda_contrib_kernels.h move all contrib ops to contrib ops namespace (#1190) 2019-06-24 10:19:01 -07:00