onnxruntime/include/onnxruntime/core
Scott McKay 6e430c0526
A few performance improvements coming out of ssd_mobilenet and ssd_resnet34 analysis (#1578)
* A few performance improvements:
 - Make the iteration in NonZero more efficient by using a raw pointer and simplifying the increment logic
   - add another unit test to check the new logic works with 3 dimensional tensor
   - gains about 2% for ssd_mobilenet
 - Avoid floating point operations on each iteration on Concat
  - about 0.5% for ssd_mobilenet and ssd_resnet34
 - Put common case first in ExecutionFrame::AllocateAsPerAllocationPlan to avoid unnecessary call to IsSparseTensor
  - about 0.05% for ssd_mobilenet
 - Minor tweak to put some ctors in the TensorShape header so they can be inlined more easily
2019-08-08 07:20:00 +10:00
..
common Remove unneeded C APIs + some refactoring. (#1555) 2019-08-07 11:05:29 -07:00
framework A few performance improvements coming out of ssd_mobilenet and ssd_resnet34 analysis (#1578) 2019-08-08 07:20:00 +10:00
graph Don't create implicit input for outer scope value if there is a subgraph input with the same name. (#1186) 2019-08-02 07:23:41 +10:00
optimizer Add/correct missing SAL annotations + avoid using unsigned types (except where counts are involved). (#1451) 2019-07-22 23:25:53 -07:00
platform Enable use of session based threadpool. (#854) 2019-04-18 10:20:46 -07:00
providers Expose provider factory C API, especially for CUDA users (#1461) 2019-07-22 19:03:06 -07:00
session Remove unneeded C APIs + some refactoring. (#1555) 2019-08-07 11:05:29 -07:00