onnxruntime/onnxruntime/core
Dmitri Smirnov e1901a7e10
Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less (#4989)
Make GatherElements kernel process 16 items each.
  unroll the constant loop. Quit loops early for zero dividend.
  Optimize Binary CompareFunction and remove Impl_Cast invocation.
2020-09-02 10:17:39 -07:00
..
codegen Convert TensorRT provider into a shared library (#4721) 2020-08-10 21:17:16 -07:00
common Add Cmake config for onnxruntime_NO_EXCEPTIONS (#4975) 2020-09-01 10:17:50 -07:00
dll
flatbuffers [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
framework Rename DeviceAllocatorRegistrationInfo to a more generic name; Use OrtArenaCfg for arena members; Remove unused OrtMemType; Simplify CreateAllocator interface. (#4970) 2020-09-01 09:25:32 -07:00
graph [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
language_interop_ops FixPyOpSegFault&MakeItStaticLib (#4600) 2020-07-28 11:45:25 -07:00
mlas Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894) 2020-08-28 23:03:51 -07:00
optimizer Pass Model Path to TensorProtoToMLValue from Constant Folding for External Inputs (#5000) 2020-09-02 21:54:40 +08:00
platform Remove evaluate telemetry due to redundancy (#4996) 2020-09-01 17:02:00 -07:00
profile
protobuf implement per-channel for quantizelinear and dequantizelinear (#4759) 2020-08-21 12:08:50 -07:00
providers Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less (#4989) 2020-09-02 10:17:39 -07:00
session Remove evaluate telemetry due to redundancy (#4996) 2020-09-01 17:02:00 -07:00
util