onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

History

edgchen1 6c7da5e9d3 Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 ) For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.	2020-07-10 10:20:23 -07:00
..
onnxruntime/core	Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 )	2020-07-10 10:20:23 -07:00

Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 )

For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.

2020-07-10 10:20:23 -07:00

onnxruntime/core

Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 )

2020-07-10 10:20:23 -07:00