onnxruntime/tools
mindest a71dab691d
Implement BatchNormInternal for cuda (#8172)
* correct batchnorm replacement output order;

remove bn replacement in grad graph builder

* update op defs and kernel class

* implement batch norm internal and grad.

* change saved_var into saved_inv_std

* cuda test case: bn internal

* remove redundant include

* fix comment; add support and UT for 1d input.

* exclude batch_norm_internal in amd_hipify

* run BNInternal UT for CUDA only

* fix CI error

* fix comment errors

* fix error

* add comment for inconsistency with cudnnBN doc

* additional comments for cudnnBN inconsistency
2021-07-28 16:04:49 +08:00
..
ci_build Implement BatchNormInternal for cuda (#8172) 2021-07-28 16:04:49 +08:00
doc Add graphviz into Dockerfile images for Python API documentation (#7819) 2021-06-02 16:12:54 -07:00
nuget Move one function from cuda_provider_factory.h (#8407) 2021-07-19 17:55:59 -07:00
perf_util Update mysql-connector-java (#5802) 2020-11-16 14:09:14 -08:00
python Decouple Forward and Backward of ATenOp (#8301) 2021-07-23 16:53:26 +08:00
test