pytorch/benchmarks/cpp
jjsjann123 0dc3f829d9 Nvfuser code bump 11 5 (#67943)
Summary:
nvfuser code update:
1. Tuning heuristics on schedulers for reduction/normalization kernels;
2. bfloat16 on IO tensor support;
3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last;
4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`.

Things that are reverted from our local branch:
1. changes on some entries in autodiff
2. aten::gelu with approximation
3. native_dropout(_backward)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943

Reviewed By: ngimel

Differential Revision: D32288709

Pulled By: dzhulgakov

fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1
2021-11-17 01:22:17 -08:00
..
nvfuser Nvfuser code bump 11 5 (#67943) 2021-11-17 01:22:17 -08:00
tensorexpr [PyTorch] Add int version of vectorized PrefixSum to Benchmark (#67865) 2021-11-04 14:00:19 -07:00
CMakeLists.txt CPU Convolution benchmark harness for some popular models (#56455) 2021-04-22 22:14:36 -07:00
convolution.cpp Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00