pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

jjsjann123 0dc3f829d9 Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
..
cuda	Nvfuser code bump 11 5 (#67943 )	2021-11-17 01:22:17 -08:00
fuser	Deduplicate codegenOutputQuery to query maximum CUDA compute capabilities (#55901 )	2021-10-18 07:42:15 -07:00

jjsjann123 0dc3f829d9 Nvfuser code bump 11 5 (#67943 )

Summary:
nvfuser code update:
1. Tuning heuristics on schedulers for reduction/normalization kernels;
2. bfloat16 on IO tensor support;
3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last;
4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`.

Things that are reverted from our local branch:
1. changes on some entries in autodiff
2. aten::gelu with approximation
3. native_dropout(_backward)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943

Reviewed By: ngimel

Differential Revision: D32288709

Pulled By: dzhulgakov

fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1

2021-11-17 01:22:17 -08:00

cuda

Nvfuser code bump 11 5 (#67943 )

2021-11-17 01:22:17 -08:00

fuser

Deduplicate codegenOutputQuery to query maximum CUDA compute capabilities (#55901 )

2021-10-18 07:42:15 -07:00