pytorch/caffe2/python
Alexander Sidorov b905166362 RNN: fix bug for parameter gradient in a case when SumOp is
Summary:
Issue is that AliasOp doesn't work well with swaps that we do for
param.grad and param.accGrad. Tensors become the same if there is no
reallocation of the gradient tensor inside the backward cell net's
local workspace.

bug explanation from  akyrola:

```
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A

on each timestap back to 0, we Alias
gpu_0/decoder/weighted_encoder_outputs_grad,
so then also

gpu_0/decoder/weighted_encoder_outputs_grad: tensor A

It's acc is:
gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B

Now after timesteps, we swap (line 626) with _acc to get

gpu_0/decoder/weighted_encoder_outputs_grad: tensor B

gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A

OPTION A -- batch size is same as before or smaller:
Then on next iteration, we do again the Alias to
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now

gpu_0/decoder/weighted_encoder_outputs_grad: tensor A

and also

gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A

swapping them does nothing and they are the same

OPTION B -- batch size increases
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated,
becomes tensor C

gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with
Alias

gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A

```

Reviewed By: urikz

Differential Revision:
D4946730

Tags: rnn, caffe2

fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47
2017-04-25 20:46:59 -07:00
..
docs doxygen python block added 2017-03-29 06:46:16 -07:00
examples resnet train print loss and accuracy 2017-04-25 16:03:58 -07:00
helpers Adding add_weight_decay and image_input to brew module 2017-04-25 16:03:58 -07:00
layers MapToRange layer 2017-04-25 16:03:58 -07:00
mint doxygen python block added 2017-03-29 06:46:16 -07:00
mkl MKL related files with review comments incorporated 2017-04-25 00:31:29 -07:00
models fix download progress bar's percentage exceed 100% 2017-04-20 10:41:06 -07:00
operator_test add gradient for LengthsTileOp 2017-04-25 14:31:15 -07:00
predictor temporarily fix sync script bugs changes by reverting partially https://github.com/caffe2/caffe2/pull/266/files 2017-04-24 15:49:22 -07:00
_import_c_extension.py doxygen python block added 2017-03-29 06:46:16 -07:00
attention.py unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder 2017-04-20 10:06:25 -07:00
brew.py Adding add_weight_decay and image_input to brew module 2017-04-25 16:03:58 -07:00
brew_test.py rename model_helpers to brew and lowercase all helper functions 2017-04-24 15:52:26 -07:00
caffe_translator.py Add Reduction layer in caffe_translator 2017-04-07 16:17:07 -07:00
caffe_translator_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
checkpoint.py Adds interfaces to check the existence of a DB 2017-04-11 14:07:49 -07:00
checkpoint_test.py Adds interfaces to check the existence of a DB 2017-04-11 14:07:49 -07:00
CMakeLists.txt CMake completions work 2017-01-11 16:59:22 -08:00
cnn.py Adding add_weight_decay and image_input to brew module 2017-04-25 16:03:58 -07:00
context.py doxygen python block added 2017-03-29 06:46:16 -07:00
context_test.py Make ContextManager thread-safe 2017-02-13 19:45:35 -08:00
control.py doxygen python block added 2017-03-29 06:46:16 -07:00
control_test.py fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
convnet_benchmarks.py doxygen python block added 2017-03-29 06:46:16 -07:00
convnet_benchmarks_test.py
core.py MKL related files with review comments incorporated 2017-04-25 00:31:29 -07:00
core_gradients_test.py Fix backward pass computation when an input is used in a Fill-op input for shape 2017-04-11 19:32:22 -07:00
core_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
crf.py cuDNN version of TransposeOp 2017-04-03 13:33:10 -07:00
cudnn_recurrent_test.py RNNCell, LSTMCell, LSTMWithAttentionCell 2017-04-18 00:47:20 -07:00
data_parallel_model.py share forward activation blobs + pass unused free blobs down all branches + use shape infernece 2017-04-25 14:23:25 -07:00
data_parallel_model_test.py RNNCell, LSTMCell, LSTMWithAttentionCell 2017-04-18 00:47:20 -07:00
data_workers.py doxygen python block added 2017-03-29 06:46:16 -07:00
data_workers_test.py Fix a data_workers test 2017-04-20 11:38:11 -07:00
dataio.py doxygen python block added 2017-03-29 06:46:16 -07:00
dataio_test.py Stop multi_reader if we run out of data before max_examples 2017-03-10 18:03:57 -08:00
dataset.py doxygen python block added 2017-03-29 06:46:16 -07:00
db_test.py Fix db_test under tsan 2016-11-29 15:18:37 -08:00
device_checker.py doxygen python block added 2017-03-29 06:46:16 -07:00
dyndep.py doxygen python block added 2017-03-29 06:46:16 -07:00
experiment_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
extension_loader.py Make extension loader properly handle visibility. 2017-03-30 14:38:38 -07:00
gradient_check_test.py gradient checker for nets 2017-03-28 13:03:14 -07:00
gradient_checker.py add net gradient check 2017-04-19 15:19:55 -07:00
hsm_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
hypothesis_test.py Fix tests for ops without a CUDA backend 2017-04-24 15:52:25 -07:00
hypothesis_test_util.py Add option to control the size of lengths tensor 2017-04-20 09:53:22 -07:00
layer_model_helper.py rename ModelHelperBase 2017-04-24 15:52:26 -07:00
layer_model_instantiator.py layer_model_instantiator: filter layers by tags 2017-04-17 14:18:27 -07:00
layer_test_util.py MapToRange layer 2017-04-25 16:03:58 -07:00
layers_test.py MapToRange layer 2017-04-25 16:03:58 -07:00
load_save_test.py Improve error message from LogFileDB on missing file 2017-03-10 23:31:28 -08:00
lstm_benchmark.py Forward-only rnns 2017-04-24 15:52:27 -07:00
memonger.py share forward activation blobs + pass unused free blobs down all branches + use shape infernece 2017-04-25 14:23:25 -07:00
memonger_test.py share forward activation blobs + pass unused free blobs down all branches + use shape infernece 2017-04-25 14:23:25 -07:00
mkl_test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
model_device_test.py Comment out NHWC Alexnet test for now 2017-01-23 13:59:29 -08:00
model_helper.py rename model_helpers to brew and lowercase all helper functions 2017-04-24 15:52:26 -07:00
mpi_python.cc Move mpi_python.cc to the python folder to be more consistent about source file locations. 2017-01-09 10:59:39 -08:00
muji.py doxygen python block added 2017-03-29 06:46:16 -07:00
muji_test.py
net_builder.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_builder_test.py Allow test discovery in caffe2/python/ 2017-03-14 18:16:41 -07:00
net_drawer.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_printer.py doxygen python block added 2017-03-29 06:46:16 -07:00
net_printer_test.py Debug/Analysis tools for Jobs/ExecutionSteps 2017-02-06 17:31:20 -08:00
optimizer.py Returns auxiliary parameters in the optimizers. 2017-04-17 10:16:32 -07:00
optimizer_test.py Returns auxiliary parameters in the optimizers. 2017-04-17 10:16:32 -07:00
optimizer_test_util.py create_net: explicitly specify if one wants to overwrite the network. 2017-04-17 21:46:53 -07:00
pipeline.py doxygen python block added 2017-03-29 06:46:16 -07:00
predictor_constants.py temporarily fix sync script bugs changes by reverting partially https://github.com/caffe2/caffe2/pull/266/files 2017-04-24 15:49:22 -07:00
pybind_state.cc create_net: explicitly specify if one wants to overwrite the network. 2017-04-17 21:46:53 -07:00
pybind_state.h bugfix for Windows, esp. VS 2017 2017-03-21 05:17:59 -07:00
pybind_state_gpu.cc Cudnn v6 2017-02-28 17:46:33 -08:00
pybind_state_mkl.cc Expose MKLMemory to the Python Feed and Fetch interface, and misc changes 2016-11-29 15:18:36 -08:00
python_op_test.py Allow PythonOp to access the workspace 2016-12-05 11:53:26 -08:00
queue_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
record_queue.py doxygen python block added 2017-03-29 06:46:16 -07:00
recurrent.py RNN: fix bug for parameter gradient in a case when SumOp is 2017-04-25 20:46:59 -07:00
rnn_cell.py Forward-only rnns 2017-04-24 15:52:27 -07:00
schema.py Improving usability of schema 2017-04-25 10:32:08 -07:00
schema_test.py fix getting empty struct 2017-04-19 22:36:05 -07:00
scope.py Fix corruption of NameScope when exception is thrown 2017-04-24 22:46:27 -07:00
scope_test.py Fix corruption of NameScope when exception is thrown 2017-04-24 22:46:27 -07:00
session.py doxygen python block added 2017-03-29 06:46:16 -07:00
session_test.py NextScopedBlob with well-defined behavior and respect namescope 2017-02-16 17:16:36 -08:00
sparse_to_dense_mask_test.py Fix few more operators to handle empty batches correctly. 2016-11-29 15:18:37 -08:00
task.py doxygen python block added 2017-03-29 06:46:16 -07:00
test_util.py doxygen python block added 2017-03-29 06:46:16 -07:00
text_file_reader.py doxygen python block added 2017-03-29 06:46:16 -07:00
timeout_guard.py doxygen python block added 2017-03-29 06:46:16 -07:00
toy_regression_test.py
tt_core.py doxygen python block added 2017-03-29 06:46:16 -07:00
tt_core_test.py
utils.py Flag to report total memory in GPUs + op and python func to retrieve 2017-04-19 10:49:11 -07:00
visualize.py doxygen python block added 2017-03-29 06:46:16 -07:00
workspace.py create_net: explicitly specify if one wants to overwrite the network. 2017-04-17 21:46:53 -07:00
workspace_test.py create_net: explicitly specify if one wants to overwrite the network. 2017-04-17 21:46:53 -07:00