pytorch/caffe2/python
Eileen Pan 4102fbdf08 [1/n] Allow dense NaN value in dper raw input processor output
Summary:
## TLDR
Support using NaN default value for missing dense features in RawInputProcessor for *DPER2*. In preparation for subsequent support for null flag features in *compute meta*. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval.
## Overview
Intern project plan to support adding dense flags for missing feature values instead of replacing with zero.

Project plan :
https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing

## Code paths:
See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths.

Test Plan:
# A. DPER3 blob value inspection
## 1. Build local bento kernel in fbcode folder
`buck build mode/dev-nosan //bento/kernels:bento_kernel_ads_ranking`

## 2. Use kernel `ads_ranking (local)` to print dense feature blob values
n280239

## 2.1 Try `default_dense_value = "0.0"` (default)
```
preproc_6/feature_preproc_6/dper_feature_processor_7/raw_input_proc_7/float_feature_sparse_to_dense_7/float_features [[0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [1.       ]
 [1.7857143]
 [1.7777778]
 [1.       ]
 [0.       ]
 [0.5625   ]
 [0.       ]
 [0.       ]
 [0.8      ]
 [0.       ]
 [1.       ]
 [0.56     ]
 [0.       ]]
```
## 2.2 Try `default_dense_value = "123"`
```
preproc_2/feature_preproc_2/dper_feature_processor_3/raw_input_proc_3/float_feature_sparse_to_dense_3/float_features [[123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [  1.       ]
 [  1.7857143]
 [  1.7777778]
 [  1.       ]
 [123.       ]
 [  0.5625   ]
 [123.       ]
 [123.       ]
 [  0.8      ]
 [123.       ]
 [  1.       ]
 [  0.56     ]
 [123.       ]]
```
## 2.3 Try `default_dense_value = float("nan")`
```
RuntimeError: [enforce fail at enforce_finite_op.h:40] std::isfinite(input_data[i]). Index 0 is not finite (e.g., NaN, Inf): -nan (Error from operator:
input: "unary_4/logistic_regression_loss_4/average_loss_4/average_loss" name: "" type: "EnforceFinite" device_option { random_seed: 54 })
```
which is expected due to nan input.

# B. Unit test
`buck test  fblearner/flow/projects/dper/tests/preprocs:raw_feature_extractor_test`

https://www.internalfb.com/intern/testinfra/testconsole/testrun/5348024586274923/

{F241336814}

Differential Revision: D21961595

fbshipit-source-id: 3dcb153b3c7f42f391584f5e7f52f3d9c76de31f
2020-06-26 16:54:14 -07:00
..
benchmarks [caffe2] optimize 2/4-bit row-wise quantization (#387) 2020-06-19 21:28:31 -07:00
docs
examples fix ROCm bench CI by increasing first iter timeout (#37633) 2020-05-04 20:49:32 -07:00
fakelowp Make FakeLowP tests work (#36525) 2020-04-13 20:16:33 -07:00
helpers Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
ideep Remove (most) Python 2 support from Python code (#35615) 2020-04-22 09:23:14 -07:00
layers [1/n] Allow dense NaN value in dper raw input processor output 2020-06-26 16:54:14 -07:00
mint
mkl Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
modeling [c2] fix compute_norm test (#38529) 2020-05-14 20:49:36 -07:00
models Skip c2_ref_tests on network failures (#37972) 2020-05-06 22:19:28 -07:00
onnx [Onnxifi] Allow adding timeout for OnnxifOp run (#40081) 2020-06-17 16:21:25 -07:00
operator_test [Caffe2][Pruning] Make the caffe2 Sum operator support long types (#40379) 2020-06-23 14:18:29 -07:00
predictor [online trainer] Add blob reorder (#39534) 2020-06-05 17:33:08 -07:00
rnn
serialized_test [Caffe2][Pruning] Make the caffe2 Sum operator support long types (#40379) 2020-06-23 14:18:29 -07:00
test Correct #39759 for HIP. (#39801) 2020-06-12 10:34:28 -07:00
trt Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
__init__.py Fixes caffe2 loading issues on Windows (#39513) 2020-06-23 20:11:24 -07:00
_import_c_extension.py [AMD] Remove num_gpu check for remote execution (#34318) 2020-03-06 09:53:57 -08:00
allcompare_test.py
attention.py
benchmark_generator.py
binarysize.py
brew.py
brew_test.py
build.py
cached_reader.py
caffe_translator.py
caffe_translator_test.py
checkpoint.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
checkpoint_test.py
CMakeLists.txt
cnn.py
compatibility.py
context.py
context_test.py
control.py
control_ops_grad.py
control_ops_grad_test.py
control_ops_util.py
control_test.py
convert.py
convert_test.py
convnet_benchmarks.py
convnet_benchmarks_test.py
core.py [net_transform] only skip ConstantFill for autogen_grad (#34628) 2020-03-11 19:09:52 -07:00
core_gradients_test.py
core_test.py
crf.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
crf_predict.py
crf_viterbi_test.py
data_parallel_model.py [Caffe2] raise exceptions instead of str (#37744) 2020-05-05 13:34:33 -07:00
data_parallel_model_test.py
data_workers.py
data_workers_test.py
dataio.py
dataio_test.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
dataset.py
db_file_reader.py Adding support for manifold files in DBReader (#37727) 2020-05-15 07:18:30 -07:00
db_test.py
device_checker.py
dlpack.h Fix typos (#30606) 2019-12-02 20:17:42 -08:00
dyndep.py
embedding_generation_benchmark.py
experiment_util.py
extension_loader.py
filler_test.py
functional.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
functional_test.py
fused_8bit_rowwise_conversion_ops_test.py [caffe2] make fused rowwise quant/dequant op work for N-dim tensors (#33426) 2020-02-19 23:29:42 -08:00
gradient_check_test.py
gradient_checker.py [caffe2] fix type and shape inference for common gradient ops (#35857) 2020-04-02 11:17:04 -07:00
gru_cell.py
hip_test_util.py
hsm_util.py
hypothesis_test.py topk tensor k support (#39407) 2020-06-15 13:10:20 -07:00
hypothesis_test_util.py Update cafe2 hypothesis_test_util to support hypothesis-5 (#39498) 2020-06-05 08:27:50 -07:00
ideep_test_util.py
layer_model_helper.py Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning 2020-03-18 15:28:00 -07:00
layer_model_instantiator.py
layer_parameter_sharing_test.py
layer_test_util.py
layers_test.py FCTransposed to FbFCPacked (#29766) 2019-12-10 10:18:21 -08:00
lengths_reducer_fused_8bit_rowwise_ops_test.py [caffe2] fix how np.clip is used in lengths_reducer_fused_{4,8}_rowwise_ops_test (#32086) 2020-01-14 22:53:28 -08:00
lengths_reducer_rowwise_8bit_ops_test.py
lstm_benchmark.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
memonger.py [pyfi] override TP2 networkx -> PyFI networkx (#37764) 2020-05-11 13:20:00 -07:00
memonger_test.py
mkl_test_util.py
model_device_test.py
model_helper.py Fix TensorProtosDBInput AttributeError (#32274) 2020-01-29 12:05:43 -08:00
model_helper_test.py
modifier_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
mpi_python.cc Replace c10::guts::stuff with std::stuff (#30915) 2019-12-16 13:57:19 -08:00
muji.py
muji_test.py
net_builder.py
net_builder_test.py
net_drawer.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
net_printer.py
net_printer_test.py
nomnigraph.py [caffe2/nomnigraph] handle when PATH env is not defined (#39373) 2020-06-10 17:09:59 -07:00
nomnigraph_test.py
nomnigraph_transformations.py
nomnigraph_transformations_test.py
normalizer.py Scale init for batch-norm and layer-norm (#31983) 2020-01-10 11:55:56 -08:00
normalizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
normalizer_test.py
numa_benchmark.py
numa_test.py
observer_test.py
operator_fp_exceptions_test.py
optimizer.py Skipping L2 regularization on sparse biases 2020-06-11 11:21:25 -07:00
optimizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
optimizer_test.py Implementation of STORM optimizer caffe2 python wrapper (#36399) 2020-04-14 23:05:45 -07:00
optimizer_test_util.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
parallel_workers.py
parallel_workers_test.py
parallelize_bmuf_distributed_test.py
pipeline.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
pipeline_test.py
predictor_constants.py
pybind_state.cc [Onnxifi] Allow adding timeout for OnnxifOp run (#40081) 2020-06-17 16:21:25 -07:00
pybind_state.h caffe2: preserve python exception type from PythonOp (#36267) 2020-04-09 12:43:24 -07:00
pybind_state_dlpack.cc
pybind_state_dlpack.h
pybind_state_gpu.cc
pybind_state_hip.cc
pybind_state_ideep.cc Upgrade MKL-DNN to DNNL v1.2 (#32422) 2020-03-26 22:07:59 -07:00
pybind_state_int8.cc
pybind_state_nomni.cc
pybind_state_registry.cc
pybind_state_registry.h
python_op_test.py caffe2: preserve python exception type from PythonOp (#36267) 2020-04-09 12:43:24 -07:00
queue_util.py
record_queue.py
recurrent.py
regularizer.py Adding LpNorm regularization for sparse features in DPER3 (#38582) 2020-06-09 12:34:50 -07:00
regularizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
regularizer_test.py
rnn_cell.py [caffe2] Remove python2 from operator_test (#33977) 2020-03-02 08:55:53 -08:00
schema.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
schema_test.py
scope.py
scope_test.py
session.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
session_test.py
sparse_to_dense_mask_test.py
sparse_to_dense_test.py
task.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
task_test.py
test_util.py
text_file_reader.py
timeout_guard.py
toy_regression_test.py
transformations.py
transformations_test.py
tt_core.py
tt_core_test.py
utils.py [C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607) 2020-02-05 23:49:27 -08:00
utils_test.py [C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607) 2020-02-05 23:49:27 -08:00
visualize.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
workspace.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
workspace_test.py