pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Xiaomeng Yang	cbcf45274b	Move tanh function to math (#9328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328 Move tanh function to math Reviewed By: houseroad Differential Revision: D8794745 fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc	2018-07-11 13:59:50 -07:00
Yinghai Lu	80380f637c	Fix to make ONNXIFI flow work (#9340 ) Summary: Small step to have Relu test work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9340 Reviewed By: bddppq Differential Revision: D8807018 Pulled By: yinghai fbshipit-source-id: 429f3185e12afb12aaecfea8dd9595fdf838d356	2018-07-11 13:09:41 -07:00
Viswanath Sivakumar	c2dd90c40e	Add angle normalization for rotated boxes (#9056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056 Closes https://github.com/pytorch/pytorch/pull/9056 Updates bbox_transform for rotated boxes with angle info to normalize the predicted angle to be within [angle_bound_lo, angle_bound_hi] range. Reviewed By: pjh5 Differential Revision: D8706240 fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879	2018-07-11 11:25:54 -07:00
JerryShih	8da936ab52	Fix the build break for python3.7 PyUnicode_AsUTF8AndSize() prototype changing (#9259 ) Summary: https://docs.python.org/3.7/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize The return type changes from "char" to "const char". Pull Request resolved: https://github.com/pytorch/pytorch/pull/9259 Reviewed By: orionr Differential Revision: D8776219 Pulled By: pjh5 fbshipit-source-id: e5eadf71264002ba57cfb68dd39686a7ec074092	2018-07-11 10:39:43 -07:00
Viswanath Sivakumar	748a90d05b	BBoxTransform op: Add support for rotated boxes (#8952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952 Closes https://github.com/pytorch/pytorch/pull/8952 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8598547 fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7	2018-07-11 10:25:34 -07:00
Lu Fang	04a7fc1dc4	Add Upsample support in C2 onnx backend for opset 1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9327 Reviewed By: ailzhang Differential Revision: D8798462 Pulled By: houseroad fbshipit-source-id: d7d1127a853de6a7bb8fdef146f283487e1e5569	2018-07-10 22:43:25 -07:00
Huamin Li	fb9f9c9ba2	Implement Sinh and Cosh (#9213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213 Closes https://github.com/pytorch/pytorch/pull/9213 Added hyperbolic trig functions Sinh and Cosh Reviewed By: BIT-silence Differential Revision: D8752566 fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7	2018-07-10 18:55:31 -07:00
Lu Fang	e06abab264	Fix Upsample ONNX Symbolic (#9288 ) Summary: Adjust the change to changes in ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/9288 Reviewed By: ailzhang Differential Revision: D8779078 Pulled By: houseroad fbshipit-source-id: 7f387eeb35ae1f5a1494afc6287853a87a6173b4	2018-07-09 23:25:26 -07:00
Lu Fang	181d2a5e60	Add support of is_compatible for old version of onnx (#9284 ) Summary: Fix the problem if caffe2 works with old version of onnx Pull Request resolved: https://github.com/pytorch/pytorch/pull/9284 Reviewed By: yinghai Differential Revision: D8773894 Pulled By: houseroad fbshipit-source-id: 99b5a962099f854edc85a2ea815cb88c82a6e175	2018-07-09 21:09:14 -07:00
Yinghai Lu	7ace3a99ec	Fix TensorRT tests (#9285 ) Summary: ONNX-TensorRT is still using old opset (<7). Patch it for now. Future fix would be expose versioning in onnx exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9285 Reviewed By: houseroad Differential Revision: D8775268 Pulled By: yinghai fbshipit-source-id: c272073f80cce35ebd971e44ec9472e3c8fd4b9e	2018-07-09 20:40:19 -07:00
Yinghai Lu	cb98c5020a	Normalize IDEEP spatial bn op test (#9276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9276 Use `checkDevice` instead rolling our own. Reviewed By: orionr Differential Revision: D8769401 fbshipit-source-id: bd47ec2b2501552c2da1cee2eb9ad96a215602b4	2018-07-09 11:55:41 -07:00
Orion Reblitz-Richardson	936f47f271	Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test. cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267 Reviewed By: mingzhe09088 Differential Revision: D8767819 Pulled By: orionr fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975	2018-07-09 11:55:39 -07:00
Zhaoheng Ni	f87499a8f3	Modify the original PackSegments operator by adding "max_length" argument (#9048 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9048 max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim. Reviewed By: bddppq Differential Revision: D8702782 fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af	2018-07-06 14:33:59 -07:00
Xiuyan Ni	4e5369349f	Add FTRL Optimzier with Group Lasso regularizer (#9074 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9074 Implement an optimzier based on FTRL Optimzier which support Group Lasso regularizer. The relevant paper list for this optimizer: 1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf, 2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf Differential Revision: D8623146 fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452	2018-07-06 13:41:00 -07:00
Shaoliang Nie	da39c24971	Add GroupL1Norm regularizer (#9115 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9115 As desc Reviewed By: hlu1 Differential Revision: D8718011 fbshipit-source-id: c9d750662064dd6e6362b6b13d9d0175e93e60e4	2018-07-06 13:26:09 -07:00
Xiaomeng Yang	21c420c32c	Remove unused RowwiseArgMaxOp (#9119 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9119 Remove unused RowwiseArgMaxOp Reviewed By: houseroad Differential Revision: D8719826 fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5	2018-07-05 15:25:28 -07:00
Yan Zhu	8364470e5c	fix expty batch for softmax (#9075 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9075 as title Reviewed By: QueryConnectionException Differential Revision: D8710616 fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4	2018-07-01 16:40:14 -07:00
Xiaomeng Yang	03e7953a98	Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9072 Use FixedDivisor in Reduce and Broadcast CUDA kernels Reviewed By: houseroad Differential Revision: D8710243 fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb	2018-07-01 00:25:34 -07:00
Yan Zhu	b07ea04e23	empty batch for spatialBN (#8933 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8933 spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting: during training, when batch_size = 0: in forward, output's saved_mean and saved_var are zeros. in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros. Reviewed By: pjh5 Differential Revision: D8644699 fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c	2018-06-29 18:40:41 -07:00
Lu Fang	863754c722	Update the ONNX op coverage in C2 Summary: Closes https://github.com/pytorch/pytorch/pull/9051 Reviewed By: pjh5 Differential Revision: D8704583 Pulled By: houseroad fbshipit-source-id: 186e8b62378ab4f7cdef5fa77dc08c6b9ddc9cc0	2018-06-29 17:25:19 -07:00
Lu Fang	b75490414c	Bump up the C2 onnx frontend opset to 8 (#9006 ) Summary: Now ONNX master has bump up to opset 8. Closes https://github.com/pytorch/pytorch/pull/9006 Reviewed By: yinghai Differential Revision: D8685417 Pulled By: houseroad fbshipit-source-id: f0c0a3682417b8803a856e232c2740cf3e68e554	2018-06-29 11:56:11 -07:00
Xiaomeng Yang	838fdd6f99	Add Cube and Cbrt Ops (#8991 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8991 Add Cube and Cbrt Ops Reviewed By: houseroad Differential Revision: D8678848 fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930	2018-06-28 14:55:30 -07:00
Xiaomeng Yang	93cc7d1923	Add in_place test for binary ops Summary: Closes https://github.com/pytorch/pytorch/pull/8973 Reviewed By: houseroad Differential Revision: D8674216 Pulled By: BIT-silence fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b	2018-06-28 11:45:35 -07:00
Lu Fang	63233f98ad	Bump up opset version to 7 in Caffe2 ONNX exporter (#8854 ) Summary: Will bump up to opset 8 in another PR to match the current opset version. Already tested through generating the models in current model zoo. Closes https://github.com/pytorch/pytorch/pull/8854 Reviewed By: ezyang Differential Revision: D8666437 Pulled By: houseroad fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa	2018-06-28 07:39:02 -07:00
Yinghai Lu	346de2535d	Workaround lack of 0-dim support in ideep (#8959 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8959 MKL-DNN doesn't have support to 0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore. Reviewed By: viswanathgs Differential Revision: D8665168 fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b	2018-06-27 20:24:28 -07:00
Duc Ngo	f52c2ca1c6	net_async tracing use enable_profile arg from NetDef (#8927 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8927 Closes https://github.com/pytorch/pytorch/pull/8855 - Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name). - Append a unique id to the json profiling result file because there could be multiple instances of the same net running. - Dump json profling file regularly instead of just when the Tracer object is destroyed Reviewed By: ilia-cher Differential Revision: D8372378 fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01	2018-06-27 16:24:57 -07:00
Mingzhe Li	c4744cfafa	bilinear upsample operator on CPU Summary: Add support for bilinear upsample operator on CPU. Reviewed By: BIT-silence Differential Revision: D7853215 fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66	2018-06-27 10:12:06 -07:00
Orion Reblitz-Richardson	9ec0a2aef4	fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af	2018-06-27 04:50:56 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Xiaomeng Yang	288d37998a	[Caffe2] Fix gradient_check on in-place ops (#8828 ) * Fix gradient_check on in-place ops * Fix hsm_test * Fix SplitByLengthOp test * Fix input_device_options for gradient_checker * Fix hypothesis_test_util.py	2018-06-25 15:25:56 -07:00
Lu Fang	9c426797a8	Expose is_compatible function (#8783 )	2018-06-21 23:37:54 -07:00
Hexus (Shihao Xu)	bd95f8f948	Resolve name conflict of ContextManager (#7244 ) * Resolve conflicting name, ContextManager Concept name `Context Manager` is taken by Python. See https://docs.python.org/3.6/reference/datamodel.html#with-statement-context-managers It says, A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. The `ContextManager` here is more like a registry. And there is a C++ registry in caffe2 codebase `caffe2/caffe2/core/registry.h`. There is also a Caffe2DBRegistry, declared by calling `CAFFE_DECLARE_REGISTRY(Caffe2DBRegistry, DB, const string&, Mode);` in `caffe2/caffe2/core/db.h`. I think we can follow the concept name `Registry`, calling it `ContextRegistry`. * Make Classes and Functions internal to this module start with "_" Make Classes and Functions internal to this module start with "_" * Update context.py * Update context.py	2018-06-22 00:41:51 -04:00
Jinghui	0e0031e204	Fix build error in pybind_state_ideep (#8684 ) Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-06-20 08:29:48 -07:00
kittipatv	32bc28dd18	caffe2 export (#8642 )	2018-06-19 00:50:33 -07:00
zrphercule	c44c95fd0b	New operator 'expand' (#8263 ) * operator 'expand' * updated operator with a simple testcase * Revert "updated operator with a simple testcase" This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7. * updated operator with a simple testcase * expand operator with a passed testcase * typo * GPU full support added * GPU support testing... * GPU full supported * formatted * nits repaired * gpu parameters fixed * Expander removed * nits fixed, document added * formatted * new testcases added & nits repaired	2018-06-18 16:33:47 -07:00
bddppq	a8bf30d7a5	caffe2 hip python binding (#8491 ) * caffe2 hip python binding * Change back onnx submodule	2018-06-14 19:56:56 -07:00
Sebastian Meßmer	384936f73e	TypeId improvements (#8350 ) * Improve TypeId: - move it to c10 namespace to allow for easy extraction from caffe2 into c10 (i.e. reuseability from aten) - Use unordered_map/unordered_set instead of map/set for performance - Make TypeId a type safe class (i.e. no implicit casts from/to int) - Make TypeId constexpr - Some readability improvements (e.g. using instead of typedef) - Don't explicitly implement TypeMeta copy assignment and construction - let the compiler do that for us. - Add TypeMeta move constructor - Make TypeMeta members noexcept - Implement TypeMeta::operator== and operator!= as free functions instead of in-class * CR comments * fix * fix windows * Rename back to CaffeTypeId * Remove c10::TypeId/TypeMeta * remove C10_KNOWN_TYPE * code review	2018-06-14 09:16:26 -07:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Lu Fang	7543d0f794	Enable some of the ONNX backend test on broadcasting (#8423 ) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast	2018-06-13 10:15:56 -07:00
Lu Fang	a42c12bb11	Enable some reduce operators' ONNX backend tests (#8418 )	2018-06-13 21:32:50 +08:00
Peter Yeh	c37e5b7137	[Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format	2018-06-13 04:00:39 -07:00
Xiaomeng Yang	44973a06ba	Add affine_channel_op (#8356 ) Add affine_channel_op	2018-06-11 20:51:11 -07:00
bddppq	3521cd54af	Fix dividing by zero segfault in Reshape (#8302 ) when infer a dimension of zero size new shape	2018-06-09 09:48:22 -07:00
Yinghai Lu	2ed03898cd	Add depthwise convolution test for IDEEP (#8301 )	2018-06-09 08:44:13 -07:00
Viswanath Sivakumar	d301d9df7a	[ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233 ) IDEEP supports fusion for non-group conv	2018-06-08 10:29:15 -07:00
Viswanath Sivakumar	832c88a766	[ideep] Add IDEEP Squeeze op (#8227 ) Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc	2018-06-06 21:58:51 -07:00
Viswanath Sivakumar	4df86b6547	Update MKL exporter to IDEEP ops (#8228 ) IDEEP exporter support	2018-06-06 21:43:43 -07:00
sunnieshang	b2dac08049	Fix a corner case for ReShapeOp (#8178 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-05 19:06:10 -07:00
Xiao Yang	ffde23d45e	use the correct datatype format (#8144 )	2018-06-05 22:01:59 -04:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00

1 2 3 4 5 ...

1866 commits