Summary:
xw285cornell
- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036
Reviewed By: xw285cornell
Differential Revision: D13091813
Pulled By: bddppq
fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13795
Add ability for dot string generation for a single subgraph and python bindings (which is pretty useful for model exploration in Python)
Restructure DotGenerator class a bit to make it easy to implement this feature
Reviewed By: bwasti
Differential Revision: D13010512
fbshipit-source-id: 825665438394b7e6968ab6da167b477af82a7b62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14074
expose inducesEdges and addNode to python's NNSubgraph. This make it easy to manually construct a NNSubgraph in python
Reviewed By: bwasti
Differential Revision: D13092885
fbshipit-source-id: a94ed0b318162e27e3a4b5a4954eb6d169da7405
Summary:
Currently after performing export it gives two entries of externel_input
of input data in predict_net proto because it extends the externel_input
twice once seperately using input blob and one it is extendind all the entries
of external_input from proto in which input blob is already included
Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979
Differential Revision: D12916349
Pulled By: soumith
fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d
Summary:
Avoid false failure by checking for the presence of the test data in setup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13627
Differential Revision: D13090324
Pulled By: ezyang
fbshipit-source-id: e85571943d168c0007212d7b1a5b99ffa0c39235
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13825
async_polling was an intermediate step towards async_scheduling and is not used
Reviewed By: yinghai
Differential Revision: D13019059
fbshipit-source-id: eee6ba53e7f476ddb481afba3bf1768303864d32
Summary:
This pull request contains changes for:
1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc
2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc
3. Modifying the tests to run convTranspose op using MIOpen engine
Differential Revision: D13055099
Pulled By: bddppq
fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865
Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function.
Reviewed By: itomatik
Differential Revision: D9891514
fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641
FeedTensor function used to take a pointer to Tensor and feed the content using Resize
and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead.
Reviewed By: ezyang
Differential Revision: D12873145
fbshipit-source-id: 653735c20d611ff6ac9e380d8b3c721cb396a28f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13798
The semantics of C2 and ONNX Concat is a bit different. C2 concat accepts "add_axis" arg and will raise the dim if so. It's equivalent of attaching a Reshape after plain concat in ONNX.
Reviewed By: rdzhabarov
Differential Revision: D13012867
fbshipit-source-id: da23e555bae709fd2a373b04dcb9db4e984ae315
Summary:
There was a bug in the uniqueness check that only made the first run
unique
Reviewed By: duc0
Differential Revision: D13013504
fbshipit-source-id: ecf7526d0fafd7968f1301734123f93968efef46
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13812
Original commit changeset: 2cf95bdc5ed8
Looks like in iOS, `uint64_t` is not the same as `size_t`. :( Fixed it here.
Reviewed By: houseroad
Differential Revision: D13017390
fbshipit-source-id: d33854ce341225aba372fb945c3704edc14f9411
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13745
We need to support types beside `int64` and `float`.
Reviewed By: bddppq, rdzhabarov
Differential Revision: D12967258
fbshipit-source-id: 688076e6f504b2bf24bba89714df87a678c5638a
Summary:
Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators.
Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703
Reviewed By: dzhulgakov
Differential Revision: D12970810
Pulled By: ajyu
fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377
* Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false.
* Fix tests to use 64-bit counters that IterOp and LearningRateOp demands.
* Fix kernels that uses uninitialized memory.
Reviewed By: salexspb
Differential Revision: D10866512
fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821
Summary:
I was hitting this error:
caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'
So, the assignment from int64_t to float loses some precision and because of that we overflow.
Reproduced this issue with this diff D12945013
Reviewed By: mlappelbaum, jdshi-fb
Differential Revision: D12927086
fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683
Summary: Adding Fetching Real number representation for int8 tensor in workpace.py
Reviewed By: harouwu
Differential Revision: D12936556
fbshipit-source-id: f8756a37bce21c93d44d52faf5da9c9bd6473f4a
Summary:
We updated the description of upsample_op in onnx: https://github.com/onnx/onnx/pull/1467
Therefore, we need to support the new upsample_op in caffe2-onnx backend as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13272
Reviewed By: houseroad
Differential Revision: D12833656
Pulled By: zrphercule
fbshipit-source-id: 21af5282abaae12d2d044e4018a2b152aff79917
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733
Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D10333829
fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554
D10233252 broke ROCM test.
We don't have group conv in NHWC for hip yet and this diff omits related tests.
Reviewed By: hyuen
Differential Revision: D12917880
fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428
Group conv in NHWC layout was enabled in CPU after D7547497.
In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py .
Reviewed By: BIT-silence
Differential Revision: D10233252
fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde
Summary:
Essentially makes cuDNN to think of those kernels like of Nx1 ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902
Reviewed By: BIT-silence
Differential Revision: D10852862
Pulled By: soumith
fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13437
revert
transform the NCHW Convolution operators to NHWC and the tensors around these operators
Reviewed By: bwasti
Differential Revision: D12871789
fbshipit-source-id: 6509a29fa1654424d22904df0d3e60f8cd9c0ec7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13436
revert
Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.
Reviewed By: bwasti
Differential Revision: D12871811
fbshipit-source-id: 486ad09f3f37723c92a946c486ce3e24a649b4e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13429
Made the SSA transformation idempotent. This ensures that if a caffe2 graph is already in SSA form, the name of the ONNX models inputs/outputs match these of the caffe2 graph.
Avoid evaluating the model by running it if the shapes of all the blobs are present in the value_info map. This speeds up the conversion and decrease its memory usage in the case of medium to large nets.
Reviewed By: abadams
Differential Revision: D12873354
fbshipit-source-id: d695b28e610562afa9a41c2d4da05be212ccb488
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13332
Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.
Reviewed By: bwasti
Differential Revision: D10861211
fbshipit-source-id: da2fcc3e3b4dbf8decbe14a8e2d5621b3fcc377f
Summary: Made the clangr rule more robust and it discovered more callsites.
Reviewed By: smessmer
Differential Revision: D12825017
fbshipit-source-id: 3be1eeb7ea697b36ef89e78ba64c0ee1259439c4