pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Richard Barnes	1622546050	use irange for loops (#70248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70248 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32813863 fbshipit-source-id: 527244b4a2b220fdfe7f17dee3599603f492a2ca	2022-01-06 23:14:29 -08:00
Rick Weyrauch	8acd0a8b2f	Allow row sizes to support int64/size_t. (#69303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792 Follow up to D32715453 (`e60fd10659`), allowing row size to be 64-bit. Test Plan: buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt //caffe2/test: Reviewed By: jspark1105, jianyuh Differential Revision: D32768838 fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a	2021-12-14 10:09:08 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Andrew Gallagher	03de807d81	[caffe2/utils] Add explicit rule to avoid package boundary violation (#60677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677 Add a rule to wrap conversions.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: mzlee Differential Revision: D29370841 fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647	2021-06-28 14:43:30 -07:00
Cao Gao	bac6bcd6d8	Update call site for FBGemm quantization util functions. (#624 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59637 Replace FloatToFusedNBitRowwiseQuantizedSBHalf, FusedNBitRowwiseQuantizedSBHalfToFloat, FloatToFused8BitRowwiseQuantizedSBFloat, and Fused8BitRowwiseQuantizedSBFloatToFloat with newer version. Test Plan: CI tests. Reviewed By: dskhudia Differential Revision: D28918581 fbshipit-source-id: a21274add71439c5e51287a0e2ec918a8d8e5392	2021-06-16 10:15:34 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Stiopa Koltsov	547346d663	[caffe2] Fix -Wundef Summary: * `#if` with some undefined name is a warning when `-Wundef` is specified (which is in ovrsource for example) * identifiers starting with two underscores are [reserved for compiler internals](https://en.cppreference.com/w/cpp/language/identifiers) Test Plan: CI Reviewed By: ezyang Differential Revision: D27318070 fbshipit-source-id: 4989fc6a3bf3c176eddd7c25aca47414e4973edd	2021-03-31 22:24:30 -07:00
frdong	d3f784244e	fix comparison of narrow type with wide type in loop condition part2 (#54471 ) Summary: Follow up PR of https://github.com/pytorch/pytorch/issues/53951. This PR fixes remaining semmle warning: comparison of narrow type with wide type in loop condition Pull Request resolved: https://github.com/pytorch/pytorch/pull/54471 Reviewed By: bdhirsh Differential Revision: D27262493 Pulled By: malfet fbshipit-source-id: 05765758da79699936af11de237c3ff3d34373d6	2021-03-23 23:38:38 -07:00
frdong	92770d25cd	fix comparison of narrow type with wide type in loop condition (#53951 ) Summary: fix Semmle warning: Comparison of narrow type with wide type in loop condition For example there is below piece of code: for (int i=0; i<array.size(); ++i) {} The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951 Reviewed By: zou3519 Differential Revision: D27181495 Pulled By: malfet fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688	2021-03-22 16:40:35 -07:00
Gemfield	700c817a6a	Add install for libCaffe2_perfkernels_avx*.a (#53825 ) Summary: When build libtorch static library, these three static libraries will be generated but won't be installed to CMAKE_INSTALL_LIBDIR: - libCaffe2_perfkernels_avx2.a - libCaffe2_perfkernels_avx512.a - libCaffe2_perfkernels_avx.a This PR will fix this issue. Please be noted that after this fix there still have static libraries missing in CMAKE_INSTALL_LIBDIR, but they belong to third_party repo, and we need to fix in the corresponding repo: - libfoxi_loader.a - libonnx.a - libonnx_proto.a - libfmt.a - libnccl_static.a Pull Request resolved: https://github.com/pytorch/pytorch/pull/53825 Reviewed By: ngimel Differential Revision: D27013844 Pulled By: malfet fbshipit-source-id: 8a84cc72b6ae87393ca26c4e474f5526a7b18ab2	2021-03-15 08:37:11 -07:00
Qi Zhou	0ec717c830	Support int32 indices and offsets in nn.EmbeddingBag (#46758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758 It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type. Test Plan: unit tests Reviewed By: ngimel Differential Revision: D24470808 fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b	2020-11-03 23:33:50 -08:00
Mark Santaniello	1a99689d71	[caffe2] Fix preprocessor checks for FMA Summary: I think this preprocessor check is incorrect. The fused multiply-add (FMA) instructions are not part of AVX2. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D24237836 fbshipit-source-id: 44f9b9179918332eb85ac087827726300f56224e	2020-10-11 11:48:32 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Daya Khudia	09aee06e82	[caffe2] Replace embedding conversion ops with fbgemm functions (#44843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44843 Replace perfkernels calls with fbgemm kernels to avoid code duplication ghstack-source-id: 112496292 Test Plan: CI Reviewed By: radkris-git Differential Revision: D23675519 fbshipit-source-id: 05c285a9eeb9ea109a04a78cb442a24ee40a4aec	2020-09-22 11:57:01 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Jongsoo Park	7a837019a4	[caffe2] optimize 2/4-bit row-wise quantization (#387 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985 avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels. This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale. Test Plan: In my devserver for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done Before this diff 2-bit 3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized After this diff 2-bit 0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized Reviewed By: choudharydhruv, jianyuh Differential Revision: D22033195 fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467	2020-06-19 21:28:31 -07:00
Taiqing Wang	8cb1f2f9dc	implement L2 regularization for Adagrad in caffe2 and dper (#37705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372 Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/) Problem formulation L(w) = J(w) + lambda/2 * \|\|w\|\|^2 J(w) is the empirical loss, and \|\|w\|\|^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer. dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i. To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation. Code changes * In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added. * In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors. * In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero. Test Plan: ` buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay ` ` ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par ` Reviewed By: jspark1105 Differential Revision: D21258652 fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f	2020-05-03 10:42:49 -07:00
Dmytro Dzhulgakov	7576cf8d00	[caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371 It allows to drop circular dependency and remove unknown_symbols in Buck build. It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything Reviewed By: malfet Differential Revision: D20958000 fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29	2020-04-10 13:26:34 -07:00
Evgeny Fiksman	e372f42110	[caffe2] Explicit vectorization of LSTM operator (#35556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35542 Apply explicit vectorization to lstm_unit operator. Enabled by -DENABLE_VECTORIZATION=1 This optimization requires vector library support and was tested with Intel SVML & clang. However, compiler which support OpenMP4.5 with omp simd extention should also benefit. After the code changes In file included from caffe2/caffe2/operators/lstm_unit_op.cc:1: caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] caffe2/caffe2/operators/lstm_unit_op.h:112:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { Test Plan: Check failures at OSS CI - No build failures related to this change - Failing tests are: - py3.6-clang7-rocmdeb-ubuntu16.04-test2 >RuntimeError: fft: ATen not compiled with MKL support - caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test - >gradient_check_test.py::TestMakeTwo Exited with code exit status 1 - pytorch_macos_10_13_py3_test , Test errors like: > ERROR [0.014s]: test_boolean_indexing_weirdness_cpu (__main__.NumpyTestsCPU) RuntimeError: shape mismatch: indexing tensors could not be broadcast together with shapes [0], [2] - caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test - No failure info Reviewed By: jspark1105 Differential Revision: D20484640 fbshipit-source-id: 8fb82dbd6698c8de3e0bbbc0b48d15b70e36ca94	2020-04-01 17:19:56 -07:00
peter	e3daf70184	Fix AVX detection with clang-cl (#35653 ) Summary: Defining macros `/D__F16C__` or sth similar won't work on clang-cl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35653 Differential Revision: D20735878 Pulled By: ezyang fbshipit-source-id: 392a664b0a9e74222b1a03b8c3f6ebb2c61d867e	2020-03-30 07:53:37 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Jongsoo Park	a7fe200f5f	[caffe2] simplify caffe2 code with fbgemm handling block size 1 emb (#33774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33774 Simplify caffe2 code using D19246900 Test Plan: CI Reviewed By: jianyuh Differential Revision: D20102410 fbshipit-source-id: 8de4d9cfac66898db0718ac6477339fd5e5428e3	2020-02-27 14:45:28 -08:00
Jongsoo Park	c57f8984e6	[caffe2] make order btw div and mul in adgrad consistent (#32974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32974 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/286 Re-attempt of D18805426 . Decided to be consistent with PyTorch Adagrad There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. This diff make them consistent by doing w += lr * grad / (sqrt(moment) + epsilon) in Adagrad and w += lr / (sqrt(moment) + epsilon) * grad in RowWiseSparseAdagrad. The Adagrad order is consistent with PyTorch (see aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp addcmul_cpu_kernel function). The RowWiseSparseAdagrad order is to make compute more efficient. In RowWiseSparseAdagrad, lr / (sqrt(moment) + epsilon) is shared among all elements in the row And, we're not going to use FMA to be consistent with PyTorch (even though it provides a little accuracy benefit) Test Plan: CI Reviewed By: wx1988 Differential Revision: D19342865 fbshipit-source-id: e950c16f2e1c4a2f2a3ef53b1705db373c67f341	2020-02-16 22:45:59 -08:00
Jianyu Huang	a840afbeb4	[pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32683 Pull Request resolved: https://github.com/pytorch/glow/pull/4079 Similar to D17768404, we changed the EmbeddingBag operator for 8-bit fused version to add the option to include the last offset and parallelize the op. ghstack-source-id: 97404645 Test Plan: To generate the AVX2 code (`embedding_lookup_fused_8bit_rowwise_idx_avx2.cc`): ``` python hp_emblookup_codegen.py --fused --use-offsets ``` To test the correctness: ``` buck test //caffe2/torch/fb/sparsenn:test -- test_embedding_bag_byte_rowwise_offsets --print-passing-details ``` Reviewed By: yinghai Differential Revision: D19592761 fbshipit-source-id: f009d675ea3f2228f62e9f86b7ccb94700a0dfe0	2020-01-29 16:04:56 -08:00
Jianyu Huang	3ada2e0d64	[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477 We would like to add the intra-op parallelization support for the EmbeddingBag operator. This should bring speedup for the DLRM benchmark: https://github.com/pytorch/pytorch/pull/24385 Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import torch import time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for _ in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` The following results are single core on Skylake T6: - Before our change (with the original caffe2::EmbeddingLookup) time_per_iter 6.313693523406982e-05 GB/s 6.341517821789133 - After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths. time_per_iter 5.7627105712890626e-05 GB/s 6.947841559053659 - With Intel's PR: https://github.com/pytorch/pytorch/pull/24385 time_per_iter 7.393271923065185e-05 GB/s 5.415518381664018 For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6. ghstack-source-id: 97124557 Test Plan: With D16990830: ``` buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench ``` With D17750961: ``` buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb ``` OSS test ``` python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu ``` Buck test ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets" --print-passing-details ``` Generate the AVX2 code for embedding_lookup_idx_avx2.cc: ``` python hp_emblookup_codegen.py --use-offsets ``` Differential Revision: D17768404 fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700	2020-01-23 21:29:44 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Hector Yuen	9e9ca6ec37	add conversion functions to embedding tables (#31083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083 add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases) Test Plan: added unit tests enhanced shape inference tests Reviewed By: jspark1105 Differential Revision: D18920547 fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891	2020-01-08 16:56:12 -08:00
Jongsoo Park	7a12ccd003	optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470 Optimize performance of these two operators. Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization. Reviewed By: hyuen Differential Revision: D19072103 fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff	2019-12-20 10:09:26 -08:00
Jongsoo Park	e09c415387	Back out "make the order btw div and mul in adagrad update consistent" (#30737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30737 Original commit changeset: 2a8b2a3f5401 Reverting this to be safe until we address test failures in T58528495 Test Plan: CI Reviewed By: wx1988 Differential Revision: D18812384 fbshipit-source-id: 2a3ac554024773022ec827f259127e4c8cffe6e2	2019-12-04 17:43:45 -08:00
Jongsoo Park	d32f261f16	make the order btw div and mul in adagrad update consistent (#30449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30449 There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. In this diff we first compute effective_lr = lr / (sqrt(moment) + epsilon) and then multiply with gradient. Test Plan: CI Reviewed By: protonu Differential Revision: D18703416 fbshipit-source-id: 2a8b2a3f5401466549561412bd22f07abac3c598	2019-12-02 13:53:38 -08:00
Jongsoo Park	649e7f057e	fix comment index_size->output_size (#29831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29831 As title. Thanks Aleks Zi for finding this! Test Plan: Just changing comments Reviewed By: zlateski Differential Revision: D18511259 fbshipit-source-id: 5f1ad9ba53db9b22622a556ec214ced361ec016a	2019-11-16 01:49:02 -08:00
James Donald	7f485121a6	Avoid MSVC _cvtsh_ss() workaround with clang-cl (#29726 ) Summary: We (me fnabulsi bmcdb) have a handful of fixes used locally to build and run with clang-cl. I am aware of https://github.com/pytorch/pytorch/issues/8784 but it has not been touched in almost a year. It may be more practical to upstream the non-controversial fixes piecewise. For example, this one. Here, the dummy version of `_cvtsh_ss` for MSVC is not required (and hence causes conflicts) when using clang-cl so can be #ifdef'd out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29726 Differential Revision: D18478120 Pulled By: ezyang fbshipit-source-id: cdcd94251e68347446f2ad1ac5a0e71089f7d0ab	2019-11-13 12:49:13 -08:00
Yinghai Lu	cddc147267	Back out "Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup" (#27728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27728 Original commit changeset: 15ad64e49f92 Test Plan: same as previous one. Reviewed By: dreamingleo Differential Revision: D17872553 fbshipit-source-id: fd9d180d5e02e2c17285898c79cdd9509ffb8bbf	2019-10-10 23:52:43 -07:00
Ailing Zhang	b3cb072de7	Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup Test Plan: revert-hammer Differential Revision: D17826873 Original commit changeset: 23c4a96d9252 fbshipit-source-id: 15ad64e49f922a859abc574b261ac0f857682ff4	2019-10-10 16:16:06 -07:00
Yinghai Lu	ce6287f675	Adding support to offsets based Fused8BitRowwiseEmbeddingLookup (#27635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27635 PyTorch uses `offsets` instead of `lengths` for embedding table lookup. Adding support to that for fused quantized version. AVX2 version is generated with ``` python caffe2/caffe2/perfkernels/hp_emblookup_codegen.py --fused --use-offsets ``` Test Plan: ``` buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: jianyuh Differential Revision: D17826873 fbshipit-source-id: 23c4a96d92521deaebc02b688ad735d76a4476df	2019-10-10 10:50:44 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Jiakai Liu	2bed201190	remove caffe2.pb.h dependency for embedding_lookup_idx.cc (#25670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25670 This is part of the effort to get rid of protobuf dependency for libtorch mobile build. embedding_lookup_idx.cc is used by ATen/EmbeddingBag.cpp. It indirectly includes caffe2.pb.h but doesn't really need it. Clean up the headers to unblock no-protobuf mobile build. The broader problem is that many common headers in pytorch/caffe2 directly or indirectly include caffe2.pb.h. After landing the stack of changes to remove protobuf from OSS libtorch mobile build, it's going to constraint how ATen and other parts of pytorch use caffe2 components: it will break OSS mobile CI if a PR introduces a dependency to a caffe2 file that indirectly includes caffe2.pb.h. We will need to tease out caffe2.pb.h dependencies like in this diff, or do a refactor to replace protobuf generated types. Chatted with gchanan and ezyang to confirm that there is no plan to add more dependencies to caffe2 components from ATen in near future, so this should be fine. Test Plan: - build locally with stacked diffs Differential Revision: D17191913 Pulled By: ljk53 fbshipit-source-id: 1248fe6424060a8bedcf20e73942b7500ae5e815	2019-09-06 00:54:36 -07:00
Jianyu Huang	ad7250d315	Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (#24944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24944 As Title says, we would like to make the EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag. ghstack-source-id: 88883902 Test Plan: python hp_emblookup_codegen.py --use-offsets Check the benchmark in D16990830. Reviewed By: jspark1105 Differential Revision: D16924271 fbshipit-source-id: 7fac640c8587db59fd2304bb8e8d63c413f27cb8	2019-08-23 14:43:56 -07:00
Jongsoo Park	431d6e2189	minor comment fix (#22140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22140 As title Reviewed By: protonu Differential Revision: D15966759 fbshipit-source-id: 15dbf9de60cced29055aeaac3b71c1ff41cfe1d4	2019-08-08 21:08:47 -07:00
Hector Yuen	26db46b324	change the epilogue of SLS to match the simd section (#21439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21439 this bug got exposed after testing accuracy on shapes not multiples of 8 Reviewed By: jspark1105 Differential Revision: D15684759 fbshipit-source-id: 2950f2bd87ee1d8e539148285a14c755f606b3a7	2019-06-05 18:41:55 -07:00
Jongsoo Park	c185145d8c	remove dependency to caffe2::math and eigen (#21169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21169 We should minimize dependency from perfkernels (we were including eigen header files only in cc files not compiled with avx or avx2 options but better to be very strict because it's easy to introduce illegal instruction errors in perfkernels) Reviewed By: salexspb Differential Revision: D15563839 fbshipit-source-id: d4b1bca22d7f2e6f20f23664d4b99498e5984586	2019-05-31 11:55:16 -07:00

1 2

96 commits