Commit graph

1457 commits

Author SHA1 Message Date
Martin Yuan
d8c3d555e4 [Delegate] Support composite of lowered sub modules of the same backend (#59921)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D29091143

Pulled By: iseeyuan

fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc
2021-06-25 07:18:32 -07:00
Luca Wehrstedt
a016150163 Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543

Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292

Test Plan: It builds

Reviewed By: cbalioglu

Differential Revision: D29062002

fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
2021-06-24 12:38:51 -07:00
Raghavan Raman
d3a8505ee1 [jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881

This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29277876

Pulled By: navahgar

fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41
2021-06-24 01:27:41 -07:00
Hui Guo
d867340c7b [nnc] Add LoopNest::getLoopAt to retrieve a specified inner For-stmt (#60569)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60569

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29337767

Pulled By: huiguoo

fbshipit-source-id: e3ae23c1b290739c03d1fa5d7da25de878eb1d4c
2021-06-23 15:53:29 -07:00
Hui Guo
c0d08dc10f [NNC] Add tile transformation in loopnest (fixed #52785) (#57758)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57758

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28260744

Pulled By: huiguoo

fbshipit-source-id: 6b5591850aaf46455bf3c2d776fa930654839a63
2021-06-23 15:52:19 -07:00
Eli Uriegas
2dedd96dd2 cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493

TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying
include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR
instead

<details>
<summary>Logs for builds with torchaudio</summary>

```
-- Building version 0.10.0a0+9e36281
running bdist_wheel
running build
running build_py
copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio
running build_ext
-- Configuring done
-- Generating done
-- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6
[1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc
[2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc
[3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc
[4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc
[5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc
[6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc
[7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp
[8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable]
  814 |     int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_;
      |           ^~~~~~~~~~~~~~~~~
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare]
 1504 |   if (normalization_stats_.size() <= frame)
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
[9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a  third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && :
[10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG   -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o  -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib:  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so  third_party/kaldi/libkaldi.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed  /usr/local/lib/libbreakpad_client.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  -lpthread  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && :
[10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake
-- Install configuration: "Release"
-- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so
-- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to ""
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio
creating build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
creating build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
creating build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
creating build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
creating build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
creating build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
creating build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
creating build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio
running install_egg_info
running egg_info
writing torchaudio.egg-info/PKG-INFO
writing dependency_links to torchaudio.egg-info/dependency_links.txt
writing requirements to torchaudio.egg-info/requires.txt
writing top-level names to torchaudio.egg-info/top_level.txt
reading manifest file 'torchaudio.egg-info/SOURCES.txt'
writing manifest file 'torchaudio.egg-info/SOURCES.txt'
Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL
creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'torchaudio/__init__.py'
adding 'torchaudio/_torchaudio.so'
adding 'torchaudio/kaldi_io.py'
adding 'torchaudio/transforms.py'
adding 'torchaudio/version.py'
adding 'torchaudio/_internal/__init__.py'
adding 'torchaudio/_internal/fft.py'
adding 'torchaudio/_internal/module_utils.py'
adding 'torchaudio/backend/__init__.py'
adding 'torchaudio/backend/common.py'
adding 'torchaudio/backend/no_backend.py'
adding 'torchaudio/backend/soundfile_backend.py'
adding 'torchaudio/backend/sox_io_backend.py'
adding 'torchaudio/backend/utils.py'
adding 'torchaudio/compliance/__init__.py'
adding 'torchaudio/compliance/kaldi.py'
adding 'torchaudio/datasets/__init__.py'
adding 'torchaudio/datasets/cmuarctic.py'
adding 'torchaudio/datasets/commonvoice.py'
adding 'torchaudio/datasets/gtzan.py'
adding 'torchaudio/datasets/librispeech.py'
adding 'torchaudio/datasets/libritts.py'
adding 'torchaudio/datasets/ljspeech.py'
adding 'torchaudio/datasets/speechcommands.py'
adding 'torchaudio/datasets/tedlium.py'
adding 'torchaudio/datasets/utils.py'
adding 'torchaudio/datasets/vctk.py'
adding 'torchaudio/datasets/yesno.py'
adding 'torchaudio/extension/__init__.py'
adding 'torchaudio/extension/extension.py'
adding 'torchaudio/functional/__init__.py'
adding 'torchaudio/functional/filtering.py'
adding 'torchaudio/functional/functional.py'
adding 'torchaudio/models/__init__.py'
adding 'torchaudio/models/conv_tasnet.py'
adding 'torchaudio/models/deepspeech.py'
adding 'torchaudio/models/wav2letter.py'
adding 'torchaudio/models/wavernn.py'
adding 'torchaudio/models/wav2vec2/__init__.py'
adding 'torchaudio/models/wav2vec2/components.py'
adding 'torchaudio/models/wav2vec2/model.py'
adding 'torchaudio/models/wav2vec2/utils/__init__.py'
adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py'
adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py'
adding 'torchaudio/prototype/__init__.py'
adding 'torchaudio/prototype/rnnt_loss.py'
adding 'torchaudio/sox_effects/__init__.py'
adding 'torchaudio/sox_effects/sox_effects.py'
adding 'torchaudio/utils/__init__.py'
adding 'torchaudio/utils/sox_utils.py'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel

```

</details>

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29316372

Pulled By: seemethere

fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed
2021-06-23 14:08:19 -07:00
Bert Maher
10e11dbdcd Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550

Original commit changeset: ed655497a981

Whatever gcc version OSS Bazel uses wasn't happy move-constructing the
SimpleIREvaluator, so use a unique_ptr instead.

Test Plan:
CI.  Hope that the gcc version used by OSS Bazel build is
happier with this (it should be), since actually testing it locally is
an intractable pain.

Reviewed By: navahgar

Differential Revision: D29333116

fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d
2021-06-23 10:50:03 -07:00
Anjali Chourdia
b14f19b6fe Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum
Test Plan: revert-hammer

Differential Revision:
D29190420 (21479ad20c)

Original commit changeset: 86246df82098

fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184
2021-06-23 06:59:37 -07:00
Bert Maher
21479ad20c [nnc][tests] Tests and benchmarks for computeSum (#60160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160

Adds a few simple tests and benchmarks for the `computeSum` op
(equivalent to `at::sum`).

The benchmarks test 1D reduction and 2D row and column reduction.  Performance
is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases,
and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13).

Results (on my skylake-avx512, with turbo disabled):
```
------------------------------------------------------------------------------------------
Benchmark                                   Time           CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
Reduce1D/Torch/16777216               4746995 ns    4746722 ns        150 BYTES=14.1379G/s
Reduce1D/Naive/16777216              34063215 ns   34061388 ns         21 BYTES=1.97023G/s
Reduce1D/NativeRfactor/16777216       5057175 ns    5057167 ns        139 BYTES=13.2701G/s
Reduce1D/TeNaive/16777216            33868945 ns   33868851 ns         21 BYTES=1.98143G/s
Reduce1D/TeSplitTail/16777216        33902786 ns   33900436 ns         21 BYTES=1.97959G/s
Reduce1D/TeSplitMask/16777216        33922509 ns   33920604 ns         21 BYTES=1.97841G/s
Reduce1D/TeRfactorV1/16777216         5141150 ns    5141002 ns        135 BYTES=13.0537G/s
Reduce1D/Op/16777216                  5140390 ns    5140091 ns        135 BYTES=13.056G/s
Reduce2DCol/Torch/8/2097152          12824403 ns   12823563 ns         55 BYTES=5.8874G/s
Reduce2DCol/Torch/64/262144           8306873 ns    8306743 ns         83 BYTES=8.20507G/s
Reduce2DCol/Torch/4096/4096           7992364 ns    7992239 ns         87 BYTES=8.3988G/s
Reduce2DCol/OpSchedule/8/2097152/0    4866144 ns    4865766 ns        138 BYTES=15.5161G/s
Reduce2DCol/OpSchedule/64/262144/0   36668978 ns   36666415 ns         19 BYTES=1.85885G/s
Reduce2DCol/OpSchedule/4096/4096/0  155862459 ns  155801266 ns          4 BYTES=430.839M/s
Reduce2DCol/OpSchedule/8/2097152/1    8067683 ns    8061117 ns         85 BYTES=9.36563G/s
Reduce2DCol/OpSchedule/64/262144/1    7496686 ns    7496562 ns         93 BYTES=9.09183G/s
Reduce2DCol/OpSchedule/4096/4096/1    5262821 ns    5262186 ns        131 BYTES=12.7562G/s
Reduce2DCol/OpSchedule/8/2097152/2    6237899 ns    6237210 ns        109 BYTES=12.1044G/s
Reduce2DCol/OpSchedule/64/262144/2    5258012 ns    5257655 ns        127 BYTES=12.9635G/s
Reduce2DCol/OpSchedule/4096/4096/2    5231686 ns    5228241 ns        132 BYTES=12.839G/s
Reduce2DCol/OpSchedule/8/2097152/3   11088573 ns   11087557 ns         62 BYTES=6.80921G/s
Reduce2DCol/OpSchedule/64/262144/3    5338843 ns    5338326 ns        127 BYTES=12.7676G/s
Reduce2DCol/OpSchedule/4096/4096/3    4311617 ns    4308102 ns        162 BYTES=15.5812G/s
Reduce2DRow/Torch/8/2097152           4642244 ns    4641794 ns        151 BYTES=14.4575G/s
Reduce2DRow/Torch/64/262144           4628311 ns    4628245 ns        151 BYTES=14.4999G/s
Reduce2DRow/Torch/4096/4096           4894012 ns    4893316 ns        143 BYTES=13.7177G/s
Reduce2DRow/Torch/262144/64          10469098 ns   10468027 ns         68 BYTES=6.51101G/s
Reduce2DRow/Hand/262144/64            5554380 ns    5554059 ns        126 BYTES=12.2716G/s
Reduce2DRow/OpSchedule/8/2097152/0   33890363 ns   33888931 ns         21 BYTES=1.98026G/s
Reduce2DRow/OpSchedule/64/262144/0   33901317 ns   33899436 ns         21 BYTES=1.97965G/s
Reduce2DRow/OpSchedule/4096/4096/0   33500358 ns   33498815 ns         21 BYTES=2.00381G/s
Reduce2DRow/OpSchedule/262144/64/0   13132231 ns   13131049 ns         53 BYTES=5.19056G/s
Reduce2DRow/OpSchedule/8/2097152/1    5200423 ns    5200025 ns        134 BYTES=12.9055G/s
Reduce2DRow/OpSchedule/64/262144/1    5204428 ns    5204327 ns        133 BYTES=12.8949G/s
Reduce2DRow/OpSchedule/4096/4096/1    8724355 ns    8723370 ns         80 BYTES=7.69488G/s
Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns          1 BYTES=37.6279M/s
Reduce2DRow/OpSchedule/8/2097152/2    9169829 ns    9168946 ns         76 BYTES=7.31915G/s
Reduce2DRow/OpSchedule/64/262144/2    9159901 ns    9158560 ns         76 BYTES=7.32747G/s
Reduce2DRow/OpSchedule/4096/4096/2    9217398 ns    9215557 ns         76 BYTES=7.28391G/s
Reduce2DRow/OpSchedule/262144/64/2   10820450 ns   10818998 ns         66 BYTES=6.29979G/s
Reduce2DRow/OpSchedule/8/2097152/3    5227921 ns    5226544 ns        133 BYTES=12.84G/s
Reduce2DRow/OpSchedule/64/262144/3    5194362 ns    5194082 ns        133 BYTES=12.9203G/s
Reduce2DRow/OpSchedule/4096/4096/3    5196080 ns    5195349 ns        134 BYTES=12.9203G/s
Reduce2DRow/OpSchedule/262144/64/3    5235189 ns    5234728 ns        133 BYTES=13.0202G/s
```

ghstack-source-id: 131753875

Test Plan: these tests

Reviewed By: navahgar

Differential Revision: D29190420

fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0
2021-06-22 23:04:09 -07:00
Jiakai Liu
b0c9762e2d [pytorch][nnc] external function call to xnnpack ops (#59525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525

This PR added NNC external function call binding for two XNNPack ops:
- prepacked::linear_clamp_run
- prepacked::conv2d_clamp_run

Both ops take two arguments: a regular input tensor and a prepacked context
object that contains other parameters like weights/bias/etc. The prepacked
context object's type is a custom class.

NNC doesn't generate assembly code that reads the content of the prepacked
object directly. It simply passes it into the XNNPack ops wrapper, so both
NNC and the generated assembly code don't need to know the custom class type.

At compilation time, we use a size-1 dummy tensor as the placeholder for the
prepacked XNNPack context object.

At runtime, we pass in the raw pointer of the XNNPack context object as if it
were a regular tensor storage data pointer.

Inside the external function call wrapper, we reinterpret_cast the raw pointer
back to the custom class type before dispatching to the XNNPack ops.
ghstack-source-id: 132135512

Test Plan: unit test

Reviewed By: bertmaher

Differential Revision: D28924934

fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4
2021-06-22 21:29:31 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
fca931d181 List striding with arbitrary step size (#58537)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58537

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D28531721

Pulled By: tugsbayasgalan

fbshipit-source-id: 8c8ed32ca00366603bfb5086e87dfa62736ff4b2
2021-06-22 11:25:23 -07:00
Michael Dagitses
91451369ed require non-empty inputs to grad() calls in the API (#52016)
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016

Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.

Fixes https://github.com/pytorch/pytorch/issues/47061

Reviewed By: albanD

Differential Revision: D26406444

Pulled By: dagitses

fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
2021-06-22 10:10:58 -07:00
Hariom Narang
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
Thomas J. Fan
c16f87949f ENH Adds nn.ReflectionPad3d (#59791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655

This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791

Reviewed By: gchanan

Differential Revision: D29242015

Pulled By: jbschlosser

fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
2021-06-21 10:53:14 -07:00
Raghavan Raman
d0c4ace00f [jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955318

Pulled By: navahgar

fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c
2021-06-18 14:32:07 -07:00
Luca Wehrstedt
08ce5eedf5 [reland] Move RPC agents to libtorch (#60170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170

Reland of #59939.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193234

fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60
2021-06-18 05:15:09 -07:00
Bin Bao
3dc8112187 [NNC] Handle int64 indices and loop bounds (#59769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59769

Allow loop bound and tensor indice to be either int32 or int64, and avoid unnecessary cast op.

Test Plan:
```
build/bin/test_tensorexpr
```

Reviewed By: H-Huang

Differential Revision: D29173970

Pulled By: desertfire

fbshipit-source-id: 859a876ddb1b41535b2266089aa1222884295c78
2021-06-17 09:35:59 -07:00
Mikhail Zolotukhin
eb36f67dcc [TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041

Differential Revision:
D29146709
D29146709

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f
2021-06-17 01:23:24 -07:00
Mike Ruberry
f233274f30 Revert D28875276: Move RPC agents to libtorch
Test Plan: revert-hammer

Differential Revision:
D28875276 (fc50f91929)

Original commit changeset: f2f6970fd74d

fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78
2021-06-17 00:48:58 -07:00
Hao Lu
eda2ddb5b0 [ATen] Fix aten::to schema (#60001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001

Fix the aten::to schema to reflect that the output may alias input.

Test Plan: Added new unit tests.

Reviewed By: ezyang

Differential Revision: D29121620

fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c
2021-06-15 20:04:20 -07:00
Brian Hirsh
27a3204982 generate C++ API for meta functions using at::meta:: (#58570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58570

**What the PR does**
Generate a fast-path `at::meta::{op}` API for calling meta functions without having to go through the dispatcher. This will be important for perf for external backends that want to use meta functions for shape checking (which seems likely to be what we end up doing for LazyTensorCore).

**Details**
In order to avoid naming collisions I had to make two small changes:
- rename `MetaFunctions.h` template -> `NativeMetaFunctions.h` (this is the file that declares the impl() function for every structured operator).
- rename the meta class: `at::meta::{op}::meta()` -> `at::meta::structured_{op}::meta()`

I also deleted a few unnecessary includes, since any file that includes NativeFunctions.h will automatically include NativeMetaFunctions.h.

**Why I made the change**
This change isn't actually immediately used anywhere; I already started writing it because I thought it would be useful for structured composite ops, but that isn't actually true (see [comment](https://github.com/pytorch/pytorch/pull/58266#issuecomment-843213147)). The change feels useful and unambiguous though so I think it's safe to add. I added explicit tests for C++ meta function calls just to ensure that I wrote it correctly - which is actually how I hit the internal linkage issue in the PR below this in the stack.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D28711299

Pulled By: bdhirsh

fbshipit-source-id: d410d17358c2b406f0191398093f17308b3c6b9e
2021-06-15 16:54:46 -07:00
Luca Wehrstedt
fc50f91929 Move RPC agents to libtorch (#59939)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28875276

fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45
2021-06-15 16:20:53 -07:00
Raghavan Raman
b822928e33 [nnc] Removed setGPUBlockIndex and setGPUThreadIndex methods from LoopNest (#59495)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59495

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28915960

Pulled By: navahgar

fbshipit-source-id: 20a4032b031aba6e43d85433ade5f0680c65fbc0
2021-06-15 10:37:46 -07:00
Raghavan Raman
aa163aeff5 [nnc] Made several LoopNest APIs static (#59494)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59494

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28915959

Pulled By: navahgar

fbshipit-source-id: bf52e30d893f4d86812219b538a14307f347f10b
2021-06-15 10:36:31 -07:00
Luca Wehrstedt
a1780432fa Move c10d to libtorch(_cuda) (#59563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563

ghstack-source-id: 131331264

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28932239

fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34
2021-06-15 02:01:31 -07:00
Raghavan Raman
b83ac0cc4e [nnc] Added a check to vectorize only those loops that are normalized. (#59423)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59423

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D28886979

Pulled By: navahgar

fbshipit-source-id: edfc61feaf5efe22d4f367ac718b83b3d0f47cb3
2021-06-11 12:03:34 -07:00
Raghavan Raman
30e24b2d2b [nnc] Modified vectorize API to return bool (#59422)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59422

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D28886980

Pulled By: navahgar

fbshipit-source-id: 58cc3ecd86564a312a132f8260d836b096505095
2021-06-11 12:02:19 -07:00
Rohan Varma
d433a55c94 Replace throw std::runtime_error with torch_check in torch/csrc/distributed (#59683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59683

Replaces usages of throw std::runtime_error("foo") with the better
torch_check(false, "foo") which allows C++ stacktraces to show up when
TORCH_SHOW_CPP_STACKTRACES=1. This will hopefully provide much better debugging
information when debugging crashes/flaky tests.
ghstack-source-id: 131167210

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D28981327

fbshipit-source-id: 677f569e28600263cab18759eb1b282e0391aa7b
2021-06-11 11:15:49 -07:00
Kimish Patel
2ce21b2e61 [Pytorch backend delegation] Preprocess to accept (#58873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58873

BackenDebugInforRecorder

Prior to this PR:
In order to generate debug handles corresponding to the graph being
lowered, backend's preprocess will call generate_debug_handles and will
get map of Node*-to-debug_handles.
In order to facilitate this, to_backend will own
BackendDebugInfoRecorder and initialize thread local pointer to it.
generate_debug_handle function will query thread local pointer to see if
there is a valid BackendDebugInforRecorder for the context. If there is
it will generate debug handles.

After this PR:
Signature of preprocess is changed such that backends have to register
preprocess that accepts instance of BackendDebugInfoRecorder by
reference. generate_debug_handles is no more a free function but becomes
part of the API of BackendDebugInfoRecorder. Now backend's preprocess
function will call generate_debug_handles on BackendDebugInfoRecorder
instead of free function.

Reason for this change:
With RAII that initializes thread local pointer, results in a lose
contract with backends, which may result in backends not storing
debug information. Making it part of API results in
backends having to be aware of BackendDebugInfoRecorder and explicitly
chosing not to generate/store debug information if they chose to do so.

Test Plan:
backend tests

Imported from OSS

Reviewed By: jbschlosser, raziel

Differential Revision: D28648613

fbshipit-source-id: c9b7e7bf0f78e87023ea7bc08612cf893b08cb98
2021-06-11 10:16:00 -07:00
Mikhail Zolotukhin
daa35141e8 Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508

An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28918342

Pulled By: ZolotukhinM

fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
2021-06-08 22:48:17 -07:00
Jeffrey Wan
f52e202840 Add warning when accessing Tensor::grad() in the C++ API (#59362)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35379

 - Adds  `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`.
   - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?).
 - Python API now uses `retain_grad` implementation from cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362

Reviewed By: jbschlosser

Differential Revision: D28969298

Pulled By: soulitzer

fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02
2021-06-08 19:43:21 -07:00
Jeffrey Wan
1733d10399 Warn when backward() is called with create_graph=True (#59412)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4661
- Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths
- Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412

Reviewed By: jbschlosser

Differential Revision: D28969294

Pulled By: soulitzer

fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0
2021-06-08 17:19:04 -07:00
Can Balioglu
cf408c3743 [1/n] [c10d] Introduce a new TCPStore constructor (#58328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58328

This PR is part of a stack that addresses the GitHub issue #41614; it introduces a new `TCPStore` constructor that takes its optional parameters via a newly introduced `TCPStoreOptions` structure. This gives the API callers the flexibility to specify only the desired options while skipping the rest.

The main motivation behind this change is the introduction of the `multiTenant` constructor option in the second PR of this stack.
ghstack-source-id: 130676384

Test Plan: Run the existing tests since there are no behavioral changes.

Reviewed By: H-Huang

Differential Revision: D28417742

fbshipit-source-id: e6ac2a057f7ad1908581176ee6d2c2554c3c74a9
2021-06-05 07:50:02 -07:00
Nikita Shulga
ba3a90b55e Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors.
Test Plan: revert-hammer

Differential Revision:
D28819780

Original commit changeset: f3feff35a1ce

fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a
2021-06-04 19:25:30 -07:00
David Reiss
a682ff7ef1 Add kMaxSupportedBytecodeVersion for Lite Interpreter (#59472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59472

Previously, the lite interpreter would refuse to load any model
with a version greater than kProducedBytecodeVersion.  Now, we're
able to independently advance the loading and saving code, so we
can roll out changes without breaking forward compatibility.

Test Plan:
CI.
Loaded a bytecode v5 model even with setting kProducedBytecodeVersion
to v4.

Reviewed By: raziel

Differential Revision: D28904350

fbshipit-source-id: 598c22f0adf47d4ed3e976bcbebdf3959dacb1df
2021-06-04 17:55:02 -07:00
Mikhail Zolotukhin
d60efd8207 [TensorExpr] Fix handling of 0-dim tensors. (#59279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279

There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.

Differential Revision:
D28819780
D28819780

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
2021-06-04 13:58:15 -07:00
Jeffrey Wan
4ae5764d47 Add is_inference to native functions (#58729)
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729

Reviewed By: mruberry

Differential Revision: D28874507

Pulled By: soulitzer

fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
2021-06-04 08:59:11 -07:00
Luca Wehrstedt
8f4cfaa9db Fix race condition in TP agent (#58753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58753

TSAN was (rightfully!) detecting and complaining about a race due to the fact that upon init the TP agent exchanges the device maps between nodes using RPC requests (and by doing so it accesses the device maps) and then sets the reverse device maps (thus possibly modifying the set of devices). This resulted in a data race, i.e., simultaneously reading and writing the set of devices without synchronizing.

One solution is to add a mutex around the devices, which works, but is "annoying". An alternative solution is to make the set of devices immutable (i.e., `const`). For that to work, we need to exchange the device maps without using RPC calls. We can do so using the process group that we need to create anyways.

Since now there's a lot more logic in Python, I've moved (and restructured) all safety checks over there, and removed them from C++.
ghstack-source-id: 130583775

Test Plan: Unit tests

Reviewed By: mrshenli

Differential Revision: D28603754

fbshipit-source-id: 88533e65d72d1eb806dc41bec8d55def5082e290
2021-06-04 06:53:42 -07:00
johnlu
db90533b9e Make JIT not assume that the device is CUDA. (#54238)
Summary:
Decouple the JIT argument spec and shape analysis with CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54238

Reviewed By: ngimel

Differential Revision: D28802085

Pulled By: Krovatkin

fbshipit-source-id: 4068c9460cdec2d80733f001ca90ea3f5e6d3a7e
2021-06-03 22:21:27 -07:00
Hui Guo
7c4ac9e3ee [NNC] Fix loopnest.cache_accesses for reduce ops (fixed #59002) (#59136)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59136

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28768598

Pulled By: huiguoo

fbshipit-source-id: 99ab8430bc0ba395e2a041b03a7761de335ddda5
2021-06-03 21:04:14 -07:00
Bin Bao
add291cf66 [JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477

Currently the conversion only deals with activation operators. The legality check is somewhat strict for now.

Test Plan:
```
python test/test_jit.py -k test_functional_to_inplace_activation
python test/test_jit.py -k test_inplace_to_functional_activation
```

Reviewed By: mrshenli

Differential Revision: D28155153

Pulled By: desertfire

fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b
2021-06-03 06:43:23 -07:00
Luca Wehrstedt
3a2149a4ce [reland] Make TP agent use streams from Future when sending response (#59212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59212

Reland of https://github.com/pytorch/pytorch/pull/58428

Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!).
ghstack-source-id: 130202842

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28623885

fbshipit-source-id: 29333bcb75d077ab801eac92017d0e381e8f5569
2021-06-02 05:46:05 -07:00
Luca Wehrstedt
5ec169b4c3 [reland] Always use intrusive_ptr for Message (1 out of 2) (#59205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59205

Reland of https://github.com/pytorch/pytorch/pull/58422

Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move).

By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above.

In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR.
ghstack-source-id: 130202849

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28623891

fbshipit-source-id: c9aeea3440679a11741ca78c06b03c57cb815a5e
2021-06-02 05:44:49 -07:00
Joel Schlosser
ef32a29c97 Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080

Original commit changeset: 3686579517cc

Test Plan: None; reverting diff

Reviewed By: albanD

Differential Revision: D28746799

fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c
2021-05-27 15:40:52 -07:00
Bert Maher
617b74aa35 [nnc] LLVMCodeGen for any target (#58713)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58713

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28585722

Pulled By: bertmaher

fbshipit-source-id: 82885b9780dc1a8610660a90969d8d2baad97920
2021-05-27 09:25:15 -07:00
Scott Wolchok
de22657e1c [PyTorch] Replace RecordFunction shouldRun callback with atomic bools (#56504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56504

Having callbacks registered but disabled via their
`shouldRun` callback defeats the `shouldRunRecordFunction`
optimization (no relation between the two things, despite the
shared prefix on the names) that aims to skip `RecordFunction`
construction.

This diff attempts to safely rectify this issue: we drop support for
`shouldRun` callbacks (this is bc-breaking; does anything use these
externally? do I need to add the support back and just stop using it
internally?), add support for enabling and disabling callbacks, and
(for global callbacks) make doing so thread-safe.

There is an interesting subtlety with `std::atomic` that came up: it
is neither copyable nor movable, which precludes putting it into
`std::vector`. I manually overrode this because the thread safety
reasons it is neither copyable nor movable don't apply here; we
already state that adding or removing callbacks (the operations that
might copy/move an atomic) are not thread-safe and should be done at
initialization time.
ghstack-source-id: 129614296

Test Plan:
Existing CI should cover correctness, right?  Inspected
perf report of a simple benchmark that runs nn.Linear in a loop on
CUDA, where internally have Kineto initialized and thus had a
shouldRun observer previously; we are no longer going through the
dispatcher's slow RecordFunction path or spending measurable time
constructing RecordFunction instances.

Reviewed By: ilia-cher

Differential Revision: D27834944

fbshipit-source-id: 93db1bc0a28b5372f7307490c908457e7853fa92
2021-05-26 14:31:33 -07:00
Chen Lai
9ba9a16700 [PyTorch Edge] Use stream as backport_vi_to_vi-1 interface (#58790)
Summary:
Two main changes:
1. Change the argument of the collection of backport_v{i}_to_v{i-1} from (reader, writer) to (input_model_stream, output_model_stream), so it's easier to backport a model in option 2.

>  2) [Both format and content change] ]Use torch.jit.load() to load the stream,
 and save it to output_model_stream.

2. Fix an issue in the test `backportAllVersionCheck`. Previous it declares `std::ostringstream oss` and uses `oss.clear()` to reset the stringstream. However, the `clear()` function doesn't reset the stream content, and causes problematic stream. As a mitigation, checks are added to prevent corrupted stream for each iteration in while loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58790

ghstack-source-id: 129929960

Test Plan:
CI
```
buck test mode/dev //caffe2/test/cpp/jit:jit
```

Reviewed By: raziel, iseeyuan

Differential Revision: D28620961

fbshipit-source-id: b0cbe0e88645ae278eb3999e2a84800702b5f985
2021-05-26 02:07:46 -07:00
Chen Lai
60af6e928a [PyTorch Edge][Version] Fix torchscript model after backport (#58892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58892

The torchscript model after backport misses the `constants` archive. Add it back, and extend the unit test to run torchscript part.
ghstack-source-id: 129853819

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit
- LiteInterpreterTest.BackPortByteCodeModelAllVersions'
```

Reviewed By: raziel, iseeyuan

Differential Revision: D28664507

fbshipit-source-id: 5f98723231cc64ed203c062ee6f00d8adbdccf77
2021-05-25 15:36:56 -07:00
Kimish Patel
ede3f5421f [Pytorch Delegated Backend] Save function name in debug info (#57481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57481

This diff introduces function name to InlinedCallStack.
Since we are using InlinedCallStack for debug information in lite
interpreter as well as delegate backends, where InlinedCallStack cannot
be constructed from model source code, we need to save function name.
In the absence of function name Function* is used to get name of the
function. This is when JIT compiles code at runtime.
When that is not possible, this diff introduces a way to obtain function
name.

Test Plan:
test_backend
test_cs_debug_info_serialization

test_backend
test_cs_debug_info_serialization

Imported from OSS

Differential Revision:
D28159097
D28159097

Reviewed By: raziel, ZolotukhinM

Pulled By: kimishpatel

fbshipit-source-id: deacaea3325e27273f92ae96cf0cd0789bbd6e72
2021-05-25 13:19:02 -07:00
Kimish Patel
813adf1076 [Pytorch Delegated Backend] Save operator name and function name in (#57441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441

debug info

Previous diffs did not save operator name in debug info. For delegated
backends that only idenfity op for profiling with debug handle, operator
name should be stores as well.
Furthermore to complete debug informaton also serialize function name.

Test Plan:
Existing lite interpreter and backend tests

Existing lite interpreter and backend tests

Imported from OSS

Differential Revision:
D28144581
D28144581

Reviewed By: raziel

Pulled By: kimishpatel

fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99
2021-05-25 13:17:54 -07:00