Commit graph

320 commits

Author SHA1 Message Date
Orion Reblitz-Richardson
5a7b4840d9 Move nanopb-generated ONNX to unique file name (#8773)
* Move nanopb-generated ONNX to unique file name

* fix other places
2018-06-22 09:51:56 -04:00
Richard Zou
8489c4cc6e
Better support for literals in jit script (#8687)
Addresses #8177

A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version

General approach:
- Add NumberType, FloatType, IntType to represent Python numbers, floats and ints.
- Emit these types for python literals
- Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType.
- Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below)
- Erase NumberType,  prim::NumToTensor, and prim::TensorToNum for ONNX export

### Tensor/number math
```
import torch
@torch.jit.script
def fn(x):
    return x + 1
```
```
graph(%x : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : Dynamic = prim::NumToTensor(%1)
  %3 : Dynamic = aten::type_as(%2, %x)
  %4 : Dynamic = aten::add[alpha={1}](%x, %4)
  return (%5);
}
```

### Number/Number Math
```
import torch
@torch.jit.script
def fn(zero):
    c = 1 + 1
    return zero + c
```
```
graph(%zero : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : int = prim::Constant[value={1}]()
  %3 : Dynamic = prim::num_to_tensor(%1)
  %4 : Dynamic = prim::num_to_tensor(%2)
  %5 : Dynamic = aten::add[alpha={1}](%3, %4)
  %c : int = prim::TensorToNum(%6)  # this is the result of the addition
  ...
  return (%13);
}
```

List of squashed commits:

* Introduce Python Number types

Added: IntType, FloatType, NumberType with
IntType <: NumberType
FloatType <: NumberType

Changed aten_schema so arguments have corresponding types

* Emit a NumberType for python literals.

Also emit a NumberType for Scalar default values.

* Add prim::NumToTensor and prim::TensorToNum

* Add DynamicType -> NumberType implicit cast for bc

* Better ensureTensor error message

* Add ensureTensorOrNumber. Allow passing Number to some functions

Like the range() construct and slices

* Patch IntList to work.

IntList is still a DynamicType in the frontend: a tensor gets built from
a List[int].

Also, IntList[1] is a "union between int and IntList" the way it is
implemented. If the frontend sees an int being passed for an IntList[1]
arg, it converts it to a tensor as well.

* Enforce some order on schemas to avoid overload ambiguity

add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This
matches the order in which python_arg_parser parses its arguments.

* Disable std_dim and var_dim tests.

With the new schema information, std(input, keepdim) and std(input, dim)
are ambiguous. This will need to be fixed at a later date.

* Add NumberType erasure pass.

This is used for ONNX export and to ensure that NumberType information
doesn't reach the interpreter

* Add support for mixed tensor/number math ops.

* Tests for new functionality.

Includes:
- Tensor/number math
- number/number math
- EraseNumberTypes pass test

* Patch tests

Update expect tests for:
- decompose_addmm
- loop unrolling tests

Because python numbers are now NumberType, they cannot be returned by
functions anymore. Work around this by using "torch.full", or by adding
a tensor([0]) (taken from FIXME_zerol()). Both approaches are used
because torch.full is more readable, but it is broken in some cases.

* Add erase_number_types to torch/CMakeLists.txt

* Move math back to emitSimpleExpr from emitSugaredExpr

* Remove some dead lines

* Renable some excluded script/trace tests that are fixed.

* Move some tests to expected failure

* Address some comments (more addressing to come)

* Erase relevant aten::type_as nodes in EraseNumberTypes

I also changed it so that EraseNumberTypes is only called for ONNX
export. It is no longer used to prevent
prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or
interpreter.cpp.

shape_analysis infers the type of the output of these nodes to be the
same as their input.

intepreter.cpp treats both of these nodes as no-ops.

* Add reminder to fix std/var

* Call EraseNumberTypes only when exporting a script module

* Update expects after rebase
2018-06-21 15:43:38 -04:00
anderspapitto
48e90e3339 Build system changes (#8627)
* All changes needed to get rid of process_github.sh

* allow thnn_h_path
2018-06-20 17:45:26 -04:00
Teng Li
61c96811be
[c10d] NCCL python binding and CI test, with bug fixes (#8357)
* [c10d] NCCL python binding and CI test, with bug fixes

* Addressed comments and further bug fix

* Made NCCL build optional, made C10D libc10d.a only

* Fixed tests so that NCCL pg won't run when not neeeded

* Addressed comments
2018-06-19 13:02:39 -07:00
cpuhrsch
05c473b85c
Temporarily remove TBB (#8255) 2018-06-18 19:31:57 -04:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
Tongzhou Wang
c537fd7432 fix lint (#8567) 2018-06-15 17:34:39 -04:00
Soumith Chintala
dc186cc9fe
Remove NO_* and WITH_* across codebase, except in setup.py (#8555)
* remove legacy options from CMakeLists

* codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY

* cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA

* removed NO_* variables and hotpatch them only in setup.py

* fix lint
2018-06-15 12:29:48 -04:00
Orion Reblitz-Richardson
edd4e2c5d1
Expose proto utils and ONNX (#8073)
* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files
2018-06-13 10:25:32 -07:00
Jorghi12
81b92f7515 Get ROCm building again on master (#8343)
Billing of changes:

- New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running
- New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily
- New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet)
- A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved
- Switch from hiprng to hcrng
- Apply some patches directly in code, eliminating the patches
- Use __hdiv instead of hdiv, it's more portable
- THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub
- Add a few more overloads HIP needs
- Turn off use of hcc to link (we plan to turn this back on to get tests running)
- Search for hiprand, hiprng, hipblas, hipsparse
- Better Python 2 portability
2018-06-12 23:05:21 -04:00
albanD
78e3259bbe Add autograd automatic anomaly detection (#7677)
* add autograd automatic anomaly detection

* python 3 string support

* Fix non python build

* fix typo in doc

* better test and naming fix

* fix no python build and python object handling

* fix missing checks

* clean NO_PYTHON build

* Remove unwanted changes
2018-06-11 21:26:17 -04:00
Pieter Noordhuis
695d40efc2
Create initial Python bindings for c10d (#8119)
* Build and install c10d from tools/build_pytorch_libs.sh

* Create initial Python bindings for c10d

* clang-format

* Switch link order to include more symbols

* Add bindings and tests for ProcessGroupGloo

* Add broadcast test

* Separate build flag for c10d

* Explicit PIC property

* Skip c10d tests if not available

* Remove c10d from Windows blacklist

Let it skip by itself because it won't be available anyway.

* Make lint happy

* Comments

* Move c10d module into torch.distributed

* Close tempfile such that it is deleted
2018-06-08 12:59:51 -07:00
Soumith Chintala
5e372c7106 fix lint 2018-06-06 12:53:58 -04:00
Paul Jesse Hellemn
8e6f7a1382
[Caffe2] Merging setup.py with setup_caffe2.py (#8129)
* Mergine setup.pys, torch works, caffe2 works up to other KP

* Fix to super call for python 2

* Works on python2 on mac

* Consolidating Caffe2 flags
2018-06-06 08:31:31 -07:00
Adam Paszke
f45a3d5558
Add a loop unrolling pass to PyTorch JIT (#7672) 2018-06-06 09:36:12 +02:00
Zachary DeVito
23dd033b51 Factor python dependency out of interpreter (#7970)
* Factor python dependency out of interpreter

* Remove NO_PYTHON for the autograd engine

If there is no python bindings, then a default Engine is constructed
the first time it is requested.

If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.

Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.

* Fixing AlexNet test which is skipped in CI
2018-06-01 16:07:21 -04:00
James Reed
1f94a6eab3 [JIT] Fission and fusion passes for addmm (#7938)
* Addmm decomposition pass

* Addmm peephole pass

* Fix handling of output shape in fusion pass

* Add DCE to the peephole passes

* add comments

* maybe bugfix?

* Fix GPU tests

* fix py2/3 test issue
2018-05-30 18:06:58 -04:00
Orion Reblitz-Richardson
4bf0202cac
[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00
Zachary DeVito
286cd04a20
JIT cleanup (#7631)
Cleans up dead code in the JIT:

* Remove interpreter_autograd_function
* Remove Handles
* Remove HandleBuilder
* Remove creates_handles, and tracing_autograd_python_function flags
* Remove unused var_args
* Fix submodules
2018-05-21 10:06:29 -07:00
Adam Paszke
b45f2ff1ae
Remove CompiledFunction + clean up JIT tests (#7421) 2018-05-16 20:03:04 +02:00
Jorghi12
cd86d4c554
PyTorch AMD Build Scripts (#6625)
* PyTorch AMD Build Script.

* Python invocation for hipify

* Adding individual hip fles.

* Updating CWD

Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.

* Updating folder path for amd_build

* Removing previous amd_build directory

* Updated setup.py to support WITH_ROCM

* Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD.

* Removing old BatchNorm & Conv files since they've been renamed.

* Updating build path to handle ROCM

* Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths.

* Seperated the individual patch files to make it easier to detect issues while building.

* Removed CMakeLists hip files and fixed directory structure

* Adding build pytorch amd script

* Merged setup patch into PyTorch setup.py & cleaned a few issues

* Added information on where to download the hipify-python script.

* Resolved linting issues inside of build_pytorch_amd.py

* Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process.

* Refactored the PR for supporting HIP

* Minimizing the number of changes inside individual patches.

* Cleaned up patch files.

* Removed patch files.

* Updating patches

* Removing HIP change from file.

* Cleaned up patches

* Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now.

* Removing the other HIP file

* Removed patch file + merged ROCm into Aten/test

* Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack.

* Reduced the number of patches down to 14 after Edward's suggestions.

* Transferred deletion of certain functions from patch to yaml file.

* Set default Thrust path

* Fixed aten files so we now use the templated pow/abs instead of std:: directly.

* Removed error from aten/src/THCUNN/Abs.cu

* Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP.

* Removed hip extraction from the build script and removed the old hip file.

* Replaced MACRO with function in upper level cmake.

* Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests.

* Updated aten tests.

* Removed the hip include from the ATen header.

* Can't throw exceptions on C++ AMP, using abort

* Missing IF guards for cuda/hip executables in aten tests.

* Removed a series of patch files.

* Added template keyword to help out the HCC compiler.

* Rebased the specific files displayed in the PR

* Fixing typo.

* Change flag from "WITH_CUDA" to "NOT NO_CUDA"

Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase.

* Fix LoadHIP path

* Updating build files after rebasing.

* Reorganization after cpu/gpu separation.

* Removed HIPCC from setup.py & removed -shared extra linking args.

* Updated CMake / Setup build to correctly link when under ROCm stack.

* Removed the unnecessary argument from Extension constructor.

* Adding another test to be included with ROCm building.

* Updated the setup_helpers scripts in order to get around linter error

* Fix syntax issue

* Solving lint issue: line too long
2018-05-15 18:38:01 -07:00
Zachary DeVito
ce69d3110b
Improve script builtin checking using schema (#7311)
Improve script builtin checking using schema

* This add aten_schema.h which provides a barebones amount of type and
  argument information about each builtin operator
* emitBuiltinCall is updated to use this information rather than
  aten_dispatch to ensure the operator is correct.
* handling of keyword and position arguments now matches python behavior
* There is no longer a requirement that kwargs be constant or that the
  attributes of an op must be entirely constant or non-constant
* compiler now constructs a non-attributed version of the op first and
  then turns it into the constant-attribute version if all attributes
  are constants.
* default arguments for builtins now work
* SugaredValue::call and similar functions now have SourceRange information
  for their arguments so that error reporting is more accurate

Notes:
* This does not try to merge the builtin checking with python arg parser.
  Given that we will eventually have C10 schema which will replace aten_schema,
  we will eventually have a C++ description of the schema and working of that
  description directly will be the easiest form to understand.
* python function calls and script method calls do not support keyword arguments yet.
  When we add this support we should refactor the handling in tryEmitSchema
  that resolves keywords into a common function.

* default arguments work
* keyword arguments to builtins work (still need to extend to calling python and other script methods)
* much better error reporting for incorrect builtins

Lift any constants to attributes on nodes when possible

* Schema  is usable internally in the compiler as
  the function signatures of script functions as well as for builtin
  operators.
* Adds a List[T] class to better represent the arguments to cat/stack
  as a type rather than with custom checking.
* Support kwargs for calls of script methods

A future commit will be needed to add support for:
* calls to script _functions_ which are currently are GraphExecutors without schema info.
* kwargs to python functions, which will require refactoring python op
2018-05-14 14:46:36 -07:00
Zachary DeVito
93eb50c103
Mark expand nodes as implicit/explicit in trace (#7303)
When tracing we record expand nodes. This is useful in some cases because
it makes it clear a broadcast happened. However, in future runs
the broadcast may be different or not needed. This change adds an
attribute to expand to track if it was implicitly added. This
takes the form of an unused input to expand with a default value.

The execution engine then removes implicit expands before execution.
Note that shape_analysis will re-add expands when it can prove by
shape analysis that they will exist and this is useful for the fuser,
so this change should not affect fusion passes.
2018-05-10 10:47:43 -07:00
Edward Z. Yang
64834f6fb8
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
Peter Goldsborough
54a4867675
Bring back C++ extension torch.h (#7310)
* Bring back C++ extension torch.h

* Fix python.h include in python_tensor.cpp
2018-05-05 14:06:27 -07:00
Peter Goldsborough
feb64b5291
Add -Wno-unknown-pragmas (#7291) 2018-05-04 13:44:13 -07:00
Peter Goldsborough
67d0d14908
Rename autograd namespace to torch and change torch.h into python.h (#7267)
* Rename autograd namespace to torch and change torch.h into python.h

* Include torch.h instead of python.h in test/cpp/api

* Change some mentions of torch.h to python.h in C++ extensions

* Set paths directly, without find_path
2018-05-04 08:04:57 -07:00
Soumith Chintala
92f54e1f01
remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259) 2018-05-03 12:32:57 -07:00
Luca Antiga
5d3c3c53aa
Add raw IR serialization/deserialization (#6392) 2018-05-01 20:21:29 +02:00
Luca Antiga
0703357723 Don't build THD/master_worker if not explicitly requested (#7081) 2018-04-29 13:17:09 -04:00
James Reed
4667983f0f
Fixes for interpreter and ONNX export for translation (#7044)
Fixes for interpreter and ONNX export for translation

Address comments
2018-04-27 22:23:57 -07:00
Peter Goldsborough
7b09bc72a5
[WIP] Enable WERROR in tests (#6539)
* Enable WERROR in tests

* Also set WERROR=1 for cpp_build in CI

* Enable Werror after the compiler checks

* Remove -DWERROR because its picked up from the env var

* Had to fix some errors in aten/contrib/data

* Allow an uninitialized variable in ReduceOpsKernel.cpp

* Use CUDNN_DATA_UINT8 in cuDNN type string conversion

* Fixes and use target_compile_options

* Fix uninitialized variables in THNN

* Include Python.h earlier in tensor_types.cpp

* Use CUDNN_VERSION 7100 instead of 7000?

* More Python.h includes

* Make switch case in common_subexpression_elimination.cpp exhaustive

* Build with WERROR=0 just to see all the warnings

* Remove some Python includes

* Enable WERROR=1 again

* Bring back switch case default
2018-04-28 01:51:16 +01:00
Soumith Chintala
bd14d8e8f8
add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891) 2018-04-25 22:10:38 -04:00
Orion Reblitz-Richardson
dec5e99e99 [aten] Move submodules to third_party (#6866)
* [aten] Move submodules to third_party

* [aten] Update aten_mirror.sh script for third_party

* [aten] Move ATen submodules def to root and rename

* [aten] Update cpuinfo cmake build

* [aten] Fix cpuinfo cmake build

* Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1

* [aten] Fix JIT test reference to catch
2018-04-24 23:33:46 -04:00
gchanan
1c7b0c1020 Update version string to 0.5. (#6795) 2018-04-22 13:57:48 -04:00
bddppq
c43c911662
Export onnx protobuf bindings to python (#6651)
* Export onnx protobuf bindings to python

* rename native onnx module to _onnx
2018-04-17 16:38:57 -07:00
srib
53d2612b55 Fix a typo in the setup.py script (#6632) 2018-04-16 15:29:45 -04:00
Zachary DeVito
825ce7f196
[jit][script] Allow tuples to be re-assigned (#6538)
* Allow tuples to be re-assigned

This commit improves our support of tuples by making them more first-class.
In particular, it allows tuples to be re-assigned across loops and ifs.
It does this by making them first-class values in the Graph IR, and then
removing the tuples in a LowerTuples pass.

An alternative approach would have added more support for desugaring tuples
in the Environment object as they were emitted. Instead,
the current approach was chosen anticipating a future when tuples are
fully supported (including the interpreter). In that future, the current
code can be completly reused with the LowerTuples pass just becoming
a optimization that removes unneeded tuple allocations.
2018-04-13 17:34:50 -07:00
Peter Goldsborough
e4f1d3b538
Better warnings (#6428)
* Better warnings

* Remove -Wc++14-extensions because gcc does not know it

* Warning fix in input_buffer.cpp

* Remove pedantic for torch/csrc/

* Also use Wextra and Wall for ATen

* Use check_env_flag

* Undo changes in shape_analysis.cpp

* Remove C linkage flag
2018-04-10 23:34:25 -07:00
peterjc123
5651695a99 Fixes #6386, Use copies instead of symbolic files (#6396)
* Use copies instead of symbolic files

* bug fix

* Remove useless item
2018-04-09 13:54:10 -04:00
Soumith Chintala
108f5c197f
[pytorch] add static linkage support for CuDNN and NCCL (#6410)
* when linking static CUDA libs, additional dep on culibos.a

* add USE_STATIC_NCCL option

* add USE_STATIC_CUDNN option

* remove libATen soversion

* add caffe, caffe2 folders to setup.py exclude list
2018-04-08 22:54:18 -04:00
Ben
119ea39021 add cuda headers (#6401) 2018-04-08 10:50:20 -04:00
gchanan
87e369111a
Add string-style devices to all tensors. (#6283)
* Add string-style devices to all tensors.

Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor.   This made it necessary to if/else code that
was meant to be device agnostic.

This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'.  For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.

2) Adds a DeviceSpec class.  This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.

DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously.  I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.

3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs.  For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')

TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions.  We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs

* Add deviceInt64 to python arg parser.

* device_str.

* Remove device_str.

* remove device prefix from attributes.

* Use const char * instead of string.

* Move autogpu index out of Device.

* comment on is_default.

* Rename torch.DeviceSpec to torch.device.

* comment.

* Fix tests.

* Fix flake8.

* Fix sparse_coo_tensor parameter name.

* Improve error message.

* Remove device_ prefix from C++ device object.

* Allocate static strings.

* Return not implemented from rich compare.

* Move torch::Device to THPDevice.

* Remove cuda index.

* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06 15:12:05 -04:00
Sam Gross
6b3a4637d6
Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785)
This changes type(tensor) to return `torch.Tensor` instead of
`torch.autograd.Variable`.

This requires a few implementation changes:

 - torch.Tensor is now a regular Python class instead of a
   pseudo-factory like torch.FloatTensor/torch.DoubleTensor
 - torch.autograd.Variable is just a shell with a __new__ function.
   Since no instanes are constructed it doesn't have any methods.
 - Adds torch.get_default_dtype() since torch.Tensor.dtype returns
   <attribute 'dtype' of 'torch._C._TensorBase' objects>
2018-04-03 16:29:25 -04:00
gchanan
4c81282c33
Introduce torch.layout and split layout from dtypes. (#6145)
* Introduce torch.layout and split layout from dtypes.

Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.

Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function).  But this doesn't really follow for sparity, which isn't a common case.

It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array).  But this should be the same whether or not the tensor is represented via strides, sparsity, etc.

This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
   torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.

* Formatting, make init throw python_error.

* Fix cuda not enabled error message.

* Fix test.
2018-04-02 14:07:50 -04:00
peterjc123
63af898d46 Fix extension test on Windows (#5548)
* Change cpp_extensions.py to make it work on Windows

* Fix linting

* Show python paths

* Debug

* Debug 1

* set PYTHONPATH

* Add ATen into library

* expose essential libs and functions, and copy _C.lib

* Specify dir in header

* Update check_abi for MSVC

* Activate cl environment to compile cpp extensions

* change version string

* Redirect stderr to stdout

* Add monkey patch for windows

* Remove unnecessary self

* Fix various issues

* Append necessary flags

* add /MD flag to cuda

* Install ninja

* Use THP_API instead of THP_CLASS

* Beautify the paths

* Revert "Use THP_API instead of THP_CLASS"

This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.

* Use THP_API instead of THP_CLASS(new)
2018-04-02 13:53:25 -04:00
Richard Zou
7355f5cd8d Tell source users about TORCH_CUDA_ARCH_LIST (#6185)
Put it into the comments about env vars in setup.py.
Also put in a line in the README about where to find this info.
2018-04-02 13:35:14 -04:00
Ma Mingfei
f8270c0225 Enable MKLDNN convolution forward and backward (#6062)
* Enable MKLDNN convolution forward and backward

* minor change

* fix mkldnn build error when building ATen standalone
2018-03-29 15:25:07 -07:00
Simeon Monov
a90aa5d818 Fix small typo in setup.py (#6091)
Fixed small typo in setup.py
2018-03-28 16:51:08 -07:00
Edward Z. Yang
eb18a2f26c
Reorganize third-party libraries into top-level third_party directory (#6025)
- gloo, pybind11, nanopb and nccl now live in third_party.
- ATen builds in aten/build rather than torch/lib/build/aten
- A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools
- Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-27 22:09:20 -04:00