Commit graph

11 commits

Author SHA1 Message Date
Tao Xu
8294db8f15 [iOS][CI] Remove org-member from iOS Simulator Builds (#34410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34410

### Summary

Currently, the iOS jobs are not being run on PRs anymore. This is because all iOS jobs have specified the `org-member` as a context which used to include all pytorch members. But seems like recently this rule has changed. It turns out that only users from the admin group or builder group can have access right to the context values. https://circleci.com/gh/organizations/pytorch/settings#contexts/2b885fc9-ef3a-4b86-8f5a-2e6e22bd0cfe

This PR will remove `org-member` from the iOS simulator build which doesn't require code signing. For the arm64 builds, they'll only be run on master, not on PRs anymore.

### Test plan

- The iOS simulator job should be able to appear in the PR workflow

Test Plan: Imported from OSS

Differential Revision: D20347270

Pulled By: xta0

fbshipit-source-id: 23f37d40160c237dc280e0e82f879c1d601f72ac
2020-03-09 13:22:54 -07:00
Ashkan Aliabadi
6aecfd1e80 Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722

In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Test Plan:
Build: CI
Functionality: Not exposed

Reviewed By: dreiss

Differential Revision: D20069796

Pulled By: AshkanAliabadi

fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c
2020-02-24 21:58:56 -08:00
Ashkan Aliabadi
039dc90854 Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration.
Test Plan: revert-hammer

Differential Revision:
D19521853

Original commit changeset: 99a1fab31d0e

fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6
2020-02-23 22:07:19 -08:00
Ashkan Aliabadi
941b42428a Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509)
Summary:
In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding **native** implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Reviewed By: dreiss

Differential Revision: D19521853

Pulled By: AshkanAliabadi

fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa
2020-02-23 19:08:42 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Edward Yang
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
Junjie Bai
352731bd6e Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip}
Test Plan: revert-hammer

Differential Revision:
D18632773

Original commit changeset: ea717c81e0d7

fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d
2019-11-21 15:01:09 -08:00
Edward Yang
ec30d9028a Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.

Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18632773

Pulled By: ezyang

fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
2019-11-21 11:27:33 -08:00
Tao Xu
636fbcdd0a add benchmark code to iOS TestApp (#28405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28405

### Summary

As discussed with AshkanAliabadi  and ljk53, the iOS TestApp will share the same benchmark code with Android's speed_benchmark_torch.cpp. This PR is the first part which contains the Objective-C++ code.

The second PR will include the scripts to setup and run the benchmark project. The third PR will include scripts that can automate the whole "build - test - install" process.

There are many ways to run the benchmark project. The easiest way is to use cocoapods. Simply run `pod install`. However, that will pull the 1.3 binary which is not what we want, but we can still use this approach to test the benchmark code. The second PR will contain scripts to run custom builds that we can tweak.

### Test Plan
- Don't break any existing CI jobs  (except for those flaky ones)

Test Plan: Imported from OSS

Differential Revision: D18064187

Pulled By: xta0

fbshipit-source-id: 4cfbb83c045803d8b24bf6d2c110a55871d22962
2019-10-22 12:52:30 -07:00
Tao Xu
52985a3501 Install developer certificate for code signing (#27593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27593

## Summary

Since the nightly jobs are lack of  testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary.

Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs

- [#26261](https://github.com/pytorch/pytorch/pull/26261)
- [#26632](https://github.com/pytorch/pytorch/pull/26632)

The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a  [tutorial](https://circleci.com/docs/2.0/ios-codesigning/)  but is too complicated to implement. Anyway, I figured out an easier way to do it

1. Disable automatically code sign in XCode
2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done)
3. Install the developer certificate to the key chain store on CI machines via Fastlane.
4. Add the testing code to PR jobs and verify the result.
5. Add the testing code to nightly jobs and verify the result.

## Test Plan

- Both PR jobs and nightly jobs can finish successfully.
- `xcodebuild` can finish successfully

Test Plan: Imported from OSS

Differential Revision: D17848814

Pulled By: xta0

fbshipit-source-id: 48353f001c38e61eed13a43943253cae30d8831a
2019-10-09 20:07:30 -07:00
Tao Xu
736c754739 add sdk support for xcodebuild script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27358

Test Plan: Imported from OSS

Differential Revision: D17757389

Pulled By: xta0

fbshipit-source-id: ed8e470b9c6329b96297ee7c65ba08759251baad
2019-10-03 20:11:08 -07:00
Renamed from scripts/xcode_ios_x86_build.rb (Browse further)