Commit graph

11997 commits

Author SHA1 Message Date
Dmitri Smirnov
871af477d7
Fix outputs of Sequences and Maps exposure. (#5743)
Fix outputs of Sequences and Maps exposure.
  Add more test conditions.
  Make sure RunWithBingind calls the right function.
2020-11-11 10:21:22 -08:00
liqunfu
1416d12f0b
Liqun/merge e2e pipelines (#5702)
* Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-11 09:42:08 -08:00
Yufeng Li
2ba637c558
Implement Scale function for quant gemm (#5632)
* Implement a Scale function for quantization

Quantized GEMM is always followed by Scaling (PerTensor Or PerColumn), and often need to be accumulated to an existing matrix. This PR implements a post-processor for quantized GEMM result and accumulate it to another matrix.
2020-11-10 23:34:38 -08:00
George Wu
cca8cd849a
update native build instructions for ACL on Jetson. (#5764) 2020-11-10 23:10:59 -08:00
Changming Sun
79350a642a
Update install_deps.sh: remove the unnecessary data generating step (#5758)
We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.
2020-11-10 22:19:03 -08:00
Guoyu Wang
0767c4fdfb
Fix x86 build break (#5759) 2020-11-10 20:33:27 -08:00
Guoyu Wang
042365029f
[NNAPI] Split OPBuilder IsOpSupported into a separated class (#5746)
* init change

* Split opbuilder into opbuilder and opsupportchecker

* Update code comments

* Address CR comments, some minor code updates
2020-11-10 15:00:38 -08:00
Scott McKay
6803e4ab44
Fix BatchNormalization registrations. (#5750)
Add diatribe on how to correctly update registrations.
2020-11-11 07:32:26 +10:00
Dwayne Robinson
7d98596bfa Merged PR 5390213: Tile allow 0 in repeats
0 is valid in Tile in "repeats" parameter. The CPU kernel handles it fine. So should the DML EP.

Related work items: #29970551
2020-11-10 21:05:59 +00:00
Alberto Magni
c75b7c5c47
[CMake] Enable NCCL only when enabling CUDA or ROCm support (#5516)
Conditionally enable NCCL depending on CUDA and ROCM

Before this change NCCL support was enabled unconditionally, even
when building without CUDA or ROCM support.
This caused the command:
$ ./build.sh --enable_training

To trigger the following cmake warning
-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)
CMake Warning at CMakeLists.txt:1282 (message):
NCCL is not found. Please use --nccl_home to specify the path of NCCL.
Otherwise, NCCL is disabled.

This is a spurious warning because the user did not ask to search for NCCL.
2020-11-10 12:39:23 -08:00
Tim Harris
48b14b52b8
Remove Env::Task wrapper around std::function (#5753)
This is a small perf / clean-up change. It removes the Env::Task abstraction which wraps a single std::function field, and adds at least one virtual method call overhead when creating a Task and when executing it. The POSIX and Windows implementations are now identical.
2020-11-10 20:22:07 +00:00
leqiao-1
2b1ebbc286
update MCR images table (#5509)
Add tag 1.5.2 for images. 
Remove tensorRT image from table.
2020-11-10 11:47:59 -08:00
edgchen1
4c6118eb49
Update get_applicable_matrix_reduction() to combine dimensions of 1 with the given reduction axes. (#5734) 2020-11-10 10:32:50 -08:00
Hariharan Seshadri
63b85fc696
Fix VS 2017 build break (#5745) 2020-11-10 10:25:43 -08:00
Xavier Dupré
d59f057db3
enable string for operator Shape (#5742) 2020-11-10 18:38:36 +01:00
Xavier Dupré
8c74df2068
Add support for string with operator Expand (#5751) 2020-11-10 18:38:20 +01:00
Changming Sun
4094a09a56
Merge pull request #5731 from microsoft/snnn/rtti
Disable RTTI in Windows GPU CI pipeline
2020-11-10 09:02:59 -08:00
Changming Sun
00b18d9dc5 Update InferenceTest.cs to exclude one more model in x86 mode 2020-11-10 09:02:43 -08:00
Tim Harris
5e44d25c5a
Support multi-loop parallel sections, use multi-loop sections in GRU (#5602)
This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops.

The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU.

The main changes are:

- Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool.

- In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section.

- Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops.

- Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available.

- Use of parallel sections in the GRU operator.

- Additional test cases in threadpool_test.h.

- Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.
2020-11-10 12:24:57 +00:00
Edward Chen
919c270f3c Increase build timeouts. 2020-11-09 22:26:27 -08:00
ISS Build Account
de85638543 Merge remote-tracking branch 'upstream/master' into DmlDev 2020-11-10 00:46:04 +00:00
Nick Feeney
838bc77f3b Merged PR 5386132: Update 8D BatchNorm
Update 8D BatchNorm

Related work items: #27678610
2020-11-09 22:45:46 +00:00
edgchen1
2acdc3cd82
Move GetUseDeterministicCompute() to OpKernelContext to avoid need to downcast to OpKernelContextInternal. (#5729) 2020-11-09 11:37:06 -08:00
ashbhandare
12d39ef4ed
Remove onnx backend test filters for updated ops (#5718)
* remove unneeded filters

* block openvino tests
2020-11-09 10:57:58 -08:00
Weixing Zhang
bb1af718b5
fix build failures due to recent change(858040fa) in CUDA EP (#5736)
Some part of code for reduction kernels has been changed in 858040fa,
which cause failures in rocm build since ROCm EP shares some code with
CUDA EP. This PR is to quick fix this failure by not sharing two files
for now to unblock CI enabling on ROCm EP. Another PR for leveraging
858040fa for ROCm EP will be done later.
2020-11-09 08:41:30 -08:00
Scott McKay
c0c9ab4d81
Fix kernel registrations for Equal, Greater and Less (#5730) 2020-11-08 07:33:49 +10:00
ISS Build Account
81b2cb9714 Merge remote-tracking branch 'upstream/master' into DmlDev 2020-11-07 02:27:13 +00:00
Dwayne Robinson
1e13ecabf7 Merged PR 5380534: BatchNormalization failure in autopilot - fix output size
New validation [here](https://microsoft.visualstudio.com/DefaultCollection/WindowsAI/_git/WindowsAI/pullrequest/5354070?_a=files&path=%2Fdml%2FSharedValidation%2FDmlBatchNormalizationOperatorValidator.h) causes some BatchNorm cases to fail (e.g. OnnxConformanceTestsTaef::BatchNormalization (BatchNormalization_2x2x2)). I'm unsure how long this bug existed, but based on Nick's investigation, it apparently still worked anyway.

Related work items: #27678610
2020-11-07 02:13:42 +00:00
Dmitri Smirnov
2bf5046d4e
Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs (#5716)
Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs
  that contain uint16_t values for float16 and bfloat16.
  These will serve as type dispatching types for C++ API.
  They are of uint16_t size and arrays of these types can be used
  to create Tensors of the corresponding types.
  Make documentation Doxygen compliant.
2020-11-06 16:41:26 -08:00
Weixing Zhang
fff85a6a35
Add GPU kernels for ROCm EP (#5655)
* Add kernels for AMD GPU.

This PR is mostly about GPU kernels for ROCm EP. Due to similar GPU programming language (CUDA and HIP and similar math library calls, one principle in ROCM EP design is to share CUDA kernels as much as possible for ROCm. Thus, the script amd_hipify.py has been created for converting CUDA kernels to ROCm HIP kernels automatically during compilation phase. But, for some reasons such as perf issue, syntax difference..., some converted kernels need some manual intervention. These kernels will be checked in the repo physically for now. In order to avoid manual intervention, the plan is to refactor CUDA kernels to make them portable between CUDA EP and ROCm EP as much as possible.

Please refer to "HIP Porting Guide" for details.

* like lamb, multi-tensor-apply needs to be disabled for IsAllFiniteOp and ReduceAllL2, current AMD GPU compiler has perf issue for kernel parameter which is a structure with "pass by value".

* Use hipMemsetAsync and add checks on HIP calls.

* move the generated files to build folder.

Co-authored-by: Jesse Benson <jesseb@microsoft.com>
2020-11-06 16:11:06 -08:00
Ryan Lai
697e8faa9e
Skip failing x86 winml tests and update testData environment variable path mechanism (#5719)
* Skip failing x86 winml tests

* fix gpt2 rename typo

* there are actually 2 gpt model tests
2020-11-06 13:59:29 -08:00
Johannes Bannhofer
9ec6da1e27
added missing flag ORT_TENSORRT_DUMP_SUBGRAPHS (#5724)
[DOCUMENTATION]
added descriptionof the function ORT_TENSORRT_DUMP_SUBGRAPHS to the documentation
2020-11-06 12:33:18 -08:00
Johannes Bannhofer
6f6dd0b869
added missing flag ORT_TENSORRT_DUMP_SUBGRAPHS (#5724)
[DOCUMENTATION]
added descriptionof the function ORT_TENSORRT_DUMP_SUBGRAPHS to the documentation
2020-11-06 12:32:12 -08:00
Chi Lo
92292de135
Tensorrt perf tool (#5436)
* Add YAML file for pipeline

* Modify typo

* Add working directory

* Modify and test

* Modfiy and test

* Modify and test

* Modify and test

* Modify

* Modify

* Modify

* Modify

* Make sure to copy all the result files

* Add clearn up

* Modify

* Modify agent pool name

* Upload only specific artifacts

* Modify

* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target

* Fix bug

* Fix bug

* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation

* Modify the script file for CI

* Replace print with logger.info

* Fix bug

* Fix bug

* Refine the code

* Modify the script so that it can capture script segmentation fault while
running ORT

* Fix bug

* fix bug

* fix bug

* Add debug info

* fix bug

* Refine perf code

* Refine the code

* fix bug

* Code refactoring

* change many-models path

* remove metadata after validation/benchmark are done

* Update README.md

* Fix bug so that metadata doesn't hold stale value

* Remove hardcode and update README

* Add arguments to the script to make it run correctly

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Fix bug so that metadata doesn't hold stale value

* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information

* use -i random for perf test of TRT changes

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-11-06 12:27:42 -08:00
Ye Wang
95e6da7957
Revert saving optimized model as external data (#5690)
* revert and add support for saving external data

* review comments

* update
2020-11-06 11:54:19 -08:00
RandySheriffH
71f90e08f1
Nuget packaging no omp (#5666)
* create new nuget packaging pipeline without openmp

* rename package

* update image name

* rename package name

* rename managed package

* reset project attribute

* merge master

* set package name

* set NoOpenMP as cpu build

* shorten line length

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-11-06 11:43:35 -08:00
Zhang Lei
77b1eea9cf
Add option to allow quantize_input() use input_qtype for initializers. (#5721) 2020-11-06 09:33:24 -08:00
George Wu
f666c3d7d7
update jetson build instructions (#5725) 2020-11-06 09:33:04 -08:00
Zhang Lei
24016a517b
Prepacking in Gemm with merged logic for Matmul and Gemm on PackingB. (#5693)
Prepacking in Gemm with merged logic for Matmul and Gemm on PackingB.
2020-11-05 22:35:24 -08:00
Nat Kershaw (MSFT)
479ed740ef
Add link to survey to README (#5685)
* Add survey request to README

* Remove period

* Fix #5681 - broken link
2020-11-05 18:01:08 -08:00
Maajid khan
d6f9cc181d
Modify logic to determine OV Version (#5701)
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2020-11-05 15:12:02 -08:00
Adam Pocock
d1d82065b9
[Java] Fixes an error allocating large direct byte buffers during OnnxTensor creation (#5619)
* Fixing an error with allocating large direct byte buffers during tensor creation.

* Removing the redundant overflow check.
2020-11-05 15:02:41 -08:00
Pranav Sharma
28197b1460
Register opset13 flatten, LRN for cuda. (#5694)
* Register opset13 flatten, LRN, ArgMax and ArgMin for cuda.

* Fix build
2020-11-05 14:13:15 -08:00
Scott McKay
11fe683471
Partition full graph one execution provider at a time (#5635)
* Partition full graph one EP at a time, bottom-up. Nuphar requires this and it makes life simpler for an EP as they can just check if all nodes in a subgraph are assigned to it when processing the control flow node containing the subgraph.

Make a couple of nuphar error messages more meaningful.
2020-11-06 07:26:00 +10:00
edgchen1
858040faaa
Implement reduce_matrix_columns() to optimize ReduceSum (#5639)
Implement reduce_matrix_columns() to optimize ReduceSum.
2020-11-05 10:25:00 -08:00
George Wu
c46515cd56
[TensorRT EP] Remove cudaDeviceSynchronize and use cudaAllocator for scratch buffers (#5714)
* use cuda allocator, remove cudaDeviceSync call

* use unique_ptr for scratch buffers
2020-11-05 09:45:27 -08:00
Dmitri Smirnov
fd9d0c4ee0
Remove redundant const_cast (#5705)
Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-11-05 09:43:22 -08:00
Tiago Koji Castro Shibata
9e68e98423
Add static CRT DLLs to Nuget package (#5661)
* Add static runtime yaml option

* Add to WAI Nuget build matrix

* Support empty build flags

* Add DML to x64

* Bundle static rt

* Bundle after Nugets are built

* Fix typo

* Skip static tests

* Pack test artifact only in x64 dynamic

* No DML static runtime

* Add Store static

* Revert "Add Store static"

This reverts commit 69133e5838.

* Static subfolder
2020-11-05 09:26:17 -08:00
Tim Harris
ff23083de2
Unbreak microbenchmark build (#5710)
Minor updates to the microbenchmarks built optionally with "--build_micro_benchmarks". These are not built as part of CI, and builds started to fail. There are three changes:

- I updated the threading-related benchmarks to use the static-method ThreadPool API, and to expose control over the thread pool configuration via constexpr int variables.

- Disable GCC warnings seen with recent compiler versions when including parts of the Eigen headers in batchnorm.cc and eigen.cc files.

- Flush std::cerr on error conditions to avoid buffered messages being lost.

I tested manual builds with Linux (GCC) and Windows (MSVC).
2020-11-05 10:46:59 +00:00
Yufeng Li
5c4543e194
Calibrate float tensor only (#5704) 2020-11-04 23:55:48 -08:00