Commit graph

6281 commits

Author SHA1 Message Date
Sunghoon
6076a262dc
upgrade react-native packages to latest (#10454) 2022-02-02 15:19:40 -08:00
Viswanath Boga
ad9d2e2e89
Prefix match in first iteration of beam search OP (#10231)
* Add BeamSearch op schema

* Add ONNX conversion for beams search

* remove attention_mask and change input order

* add option to run baseline

* add check data type NULL

* applies VerifyNodeAndOpMatch to subgraph

* update input_ids shape

* Add node name for Cast node

* expose API for topk

* parse parameters

* Add beam search scorer

* output results

* fix typo

* use c++ template and format python

* fix build pipeline errors

* symbolic shape infer of input onnx

* output scores

* add kernel def hash

* Handle vocab_mask; move CheckSubgraph

* undo insert_cast_transformer.cc and fusion_utils.py

* fix typo

* fix merge

* update doc

* add repetition penalty

* refactoring: add GptSubgraph class

* move BeamSearchState from .h to .cc file

* adjust logits processor order

* add batch generation example

* fix repetition penalty for dup words in sequence

* Add test

* Add no repeat ngram processor

* refactoring: move logits processor to classes

* fix build warning

* show latency

* use allocator in beam state

* use allocator in sequences

* fix build error

* move next_positions to beam state

* Changes for prefix matching

* removing debugs

* removing more debugs

* clean up

* clean up

* cpu doc updated

* Updated docs

* updated prefix_vocab_mask dimension in convert script

* changes to support bxs prefix_vocab_mask in beamsearchop kernel

* doc update

* OperatorKernels.md updated

* matching docs from artifacts

* minor change in logits processor

* Addressing comments

* Updated the prefix vocab mask usage properly

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2022-02-03 00:14:39 +05:30
Yufeng Li
1aa0789691
add qdq support for QGemm (#10414)
* add qgemm in quantization tool

* add qdq support for QGemm

* fix build break

* fix OperatorKernels.md
2022-02-02 10:35:29 -08:00
Guoyu Wang
7318361645
[NNAPI QDQ] Add QDQ Resize support (#10442)
* Add NNAPI support of QDQ Resize

* minor update to UT

* fix build break

* fix android UT failure

* address cr comments
2022-02-01 18:14:58 -08:00
Dmitri Smirnov
91b8ad5ee7
Allow users to bind arbitrary memory using raw pointers (#10428)
Add binding external allocation
  Add negative tests
  Add missing return status check
2022-02-01 18:09:24 -08:00
Weixing Zhang
3c96760192
support rocm/migraphx EP in perftest tool (#10449)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2022-02-01 16:12:01 -08:00
Shucai Xiao
062129a5c4
Update rocm_ep and migraphx_ep to rocm4.5.2 and fix dockerfiles to build docker images correctly (#10445)
* fix build errors for the migraphx and rocm dockerfile

* add the numpy package in the migraphx and rocm dockerfile
2022-02-01 16:11:39 -08:00
Olivia Jain
a1d9a71b8b
Improve Perf System (#10404)
* move table names to one location

* remove session metadata

* reload trt inputs

* fix posting names

* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines

* remove comments

* Split up anubis job and perf run

* add trt environ variables

* No embedded links
2022-02-01 16:01:34 -08:00
Chi Lo
a7c67860a5
Reduce test time for TensorRT EP CI (#10408)
* expand model tests name

* skip cpu/cuda for trt when running onnxruntime_test_all

* only run trt ep for c++ unit test

* Update CMAKE_CUDA_ARCHITECTURES for T4

* Use new t4 agent pool

* Update YAML for run T4 on Windows

* revert code

* Update CMAKE_CUDA_ARCHITECTURES

* fix wrong value

* Remove cpu/cuda directly in model tests

* add only CMAKE_CUDA_ARCHITECTURES=75

* remove expanding model test name to see difference

* revert code

* Add fallback execution provider for unit test

* Add fallback execution provider for unit test (cont)

* add conditional to add fackback cuda ep

* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs

* use M60

* revert code

* revert code

* add comments

* Modify code and add comment

* modify comment

* update comment

* add comment
2022-02-01 15:56:33 -08:00
Yi-Hong Lyu
ef7b4dc05c
Add test quantization of ArgMax for TensorRT (#10325)
Make sure quantize_statict would insert DQ -> Q before ArgMax.
2022-01-31 16:22:16 -08:00
Guoyu Wang
68262cce86
[NNAPI QDQ] Add QDQ Conv support (#10418)
* Add qdq conv to NNAPI

* fix build warning

* addressed CR comments

* fix a minor bug in my previous merge
2022-01-31 14:36:31 -08:00
Edward Chen
c43c1691ad
Enable transpose optimizer in minimal extended build (#10349)
Enable transpose optimizer and infrastructure it depends on in a minimal extended build.
2022-01-31 09:41:04 -08:00
Scott McKay
baa1767922
Allow for an optional subgraph input to have no type info. (#10379)
Add a test for a missing optional input to Loop.
2022-01-30 08:10:13 +10:00
ytaous
85cbe8367e
[ROCm] BFloat16 support (#10416)
* reducesum bf16 support

* bf16 for add/sub/mul/div

* fix build

* bf16 for Cast

* bf16 for softmax

Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-28 22:43:27 -08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426) 2022-01-28 14:25:12 -08:00
Guoyu Wang
5f0ba31890
Remove coremltools submodule *security vulnerability* and copy the coreml model schema (#10424)
* remove coremltools submodule

* update cgmanifest

* Copy proto files directly from coremltools
2022-01-28 12:48:48 -08:00
Chen Fu
c4f1dfcfaa
Cfu s8s8 (#10413)
Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv.

Perf number with single thread:

Nokia G10 (baseline / new) in ms	Pixel 4 (baseline/new) in ms
mobilenet_edgetpu	220 / 213	18.5 / 17.6
cartoongan	8537 / 8521	967 / 928

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-28 09:26:52 -08:00
Nat Kershaw (MSFT)
1a2925acce
Add sympy package as a dependency (#10406) 2022-01-28 09:19:08 -08:00
Sheil Kumar
2dd5e75ba8
Incorrect output after GPU to GPU inference via VideoFrame and Gray8 models (#10425)
* If the tensor is of gray8 format, we should call the gray8 shader

* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-01-28 08:45:57 -08:00
Changming Sun
feae842a7c
Update pytorch-lightning (#10421) 2022-01-27 21:15:00 -08:00
Changming Sun
b14da94fc1
Exclude CETCOMPAT from Windows ARM build (#10417) 2022-01-27 17:57:01 -08:00
RandySheriffH
ce081fe655
Fix TopK with NAN on Cuda (#10314)
* reset MIN for float/double

* better logics for float/double comparision for equals
2022-01-27 16:19:55 -08:00
Rachel Guo
ff2057a817
Add sample qdq unit test case for nnapi ep qdq integration (#10358)
* add sample unit test case and make qdq modeltestubuilder shared

* update

* address pr comments

* modify redundant funcs impl

* update

* update

* address pr comments

* update

* update

* update

* fix build breaks

* minor update

* fix bad_alloc in UT

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
2022-01-27 15:10:41 -08:00
Edward Chen
0e951d7d6b
Add some more documentation for the C/C++ API tensor creation functions. (#10394) 2022-01-27 13:19:11 -08:00
Xavier Dupré
481b96d32a
STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. (#10211)
* STVM, checks pointers are not null.
* removes submodules tvm
* add missing include(FetchContent)
* add target tvm
* fix stvm test
* extend cgmanifest with dependencies of tvm
2022-01-27 20:31:13 +01:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
Edward Chen
66acf50488
Document C/C++ API documentation version info conventions. (#10396) 2022-01-27 10:20:13 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
ytaous
4d305282da
[ROCm] Enable BFloat16 for Gemm and MatMul Op (#10398)
* gemm-bf16

* gemm bf16

* gemm bf16

* matmul bf16

* minor style change

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-27 00:09:16 -08:00
dependabot[bot]
5f49f40fa5 Bump log4js from 6.3.0 to 6.4.0 in /js/web
Bumps [log4js](https://github.com/log4js-node/log4js-node) from 6.3.0 to 6.4.0.
- [Release notes](https://github.com/log4js-node/log4js-node/releases)
- [Changelog](https://github.com/log4js-node/log4js-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/log4js-node/log4js-node/compare/v6.3.0...v6.4.0)

---
updated-dependencies:
- dependency-name: log4js
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-26 20:51:49 -08:00
Hariharan Seshadri
27a4af6074
Fix some BinSkim defects (#10400) 2022-01-26 20:22:22 -08:00
Guoyu Wang
c6ef465011
minor fix in node unit change (#10405) 2022-01-26 16:42:38 -08:00
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Chi Lo
389d2db1ce
Make model tests name clear (#10220)
* add clear test name for model tests

* handle remove character

* modify for test

* Modify for correct test name

* Remove test code

* add comments

* make it only on Linux

* change function name

* Convert from wchar_t to char
2022-01-26 15:08:27 -08:00
Yulong Wang
847801f5be
[wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
ashbhandare
cf13b9dd5e
Symbolic export for numpy_T (#10390)
* Export numpy_T as onnx transpose

* further fixes, test

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-01-26 14:14:42 -08:00
RandySheriffH
a27503ebe4
use strict mode (#10397) 2022-01-26 10:27:05 -08:00
Changming Sun
5576e3553d
Remove python 3.6 from our python packaging pipeline (#10395) 2022-01-26 10:21:57 -08:00
Guoyu Wang
4af116649c
[QDQ] Hookup NNAPI GetCapability/Compile with shared QDQ selectors (#10347)
* add qdqgroup as input for NodeUnit

* minor update

* hookup nnapi_ep

* minor update

* update compiler setting

* Add a simple UT

* Pipeline change to add build minimal extended with NNAPI for Android

* move GetAllNodeUnits to node_unit.h, add UT for NodeUnits, minor updates

* minor updates

* address CR comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2022-01-25 17:13:46 -08:00
Tang, Cheng
9aa51379c9
[eager mode]: add configuration for ort virtual device count (#10346)
* add configuration for ort virtual device count

* fix build break

* fix ci build break

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-01-25 16:15:54 -08:00
Edward Chen
5eafbb50f9
Fix possible null pointer dereference. (#10373)
NodeInfo::p_node was used directly but it can be null from here:
2afce4830c/onnxruntime/core/framework/session_state_utils.cc (L381-L382)

Add an additional check that it is not null before use.
2022-01-25 14:48:51 -08:00
sumitsays
e1012a8662
Added OnRunEnd and Sync method in ExecutionProvider (#10362)
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-01-25 13:00:44 -08:00
Edward Chen
df16c605e8
Add "available since" message for C API additions since v1.10.0. (#10348) 2022-01-25 10:15:34 -08:00
Alexey Gladyshev
a0fe4a7c1c
[TVM EP] Improved usability of TVM EP (#10241)
* improved usability of TVM EP
* moved technical import under a condition related to TVM EP only
* Revert "moved technical import under a condition related to TVM EP only"
* add conditional _ld_preload.py file extension for TVM EP
* improve readability of inserted code
2022-01-25 18:48:08 +01:00
Xavier Dupré
6e95c0316d
Builds onnxruntime + eager mode with the same value for _GLIBCXX_USE_CXX11_ABI as pytorch (#10114)
* add _GLIBCXX_USE_CXX11_ABI
* restrict to eager mode
2022-01-25 11:25:31 +01:00
pallavides
790c3be7e9
Fix Reshape issue when shape size is -1 (#10356)
* Fix Reshape issue (in_place) when shape size is -1
2022-01-24 19:30:52 -08:00
Edward Chen
4b87d2c172
Fix dockerfiles/Dockerfile.arm32v7 build. (#10360)
Install CMake, ignore some Eigen warnings.
2022-01-24 19:06:09 -08:00
Chen Fu
df0c819850
fix compilation error due to symantic conflict with another PR (#10370)
Resolve PR conflicts between: #10289 and #10334
Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-24 16:32:05 -08:00
Chen Fu
2afce4830c
Symmetric QGEMM (#10289)
Adding code for symmetric quantized matrix multiplication. Used in quantized convolution, achieving significant perf gain.

TODO, use Symmetric Quantized GEMM in other operators!

TODO address activation buffer overread in custom allocators and tensors supplied by users.

DOT kernel perf test:

Pixel 5a:

Cartoongan	513.539 ms	471.786 ms
Efficient	57.5169 ms	56.4174 ms
Edgetpu	14.6673 ms	13.5959 ms
NEON kernel perf test

Pixel 3a

Cartoongan	1423.53 ms	1069.92 ms
Efficient	114.086 ms	107.968 ms
Edgetpu	39.2632 ms	36.9839 ms


Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-24 10:49:04 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00