Commit graph

6286 commits

Author SHA1 Message Date
Ye Wang
bb09acffed
Transformer model CUDA EP align with CPU on corner case (#9889)
* align with cpu on no input data

* review comments and add tests

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-02-03 12:58:49 -08:00
ytaous
63198a6566
[ROCm] BFloat16 support (#10447)
* bf16 support

* bf16 support

* UTs

* fix build

* fix UTs

Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-02-03 11:31:14 -08:00
zhangyaobit
239c6ad3f0
Support specifying an execution provider in benchmark script (#10453)
* Support specifying execution providers.

* Change default provider setting to None.

* Add support for bert_perf_test script.

* Fall back to ROCM/CUDA EP for MIGraphX/Tensorrt EP.

* Assert fall back EPs are included.

* Add model class AutoModelForCausalLM and other minor updates.

Co-authored-by: Yao Zhang <zhanyao@microsoft.com>
2022-02-02 19:11:31 -08:00
Yi-Hong Lyu
a405658370
Fuse Clip->Q to Q (#10434)
* Fuse Clip->Q to Q

* Remove unused variable argmax_node

* Remove braces around scalar initializer

* Move GetClipConstantMinMax under ORT_MINIMAL_BUILD

* Consider epsilon so we can fuse more cases
2022-02-02 18:29:30 -08:00
Rachel Guo
97b8f6f394
Add logic to NNAPI EP to exclude pre-processing involving dynamic shapes when partitioning (#10452)
* wip

* wip

* wip

* save

* address pr comments

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-02-02 15:54:19 -08:00
Sunghoon
6076a262dc
upgrade react-native packages to latest (#10454) 2022-02-02 15:19:40 -08:00
Viswanath Boga
ad9d2e2e89
Prefix match in first iteration of beam search OP (#10231)
* Add BeamSearch op schema

* Add ONNX conversion for beams search

* remove attention_mask and change input order

* add option to run baseline

* add check data type NULL

* applies VerifyNodeAndOpMatch to subgraph

* update input_ids shape

* Add node name for Cast node

* expose API for topk

* parse parameters

* Add beam search scorer

* output results

* fix typo

* use c++ template and format python

* fix build pipeline errors

* symbolic shape infer of input onnx

* output scores

* add kernel def hash

* Handle vocab_mask; move CheckSubgraph

* undo insert_cast_transformer.cc and fusion_utils.py

* fix typo

* fix merge

* update doc

* add repetition penalty

* refactoring: add GptSubgraph class

* move BeamSearchState from .h to .cc file

* adjust logits processor order

* add batch generation example

* fix repetition penalty for dup words in sequence

* Add test

* Add no repeat ngram processor

* refactoring: move logits processor to classes

* fix build warning

* show latency

* use allocator in beam state

* use allocator in sequences

* fix build error

* move next_positions to beam state

* Changes for prefix matching

* removing debugs

* removing more debugs

* clean up

* clean up

* cpu doc updated

* Updated docs

* updated prefix_vocab_mask dimension in convert script

* changes to support bxs prefix_vocab_mask in beamsearchop kernel

* doc update

* OperatorKernels.md updated

* matching docs from artifacts

* minor change in logits processor

* Addressing comments

* Updated the prefix vocab mask usage properly

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2022-02-03 00:14:39 +05:30
Yufeng Li
1aa0789691
add qdq support for QGemm (#10414)
* add qgemm in quantization tool

* add qdq support for QGemm

* fix build break

* fix OperatorKernels.md
2022-02-02 10:35:29 -08:00
Guoyu Wang
7318361645
[NNAPI QDQ] Add QDQ Resize support (#10442)
* Add NNAPI support of QDQ Resize

* minor update to UT

* fix build break

* fix android UT failure

* address cr comments
2022-02-01 18:14:58 -08:00
Dmitri Smirnov
91b8ad5ee7
Allow users to bind arbitrary memory using raw pointers (#10428)
Add binding external allocation
  Add negative tests
  Add missing return status check
2022-02-01 18:09:24 -08:00
Weixing Zhang
3c96760192
support rocm/migraphx EP in perftest tool (#10449)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2022-02-01 16:12:01 -08:00
Shucai Xiao
062129a5c4
Update rocm_ep and migraphx_ep to rocm4.5.2 and fix dockerfiles to build docker images correctly (#10445)
* fix build errors for the migraphx and rocm dockerfile

* add the numpy package in the migraphx and rocm dockerfile
2022-02-01 16:11:39 -08:00
Olivia Jain
a1d9a71b8b
Improve Perf System (#10404)
* move table names to one location

* remove session metadata

* reload trt inputs

* fix posting names

* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines

* remove comments

* Split up anubis job and perf run

* add trt environ variables

* No embedded links
2022-02-01 16:01:34 -08:00
Chi Lo
a7c67860a5
Reduce test time for TensorRT EP CI (#10408)
* expand model tests name

* skip cpu/cuda for trt when running onnxruntime_test_all

* only run trt ep for c++ unit test

* Update CMAKE_CUDA_ARCHITECTURES for T4

* Use new t4 agent pool

* Update YAML for run T4 on Windows

* revert code

* Update CMAKE_CUDA_ARCHITECTURES

* fix wrong value

* Remove cpu/cuda directly in model tests

* add only CMAKE_CUDA_ARCHITECTURES=75

* remove expanding model test name to see difference

* revert code

* Add fallback execution provider for unit test

* Add fallback execution provider for unit test (cont)

* add conditional to add fackback cuda ep

* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs

* use M60

* revert code

* revert code

* add comments

* Modify code and add comment

* modify comment

* update comment

* add comment
2022-02-01 15:56:33 -08:00
Yi-Hong Lyu
ef7b4dc05c
Add test quantization of ArgMax for TensorRT (#10325)
Make sure quantize_statict would insert DQ -> Q before ArgMax.
2022-01-31 16:22:16 -08:00
Guoyu Wang
68262cce86
[NNAPI QDQ] Add QDQ Conv support (#10418)
* Add qdq conv to NNAPI

* fix build warning

* addressed CR comments

* fix a minor bug in my previous merge
2022-01-31 14:36:31 -08:00
Edward Chen
c43c1691ad
Enable transpose optimizer in minimal extended build (#10349)
Enable transpose optimizer and infrastructure it depends on in a minimal extended build.
2022-01-31 09:41:04 -08:00
Scott McKay
baa1767922
Allow for an optional subgraph input to have no type info. (#10379)
Add a test for a missing optional input to Loop.
2022-01-30 08:10:13 +10:00
ytaous
85cbe8367e
[ROCm] BFloat16 support (#10416)
* reducesum bf16 support

* bf16 for add/sub/mul/div

* fix build

* bf16 for Cast

* bf16 for softmax

Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-28 22:43:27 -08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426) 2022-01-28 14:25:12 -08:00
Guoyu Wang
5f0ba31890
Remove coremltools submodule *security vulnerability* and copy the coreml model schema (#10424)
* remove coremltools submodule

* update cgmanifest

* Copy proto files directly from coremltools
2022-01-28 12:48:48 -08:00
Chen Fu
c4f1dfcfaa
Cfu s8s8 (#10413)
Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv.

Perf number with single thread:

Nokia G10 (baseline / new) in ms	Pixel 4 (baseline/new) in ms
mobilenet_edgetpu	220 / 213	18.5 / 17.6
cartoongan	8537 / 8521	967 / 928

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-28 09:26:52 -08:00
Nat Kershaw (MSFT)
1a2925acce
Add sympy package as a dependency (#10406) 2022-01-28 09:19:08 -08:00
Sheil Kumar
2dd5e75ba8
Incorrect output after GPU to GPU inference via VideoFrame and Gray8 models (#10425)
* If the tensor is of gray8 format, we should call the gray8 shader

* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-01-28 08:45:57 -08:00
Changming Sun
feae842a7c
Update pytorch-lightning (#10421) 2022-01-27 21:15:00 -08:00
Changming Sun
b14da94fc1
Exclude CETCOMPAT from Windows ARM build (#10417) 2022-01-27 17:57:01 -08:00
RandySheriffH
ce081fe655
Fix TopK with NAN on Cuda (#10314)
* reset MIN for float/double

* better logics for float/double comparision for equals
2022-01-27 16:19:55 -08:00
Rachel Guo
ff2057a817
Add sample qdq unit test case for nnapi ep qdq integration (#10358)
* add sample unit test case and make qdq modeltestubuilder shared

* update

* address pr comments

* modify redundant funcs impl

* update

* update

* address pr comments

* update

* update

* update

* fix build breaks

* minor update

* fix bad_alloc in UT

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
2022-01-27 15:10:41 -08:00
Edward Chen
0e951d7d6b
Add some more documentation for the C/C++ API tensor creation functions. (#10394) 2022-01-27 13:19:11 -08:00
Xavier Dupré
481b96d32a
STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. (#10211)
* STVM, checks pointers are not null.
* removes submodules tvm
* add missing include(FetchContent)
* add target tvm
* fix stvm test
* extend cgmanifest with dependencies of tvm
2022-01-27 20:31:13 +01:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
Edward Chen
66acf50488
Document C/C++ API documentation version info conventions. (#10396) 2022-01-27 10:20:13 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
ytaous
4d305282da
[ROCm] Enable BFloat16 for Gemm and MatMul Op (#10398)
* gemm-bf16

* gemm bf16

* gemm bf16

* matmul bf16

* minor style change

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-27 00:09:16 -08:00
dependabot[bot]
5f49f40fa5 Bump log4js from 6.3.0 to 6.4.0 in /js/web
Bumps [log4js](https://github.com/log4js-node/log4js-node) from 6.3.0 to 6.4.0.
- [Release notes](https://github.com/log4js-node/log4js-node/releases)
- [Changelog](https://github.com/log4js-node/log4js-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/log4js-node/log4js-node/compare/v6.3.0...v6.4.0)

---
updated-dependencies:
- dependency-name: log4js
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-26 20:51:49 -08:00
Hariharan Seshadri
27a4af6074
Fix some BinSkim defects (#10400) 2022-01-26 20:22:22 -08:00
Guoyu Wang
c6ef465011
minor fix in node unit change (#10405) 2022-01-26 16:42:38 -08:00
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Chi Lo
389d2db1ce
Make model tests name clear (#10220)
* add clear test name for model tests

* handle remove character

* modify for test

* Modify for correct test name

* Remove test code

* add comments

* make it only on Linux

* change function name

* Convert from wchar_t to char
2022-01-26 15:08:27 -08:00
Yulong Wang
847801f5be
[wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
ashbhandare
cf13b9dd5e
Symbolic export for numpy_T (#10390)
* Export numpy_T as onnx transpose

* further fixes, test

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-01-26 14:14:42 -08:00
RandySheriffH
a27503ebe4
use strict mode (#10397) 2022-01-26 10:27:05 -08:00
Changming Sun
5576e3553d
Remove python 3.6 from our python packaging pipeline (#10395) 2022-01-26 10:21:57 -08:00
Guoyu Wang
4af116649c
[QDQ] Hookup NNAPI GetCapability/Compile with shared QDQ selectors (#10347)
* add qdqgroup as input for NodeUnit

* minor update

* hookup nnapi_ep

* minor update

* update compiler setting

* Add a simple UT

* Pipeline change to add build minimal extended with NNAPI for Android

* move GetAllNodeUnits to node_unit.h, add UT for NodeUnits, minor updates

* minor updates

* address CR comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2022-01-25 17:13:46 -08:00
Tang, Cheng
9aa51379c9
[eager mode]: add configuration for ort virtual device count (#10346)
* add configuration for ort virtual device count

* fix build break

* fix ci build break

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-01-25 16:15:54 -08:00
Edward Chen
5eafbb50f9
Fix possible null pointer dereference. (#10373)
NodeInfo::p_node was used directly but it can be null from here:
2afce4830c/onnxruntime/core/framework/session_state_utils.cc (L381-L382)

Add an additional check that it is not null before use.
2022-01-25 14:48:51 -08:00
sumitsays
e1012a8662
Added OnRunEnd and Sync method in ExecutionProvider (#10362)
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-01-25 13:00:44 -08:00
Edward Chen
df16c605e8
Add "available since" message for C API additions since v1.10.0. (#10348) 2022-01-25 10:15:34 -08:00
Alexey Gladyshev
a0fe4a7c1c
[TVM EP] Improved usability of TVM EP (#10241)
* improved usability of TVM EP
* moved technical import under a condition related to TVM EP only
* Revert "moved technical import under a condition related to TVM EP only"
* add conditional _ld_preload.py file extension for TVM EP
* improve readability of inserted code
2022-01-25 18:48:08 +01:00
Xavier Dupré
6e95c0316d
Builds onnxruntime + eager mode with the same value for _GLIBCXX_USE_CXX11_ABI as pytorch (#10114)
* add _GLIBCXX_USE_CXX11_ABI
* restrict to eager mode
2022-01-25 11:25:31 +01:00