Commit graph

7863 commits

Author SHA1 Message Date
ytaous
399ffc9700
Fix Windows GPU CI (#10499)
* fix build

* fix win build

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-02-08 22:06:23 -08:00
Guoyu Wang
e4dc4e4d3c
[NNAPI QDQ] AddQDQAdd/Mul, update to NNAPI QDQ handling, update some test settings (#10483)
* Squashed commit of the following:

commit 12380491a9
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Feb 7 12:59:04 2022 -0800

    Add qdq mul support

commit 9cadda7f2c
Merge: 7a32847761 0f5d0a091a
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Feb 7 11:24:47 2022 -0800

    Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul

commit 7a32847761
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Feb 7 00:41:30 2022 -0800

    move test case to util

commit c1a8f0d81e
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Fri Feb 4 13:04:26 2022 -0800

    update input/output check

commit a6f0a0d504
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Thu Feb 3 18:37:21 2022 -0800

    update quantized io check functions

commit 87f4d1dcfe
Merge: 7849f07109 97b8f6f394
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Wed Feb 2 17:22:58 2022 -0800

    Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul

commit 7849f07109
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Wed Feb 2 17:22:55 2022 -0800

    minor update

commit 7196cdf419
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Wed Feb 2 10:50:10 2022 -0800

    init change

commit 84c00772a1
Merge: a8c7dce22f 7318361645
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Tue Feb 1 18:21:17 2022 -0800

    Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul

commit a8c7dce22f
Merge: 55e536c182 ef7b4dc05c
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Tue Feb 1 13:51:04 2022 -0800

    Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul

commit 55e536c182
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Tue Feb 1 11:44:34 2022 -0800

    address cr comments

commit d460f5b776
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Tue Feb 1 00:33:54 2022 -0800

    fix android UT failure

commit 52146cf06f
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Jan 31 16:01:13 2022 -0800

    fix build break

commit ec6d07df8b
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Jan 31 15:41:52 2022 -0800

    minor update to UT

commit 8ec8490b4f
Author: Guoyu Wang <wanggy@outlook.com>
Date:   Mon Jan 31 15:01:30 2022 -0800

    Add NNAPI support of QDQ Resize

* Update qdq add/mul test case, fix build break

* Address CR comments

* Add QLinearMul support

* remove unused params

* Address CR comments
2022-02-08 20:44:15 -08:00
Vincent Wang
655f490c95
Remove BFloat16 Specialized Code for ReduceSum (#10476) 2022-02-09 07:39:57 +08:00
Ryan Lai
4388eaed1b Merged PR 6937750: Restore history to dmldev. Merge without squash
Related work items: #37712737
2022-02-08 23:24:02 +00:00
Ryan Lai
b14944f9f8 Merge commit 'b02f4ece5e4f48f5d303d6be0170c03d60b24efb' into user/rylai/restore_history 2022-02-08 14:58:23 -08:00
ashbhandare
7e5d68eea6
gradient and test (#10455)
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-02-08 10:18:22 -08:00
ytaous
435e14d60a
[ROCm] BFloat16 support (#10465)
* bf16 support

* minor clean up

* UTs

* fix build

* UTs

* UTs

* merge commit 6b5504c

* minor

* ROCm code cleanup

* fix build

* fix build

* minor

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-02-07 22:55:15 -08:00
Yufeng Li
c696da36c7
fix unit test of quant gemm (#10469) 2022-02-07 09:14:37 -08:00
Chi Lo
0f5d0a091a
Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option (#10450)
* modify code for add additional field in OrtTensorRTProviderOptionsV2

* add include file

* fix typo

* fix bug

* add comment

* fix code

* revert change
2022-02-05 11:15:12 -08:00
Rachel Guo
927f1f18c9
[NNAPI QDQ] Add QDQ AveragePool op support (#10464)
* wip

* save

* address pr comments

* update

* revert minor changes

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-02-04 17:04:48 -08:00
wraveane
d0ab881d07
Contrib ops for TRT plugins: EfficientNMS and Pyramid ROI Align (#9486)
* Contrib ops for TRT plugins: EfficientNMS and Pyramid ROI Align

* Contrib ops for TRT plugins: Multilevel Crop and Resize
2022-02-04 12:10:04 -08:00
Dwayne Robinson
6fd7ba5b7e Merged PR 6917440: ONNX Runtime update from GitHub master
Just RI.

Related work items: #38034064
2022-02-04 10:13:38 +00:00
Ye Wang
0d09dd5d20
Support fusion for TNLR based model (#10432)
* support tnlr based offensive V4 model

* Update onnx_model_tnlr.py

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-02-03 23:59:05 -08:00
Changming Sun
4f13c8ac39
Update orttraining-linux-ci-pipeline.yml (#10462) 2022-02-03 13:46:16 -08:00
Maxiwell S. Garcia
6bbf016dc4 cmake: disable 'attributes' error to fix the build with GCC < 9.x
This patch fixes the error "requested alignment X is larger than Y" in older GCC's

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357
2022-02-03 13:38:19 -08:00
Ye Wang
bb09acffed
Transformer model CUDA EP align with CPU on corner case (#9889)
* align with cpu on no input data

* review comments and add tests

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-02-03 12:58:49 -08:00
ytaous
63198a6566
[ROCm] BFloat16 support (#10447)
* bf16 support

* bf16 support

* UTs

* fix build

* fix UTs

Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-02-03 11:31:14 -08:00
zhangyaobit
239c6ad3f0
Support specifying an execution provider in benchmark script (#10453)
* Support specifying execution providers.

* Change default provider setting to None.

* Add support for bert_perf_test script.

* Fall back to ROCM/CUDA EP for MIGraphX/Tensorrt EP.

* Assert fall back EPs are included.

* Add model class AutoModelForCausalLM and other minor updates.

Co-authored-by: Yao Zhang <zhanyao@microsoft.com>
2022-02-02 19:11:31 -08:00
Yi-Hong Lyu
a405658370
Fuse Clip->Q to Q (#10434)
* Fuse Clip->Q to Q

* Remove unused variable argmax_node

* Remove braces around scalar initializer

* Move GetClipConstantMinMax under ORT_MINIMAL_BUILD

* Consider epsilon so we can fuse more cases
2022-02-02 18:29:30 -08:00
Rachel Guo
97b8f6f394
Add logic to NNAPI EP to exclude pre-processing involving dynamic shapes when partitioning (#10452)
* wip

* wip

* wip

* save

* address pr comments

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-02-02 15:54:19 -08:00
Sunghoon
6076a262dc
upgrade react-native packages to latest (#10454) 2022-02-02 15:19:40 -08:00
Viswanath Boga
ad9d2e2e89
Prefix match in first iteration of beam search OP (#10231)
* Add BeamSearch op schema

* Add ONNX conversion for beams search

* remove attention_mask and change input order

* add option to run baseline

* add check data type NULL

* applies VerifyNodeAndOpMatch to subgraph

* update input_ids shape

* Add node name for Cast node

* expose API for topk

* parse parameters

* Add beam search scorer

* output results

* fix typo

* use c++ template and format python

* fix build pipeline errors

* symbolic shape infer of input onnx

* output scores

* add kernel def hash

* Handle vocab_mask; move CheckSubgraph

* undo insert_cast_transformer.cc and fusion_utils.py

* fix typo

* fix merge

* update doc

* add repetition penalty

* refactoring: add GptSubgraph class

* move BeamSearchState from .h to .cc file

* adjust logits processor order

* add batch generation example

* fix repetition penalty for dup words in sequence

* Add test

* Add no repeat ngram processor

* refactoring: move logits processor to classes

* fix build warning

* show latency

* use allocator in beam state

* use allocator in sequences

* fix build error

* move next_positions to beam state

* Changes for prefix matching

* removing debugs

* removing more debugs

* clean up

* clean up

* cpu doc updated

* Updated docs

* updated prefix_vocab_mask dimension in convert script

* changes to support bxs prefix_vocab_mask in beamsearchop kernel

* doc update

* OperatorKernels.md updated

* matching docs from artifacts

* minor change in logits processor

* Addressing comments

* Updated the prefix vocab mask usage properly

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2022-02-03 00:14:39 +05:30
Yufeng Li
1aa0789691
add qdq support for QGemm (#10414)
* add qgemm in quantization tool

* add qdq support for QGemm

* fix build break

* fix OperatorKernels.md
2022-02-02 10:35:29 -08:00
Guoyu Wang
7318361645
[NNAPI QDQ] Add QDQ Resize support (#10442)
* Add NNAPI support of QDQ Resize

* minor update to UT

* fix build break

* fix android UT failure

* address cr comments
2022-02-01 18:14:58 -08:00
Dmitri Smirnov
91b8ad5ee7
Allow users to bind arbitrary memory using raw pointers (#10428)
Add binding external allocation
  Add negative tests
  Add missing return status check
2022-02-01 18:09:24 -08:00
Weixing Zhang
3c96760192
support rocm/migraphx EP in perftest tool (#10449)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2022-02-01 16:12:01 -08:00
Shucai Xiao
062129a5c4
Update rocm_ep and migraphx_ep to rocm4.5.2 and fix dockerfiles to build docker images correctly (#10445)
* fix build errors for the migraphx and rocm dockerfile

* add the numpy package in the migraphx and rocm dockerfile
2022-02-01 16:11:39 -08:00
Olivia Jain
a1d9a71b8b
Improve Perf System (#10404)
* move table names to one location

* remove session metadata

* reload trt inputs

* fix posting names

* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines

* remove comments

* Split up anubis job and perf run

* add trt environ variables

* No embedded links
2022-02-01 16:01:34 -08:00
Chi Lo
a7c67860a5
Reduce test time for TensorRT EP CI (#10408)
* expand model tests name

* skip cpu/cuda for trt when running onnxruntime_test_all

* only run trt ep for c++ unit test

* Update CMAKE_CUDA_ARCHITECTURES for T4

* Use new t4 agent pool

* Update YAML for run T4 on Windows

* revert code

* Update CMAKE_CUDA_ARCHITECTURES

* fix wrong value

* Remove cpu/cuda directly in model tests

* add only CMAKE_CUDA_ARCHITECTURES=75

* remove expanding model test name to see difference

* revert code

* Add fallback execution provider for unit test

* Add fallback execution provider for unit test (cont)

* add conditional to add fackback cuda ep

* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs

* use M60

* revert code

* revert code

* add comments

* Modify code and add comment

* modify comment

* update comment

* add comment
2022-02-01 15:56:33 -08:00
Yi-Hong Lyu
ef7b4dc05c
Add test quantization of ArgMax for TensorRT (#10325)
Make sure quantize_statict would insert DQ -> Q before ArgMax.
2022-01-31 16:22:16 -08:00
Guoyu Wang
68262cce86
[NNAPI QDQ] Add QDQ Conv support (#10418)
* Add qdq conv to NNAPI

* fix build warning

* addressed CR comments

* fix a minor bug in my previous merge
2022-01-31 14:36:31 -08:00
Edward Chen
c43c1691ad
Enable transpose optimizer in minimal extended build (#10349)
Enable transpose optimizer and infrastructure it depends on in a minimal extended build.
2022-01-31 09:41:04 -08:00
Scott McKay
baa1767922
Allow for an optional subgraph input to have no type info. (#10379)
Add a test for a missing optional input to Loop.
2022-01-30 08:10:13 +10:00
ytaous
85cbe8367e
[ROCm] BFloat16 support (#10416)
* reducesum bf16 support

* bf16 for add/sub/mul/div

* fix build

* bf16 for Cast

* bf16 for softmax

Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-28 22:43:27 -08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426) 2022-01-28 14:25:12 -08:00
Guoyu Wang
5f0ba31890
Remove coremltools submodule *security vulnerability* and copy the coreml model schema (#10424)
* remove coremltools submodule

* update cgmanifest

* Copy proto files directly from coremltools
2022-01-28 12:48:48 -08:00
Chen Fu
c4f1dfcfaa
Cfu s8s8 (#10413)
Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv.

Perf number with single thread:

Nokia G10 (baseline / new) in ms	Pixel 4 (baseline/new) in ms
mobilenet_edgetpu	220 / 213	18.5 / 17.6
cartoongan	8537 / 8521	967 / 928

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-28 09:26:52 -08:00
Nat Kershaw (MSFT)
1a2925acce
Add sympy package as a dependency (#10406) 2022-01-28 09:19:08 -08:00
Sheil Kumar
2dd5e75ba8
Incorrect output after GPU to GPU inference via VideoFrame and Gray8 models (#10425)
* If the tensor is of gray8 format, we should call the gray8 shader

* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-01-28 08:45:57 -08:00
Changming Sun
feae842a7c
Update pytorch-lightning (#10421) 2022-01-27 21:15:00 -08:00
Changming Sun
b14da94fc1
Exclude CETCOMPAT from Windows ARM build (#10417) 2022-01-27 17:57:01 -08:00
RandySheriffH
ce081fe655
Fix TopK with NAN on Cuda (#10314)
* reset MIN for float/double

* better logics for float/double comparision for equals
2022-01-27 16:19:55 -08:00
Rachel Guo
ff2057a817
Add sample qdq unit test case for nnapi ep qdq integration (#10358)
* add sample unit test case and make qdq modeltestubuilder shared

* update

* address pr comments

* modify redundant funcs impl

* update

* update

* address pr comments

* update

* update

* update

* fix build breaks

* minor update

* fix bad_alloc in UT

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
2022-01-27 15:10:41 -08:00
Edward Chen
0e951d7d6b
Add some more documentation for the C/C++ API tensor creation functions. (#10394) 2022-01-27 13:19:11 -08:00
Xavier Dupré
481b96d32a
STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. (#10211)
* STVM, checks pointers are not null.
* removes submodules tvm
* add missing include(FetchContent)
* add target tvm
* fix stvm test
* extend cgmanifest with dependencies of tvm
2022-01-27 20:31:13 +01:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
Edward Chen
66acf50488
Document C/C++ API documentation version info conventions. (#10396) 2022-01-27 10:20:13 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. (#10374)
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
  Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
  Rename T from InlinedShapeVectorT. Fix Eager build
  Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
ytaous
4d305282da
[ROCm] Enable BFloat16 for Gemm and MatMul Op (#10398)
* gemm-bf16

* gemm bf16

* gemm bf16

* matmul bf16

* minor style change

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-27 00:09:16 -08:00
dependabot[bot]
5f49f40fa5 Bump log4js from 6.3.0 to 6.4.0 in /js/web
Bumps [log4js](https://github.com/log4js-node/log4js-node) from 6.3.0 to 6.4.0.
- [Release notes](https://github.com/log4js-node/log4js-node/releases)
- [Changelog](https://github.com/log4js-node/log4js-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/log4js-node/log4js-node/compare/v6.3.0...v6.4.0)

---
updated-dependencies:
- dependency-name: log4js
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-26 20:51:49 -08:00