ytaous
399ffc9700
Fix Windows GPU CI ( #10499 )
...
* fix build
* fix win build
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-02-08 22:06:23 -08:00
Guoyu Wang
e4dc4e4d3c
[NNAPI QDQ] AddQDQAdd/Mul, update to NNAPI QDQ handling, update some test settings ( #10483 )
...
* Squashed commit of the following:
commit 12380491a9
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Feb 7 12:59:04 2022 -0800
Add qdq mul support
commit 9cadda7f2c
Merge: 7a32847761 0f5d0a091a
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Feb 7 11:24:47 2022 -0800
Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul
commit 7a32847761
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Feb 7 00:41:30 2022 -0800
move test case to util
commit c1a8f0d81e
Author: Guoyu Wang <wanggy@outlook.com>
Date: Fri Feb 4 13:04:26 2022 -0800
update input/output check
commit a6f0a0d504
Author: Guoyu Wang <wanggy@outlook.com>
Date: Thu Feb 3 18:37:21 2022 -0800
update quantized io check functions
commit 87f4d1dcfe
Merge: 7849f07109 97b8f6f394
Author: Guoyu Wang <wanggy@outlook.com>
Date: Wed Feb 2 17:22:58 2022 -0800
Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul
commit 7849f07109
Author: Guoyu Wang <wanggy@outlook.com>
Date: Wed Feb 2 17:22:55 2022 -0800
minor update
commit 7196cdf419
Author: Guoyu Wang <wanggy@outlook.com>
Date: Wed Feb 2 10:50:10 2022 -0800
init change
commit 84c00772a1
Merge: a8c7dce22f 7318361645
Author: Guoyu Wang <wanggy@outlook.com>
Date: Tue Feb 1 18:21:17 2022 -0800
Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul
commit a8c7dce22f
Merge: 55e536c182 ef7b4dc05c
Author: Guoyu Wang <wanggy@outlook.com>
Date: Tue Feb 1 13:51:04 2022 -0800
Merge remote-tracking branch 'origin/master' into gwang-msft/qdq_mul
commit 55e536c182
Author: Guoyu Wang <wanggy@outlook.com>
Date: Tue Feb 1 11:44:34 2022 -0800
address cr comments
commit d460f5b776
Author: Guoyu Wang <wanggy@outlook.com>
Date: Tue Feb 1 00:33:54 2022 -0800
fix android UT failure
commit 52146cf06f
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Jan 31 16:01:13 2022 -0800
fix build break
commit ec6d07df8b
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Jan 31 15:41:52 2022 -0800
minor update to UT
commit 8ec8490b4f
Author: Guoyu Wang <wanggy@outlook.com>
Date: Mon Jan 31 15:01:30 2022 -0800
Add NNAPI support of QDQ Resize
* Update qdq add/mul test case, fix build break
* Address CR comments
* Add QLinearMul support
* remove unused params
* Address CR comments
2022-02-08 20:44:15 -08:00
Vincent Wang
655f490c95
Remove BFloat16 Specialized Code for ReduceSum ( #10476 )
2022-02-09 07:39:57 +08:00
Ryan Lai
4388eaed1b
Merged PR 6937750: Restore history to dmldev. Merge without squash
...
Related work items: #37712737
2022-02-08 23:24:02 +00:00
Ryan Lai
b14944f9f8
Merge commit 'b02f4ece5e4f48f5d303d6be0170c03d60b24efb' into user/rylai/restore_history
2022-02-08 14:58:23 -08:00
ashbhandare
7e5d68eea6
gradient and test ( #10455 )
...
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-02-08 10:18:22 -08:00
ytaous
435e14d60a
[ROCm] BFloat16 support ( #10465 )
...
* bf16 support
* minor clean up
* UTs
* fix build
* UTs
* UTs
* merge commit 6b5504c
* minor
* ROCm code cleanup
* fix build
* fix build
* minor
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-02-07 22:55:15 -08:00
Yufeng Li
c696da36c7
fix unit test of quant gemm ( #10469 )
2022-02-07 09:14:37 -08:00
Chi Lo
0f5d0a091a
Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option ( #10450 )
...
* modify code for add additional field in OrtTensorRTProviderOptionsV2
* add include file
* fix typo
* fix bug
* add comment
* fix code
* revert change
2022-02-05 11:15:12 -08:00
Rachel Guo
927f1f18c9
[NNAPI QDQ] Add QDQ AveragePool op support ( #10464 )
...
* wip
* save
* address pr comments
* update
* revert minor changes
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-02-04 17:04:48 -08:00
wraveane
d0ab881d07
Contrib ops for TRT plugins: EfficientNMS and Pyramid ROI Align ( #9486 )
...
* Contrib ops for TRT plugins: EfficientNMS and Pyramid ROI Align
* Contrib ops for TRT plugins: Multilevel Crop and Resize
2022-02-04 12:10:04 -08:00
Dwayne Robinson
6fd7ba5b7e
Merged PR 6917440: ONNX Runtime update from GitHub master
...
Just RI.
Related work items: #38034064
2022-02-04 10:13:38 +00:00
Ye Wang
0d09dd5d20
Support fusion for TNLR based model ( #10432 )
...
* support tnlr based offensive V4 model
* Update onnx_model_tnlr.py
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-02-03 23:59:05 -08:00
Changming Sun
4f13c8ac39
Update orttraining-linux-ci-pipeline.yml ( #10462 )
2022-02-03 13:46:16 -08:00
Maxiwell S. Garcia
6bbf016dc4
cmake: disable 'attributes' error to fix the build with GCC < 9.x
...
This patch fixes the error "requested alignment X is larger than Y" in older GCC's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357
2022-02-03 13:38:19 -08:00
Ye Wang
bb09acffed
Transformer model CUDA EP align with CPU on corner case ( #9889 )
...
* align with cpu on no input data
* review comments and add tests
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-02-03 12:58:49 -08:00
ytaous
63198a6566
[ROCm] BFloat16 support ( #10447 )
...
* bf16 support
* bf16 support
* UTs
* fix build
* fix UTs
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-02-03 11:31:14 -08:00
zhangyaobit
239c6ad3f0
Support specifying an execution provider in benchmark script ( #10453 )
...
* Support specifying execution providers.
* Change default provider setting to None.
* Add support for bert_perf_test script.
* Fall back to ROCM/CUDA EP for MIGraphX/Tensorrt EP.
* Assert fall back EPs are included.
* Add model class AutoModelForCausalLM and other minor updates.
Co-authored-by: Yao Zhang <zhanyao@microsoft.com>
2022-02-02 19:11:31 -08:00
Yi-Hong Lyu
a405658370
Fuse Clip->Q to Q ( #10434 )
...
* Fuse Clip->Q to Q
* Remove unused variable argmax_node
* Remove braces around scalar initializer
* Move GetClipConstantMinMax under ORT_MINIMAL_BUILD
* Consider epsilon so we can fuse more cases
2022-02-02 18:29:30 -08:00
Rachel Guo
97b8f6f394
Add logic to NNAPI EP to exclude pre-processing involving dynamic shapes when partitioning ( #10452 )
...
* wip
* wip
* wip
* save
* address pr comments
* address pr comments
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-02-02 15:54:19 -08:00
Sunghoon
6076a262dc
upgrade react-native packages to latest ( #10454 )
2022-02-02 15:19:40 -08:00
Viswanath Boga
ad9d2e2e89
Prefix match in first iteration of beam search OP ( #10231 )
...
* Add BeamSearch op schema
* Add ONNX conversion for beams search
* remove attention_mask and change input order
* add option to run baseline
* add check data type NULL
* applies VerifyNodeAndOpMatch to subgraph
* update input_ids shape
* Add node name for Cast node
* expose API for topk
* parse parameters
* Add beam search scorer
* output results
* fix typo
* use c++ template and format python
* fix build pipeline errors
* symbolic shape infer of input onnx
* output scores
* add kernel def hash
* Handle vocab_mask; move CheckSubgraph
* undo insert_cast_transformer.cc and fusion_utils.py
* fix typo
* fix merge
* update doc
* add repetition penalty
* refactoring: add GptSubgraph class
* move BeamSearchState from .h to .cc file
* adjust logits processor order
* add batch generation example
* fix repetition penalty for dup words in sequence
* Add test
* Add no repeat ngram processor
* refactoring: move logits processor to classes
* fix build warning
* show latency
* use allocator in beam state
* use allocator in sequences
* fix build error
* move next_positions to beam state
* Changes for prefix matching
* removing debugs
* removing more debugs
* clean up
* clean up
* cpu doc updated
* Updated docs
* updated prefix_vocab_mask dimension in convert script
* changes to support bxs prefix_vocab_mask in beamsearchop kernel
* doc update
* OperatorKernels.md updated
* matching docs from artifacts
* minor change in logits processor
* Addressing comments
* Updated the prefix vocab mask usage properly
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2022-02-03 00:14:39 +05:30
Yufeng Li
1aa0789691
add qdq support for QGemm ( #10414 )
...
* add qgemm in quantization tool
* add qdq support for QGemm
* fix build break
* fix OperatorKernels.md
2022-02-02 10:35:29 -08:00
Guoyu Wang
7318361645
[NNAPI QDQ] Add QDQ Resize support ( #10442 )
...
* Add NNAPI support of QDQ Resize
* minor update to UT
* fix build break
* fix android UT failure
* address cr comments
2022-02-01 18:14:58 -08:00
Dmitri Smirnov
91b8ad5ee7
Allow users to bind arbitrary memory using raw pointers ( #10428 )
...
Add binding external allocation
Add negative tests
Add missing return status check
2022-02-01 18:09:24 -08:00
Weixing Zhang
3c96760192
support rocm/migraphx EP in perftest tool ( #10449 )
...
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2022-02-01 16:12:01 -08:00
Shucai Xiao
062129a5c4
Update rocm_ep and migraphx_ep to rocm4.5.2 and fix dockerfiles to build docker images correctly ( #10445 )
...
* fix build errors for the migraphx and rocm dockerfile
* add the numpy package in the migraphx and rocm dockerfile
2022-02-01 16:11:39 -08:00
Olivia Jain
a1d9a71b8b
Improve Perf System ( #10404 )
...
* move table names to one location
* remove session metadata
* reload trt inputs
* fix posting names
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* remove comments
* Split up anubis job and perf run
* add trt environ variables
* No embedded links
2022-02-01 16:01:34 -08:00
Chi Lo
a7c67860a5
Reduce test time for TensorRT EP CI ( #10408 )
...
* expand model tests name
* skip cpu/cuda for trt when running onnxruntime_test_all
* only run trt ep for c++ unit test
* Update CMAKE_CUDA_ARCHITECTURES for T4
* Use new t4 agent pool
* Update YAML for run T4 on Windows
* revert code
* Update CMAKE_CUDA_ARCHITECTURES
* fix wrong value
* Remove cpu/cuda directly in model tests
* add only CMAKE_CUDA_ARCHITECTURES=75
* remove expanding model test name to see difference
* revert code
* Add fallback execution provider for unit test
* Add fallback execution provider for unit test (cont)
* add conditional to add fackback cuda ep
* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs
* use M60
* revert code
* revert code
* add comments
* Modify code and add comment
* modify comment
* update comment
* add comment
2022-02-01 15:56:33 -08:00
Yi-Hong Lyu
ef7b4dc05c
Add test quantization of ArgMax for TensorRT ( #10325 )
...
Make sure quantize_statict would insert DQ -> Q before ArgMax.
2022-01-31 16:22:16 -08:00
Guoyu Wang
68262cce86
[NNAPI QDQ] Add QDQ Conv support ( #10418 )
...
* Add qdq conv to NNAPI
* fix build warning
* addressed CR comments
* fix a minor bug in my previous merge
2022-01-31 14:36:31 -08:00
Edward Chen
c43c1691ad
Enable transpose optimizer in minimal extended build ( #10349 )
...
Enable transpose optimizer and infrastructure it depends on in a minimal extended build.
2022-01-31 09:41:04 -08:00
Scott McKay
baa1767922
Allow for an optional subgraph input to have no type info. ( #10379 )
...
Add a test for a missing optional input to Loop.
2022-01-30 08:10:13 +10:00
ytaous
85cbe8367e
[ROCm] BFloat16 support ( #10416 )
...
* reducesum bf16 support
* bf16 for add/sub/mul/div
* fix build
* bf16 for Cast
* bf16 for softmax
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-28 22:43:27 -08:00
Dwayne Robinson
b02f4ece5e
Remove cbegin and cend calls which do not exist in std::span or gsl::span ( #10426 )
2022-01-28 14:25:12 -08:00
Guoyu Wang
5f0ba31890
Remove coremltools submodule *security vulnerability* and copy the coreml model schema ( #10424 )
...
* remove coremltools submodule
* update cgmanifest
* Copy proto files directly from coremltools
2022-01-28 12:48:48 -08:00
Chen Fu
c4f1dfcfaa
Cfu s8s8 ( #10413 )
...
Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv.
Perf number with single thread:
Nokia G10 (baseline / new) in ms Pixel 4 (baseline/new) in ms
mobilenet_edgetpu 220 / 213 18.5 / 17.6
cartoongan 8537 / 8521 967 / 928
Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-28 09:26:52 -08:00
Nat Kershaw (MSFT)
1a2925acce
Add sympy package as a dependency ( #10406 )
2022-01-28 09:19:08 -08:00
Sheil Kumar
2dd5e75ba8
Incorrect output after GPU to GPU inference via VideoFrame and Gray8 models ( #10425 )
...
* If the tensor is of gray8 format, we should call the gray8 shader
* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-01-28 08:45:57 -08:00
Changming Sun
feae842a7c
Update pytorch-lightning ( #10421 )
2022-01-27 21:15:00 -08:00
Changming Sun
b14da94fc1
Exclude CETCOMPAT from Windows ARM build ( #10417 )
2022-01-27 17:57:01 -08:00
RandySheriffH
ce081fe655
Fix TopK with NAN on Cuda ( #10314 )
...
* reset MIN for float/double
* better logics for float/double comparision for equals
2022-01-27 16:19:55 -08:00
Rachel Guo
ff2057a817
Add sample qdq unit test case for nnapi ep qdq integration ( #10358 )
...
* add sample unit test case and make qdq modeltestubuilder shared
* update
* address pr comments
* modify redundant funcs impl
* update
* update
* address pr comments
* update
* update
* update
* fix build breaks
* minor update
* fix bad_alloc in UT
* address pr comments
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
2022-01-27 15:10:41 -08:00
Edward Chen
0e951d7d6b
Add some more documentation for the C/C++ API tensor creation functions. ( #10394 )
2022-01-27 13:19:11 -08:00
Xavier Dupré
481b96d32a
STVM, NUPHAR, remove tvm from submodules list, checks pointers are not null. ( #10211 )
...
* STVM, checks pointers are not null.
* removes submodules tvm
* add missing include(FetchContent)
* add target tvm
* fix stvm test
* extend cgmanifest with dependencies of tvm
2022-01-27 20:31:13 +01:00
Changming Sun
ec4362f8f3
Enable more static analysis warnings and enable the analyzer for training cpu ( #10176 )
2022-01-27 11:17:20 -08:00
Edward Chen
66acf50488
Document C/C++ API documentation version info conventions. ( #10396 )
2022-01-27 10:20:13 -08:00
Dmitri Smirnov
3367ddc5ba
Add abseil cgmanifest declaration. Update coding standards. ( #10374 )
...
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
Rename T from InlinedShapeVectorT. Fix Eager build
Add LLVM Copyright with modified derived code notice.
2022-01-27 08:32:05 -08:00
ytaous
4d305282da
[ROCm] Enable BFloat16 for Gemm and MatMul Op ( #10398 )
...
* gemm-bf16
* gemm bf16
* gemm bf16
* matmul bf16
* minor style change
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
2022-01-27 00:09:16 -08:00
dependabot[bot]
5f49f40fa5
Bump log4js from 6.3.0 to 6.4.0 in /js/web
...
Bumps [log4js](https://github.com/log4js-node/log4js-node ) from 6.3.0 to 6.4.0.
- [Release notes](https://github.com/log4js-node/log4js-node/releases )
- [Changelog](https://github.com/log4js-node/log4js-node/blob/master/CHANGELOG.md )
- [Commits](https://github.com/log4js-node/log4js-node/compare/v6.3.0...v6.4.0 )
---
updated-dependencies:
- dependency-name: log4js
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-01-26 20:51:49 -08:00