Commit graph

7218 commits

Author SHA1 Message Date
Yulong Wang
bfdd191eec
[wasm] use same export name for SIMD/NOSIMD build (#12545) 2022-08-19 18:17:50 -07:00
Dwayne Robinson
aa85092b51
DML EP squeeze all axes when empty (#12649)
DML EP squeeze empty axes
2022-08-19 08:56:03 -07:00
Changming Sun
b270334e1e
Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644) 2022-08-18 22:11:48 -07:00
Chen Fu
56dd0176a1
QDQ debugger - Adding Error Calculator (#12632)
QDQ debugger - Adding Error Calculator
2022-08-18 09:30:43 -07:00
Cheng
81b128b5e9
Qlinearsoftmax take FLOAT lookup-table (#12574)
* [loopuptable] float-type

* typed y-scale

* round to nearest even
2022-08-18 09:54:39 +08:00
Erick Muñoz
82b724fa5e
[oneDNN] Improve DequantizeLinear operator performance. (#12611)
* Detect when ZeroPoint = 0 and avoid sub op.

* Added tests to verify constant initializer behaviour.
2022-08-17 12:31:10 -07:00
Thiago Crepaldi
d1ba801570
Add BuildError for --gen_doc and --enable_training (#12630) 2022-08-17 14:18:37 -04:00
Dmitri Smirnov
9481893b58
Replace to lock_guard as lighter class for locking (#12616)
Replace to lock_guard as lighter class
2022-08-17 11:08:31 -07:00
Chen Fu
f2db6bb293
weight matching (#12607)
QDQ loss debug - Weights Matching

Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.
2022-08-17 11:01:10 -07:00
Haoming Chen
8a038b9b0c
Fix a build error (#12600)
LLVM compiler complains the std::hash<const char*> and suggests std::hash<const void*>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.
2022-08-17 10:49:54 -07:00
Tianlei Wu
ce01ed02da
Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448)
* add AddBiasTranspose kernel, new format of weights
* Use compact global_q in GEMM
* sequence_index from BxS to S; new stream for copy
* merge input and output pointers in scratch2
* update default benchmark tests
* add new format 0 for weight and bias
* avoid integer overflow
* check gpu memory
* output summary in benchmark
* add logging
* update unit tests with non empty bias value
* add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm
2022-08-17 09:36:48 -07:00
pengwa
7df2e8c5cc
Refactor with std::variant (on device training) (#12383)
* use std::variant for synthetic data storage.

* use std::variant to replace TypedCheckpointProperty

* Remvoe shared ptr for checkpoint property

* fix tests

* refine std::variant usage a bit

* remove CheckpointProperty data abstraction

* use InlinedVector and InlinedHashMap if possible

* fix comments

* fix build and test

* fix some comments

* use gsl::span

* fix tests

* refine based on comments

* fix win build

* fix build
2022-08-17 08:31:23 +08:00
Edward Chen
caabfcd920
Replace references to onnxruntime 'master' with 'main' in Dockerfiles. (#12550)
* Replace references to onnxruntime 'master' with 'main' in Dockerfiles.

* update dockerfiles/README.md
2022-08-16 14:13:05 -07:00
yf711
9d10badc55
Add build option to link TensorRT prebuilt parser (#12602)
* Add build option to link prebuilt TensorRT parser

* Test without the build option to link prebuilt TRTParser

* Minor: update name of build option

* Minor: update name of build option
2022-08-16 14:09:58 -07:00
Adam Pocock
733db31420
[Java] JNI refactor for OrtSession (#12496)
Refactor JNI error reporting
2022-08-16 13:43:06 -07:00
Chen Fu
eb6aa861cf
QDQ debugger - activations compare (#12544)
Debugger for QDQ loss - activation matching

This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.
2022-08-15 17:03:28 -07:00
Yufeng Li
30ee5a4f79
release calibrator before deleting temporary files (#12601) 2022-08-15 16:03:46 -07:00
Maxiwell S. Garcia
19a9690885
ppc64le: fix MlasQLinearMulKernel's VSX code to work with inputs of 32 bits (#12441) 2022-08-15 16:03:07 -07:00
Dmitri Smirnov
616677104a
ONNX Protobuf natvis with some google::protobuf (#12580)
ONNX Protobuf natvis with some google::protobuf structures
  Add leading underscore to local Intrinsic
2022-08-15 09:59:07 -07:00
Baiju Meswani
f5e3517c39
Add Learning Rate Scheduler C API (#11957) 2022-08-15 09:10:25 -07:00
Kevin Chen
73da3f3705
Add TRT uint8 support (#12570)
* uint8 support

Signed-off-by: Kevin Chen <kevinch@nvidia.com>

* Handle outputs as well

Signed-off-by: Kevin Chen <kevinch@nvidia.com>

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
2022-08-15 08:22:50 -07:00
Yulong Wang
95f2a3e7e0
[js/web] update branch name for pull:wasm (#12548)
* [js/web] update branch name for pull:wasm

* revise message
2022-08-12 15:46:36 -07:00
Nat Kershaw (MSFT)
cc9b3e1c37
Automate generation of javadocs and create PR with changes (#12515) 2022-08-12 12:03:38 -07:00
Scott McKay
0b0c51e028
Support direct usage of ORT format model flatbuffer for initializers (#12465)
* Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure.

Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).
2022-08-12 18:31:43 +10:00
Xinya Zhang
bc353c7afe
Add FusedConv Op to ROCm (#11792)
* [ROCm] Add FusedConv Op.

* Enable ROCm for FusedConvTest

* [ROCm] Implement FusedConv Op. with Fusion API

The old code path was left as the fallback since some combinations are
not supported (e.g., FusedConvTest.Conv2D_Bias_Z_Relu as of ROCM 5.1,
where to bias layers are needed).

* [ROCM] Suppress duplicated warnings in unsupported Fusion API usage.

Know limitation for current MIOpen (verified with ROCM 5.2): Only one
bias layer may present in the Fusion Plan. Adding the second bias
operation to the Fusion plan will end up with miopenStatusUnsupportedOp.
In this case the fallback code path will be taken to complete required
FusedConv operation.

However, previously this failure was not detected and cached, and
applications that create multiple FusedConv Ops with both z and bias
will keep printing error messages, which is annoying to end users
while this message is mainly for developers.

This commit will let it print the first error message as a reminder, and
skip the Fusion API code path in following calls if both z and bias
present. (Note: the skipping applies to all newly created FusedConv Ops).

* [ROCM] Add cache mechanism for FusedConv Op.

Now the operator with the same configuration will share the same Fusion
Plan object, and the creation result will also be cached.

Two benefits:
1. No duplicated Fusion plan creation, which is a presumably very costly
process.
2. Failures due to MIOpen limitations (like z and b cannot present at
the same time) will only be triggered once.

Know limits:
Due to the limitation of MIOpen Interface, the tensor order of the
convolution operator can only be guessed.
2022-08-11 23:04:01 -07:00
Xinya Zhang
eb827bd3e5
[ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops (#11971)
* [ROCm] enable NGramRepeatBlock Op

* [ROCm] Enable testing ROCm in NGramRepeatBlockTest.NGramSize_3

Also link onnxruntime_test_all with amdhip64 when USE_ROCM=1

* [ROCm] add LongformerAttention Op

* [ROCm] Enable LongformerAttentionTest

* [ROCm] Add DecoderAttention Op

* Enable DecoderAttention Test for ROCm.

* [ROCM] Updates according to reviews
2022-08-11 19:32:08 -07:00
Yufeng Li
95df5dac51
do not quantize Relu/Clip if their inputs are not quantized (#12565) 2022-08-11 16:16:10 -07:00
Sheil Kumar
67f6b7ce29
DirectML GEMM broken in opset 11 and 13 when optional tensor C not provided (#12568)
Set kernel input indices to be fixed to 0,1,2. C input is now optional, so last tensor must be specified.
2022-08-11 16:01:27 -07:00
Jian Chen
580f2294bc
Adding w_zero_point to conv_integer_test.cc (#12423)
* Adding w_zero_point to conv_integer_test.cc

* Reformatting code
2022-08-11 17:40:26 -04:00
Wil Brady
3d009cdde3
Updating binary ops in eager mode to support broadcasting. (#12560)
* Updating binary ops in eager mode to support broadcasting.
2022-08-11 17:00:12 -04:00
pengwa
24eab921be
Enable PythonOp for --enable_training_torch_interop build (#12539)
* enable PythonOp by default when --enable_training_torch_interop is enabled during build

* clean up

* fix

* fix comment

* fix

* fix tests

* fix fallback test

* pylint format

* refine based on comments
2022-08-12 00:49:30 +08:00
Scott McKay
b59ccbc75b
Add big endian support to murmurhash3 (#12549) 2022-08-11 18:39:39 +10:00
Vincent Wang
018fba9b74
Fix Compile Warning (#12552)
* fix warning

* more fix
2022-08-11 16:00:35 +08:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Cheng
819c36701f
[xnnpack] basic QDQ operators support (#11912)
* basic ops for mobilenet,qconv,qsoftmax,qavgpool

update Xnnpack to latest

unit test

* NodeUnit: use outputedge to replace output-node

* qdq model e2e test

* use inlinedvector to replace vector

* conv bias check

* tensorshape helpers

* Refactor xnn_op minmax

* Qlinearsoftmax schema update

* Remove qlinearsoftmax registration

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-08-11 10:12:51 +08:00
Baiju Meswani
3e78f3cf1f
Add win-ci pipeline for on-device training (#12513) 2022-08-10 14:45:39 -07:00
Chen Fu
b2382dc43a
fix qdq relu removal bug (#12542)
Fix minor bug in qdq quantization tool

Motivation and Context
Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input
2022-08-10 14:06:51 -07:00
Dmitri Smirnov
c10704a501
Use alignas instead of naive padding to avoid false cache sharing (#12514)
PerThread and ChildThreadStat alignas
2022-08-10 11:23:20 -07:00
Kevin Chen
25032f1756
Add default to TRT datatype switch statement (#12533)
Signed-off-by: Kevin Chen <kevinch@nvidia.com>

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
2022-08-10 09:12:08 -07:00
Changming Sun
c0d396d176
Restrict "Component Detection" task to Lotus project only (#12536)
It is related to PR #12426
2022-08-10 03:25:29 -07:00
Changming Sun
e810480403
Replace the occurrences of "master" to "main" in yaml files (#12534) 2022-08-09 22:03:21 -07:00
Cheng
64e991a9fc
[Qlinearsoftmax] contrib cpu (#12177)
* [Qlinearsoftmax] contrib cpu

* int8 implementation

* contrib operator md

* qdq transformer test

* new attribute: opset

* doc

* quantized tool

* remove template to reduce Binary size

* doc of contribe operators

* enforce x_shape is valid

* fix reduce_size if input-shape is dynamic

* add UT

* register one op for reducing binarysize

* kernel hash update

* docs/ContribOperators.md
2022-08-10 10:52:02 +08:00
Vincent Wang
0c6037b5ab
Bugfix for BiasSoftmax Fusion (#12517) 2022-08-10 07:20:13 +08:00
msftlincoln
0d9a02e647
Eager Mode - Support Concatenation via aten::cat.out (#12527)
* support concatenation via aten::cat.out

* wrap dims

* rename vars in tests, test wrapped dims
2022-08-09 17:16:18 -04:00
Chen Fu
47b787c28f
Python module for dumping activation tensors when running an ONNX model (#12474)
Python module for dumping activation tensors when running an ONNX model

This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
2022-08-09 13:15:45 -07:00
Adam Louly
2681648f5b
Load checkpoint in cpp (#12352)
* Load checkpoint in cpp

* removed unused imports

* throw error on invalid name and change function name

* inplace model assignment, change name and other comments resolved

* name change  on import

* Addded unit test, resolved comments

* remove unused  imports

* resolved comments

* refactoring too reduce memoory allocation

* resolved extra comments

* changed files hierarchy an force added onnx moodel

* solved order of function argument

* used gtest macros on test cases

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-09 12:30:50 -07:00
Faith Xu
ee3b757492
Add codeowners for requirement files (#12512)
* Add Codeowners for dependency files

* Fix team @s
2022-08-09 09:46:47 -07:00
Vincent Wang
2bed0d4abb
[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482)
* sce refactor

* refactor

* remove usnecessory memset
2022-08-09 16:48:44 +08:00
Vincent Wang
cfa09d16d9
[CUDA] Mod Op Kernel (#12499)
* mod for cuda and rocm

* fix bfloat16 ut

* change bf16 ut number

* fix opset version

* fix op kernel doc
2022-08-09 13:05:40 +08:00
pengwa
a2dc3e9eac
Improve the compilation speed when compiling for multiple architectures. (#12490)
* improve the compilation speed when compiling for multiple architectures.

* formatting

* fix

* use 0 by default

* fix comments
2022-08-09 11:52:26 +08:00