Commit graph

7187 commits

Author SHA1 Message Date
Scott McKay
b59ccbc75b
Add big endian support to murmurhash3 (#12549) 2022-08-11 18:39:39 +10:00
Vincent Wang
018fba9b74
Fix Compile Warning (#12552)
* fix warning

* more fix
2022-08-11 16:00:35 +08:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Cheng
819c36701f
[xnnpack] basic QDQ operators support (#11912)
* basic ops for mobilenet,qconv,qsoftmax,qavgpool

update Xnnpack to latest

unit test

* NodeUnit: use outputedge to replace output-node

* qdq model e2e test

* use inlinedvector to replace vector

* conv bias check

* tensorshape helpers

* Refactor xnn_op minmax

* Qlinearsoftmax schema update

* Remove qlinearsoftmax registration

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-08-11 10:12:51 +08:00
Baiju Meswani
3e78f3cf1f
Add win-ci pipeline for on-device training (#12513) 2022-08-10 14:45:39 -07:00
Chen Fu
b2382dc43a
fix qdq relu removal bug (#12542)
Fix minor bug in qdq quantization tool

Motivation and Context
Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input
2022-08-10 14:06:51 -07:00
Dmitri Smirnov
c10704a501
Use alignas instead of naive padding to avoid false cache sharing (#12514)
PerThread and ChildThreadStat alignas
2022-08-10 11:23:20 -07:00
Kevin Chen
25032f1756
Add default to TRT datatype switch statement (#12533)
Signed-off-by: Kevin Chen <kevinch@nvidia.com>

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
2022-08-10 09:12:08 -07:00
Changming Sun
c0d396d176
Restrict "Component Detection" task to Lotus project only (#12536)
It is related to PR #12426
2022-08-10 03:25:29 -07:00
Changming Sun
e810480403
Replace the occurrences of "master" to "main" in yaml files (#12534) 2022-08-09 22:03:21 -07:00
Cheng
64e991a9fc
[Qlinearsoftmax] contrib cpu (#12177)
* [Qlinearsoftmax] contrib cpu

* int8 implementation

* contrib operator md

* qdq transformer test

* new attribute: opset

* doc

* quantized tool

* remove template to reduce Binary size

* doc of contribe operators

* enforce x_shape is valid

* fix reduce_size if input-shape is dynamic

* add UT

* register one op for reducing binarysize

* kernel hash update

* docs/ContribOperators.md
2022-08-10 10:52:02 +08:00
Vincent Wang
0c6037b5ab
Bugfix for BiasSoftmax Fusion (#12517) 2022-08-10 07:20:13 +08:00
msftlincoln
0d9a02e647
Eager Mode - Support Concatenation via aten::cat.out (#12527)
* support concatenation via aten::cat.out

* wrap dims

* rename vars in tests, test wrapped dims
2022-08-09 17:16:18 -04:00
Chen Fu
47b787c28f
Python module for dumping activation tensors when running an ONNX model (#12474)
Python module for dumping activation tensors when running an ONNX model

This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
2022-08-09 13:15:45 -07:00
Adam Louly
2681648f5b
Load checkpoint in cpp (#12352)
* Load checkpoint in cpp

* removed unused imports

* throw error on invalid name and change function name

* inplace model assignment, change name and other comments resolved

* name change  on import

* Addded unit test, resolved comments

* remove unused  imports

* resolved comments

* refactoring too reduce memoory allocation

* resolved extra comments

* changed files hierarchy an force added onnx moodel

* solved order of function argument

* used gtest macros on test cases

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-09 12:30:50 -07:00
Faith Xu
ee3b757492
Add codeowners for requirement files (#12512)
* Add Codeowners for dependency files

* Fix team @s
2022-08-09 09:46:47 -07:00
Vincent Wang
2bed0d4abb
[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482)
* sce refactor

* refactor

* remove usnecessory memset
2022-08-09 16:48:44 +08:00
Vincent Wang
cfa09d16d9
[CUDA] Mod Op Kernel (#12499)
* mod for cuda and rocm

* fix bfloat16 ut

* change bf16 ut number

* fix opset version

* fix op kernel doc
2022-08-09 13:05:40 +08:00
pengwa
a2dc3e9eac
Improve the compilation speed when compiling for multiple architectures. (#12490)
* improve the compilation speed when compiling for multiple architectures.

* formatting

* fix

* use 0 by default

* fix comments
2022-08-09 11:52:26 +08:00
Scott McKay
56bd96a3f5
Incrementally free initializers while saving to OrtValue instances (#12485)
* Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage.

Co-authored-by: Pranav Sharma <prs@microsoft.com>
2022-08-09 10:59:10 +10:00
Hector Li
730240d2a5
remove the link the comments (#12510) 2022-08-08 15:20:40 -07:00
Adam Pocock
8a86b346a5
[Java] JNI refactor for ONNX Tensor (#12281)
Working on JNI refactor for OnnxTensor.
  Simplifying the error handling logic in createTensor.
  Collapsing casting branches and migrating to ONNX element type enum.
  Disable cpplint for JNI C files.
2022-08-08 12:48:30 -07:00
Jian Chen
8c5c283471
new quantized operators split (#12495)
* adding conditional variable again

* Adding split test cases in python

* Adding python cases for split

* Enable s8s8 split

* Optimize input

* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"

This reverts commit d5e34acb

* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""

This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.

* format file

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Reformat file

* Reformat file

* format file

* Optimize input

* Remove unused import

* Remove useless init

* Format split.py with black
2022-08-08 15:12:09 -04:00
cloudhan
9c05577021
Fix various warning in kernel explorer (#12501)
Fix various warning
2022-08-08 11:15:41 -07:00
Yufeng Li
bdd6b00c9a
set zero point to 0 if all value are 0.0 (#12470)
* set zero point to 0 if all value are 0.0

* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal

* check scale to make sure it is not subnormal
2022-08-07 21:34:58 -07:00
cloudhan
ddea1e48df
Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483)
* Workaround false positive error produced by clang

ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name"
where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error
here.
2022-08-08 10:32:01 +08:00
Dwayne Robinson
eb90b52a75
DML EP fix training build error (#12461)
Fix onnxruntime_training.cmake missing linkage issue
2022-08-05 16:01:25 -07:00
Vincent Wang
e85e31ee80
Update ORTModule Default Opset Version to 15 (#12419)
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
2022-08-05 16:55:04 +08:00
Baiju Meswani
a7d6290774
CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412) 2022-08-04 22:28:28 -07:00
PeixuanZuo
3e1b0ac4b3
[DELETE] delete python package rocm4.3.1 (#12480)
[delete] delete rocm4.3.1
2022-08-05 13:27:42 +08:00
ytaous
b879dca51c
Fix Python Packaging CI (Rocm) (#12477)
Fix Python Packaging CI

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-04 20:40:09 -07:00
Scott McKay
8d830adf24
Rework parts of Graph::Resolve to reduce memory usage (#12176)
* Rework some aspects of Graph::Resolve to reduce memory usage.
2022-08-05 13:20:25 +10:00
cloudhan
f39354d7cb
Add composable kernel GEMM baseline for kernel explorer (#12364)
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
2022-08-04 17:32:20 -07:00
Vincent Wang
37995a7245
[CUDA] BiasSoftmax Supporting New Pattern (#12361) 2022-08-05 06:59:24 +08:00
LironKesem
d452462b5e
Lironkesem/unsqueeze_and_squeeze (#12421) 2022-08-04 15:12:34 -04:00
Dmitri Smirnov
a4ef0e7f7b
Remove dynamic allocation for ThreadPool ParallelSection (#12429)
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
2022-08-04 09:46:16 -07:00
Yufeng Li
ac10f33d2d
Enable quant op to share quantization parameter between input and ouput (#12408)
* share quant param between tensors
2022-08-03 21:25:35 -07:00
Ryan Hill
52d4699788
Minor doc fixes (#12388) 2022-08-03 19:47:36 -07:00
Edward Chen
3efd9a73bb
Refactor InferenceSession Load member functions. (#12430)
Fix comparison of path characters when checking for ".ort" suffix.

Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
2022-08-03 16:28:26 -07:00
Ashwini Khade
97268e023c
dev notes for layout transformer (#12396)
* first draft

* plus fixes

* plus more links

* Plus updates per review

* plus more clarifications

* plus updates

* plus more nit fixes

* plus some additions
2022-08-03 15:15:59 -07:00
Scott McKay
a3de1bbf7d
Update script to find optimizers that potentially need supported opset updates (#12330)
* Update to handle multiline declarations for the kernels which are typical these days.
* Update to new path for the cpu contrib_op kernel registrations.
* Update tools/python/find_optimizer_opset_version_updates_required.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-08-04 07:37:27 +10:00
Xinya Zhang
77cab7a3a5
[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968)
* [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d

* [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops

* (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes

* [ROCM] Replace miCompat with Helper functions

* (to squash) fix the compiling error of SetPoolingNdDescriptorHelper
2022-08-03 14:36:36 -07:00
Erick Muñoz
d1497bdf62
[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403)
* Removed unnecesary reorders
* Removed unnecesary element wise clip
2022-08-03 12:36:42 -07:00
Baiju Meswani
7f58bd7236
Perform graph transformations during offline tooling (#12422) 2022-08-03 11:27:12 -07:00
Dmitri Smirnov
dc984a03d5
Container and memory allocation guidelines (#12387)
Container and memory allocation guidelines
  Re-org and add code samples
  Clarify the wording on returning gsl::span
2022-08-03 10:31:59 -07:00
Tianlei Wu
97a340bf48
Fix integer overflow in LongformerAttention (#12435)
fix integer overflow
2022-08-03 10:29:07 -07:00
Changming Sun
44ec2cf088
Update publish-python-apidocs.yml (#12433) 2022-08-03 10:17:00 -07:00
Ye Wang
b622e5fa9b
Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327)
* support more inputs for greedy search

* fix docs

* refactor test

* lint

* review comments
2022-08-03 10:10:08 -07:00
Xinya Zhang
01f3a197d7
[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972)
* [ROCm] Add InstanceNormalization Op

* Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm

* [ROCm] Add BatchNormalization for fp32 and fp16

* Enable BatchNormTest for ROCm

* [ROCm] Add LRN Op

* [ROCM] replace miCompat functions with Helper functions
2022-08-02 23:14:26 -07:00
Vincent Wang
99d2a63e1a
Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432)
add seed
2022-08-03 13:29:30 +08:00