Commit graph

7023 commits

Author SHA1 Message Date
Juan Paez
9b6ef17c5f
Eager opgen support for in-place operations with variadic args (#12125)
* use torch library binding frontend for tensorlist

* fix test

* allow in-place modification of variadic args

* fix lint issues

* update ORT eager readme

Co-authored-by: Juan Paez <juanpaez@microsoft.com>
2022-07-19 21:01:00 -07:00
Xinya Zhang
5e2109f7ef
[ROCm] Enable GridSample Op. (#11969) 2022-07-19 20:44:30 -07:00
Dmitri Smirnov
4f106d2b3b
Eliminate unnecessary status lock acquisition in TP (#12196)
Eliminate unnecessary status lock acquisition in the Thread Pool
2022-07-19 14:16:12 -07:00
Tianlei Wu
972e5e7300
Improve symbolic shape inference in transformers tools (#12217)
improve symbolic shape inference handling n transformers tools:  avoid infinite loop and suppress duplicated warnings
2022-07-19 13:27:35 -07:00
Jameson Miller
975bb56e8c
Eager mode - argmax_out: set output tensor (#12233)
This change updates the implementation or te argmax_out operator to 1)
set the output tensor correctly and 2) remove the unnecessary use of a
temporary tensor to store intermediate result of onnx ArgMax operation.

Previously, the argmax_out operator did not correctly update the out
tensor - it replaced the OrtValue instead of the memory backing the
OrtValue . To properly update the output tensor, we need to calculate
the expected shape of the out tensor.

We add the helper function calculate_reduction_shape to calculate the
shape of the reduced tensor from the input tensor, dimension to reduce,
and option to keep the reduced dimension or not. This is based on the
utility functions in aten/src/ATen/native/ReduceOpsUtils.h in the
PyTorch repository, but is tailored to be a bit more specific to our
current needs.

Notes:

We considered just directly leveraging PyTorch's utility functions (e.g.
get_reduction_shape) to calculate the shape of the reduced tensor from
aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but
including this header file resulted in warnings around unused functions
that we need to handle. As we only need a limited functionality at the
moment, we instead implemented our own utility function to calculate the
reduction shape for our specific current needs. If we need a utility
function to more generally calculate the reduction shape, we could
consider switching to leveraging the utility methods in PyTorch.
2022-07-19 14:37:03 -04:00
Dmitri Smirnov
555e88982f
Fix GH issue 12208 (#12224) 2022-07-19 10:03:43 -07:00
Changming Sun
2cb642927b
Simplify get_docker_image.py (#12166)
Simplify get_docker_image.py by leverage docker itself remote cache functionality.
2022-07-19 09:53:01 -07:00
Tianlei Wu
0c319d6e94
Exclude implicit inputs from dump of encoder feeds in beam search (#12222)
fix encoder feeds dump
2022-07-19 09:44:12 -07:00
Alexey Gladyshev
66978c7ef5
[TVM EP][CI] Added TVMso EP testing into CI (#12188)
* refactor test for model with undefined shapes

* add test for TVMso EP

* update build script for TVM EP tests

* fix pylint

* disable test for Windows

* fix black

* fix python format

* fix pylint

* fix python format

* replace Path.resolve with os.path.join

* fix python path issue

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-19 16:05:28 +02:00
Wil Brady
4235ebc161
Add eager mode support for mm.out (matrix multiplication). (#12214)
* Add eager mode support for mm.out (matrix multiplication).

* Fallback to cpu when mm requirements not met so cpu can print error message.
2022-07-19 07:28:48 -04:00
Michael Melesse
bb5bd08545
[ROCM] Navi21 fixes pr (#11368)
* add scripts

* update docker scripts

* update build script

* create run script

* add test script

* add log 3 flags

* use the right build function

* build navi

* add clean script

* add pytorch like soln

* only build gfx 1030

* use HOST side var

* ignore logs

* update scripts

* GPU_WARP_SIZE_HOST

* update scripts

* remove scripts/amd

* match main

* add GPU_WARP_SIZE_HOST on cuda side

* match main

* correct gfx1030

* remove print

* move gfx add to rocm5.0

* remove inline

* make constexpr on cuda side
2022-07-18 22:26:57 -07:00
Vincent Wang
173bcdbc71
[CUDA] Split/Concat Kernel Optimization (#12175)
* split concat optimization

* bugfix

* fix ut

* deprecate LooseVersion
2022-07-19 08:10:46 +08:00
Yulong Wang
ced7c2deac
[js/web] use windowed Chrome for perf mode (#12157) 2022-07-18 14:04:27 -07:00
Tianlei Wu
b81b652608
Add --disable_shape_inference option to optimizer.py (#12215) 2022-07-18 13:52:02 -07:00
Sean Murray
93229949d4
Fix bug where onnxruntime_USE_NCCL flag would default to ON (#12195)
Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise
2022-07-18 12:13:08 -07:00
Tianlei Wu
17b84c78f7
remove identity in transformers model graph fusion (#12194)
* remove identity in fusion
2022-07-18 09:59:42 -07:00
caoting-dotcom
4d38b84e26
Add file mapping for windows platform. (#12183)
* Add file mapping for windows platform.

* Add unit test for file mapping for windows. Also add an error message for mis-aligned offset

* Add unit test for file mapping for windows. Also add an error message for mis-aligned offset

* Update data type to avoid warnings

* Compitable data type to avoid warnings. Update CreatFileMapping2 condition for winml compiling.

* Add type conversion to avoid warnings for X86 release build.

Co-authored-by: Ting Cao <ticao@microsoft.com>
2022-07-18 09:24:12 -07:00
leqiao-1
09af4a7fdd
remove wrong placed libs (#12201) 2022-07-18 09:22:22 -07:00
Alexey Gladyshev
d31db1aa57
[TVM EP][CI] Integrate TVM EP into ORT public CI on Windows (#12161)
* Integrate TVM EP into ORT public CI on Windows

* empty commit for restart pylint

* empty commit for restart pylint
2022-07-18 11:12:16 +02:00
msftlincoln
52095fb042
Fix line spacing/break issue, extend existing tests (#12191)
* fix line length

* extend test cases

* lint
2022-07-15 19:32:34 -04:00
msftlincoln
a2dc6d32fc
OnnxRuntime Eager: Implement log_softmax with ONNX Ops (#12190)
* share CHECK_STATUS

* log_softmax
2022-07-15 15:03:08 -04:00
msftlincoln
9bca8405aa
bitwise_and ONNX support (#12189)
* bitwise_and ONNX support

* whitespace lint
2022-07-15 12:59:56 -04:00
Wil Brady
89bf6c9b5d
Simple eager training models (#12180)
* Simple NN using ort, and added or modified ort op support.
2022-07-15 09:18:00 -04:00
msftlincoln
fafb24142f
add comment to explain local scalar dense (#12179)
* add comment to explain local scalar dense

* spacing
2022-07-15 09:03:43 -04:00
Viswanath Boga
05c31a036d
fixing positions for beam search gpt2 (#12156)
* fixing positions for beam search gpt2
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2022-07-14 13:31:59 -07:00
Wil Brady
9ebef91a6f
Update eager Readme.md (#12170) 2022-07-14 06:05:50 -04:00
PeixuanZuo
7b53b223b8
[UPDATE] update AMD CI pipeline to Rocm5.2 with torch1.11 (#12162)
* [UPDATE] update ci to rocm5.2 + torch1.11

* [Revert] disable ort module test

* [DELETE] delete Rocm5.1.1 ci test result

* [UPDATE] update the comments
2022-07-14 16:38:16 +08:00
Vincent Wang
a7eb9fe3ac
Remove Apex Dependency For Deepspeed FP16_Optimizer (#12077)
* remove apex dependency

* fix amd build
2022-07-14 11:15:53 +08:00
Wil Brady
5da1e5d36d
Eager mode: Fix some python warnings. (#12167) 2022-07-13 20:24:42 -04:00
Maxiwell S. Garcia
51f8456c4d
ppc64le: Optimizing the MlasQLinearMulKernel() to use VSX instructions (#12051) 2022-07-13 11:11:29 -07:00
Chen Fu
040c2f4517
x86/64 U8S8 Gemm Precision Fix (#12088)
Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed

In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in:
https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html

In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied:

1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support
2. Session option "session.x64quantprecision" is on.
3. Constant weight tensor contains values outside of [-64, 63] range

Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.
2022-07-13 10:12:25 -07:00
Wil Brady
48647bc7d7
Fix NonZero eager impl. (#12143) 2022-07-13 05:50:33 -04:00
Valery Chernov
3b0aaa9e0e
[TVM EP] support build on Windows (#11851)
* add description of build ORT+TVM EP on Windows

* fix cmake error related to symlink creation on Windows

* add llvm config path to build flags for correct build on Windows

* update TVM_EP.md for llvm_config build arg

* fix warnings skipping during build on Windows

* fix using string or wstring for model path to correct build on Windows (MSVC error)

* fix error in custom logger for correct build on Windows

* implement glob algorithm for Windows

* additional build fixes

* update TVM with export of VM symbols for dll

* description of nasm issue and workaround

* update TVM with export of Executable from VM symbols for dll

* description of installation of ipp-crypto dependencies on Windows

* cmake key for ipp-crypto build

* fix wstring for TVMso EP

* fix ipp-crypto build

* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality

* fix absolute path to compiled lib

* update TVM_EP.md, fix lint warnings

* update TVM_EP.md

* small fixes after review

* switch on handshake functionality for Linux workflow

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-07-13 10:48:42 +02:00
Scott McKay
75cf5dc2c9
Fix GH issue 12151 by using inverse perms for updating DQ axis attribute (#12158)
* Fix GH issue 12151.

Need to use inverse perms for updating that axis to what is used for transposing the input. This only applies if the DQ node is doing per-axis dequantization.
2022-07-13 18:02:58 +10:00
cloudhan
785f74979b
Rework cmake for kernel_explorer (#12079)
Improve CMake for deep integration with ORT, so that we can easily hook ort function of microbenchmarking purpose.
2022-07-13 15:43:32 +08:00
PeixuanZuo
5579d81fc8
[add] Add operator gemmfastgelu for ROCM (#12101)
* [ADD] add gemm fast gelu

* [UPDATE] refunction matmul_impl

* [Update] delete tuning_ in this pr

* [FIX] code format

* [FIX] compiler warning

* [Update] update doc
2022-07-13 15:40:16 +08:00
jingyanwangms
a9d0d3323e
Use updated symbolic_helper.check_training_mode (#11900)
Co-authored-by: Jingyan Wang, Baiju Meswani
2022-07-12 17:26:06 -07:00
RandySheriffH
178a413ca1
List 3.10 as supported python version and remove 3.6 (#12141)
list 3.10 as supported python version and remove 3.6

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-07-12 15:28:30 -07:00
Adam Pocock
e0ed9f0f2f
[java] First part of the JNI error handling rewrite (#12013)
**Description**: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is.

The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review.

**Motivation and Context**
- Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side.
- If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.
2022-07-12 15:16:54 -07:00
msftlincoln
a6fd1a3b85
Eager mode generator improvements for multiple onnx operators and extra test cases (#12111)
* test case for masked_select

* isolate variables per onnx_op, include line numbers for ORT errors

* format errors

* correct masked_select impl, broadcast test

* node attrs naming fixed
2022-07-12 16:05:09 -04:00
Edward Chen
6e051016c1
Add Python package to perf test pipeline. (#12135) 2022-07-12 10:50:24 -07:00
LironKesem
9647a3be40
Add tests for all unary aten ops supported in eager mode (#12087)
* Add tests for all uniary aten ops supported in eager mode

* fixing the PR draft

* fixing the merge

* changing eval to be at compile time

* adding requirements for eager

* 1.adding function to {ops}_out
2.cleaning the code
  and adding comments

* editing the code according to code review

Co-authored-by: root <root@AHA-LIRONKESE-1>
2022-07-12 08:53:19 -04:00
Hariharan Seshadri
73310b2a0f
Fix Reduced Ops build pipeline (#12144)
Fix ReducedOps build pipeline
2022-07-11 19:02:38 -07:00
Carson Swope
c675c4750a
include coreml_provider_factory.h in macos build instead of coreml_ex… (#12138)
include coreml_provider_factory.h in macos build instead of coreml_execution_provider.h
2022-07-11 18:27:01 -07:00
Dwayne Robinson
742f843efc
RoiAlign CPU EP add warning for max mode with samples != 1 (#12136)
* RoiAlign add warning about incorrect max summation when sample size not 1
2022-07-11 17:44:41 -07:00
Wil Brady
f1047e0456
Fix minor python and cpp warnings from previous PR. (#12140)
Description: In the PR 12018 a few fixable python and cpp warning were introduced that this PR cleans up. Also adding a comment on the intent of test_mul_bool and out testing on test_ones.

Motivation and Context

When iterating in Python, use a list instead of a set and don't use reserved words
Fix long line in cpp
Clarify test_mul_bool intent for future developers.
fill_ implements torch.ones under the covers but in previous pr verification on the out param was not added so adding it here.
2022-07-11 16:18:40 -04:00
Preetha Veeramalai
99a370dd02
Update readme for OVEP (#12122)
* Add changes for training module in Readme

* Update ReadMeOV.rst
2022-07-11 10:54:12 -07:00
Wil Brady
418cfdc766
Update create_ort_attribute to set the tensor dimension and value correctly. Implement eager fill_ (#12018)
* Update create_ort_attribute to set the tensor dimension and value correctly.

* Eager mode support for fill_ and mm.out (mm uses mm.out).
2022-07-11 11:18:04 -04:00
PeixuanZuo
1c39d22f4e
[ADD] Rocm5.2 for Rocm python packaging pipeline (#12129)
[ADD] rocm5.2
2022-07-11 11:10:45 +08:00
Ashwini Khade
c6732c079b
pin protobuf version to be compatible with onnx (#12132)
Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-07-08 15:01:27 -07:00