Commit graph

7533 commits

Author SHA1 Message Date
Yi Zhang
8a3407d54f
update file name in the comment (#13275)
### Description
Correct the file name in the comments of the generated yaml.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-12 08:35:42 +08:00
cloudhan
1e55949a70
Fix unsound hipify in ROCm EP (#13269)
Some cuda related things is still left in the rocm ep statically
hipified code. Eliminate them to avoid confusion.
2022-10-12 08:32:42 +08:00
PeixuanZuo
b2353fa737
[ROCm] Add ROCm5.3 to python package pipeline (#13249)
### Description
<!-- Describe your changes. -->

1. Remove ROCm5.1.1 and ROCm5.2 from ROCm python package pipeline
2. Add ROCm5.3 to ROCm python package pipeline
pipeline:

https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=237172&view=results

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-12 07:23:42 +08:00
Nat Kershaw (MSFT)
fb86edb19f
Update publish-c-apidocs.yml to use main instead of master (#13281) 2022-10-11 15:37:59 -07:00
Prathik Rao
93e0a15117
implement cos gradient as a function op (#13227)
### Description
Implemented gradient of cos as per the function below.

![image](https://user-images.githubusercontent.com/31260940/193900310-b62a3e77-06d5-45af-ad28-a1d41920bad0.png)

### Motivation and Context
Cos gradient required for [huggingface's diffusers
library](https://github.com/huggingface/diffusers)

### Testing
built ORT from source: `./build.sh --config RelWithDebInfo
--enable_training --use_cuda --cuda_home /usr/local/cuda --cudnn_home
/usr/local/cuda --build_wheel --parallel --skip_tests`
tested CosGrad implementation: `cd build/Linux/RelWithDebInfo/ &&
./onnxruntime_test_all --gtest_filter=GradientCheckerTest.CosGrad`

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
2022-10-11 10:11:19 -07:00
Prathik Rao
05acd20a88
convert singrad to function op and remove cpu kernel (#13263)
### Description
Implemented gradient of sin as a function op.

### Motivation and Context
Sin gradient currently implemented as cpu op which could hurt
performance.

### Testing
built ORT from source: `./build.sh --config RelWithDebInfo
--enable_training --use_cuda --cuda_home /usr/local/cuda --cudnn_home
/usr/local/cuda --build_wheel --parallel --skip_tests`
tested SinGrad implementation: `cd build/Linux/RelWithDebInfo/ &&
./onnxruntime_test_all --gtest_filter=GradientCheckerTest.SinGrad`

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-10-11 10:11:08 -07:00
Yi Zhang
cd2e8b306c
Replace or remove some characters to meet gtest name convention (#13266)
### Description
To construct test name, replace whitespace to underscore and remove
parentheses

### Motivation and Context
gtest name only accepts '_' and alphanumeric
2022-10-11 16:23:54 +08:00
petermcaughan
febd5facce
Change head_size parameter dependent on qkv_hidden_size (#12933)
**Description**: Add qkv_hidden_size support in CUDA Attention Layer
implementation.

Changes include:

- Modify UT to test GPU and CPU implementation
- Add overload for CUDA kernel `AddBiasTransposeQKV` to support scenario
where V_HIDDEN_SIZE != QK_HIDDEN_SIZE
- Update variable names from `head_size` to `qkv_head_sizes[0]` or
`qkv_head_sizes[2]`
- Modify function definitions to allow communication of
`qkv_hidden_sizes` or `qkv_head_sizes`

Note that this feature is not supported in Rocm EP or quantized
attention right now.

**Motivation and Context**
- Why is this change required? What problem does it solve? The current
CUDA implementation of attention layer doesn't support the parameter
qkv_hidden_size added in the CPU implementation in PR
[8039](https://github.com/microsoft/onnxruntime/pull/8039)
- If it fixes an open issue, please link to the issue here.

Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
2022-10-11 00:25:47 -07:00
Vincent Wang
b9e23bd086
[ORTModule] Fix Custom Op Registry for Torch 1.13+ (#13250)
This PR has two fixes:
- https://github.com/pytorch/pytorch/pull/85636 change the behavior of
register_custom_op_symbolic to only register the symbolic function at a
single version. For ORTModule we need to pass the op_set version when
calling it.
- Since torch_1.13 the signature of einsum is changed to have a new
argument, need to change our custom op symbolic registry code
accordingly.

Without the fixes, ORTModule will not work with the nightly torch, and
the new torch version will be released.
2022-10-11 15:20:51 +08:00
Yi Zhang
6b499db7e1
increase ios pipeline timeout limit (#13268)
### Description
<!-- Describe your changes. -->



### Motivation and Context
The timeout issues increased
2022-10-11 14:07:04 +08:00
Yi Zhang
ea128cdb18
skip windows GPU check if changes only in doc (#13248)
### Description
Use Path filter and fake workflow to skip windows GPU check if there's
only changes in doc.
Refs:

https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/troubleshooting-required-status-checks#handling-skipped-but-required-checks

The fake github yaml is generated by code.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

###verifications:###
In this PR:
since the win-gpu-ci-pipeline.yml and .github are updated, so the real
Windows GPU workflows are always triggered.

in #13256
To avoid update win-gpu-ci-pipleline.yml, I added the path filter in
devops page. the fake win GPU workflows triggered, and the real
workflows are skipped.
2022-10-11 13:51:44 +08:00
PeixuanZuo
4d25b9c8f0
[ROCm] Update ROCm and MIGraphX CI pipeline to ROCm5.3 (#13257)
### Description
<!-- Describe your changes. -->

1. Update ROCm pipeline and MIGraphX pipeline to ROCm5.3
ROCm pipeline run ortmodule test one time and disable it :
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=777794&view=logs&j=48b14a85-ff1a-5ca4-53fa-8ea420d27feb&t=9c199f35-fc50-565d-6c65-5162c9bb1b04
2. Add `workspace: clean: all `.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-11 13:47:22 +08:00
cloudhan
2cf5d04e3d
Fix clang-tidy(cppcoreguidelines-pro-bounds-array-to-pointer-decay) (#13241)
clang-tidy says "Do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead"

It is a false positive scattering around all our codebase when using
helper macros. It is becuase for function with 4 char name, say `main`,
the type of __FUNCTION__ and __PRETTY_FUNCTION__ is `char [5]`.
2022-10-11 13:16:48 +08:00
Edward Chen
00146b2541
Add onnxruntime_BUILD_UNIT_TESTS=OFF definition to iOS package build options. (#13238)
Add onnxruntime_BUILD_UNIT_TESTS=OFF definition to iOS package build options. The `--skip_tests` option is already specified.
2022-10-10 18:00:17 -07:00
Dmitri Smirnov
25c0a66934
Natvis adjustments to make debugging bearable (#13237)
### Description

- Fix Abseil::InlinedVector inlined storage visualization
- Fix typo in protobuf natvis.
- Add basic gsl.natvis


### Motivation and Context
Debugging is hard.
2022-10-10 10:06:55 -07:00
pengwa
0668600255
Share scalar constant initializer (#12878)
**Description**: 
1. Share scalar constant for same data type, value and shape. 
2. Fix the order of Graph resolve context clear and
CleanUnusedInitializersAndNodeArgs().

**Share initializer for those who hold same value in same type and
shape, currently only handle scalar value or 1-D single value array.**
  
The transformation itself did not bring much impact on memory/perf,
instead is helpful to simplify the graph, making it easier for common
subexpression eliminations (CSE). Imagine graphs like this:


![image](https://user-images.githubusercontent.com/10530022/188895598-e06f9bf9-5466-4009-a68c-6b339133936c.png)

Add is NOT shared as inputs of Clip after CSE transformation because,
all Add's second constant input are different NodeArg*, so if we change
all constant initializer share the same NodeArg*, then only one Add will
be preserved after CSE transformation. There are few other similar cases
in one of 1P deberta models.

E2E measurement on 1P DEBERTA model, we see an increase from
SamplesPerSec=562.041593991271 to 568.0106130440271, 1.07% gains.

**Fix the order of Graph resolve context clear and
CleanUnusedInitializersAndNodeArgs().**

Graph resolve context will be cleared every time by end of
Graph.Resolve(), one of the thing to be cleared is the
"inputs_and_initializers" who hold string_view of all initializers.
While CleanUnusedInitializersAndNodeArgs removed some initializers, so
some strings that is referenced by string_view in
"inputs_and_initializers" remain to be there BUT in an invalid state.

**Motivation and Context**
- Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here.
2022-10-10 13:32:33 +08:00
sumitsays
e01a8519e0
[DML EP] Re-architect | Partitioning as Transformer (#13131)
### Description
Re-architect DML EP to allow ORT L2/L3 transformers. This change
includes:
- During ORT graph partitioning, DML EP will only set the
dmlExecutionProvider to all eligible nodes.
- Moved DML specific operator transformer as L2 transformer
- Introduced a new DMLGraphFusionTransformer, applicable only for DML
EP, which is responsible to
    - partition the graph
    - fuse each partition into a IDMLCompiledOperator
    - register the kernel for each partition


### Motivation and Context
- Why is this change required? What problem does it solve? 
It enables ORT L2/L3 transformers for DML EP, which will increase the
perf of Transformer-based models.
- If it fixes an open issue, please link to the issue here. N/A

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-10-07 22:35:47 -07:00
garanews
38906625a3
fix some typo in docs (#13212)
### Description
<!-- Describe your changes. -->
fix some typo in docs


### Motivation and Context
singed vs signed
succeding vs succeeding 
fileter vs filter
kernal vs kernel
libary vs library
2022-10-07 15:58:18 -07:00
Edward Chen
d411bd277e
Increase iOS packaging pipeline timeout. (#13233)
Increase iOS packaging pipeline timeout to 300 minutes.
2022-10-07 14:49:16 -07:00
Dmitri Smirnov
bb1c133245
[MicroGraph] Address ROCM warning and build failure (#13234)
### Description
Address build failures after Public API refactoring

### Motivation and Context
Make pipelines health.
2022-10-07 14:30:19 -07:00
Jian Chen
6662ece4a1
increase timeout to 5 hours (#13226)
### Description
Increase MacOS pipeline timeout to 5 hours



### Motivation and Context
It blocks Release pipeline
2022-10-07 13:02:48 -04:00
Baiju Meswani
04ba8a7e6e
Introduce Training C++ Apis (#12994) 2022-10-06 20:13:37 -07:00
cloudhan
51ac6617f5
Fix warnings and enable dev mode for ROCm CI (#13223)
Fix warnings and enable dev mode for ROCm CI:

* Fix ROCm headers complaining "This file is deprecated. Use the header file from ..."
* Disable warning signed and unsigned compare for kernel explorer
* Fix unused and nondiscard warnings
* Enable dev mode for ROCm CI
* Walkaround error "unknown warning option '-Wno-nonnull-compare'" in kernel explorer by using '-Wno-unknown-warning-option' to ignore the unknown option
* Fix error "unused parameter 'mask'"
* Fix warning "instantiation of variable 'onnxruntime::rocm::Consts<float>::One' required here, but no definition is available", etc. Fixed by using C++17's inline (implied by constexpr) static initialization.
* Remove unused variable
* Add the missing `override` specifier
2022-10-07 09:45:01 +08:00
Dmitri Smirnov
5dae0c477d
Deprecate CustomApi and refactor public API for better safety and consistency (#13215)
### Description
Deprecate CustomOpApi and refactor dependencies for exception safety and
eliminate memory leaks.
Refactor API classes for clear ownership and semantics.
Introduce `InitProviderOrtApi()`

### Motivation and Context
Make public API better and safer.

Special note about `Ort::Unowned`. The class suffers from the following
problems:

1. It is not able to hold const pointers to the underlying C objects.
This forces users to `const_cast` and circumvent constness of the
returned object. The user is now able to call mutating interfaces on the
object which violates invariants and may be a thread-safety issue. It
also enables to take ownership of the pointer and destroy it
unintentionally (see examples below).
2. The objects that are unowned cannot be copied and that makes coding
inconvenient and at times unsafe.
3. It directly inherits from the type it `unowns`.

All of the above creates great conditions for inadvertent unowned object
mutations and destructions. Consider the following examples of object
slicing, one of them is from a real customer issue and the other one I
accidentally coded myself (and I am supposed to know how this works).
None of the below can be solved by aftermarket patches and can be hard
to diagnose.

#### Example 1 slicing of argument
```cpp
void SlicingOnArgument(Ort::Value& value) {
  // This will take possession of the input and if the argument
  // is Ort::Unowned<Ort::Value> it would again double free the ptr
  // regardless if it was const or not since we cast it away.
  Ort::Value output_values[] = {std::move(value)};
}

void main() {
  const OrtValue* ptr = nullptr;  // some value does not matter
  Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)};
  // onowned is destroyed when the call returns.
  SlicingOnArgument(unowned);
}
```

#### Example 2 slicing of return value
```cpp
// The return will be sliced to Ort::Value that would own and relase (double free the ptr)
Ort::Value SlicingOnReturn() {
  const OrtValue* ptr = nullptr; // some value does not matter
  Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)};
  return unowned;
}
```
2022-10-06 14:57:37 -07:00
Ti-Tai Wang
87f55505b3
[ONNX] Support huggingface BART to ONNX (#12779)
Add BART into transformer support, specificalyy for
`BartForConditionalGeneration`

**Motivation and Context**
- fixes #11210 

Currently, the custom op beam search is not working in nightly, this PR
should be run with a [custom
commit](10f3d46d92)
2022-10-06 12:20:03 -07:00
Rachel Guo
814e5cfa4c
[rn] Support UINT8 type for onnxruntime-react-native on iOS (#13210)
### Description
<!-- Describe your changes. -->

As title.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Uint8 type might be required for some model used in sample application.
To match supported data types for onnxruntime-react-native for Android.

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-10-06 11:35:25 -07:00
ashari4
b09dd11ece
BFP schemas: Change block dimension type to Int (#13169)
* Change block dimension type to Int from Ints.
* In response to feedback that the block dimension corresponds to the
reduction dimension of the consuming matrix multiplication. There is
always only 1 reduction dimension.
2022-10-06 11:11:43 -07:00
Scott McKay
cf075fcbad
Handle edge case in CumSum causing overflow (#13174)
### Description
<!-- Describe your changes. -->
Add special case handling for exclusive + reverse where axis has dim
value of 1.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
#13165
2022-10-06 07:18:02 +10:00
Edward Chen
4e37464cc5
Add build configuration to binary size checks pipeline. (#13208)
Add another build configuration to binary size checks pipeline. Enable additional configurations to be added more easily.
2022-10-05 12:39:19 -07:00
Tony Xia
c7522e547a
Fixed a minor typo (#13194)
### Description
binraries ==> binaries



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-05 12:10:14 -07:00
Zhang Lei
dca941795e
Fix prefast bugs: 1944959 1997925 1997926 1997927 1997928 (#13203) 2022-10-05 08:59:40 -07:00
cloudhan
72076b1eb2
Update ROCm CI to use HIP LANGUAGE (#13214)
Update for ROCm CI before reland tunable GEMM #12853. This PR also update
composable kernel to use CMakes's HIP language support so that we can
mix C/C++ compiler with HIP compiler instead of locking to hip-clang
2022-10-05 16:15:16 +08:00
Ashwini Khade
4fc8f7139a
Bug Fix - C# API order incompatibile with C API (#13191)
### Description
Training C# bindings (ReleaseTrainingSession and ReleaseCheckpointState)
broke after an API order change in Training C API. This PR fixes this
issue.



### Motivation and Context
Bug Fix for Training C# bindings
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-04 09:29:20 -07:00
Justin Chu
595a0c8658
Disable clang-tidy CI (#13207)
Disable clang-tidy CI for now because it is creating a lot of false
positives like in https://github.com/microsoft/onnxruntime/pull/12998
2022-10-04 07:37:49 -07:00
Tianlei Wu
b6c04f48c1
Fix reshape fusion (#13150)
(1) Hot fixes reshape fusion, which causes stable diffusion unet model invalid.
(2) Update remove_cascaded_cast_nodes to make it faster
2022-10-04 00:26:29 -07:00
Faith Xu
2d50d4be24
Update TSA path to new ADO project (#12902)
Updates TSA item path to new ADO project area paths
2022-10-03 22:54:42 -07:00
Ashwini Khade
c780c4a2b9
Fix two prefast warnings (#13211) 2022-10-03 20:00:57 -07:00
Tony Xia
962fee5fe5
Fix typo enviroment => environment (#13195) 2022-10-03 17:02:26 -07:00
Justin Stoecker
9cf98dacb3
Enable command list reuse for Xbox (#13173)
Removes the workaround introduced in #12063, which disabled DML command
list reuse for Xbox builds.

The ID3D12CommandList created in FusedGraphKernel takes points to an
ID3D12CommandAllocator that is local to the `BuildReusableCommandList`
function. On PC it would seem the command list is keeping the command
allocator alive, but this is highly suspect logic that definitely
doesn't work on Xbox. I find no documentation indicating this logic
should work (a section on [reference
counting](https://learn.microsoft.com/en-us/windows/win32/direct3d12/recording-command-lists-and-bundles#reference-counting)
makes it clear command lists take no refs on D3D objects passed as args
to its APIs; however, it's unclear if this also applies to its
construction).

A second (small) change is constructing the command list straight into
`ID3D12GraphicsCommandList` and removing an unnecessary QI.
2022-10-03 16:03:29 -07:00
Ryan Hill
81a4efee6c
Prefast Fixes (#12952)
**Description**: Fixes these TSA issues (no actual bugs fixed, but just
changing code to make TSA happy)

To fix 1944982 and 1944973 I changed DeleteOnUnloadPtr to not use 'new'
and to just use placement new to go into a fixed buffer. This required
changing the rocm usage of it also (probably a separate TSA bug on that
one that I don't have)

1944982 Ryan Hill [prefast:Warning]: C26426 (in
onnxruntime/core/providers/cuda/tensor/cast_op.cc)
Global initializer calls a non-constexpr function 'operator new' (i.22).

1944973 Ryan Hill [prefast:Warning]: C26426 (in
onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc)
Global initializer calls a non-constexpr function 'operator new' (i.22).

1944929 Ryan Hill [prefast:Warning]: C26436 (in
onnxruntime/core/providers/cuda/cuda_provider_factory.cc)
The type 'struct onnxruntime::ProviderInfo_CUDA_Impl' with a virtual
function needs either public virtual or protected non-virtual destructor
(c.35).
2022-10-03 15:50:44 -07:00
Yulong Wang
82786baed1
[js/web] add 'xnnpack' to EP list (#12723)
**Description**: This PR adds support for "XNNPACK EP" in ORTWeb and
changes the behavior of how ORTWeb deals with "backends", or "EPs" in
API.

**Background**: Term "backend" is introduced in ONNX.js to representing
a TypeScript type which implements a "backend" interface, which is a
similar but different concept to ORT's EP (execution provider). There
was 3 backends in ONNX.js: "cpu", "wasm" and "webgl".

When ORT Web is launched, the concept is derived to help users to
integrate smoothly. Technically, when "wasm" backend is used, users need
to also specify "EP" in the session options. Considering it may get
complicated and confused for users to figure out the difference between
"backend" and "EP", the JS API hide the "backend" concept and made a
mapping between names, backends and EPs:
"webgl" (Name) <==> "onnxjsBackend" (Backend)
"wasm" (Name) <==> "wasmBackend" (Backend) <==> "CPU" (EP)

**Details**:
The following changes are applied in this PR:
1. allow multi-registration for backends using the same name. This is
for use scenarios where both "onnxruntime-node" and "onnxruntime-web"
are consumed in a Node.js App ( so "cpu" will be registered twice in
this scenario. )
2. re-assign priority values to backends. I give 100 as base to "cpu"
for node and react_native, and 10 as base to "cpu" in web.
3. add "cpu", "xnnpack" as new names of backends.
4. update onnxruntime wasm exported functions to support EP
registration.
5. update implementations in ort web to handle execution providers in
session options.
6. add '--use_xnnpack' as default build flag for ort-web
2022-10-03 10:38:45 -07:00
Yufeng Li
1342baf1c7
refine QuantConfig (#13155)
Refine the QuantConfig: 1. Remove the default EP config. 2. pass
QuantConfig to quantize API direclty.
2022-10-03 08:34:49 -07:00
Baiju Meswani
0cf17b1921
Add linux debug training package to nightly pipeline (#13192) 2022-10-01 06:58:43 -07:00
Nat Kershaw (MSFT)
68218935b9
Fix syntax error in labeler.yml (#13193) 2022-09-30 23:10:21 -07:00
Edward Chen
a86b8329d9
Update unsupported ORT format version error message to link to doc on rel-1.13.0 branch. (#13187) 2022-09-30 17:13:52 -07:00
Yulong Wang
054464dce2
fix XNNPACK on WebAssembly SIMD (#13161)
### Description

fix XNNPACK on WebAssembly SIMD.

Flag "-msimd128" need to be applied to every source file when compiling
WASM SIMD. Currently only a part of the source files are compiled with
this flag so we get inconsistent result for
`sizeof(xnn_f32_minmax_params)` because the type definition include a
`#ifdef` for `__wasm_simd128__`. The inconsistency causes writing
garbage data to a stack variable and eventually cause the crash.

XNNPACK libraries are C libraries so need to apply the build flags not
only to `CMAKE_CXX_FLAGS` but also to `CMAKE_C_FLAGS`.
2022-09-30 16:34:15 -07:00
Nat Kershaw (MSFT)
0bf0991fa2
Update labeler.yml (#13186) 2022-09-30 15:43:33 -07:00
Numfor Tiapo
56387c3c31
Fix SDL Unmatched Annotation Errors (#13162)
Fixes 3 SDL unmatched annotation errors.

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-09-30 15:36:30 -07:00
Edward Chen
aae35f2759
Binary size reduction in KernelTypeStrResolver and GraphPartitioner (#13172)
Reduce binary size for minimal Android builds.
- reduce places where Status objects are created in KernelTypeStrResolver::LoadFromOrtFormat()
- remove some unused parameters (in a base minimal build) and code in graph_partitioner.cc
2022-09-30 13:50:39 -07:00
George Nash
b76a65c784
Upgrade the oneDNN ep to use oneDNNv2.7 (#13175)
### Description
This updates the oneDNN library used by oneDNN ep from version 2.6 to
version 2.7



### Motivation and Context
This brings in the many improvements incorporated into the oneDNN
library to the oneDNN execution provider.

Signed-off-by: George Nash <george.nash@intel.com>
2022-09-30 12:29:17 -07:00