Commit graph

231 commits

Author SHA1 Message Date
RandySheriffH
75584c5fa8
Enabling thread pool to be numa-aware (#13778)
The PR enables ort thread pool to be numa-aware, so that threads could
be evenly created and distributed among numa nodes.
In addition, to facilitate performance tuning, the PR opens a new API
allowing customers to attach threads to certain logical processors.
Please check the API
[definition](https://github.com/microsoft/onnxruntime/pull/13778/files#diff-5845a5c76fb64abdc8f0cffe21b37f8da1712674eb3abc4cd87190891be1bd48)
for details.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-12-12 10:33:55 -08:00
Abhishek Udupa
83c59d2594
Session-aware and thread-safe CUDA profiler (#13706)
### Description
The existing CUDA profiler is neither session-aware, nor thread-safe.
This PR ensures both.

### Motivation and Context
[PR 13549](https://github.com/microsoft/onnxruntime/pull/13549) brought
thread-safety and session-awareness to the ROCm profiler. This PR brings
the same goodness to the CUDA profiler as well.

Sample outputs of a profiling run from the StableDiffusion model (this
model was chosen because it requires orchestration of multiple sessions,
and verifies that the profilers are now indeed session-aware) on both
CUDA and ROCm EPs are attached, along with a script that checks that the
trace files generated by the profile are well-formed.

Update 11/29: Updated the profile outputs. The older profile outputs
exhibited an issue where some timestamps were wildly out of range,
leading to problems visualizing the traces. The bug has been fixed and
the profile outputs have been updated, along with an update to the check
script to ensure that timestamps are monotonically increasing.


[sd_profile_outputs_cuda.tar.gz](https://github.com/microsoft/onnxruntime/files/10118088/sd_profile_outputs_cuda.tar.gz)

[sd_profile_outputs_rocm.tar.gz](https://github.com/microsoft/onnxruntime/files/10118089/sd_profile_outputs_rocm.tar.gz)

[check_profile_output_well_formedness.zip](https://github.com/microsoft/onnxruntime/files/10118090/check_profile_output_well_formedness.zip)

Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>
2022-12-09 13:22:12 -08:00
Sumit Agarwal
5b16593192
[DML EP] Attention Kernel bug fix (#13879)
### Description
- Use same data type as input for mask_index tensor which is used as DML
GEMM API's C parameter.
- Remove gsl header include as it is already gets included transitively.



### Motivation and Context
- Why is this change required? What problem does it solve?
Bug found in internal conformance testing.
- If it fixes an open issue, please link to the issue here.
N/A
2022-12-07 15:24:27 -08:00
Yi Zhang
ae2a9373ab
reenable quant model tests (#13871)
### Description

### Motivation and Context
Test data in the image has been fixed.
2022-12-07 23:33:22 +08:00
Numfor Tiapo
e0dcbc3832
Fix C26436 prefast errors (#13774)
Fixes errors 9196, 9214, 9255, and 9314.

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-12-01 09:07:44 -08:00
Numfor Tiapo
aa1390e963
Fix Prefast Errors (#13675)
Fixes all C28204, C6031, and C26814 prefast errors.

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-11-28 09:16:22 -08:00
Yi Zhang
a9a9c34d98
Fix WinML Test Case: create LearningModelBinding for every testcase (#13587)
### Description
Fix #13509

### Motivation and Context
The exception was caused by the incorrect fetches, which was from the
binding with last test cases.

efcbdac58e/onnxruntime/core/session/onnxruntime_c_api.cc (L809-L815)
2022-11-09 11:20:48 +08:00
Numfor Tiapo
49e5a11ccd
Fix SDL and Prefast Errors (#13465)
Fixes Errors 1978844, 1978870, 1978850, 1978855, and 9245

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-10-28 09:41:18 -07:00
Yi Zhang
e160688a9b
Skip some failed models winml and training workflows on Windows CPU (#13407)
### Description
1. update model name structure in model_tests.cpp with source name. To
avoid
`Condition test_param_names.count(param_name) == 0 failed. Duplicate
parameterized test name 'BERT_Squad_opset10_CPU'`
2. skip some failed models https://github.com/onnx/models/issues/568


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-10-25 10:05:04 +08:00
Numfor Tiapo
56387c3c31
Fix SDL Unmatched Annotation Errors (#13162)
Fixes 3 SDL unmatched annotation errors.

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-09-30 15:36:30 -07:00
Brian Martin
c20abcab87
User/brianma/eo (#13152)
fixing SDL issues. One was a SAL mismatch, the other was handling an
optional null pointer.
2022-09-30 09:43:56 -07:00
Edward Chen
454f77cd94
Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791)
# Motivation
Currently, ORT minimal builds use kernel def hashes to map from nodes to
kernels to execute when loading the model. As the kernel def hashes must
be known ahead of time, this works for statically registered kernels.
This works well for the CPU EP.
For this approach to work, the kernel def hashes must also be known at
ORT format model conversion time, which means the EP with statically
registered kernels must also be enabled then. This is not an issue for
the always-available CPU EP. However, we do not want to require that any
EP which statically registers kernels is always available too.
Consequently, we explore another approach to match nodes to kernels that
does not rely on kernel def hashes. An added benefit of this is the
possibility of moving away from kernel def hashes completely, which
would eliminate the maintenance burden of keeping the hashes stable.

# Approach
In a full build, ORT uses some information from the ONNX op schema to
match a node to a kernel. We want to avoid including the ONNX op schema
in a minimal build to reduce binary size. Essentially, we take the
necessary information from the ONNX op schema and make it available in a
minimal build.
We decouple the ONNX op schema from the kernel matching logic. The
kernel matching logic instead relies on per-op information which can
either be obtained from the ONNX op schema or another source.
This per-op information must be available in a minimal build when there
are no ONNX op schemas. We put it in the ORT format model.
Existing uses of kernel def hashes to look up kernels are replaced
with the updated kernel matching logic. We no longer store
kernel def hashes in the ORT format model’s session state and runtime
optimization representations. We no longer keep the logic to
generate and ensure stability of kernel def hashes.
2022-09-20 14:24:59 -07:00
Sumit Agarwal
f78ed1388a Fixed build break: inbox version of WindowsAI repo 2022-09-09 18:25:01 -07:00
Sumit Agarwal
bcdddb47ba Merge remote-tracking branch 'origin/main' into WindowsAI 2022-09-09 17:34:48 -07:00
sumitsays
05c65a54b3
[DML EP] Contrib Op: FusedMatMul (#12898)
* Contrib Op: FusedMatMul for DML EP

* Added relevant comments and extra validation

* Polish

* More polish

* Last polish

* Addressed comment on the PR

* Addressed comment on the R

* Removed un-necessary comments

* Used c++ standard function

* used std::c++ algorithms function

* Removed unsed code

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2022-09-09 09:37:38 -07:00
Sheil Kumar
535b0835f2
User/sheilk/dft fixes (#12862)
* DirectML DFT Tests and Fixes

* Dynamicaly allocate temporaries using the allocator...

* Allocate during compute

* wrong dims

* CR feedback
2022-09-07 13:21:56 -07:00
Yulong Wang
1a402a3f25
replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
Chun-Wei Chen
6246662b1d
[Dup] Fix SAME_UPPER/SAME_LOWER (auto_pad attribute) in ConvTranspose (#12537)
* Fix SAME_UPPER/SAME_LOWER (auto_pad attribute) in ConvTranspose

* Bump ONNX 1.10.2 globally

* load ONNX_VERSION from VERSION_NUMBER

* /

* revert deprecate warning in ORT 1.12

* add a comment about why removing cntk_simple_seg

* correct the implem in DML as well
2022-08-22 15:35:34 -07:00
Edward Chen
3efd9a73bb
Refactor InferenceSession Load member functions. (#12430)
Fix comparison of path characters when checking for ".ort" suffix.

Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
2022-08-03 16:28:26 -07:00
Sheil Kumar
7d712c8f8b
Fix WinML Tests are still targetting deprecated (deleted) experimental signal op definitions (#12006)
* fix winml tests

* remove legacy test

* switch idft -> dft+inverse attr

* upgrade opset 13->17 for signal ops tests
2022-06-27 16:35:50 -07:00
Dmitri Smirnov
267a424e52
Retry Rework execution frame to reduce memory allocations (#11897)
* Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)"

This reverts commit d2cbae3a04.

* Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.
2022-06-20 10:29:43 -07:00
Yi Zhang
d2cbae3a04
Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)
Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)"

This reverts commit 2ecba6fd25.
2022-06-17 17:07:21 +08:00
Dmitri Smirnov
2ecba6fd25
Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)
Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.
2022-06-16 16:50:48 -07:00
Gary Miguel
e8b0d24071
Support per-test tolerances for ONNX tests (#11775)
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.

In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.

Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.

Also fix various minor issues raised by linters.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 15:12:23 -07:00
Sheil Kumar
22739137c4
Update signal op defs to match onnx17 defs, and add more tests (#11631) 2022-05-28 16:00:09 -07:00
Sheil Kumar
6255194659
All LearningModelSessions created from a common LearningModelDevice should share the same thread pool (#11457)
* Share thread pools between devices

* make tests reuse device

* Change cpu thread pool options for dml sessions to use 1 thread with no spinning

* fix test failure

* Update missing type constraints for dft

* Add comment and rename inference session parameter

* default missing causing inconsistent test behavior

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-05-13 11:12:43 -07:00
Sheil Kumar
85fa168dc1
Add optional dft_length input to the DFT and IDFT operators. (#11427)
* Add optional dft_length input.

* CR Feedback

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-05-03 16:17:43 -07:00
Sheil Kumar
027565b3b2
Add multi-dim dft test, and fix complex idft (#10947)
* fix complex multi-dim dft

* Add multi-dim dft test, and fix complex idft

* remove incorrect inplace specification

* Add DFT tests

* update epsilon to 1000ths place

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-03-22 10:08:12 -07:00
Sheil Kumar
810c18e809
fix complex multi-dim dft (#10896)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-03-17 12:45:51 -07:00
Sheil Kumar
860f28254e
Update DFT definition to more closely align with PyTorch by enabling axis attribute, and arbitrary tensor rank. (#10842)
* Add axis attribute

* fix breaks

* Enable axis-specified DFT

* remove static cast

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-03-15 15:27:12 -07:00
Jingqiao Fu
f4fd67cc2c
Revert "add load from buffer (#10162)" (#10590)
This reverts commit 5cd57bb726.
2022-03-08 13:35:23 -08:00
Numfor Tiapo
9ad95bf068
Skip SetName test on inbox build (#10699) 2022-03-02 10:28:58 -08:00
Numfor Tiapo
5fbfca3d58
Add Experimental API for setting model name (#10518)
* Add experimental API for editing model name

* Change EditModelName to 'SetName'

* Change API to pass c_string

* Update SetName to edit the proto

* Test that the model proto gets changed

* Remove comments

* Skip inbox tests

* Use filehelper path

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-02-25 14:23:49 -08:00
Dwayne Robinson
ea7f773a6e
Merge pull request #10619 from microsoft/user/dwayner/DmlDev20220221
Update DirectML EP for ORT 1.11
2022-02-23 01:09:26 -08:00
Dwayne Robinson
6db6ee5710 Merged PR 6973543: ORT DML EP Opset 13 more complete
Extend opset 13 support for:
- Split-13
- Squeeze-13
- Unsqueeze-13
- Reshape-13
- QuantizeLinear-13
- DequantizeLinear-13
- ReduceSum-13
- Resize-13

Also:
- Rename the file where all the opset versions are stored from "OperatorRegistration.h" to "OperatorVersions.h", which will make it much less confusing in the future when looking given there's another file called "OperatorRegistration.h" that corresponds to "OperatorRegistration.cpp".
- Detemplatize many of the OperatorHelper.h constructors, which duplicate multiple instantiations due to the operator helper classes not sharing a common base class, by wrapping them with an adapter. Ideally there would be a common COM base interface that both IMLOperatorKernelCreationContext and IMLOperatorShapeInferenceContext implementation objects would implement, which a wrapper in MLOperatorAuthorHelper.h could QI for.
- Fix style formatting issues in OperatorHelper.h (sorry for the noise).

```
Summary: Total=4679, Passed=4355, Failed=0, Blocked=0, Not Run=0, Skipped=324
```

Corresponding WindowsAI PR:
https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/6973645

Related work items: #36672908, #36672926
2022-02-18 01:41:07 +00:00
Jingqiao Fu
2fa333443a
Add telemetry for device kind (#10431)
Add telemetry for device kind
2022-02-17 13:56:22 -08:00
Dwayne Robinson
6fd7ba5b7e Merged PR 6917440: ONNX Runtime update from GitHub master
Just RI.

Related work items: #38034064
2022-02-04 10:13:38 +00:00
Sheil Kumar
2dd5e75ba8
Incorrect output after GPU to GPU inference via VideoFrame and Gray8 models (#10425)
* If the tensor is of gray8 format, we should call the gray8 shader

* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2022-01-28 08:45:57 -08:00
Ryan Lai
c07e251cec Merged PR 6835169: RI 12/9/21 - 01/12/22
Build is green https://microsoft.visualstudio.com/WindowsAI/_build/results?buildId=43713985&view=results

![image.png](https://microsoft.visualstudio.com/274e76ac-6b29-4f77-a85d-7914c77cabd5/_apis/git/repositories/853d2ddc-663c-4fe8-8036-dbf0d50db2d9/pullRequests/6835169/attachments/image.png)

Related work items: #37712737
2022-01-13 00:25:51 +00:00
Jingqiao Fu
5cd57bb726
add load from buffer (#10162)
* Add LoadFromBuffer API
2022-01-10 10:51:48 -08:00
Dwayne Robinson
0f5e82c294
DirectML EP remove stale code for int64 via int32 double strides (#9959) 2022-01-10 02:07:22 -08:00
Dwayne Robinson
4ff78aae45
Merge pull request #9917 from microsoft/user/dwayner/FnsCandyTolerance30696168
Update WinML model tests for FNS candy and Inception float16
2021-12-02 22:45:45 -08:00
Sheil Kumar
5edaa75ef6
Fix LoadFromStream to not use wss::Buffer internally (#9918)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-12-02 21:29:06 -08:00
Dwayne Robinson
6e4c534ce2 Relax tolerance slightly more for Intel after autopilot run 2021-12-02 19:42:31 -08:00
Dwayne Robinson
77e67a6de7 Add one more example line 2021-12-02 13:34:01 -08:00
Dwayne Robinson
ef7671b938 Comment out old lines 2021-12-02 13:30:34 -08:00
Dwayne Robinson
7a3abd863f Update WinML model test tolerances for tiny_yolov2 and FNS_Candy 2021-12-02 00:48:54 -08:00
Ryan Lai
d8a7e1d159 Merged PR 6718335: RI 11/30 from github
Pipeline green https://microsoft.visualstudio.com/WindowsAI/_build/results?buildId=42142807&view=results

![image.png](https://microsoft.visualstudio.com/274e76ac-6b29-4f77-a85d-7914c77cabd5/_apis/git/repositories/853d2ddc-663c-4fe8-8036-dbf0d50db2d9/pullRequests/6718335/attachments/image.png)

Related work items: #37220320
2021-11-30 21:29:25 +00:00
Sheil Kumar
53c43e9949
WinML RT API: Add PixelRange Metadata to Bind() call PropertySet (#9827)
* Enable Normalization Binding Metadata

* copy paste error

* Small fix.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-11-24 13:44:25 -08:00
nums11
533b20c6ca Merge remote-tracking branch 'upstream/master' into dmldev_temp 2021-11-18 14:21:34 -08:00