Commit graph

7386 commits

Author SHA1 Message Date
cloudhan
0ddf4efbd9
Make PythonOp report dtype mismatch by name, instead of by using enum index (#13007) 2022-09-20 12:29:30 +08:00
Chen Fu
77b567df66
test qdq loss presence (#12928)
**Description**: Change qdq debugger test oracle

instead of testing a threshold, which occasionally fails, we just test
the loss value is present.
2022-09-19 15:58:27 -07:00
Prathik Rao
3cd2d4a7a1
Merge pull request #13013 from microsoft/prathikrao/setuptools-version-bug-fix
downgrade setuptools
2022-09-19 15:50:48 -07:00
Prathik Rao
8ea742b507 downgrade setuptools 2022-09-19 12:39:35 -07:00
Justin Chu
14eb3cf485
Ignore settings.json in git (#12988)
**Description**: Remove the `settings.json` line in gitignore.

**Motivation and Context**

Having `settings.json` tracked in git has created annoying diffs when it
is modified locally. This PR removes the entry in gitignore but
maintains the `settings.json` in the repo so that we have a good
default.
2022-09-19 12:05:43 -07:00
cloudhan
14365b67a0
Fix hipify due to CUDA EP tensorrt_fused_multihead_attention optimization (#12990)
Recent change in CUDA EP #12814 makes hipify extremely slow and breaks the building. This PR fixes it by c

The onnxruntime/contrib_ops/rocm/bert/attention.h is checkout-ed from the version before #12814 and manually hipify-ed.
Slightly extend amd_hipify.py to allow wildcard file match and exclude all `tensorrt_fused_multihead_attention/*` files from hipify
2022-09-19 15:29:23 +08:00
Changming Sun
e02bea2e3f
Fix some warnings (#12918) 2022-09-18 10:55:33 -07:00
Baiju Meswani
4ed5a5b2a8
Disable local versions based on environment variable (#12997) 2022-09-16 22:51:18 -07:00
Yufeng Li
b48f71fcfc
fix bug: quantization shape inference (#12983)
model path for onnx.shape_inference.infer_shapes_path and the external
data needs to be under the same directory as doc here:
f4dea9e68b/docs/PythonAPIOverview.md (shape-inference-a-large-onnx-model-2gb)
2022-09-16 10:17:22 -07:00
Wei-Sheng Chin
1a684152cc
Fix C6011: dereferencing NULL pointer with_data (and external_data) (#12982)
As title. For pattern like
```cpp
foo(*ptr)
```

we change them to
```cpp
if (ptr)
  foo(*ptr)
else
  throw
```
2022-09-16 09:49:36 -07:00
Wei-Sheng Chin
12aab3c01d
Fix TSA warnings (#12950)
Fix two warnings:
1. Warning: Avoid calling new and delete explicitly, use
std::make_unique<T> instead (r.11).
   Fix: new is replaced by creating unique_ptr and unique_ptr.release
delete is replaced by unique_ptr.reset and unique_ptr's destructor.
2. Warning: Buffer overrun while writing to 'cpu_buffers_info->buffers':
the writable size is 'buffers.public: unsigned __int64 __cdecl
std::vector<void \*,class std::allocator<void\*> >::size(void)const ()
\* 8' bytes, but '16' bytes might be written.
Fix: Replace void* with cudaStream_t and void** with
std::vector<cudaStream_t>.
2022-09-16 09:43:48 -07:00
Adam Louly
268bfe2a5d
python training api bindings (#12610)
**Description**: **Python API Bindings for on device training. **
**Motivation and Context**
- This PR contains api bindings so python users can perform a whole
training loop.

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-09-16 09:38:24 -07:00
Alexey Gladyshev
2b5b11d373
[C#][TVM EP] Fix issues related to using TVM EP in C# front-end (#12958)
Changes in this PR:
* Update building of Nuget package for TVM EP
* Update of documentation  for using TVM EP in C#
2022-09-16 16:04:59 +02:00
Jake Mathern
85546255c4
make nhwc transformer only apply to CPU ep. (#11882)
QLinearConv does not work with DML EP because this optimizer intended for CPU EP is wrongfully applied to it.

Limit NHWC optimizer to nodes assigned to the CPU EP
2022-09-16 18:46:28 +10:00
sumitsays
ab45ac311f
Merge pull request #12980 from microsoft/WindowsAI
[DML EP] Merge ORT/WindowsAI to ORT/main
2022-09-15 22:24:14 -07:00
Pranav Sharma
b935524e22
Revert reverse setup of allocators + create/register allocator in CPU EP only when needed. (#12954)
* Revert reverse setup of allocators + create/register allocator in CPU EP only when needed.
2022-09-15 17:54:32 -07:00
Faith Xu
94d9e9ad6d
[Issue labeler] Separate out C# api as separate label (#12951)
Separate out C# api as separate label
2022-09-15 17:36:57 -07:00
sumitsays
709254949a
DML EP temporarily fall back to CPU for LayerNorm when Bias is not present (#12987)
* Temporarily fall back to CPU for LayerNorm

* Build fix

* Typo

* TYPO

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-09-15 16:13:18 -07:00
Ye Wang
3c427a8946
Fix an arithmetic overflow warning (#12961) 2022-09-15 15:53:57 -07:00
Tang, Cheng
739b5675c8
remove legacy compile api (#12932)
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-09-15 13:18:40 -07:00
Changming Sun
203f63c224
Publish WinML Nuget package to ORT-Nightly ADO feed (#12904) 2022-09-15 12:10:27 -07:00
Sumit Agarwal
9f6646f11d Merge branch 'master' into WindowsAI 2022-09-15 10:55:08 -07:00
sumitsays
363c695dad
Update DML 1.9.0 to 1.9.1 (#12966)
Update DML to 1.9.1

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2022-09-15 10:54:22 -07:00
Yi Zhang
08af88e3e2
Assign generate document job to CPU pool. (#12973) 2022-09-15 10:42:12 -07:00
PeixuanZuo
647e09cc39
[FIX] skip layer norm for ROCm EP (#12803)
* [FIX] fix skiplayernorm
2022-09-15 09:07:37 -07:00
cloudhan
d2aa2109c0
Make TunableOp follow stream semantics (#12856) 2022-09-15 21:11:27 +08:00
Cheng
248f72e972
fix VC++ Static Code Analysis warnings (#12940)
* fix VC++ Static Code Analysis warnings

* fix warning
2022-09-15 16:33:13 +08:00
cloudhan
10f9a69707
Use CMake EXCLUDE_FROM_ALL for composable kernels to avoid building of conv related kernels (#12855) 2022-09-14 22:11:31 -07:00
Chun-Wei Chen
d819b56fba
Consume ONNX 1.12.1 to prevent vulnerability issue while loading external file (#12915)
* consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors

* update ONNX 1.12.1

* test updated PR

* use official rel-1.12.1 commit
2022-09-14 21:10:24 -07:00
PeixuanZuo
3f456a1847
[Update] update rocm5.2.3 (#12942)
* [Update] update rocm5.2.3

* [Update] use rocm docker image as base
2022-09-15 10:41:49 +08:00
Cassie Breviu
5099dda42f
Lint updates csharp docs (#12962)
* fix lint issues on docfx.vendor.js file

* fix ci

* remove submodule

* fix ci

* Update var name to AcceptedList

* remove test branch from ci
2022-09-14 17:56:41 -05:00
Dmitri Smirnov
bc2df1bf95
Remove previously deprecated API (#12935)
Remove previously deprecated API
Format JS code, address review comments
NPM Formatting
2022-09-14 10:58:03 -07:00
Yi Zhang
1ef1029163
Skip 2 tests in windows gpu workflow (#12956) 2022-09-14 09:43:38 -07:00
cloudhan
b8e34fbd91
Split topk implementation into per-type translation units to speed up compilation (#12861) 2022-09-14 19:36:54 +08:00
Vincent Wang
da07c83948
SoftmaxCrossEntropyLossInternalGrad and Sum Fusion (#12746)
* fuse scegrad and sum

* add yield output shapes to value_info

* resolve comments

* fix merge main
2022-09-14 14:45:51 +08:00
Dwayne Robinson
568950e28c
Warn on node EP silent fallback from preferred provider (#10831)
* Warn on node EP fallback from preferred provider
* Clarify with comment
* Update to ORT's weird coding style for ragged parameter wrap
* Android build error: unused parameter ‘providers’
* Update logic to be more robust
* Updates from Pranav's feedback about messaging to rerun with verbose and respecting explicit vs implicit EP addition. Also merge from main.
* brace style patch up
* Update with feedback from Pranav and Scott McKay
* Restore node_placement_set after realizing it only applies when is_verbose is true
* Fix build warning on Android
* Renamed to node_placement_provider_set per Pranav's suggestion
2022-09-13 15:53:17 -07:00
Yulong Wang
78bc53f91d
fix prefast:Warning C26814 in non_max_suppression.cc (#12934) 2022-09-13 15:22:55 -07:00
Changming Sun
bb98922cc8
Delete nuphar docker file (#12944) 2022-09-13 15:22:07 -07:00
Tianlei Wu
95c4fc6877
[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814)
* Add TensorRT fused attention fp16 kernels
* drop sm 72;  seq 512 for sm75; and head_size 32 kernels
* Add env variable ORT_DISABLE_FUSED_ATTENTION
* exclude files in hipify
* update AttentionPastState_dynamic test threshold
* fix --use_mask_index in benchmark
2022-09-13 15:16:12 -07:00
Scott McKay
1016c33519
Fix prefast warning in upsample.cc. (#12938)
* Fix prefast warning.
* Fix some other static analysis warnings.
2022-09-14 08:14:33 +10:00
Changming Sun
626d94aa23
Refactor python packaging pipeline and nuget packaging pipeline (#12945)
1. Move the Linux ARM64 part of python packaging pipeline to a real ARM64 machine pool
2. Refactor the Linux CPU build jobs of python packaging pipeline to two parts: build and test. The test part will be exempted from Cyber EO compliance requirements as it won't affect the final bits we publish. This refactoring is to reduce dependencies in the build part. For example, this PR remove pytorch from the build dependencies.
3. Combine DML nuget packaging pipeline with "Zip-Nuget-Java-Nodejs Packaging Pipeline" as they all produce ORT nuget packages. Also, publish DML nuget packages and ORT GPU nuget packages to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly feed.
2022-09-13 14:50:31 -07:00
Hariharan Seshadri
9edc9465f0
Fix some prefast warnings (#12936) 2022-09-13 13:04:37 -07:00
RandySheriffH
64466c2d62
Remove nuphar provider folder (#12939) 2022-09-13 09:10:52 -07:00
madurais
28e27ee7f7
Changes for AIX compilation to get CPU of running thread. hz is inter… (#12744)
* Changes for AIX compilation to get CPU of running thread. hz is internal variable in AIX, hence changing to hz1 in window_functions.cc so that all OS shall work

Co-authored-by: madurais <root@telesto10.in.ibm.com>
Co-authored-by: tvkai <vamshikrishna@in.ibm.com>
2022-09-13 11:06:35 +10:00
Edward Chen
31a1403e06
Add --output_dir option to convert_onnx_models_to_ort.py. (#12844)
Add --output_dir option to convert_onnx_models_to_ort.py.
Allows one to optionally specify an output directory for the converted model files.
2022-09-12 15:36:03 -07:00
Joseph Groenenboom
a433f22f17
Softmax interface update (#12469)
* Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP

* Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP

The kernel already masks off invalid items and this gives a much
faster implementation in hipCUB.

* Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel

Hard code accumulator to fp32 for hipCUB in indicated kernel.

* Reset casting to old behavior

* Document steps to optimize SoftMax kernel on ROCm EP

Usage of the hipCUB valid_items interface on reduction operations
has a significant performance impact. Masking all thread data to
avoid need to use the valid_items interface to hipCUB.
2022-09-12 13:02:31 -07:00
Tianlei Wu
30ebc9e00a
Useless Cast removal after converting model from float32 to float16 (#12871) 2022-09-12 11:07:33 -07:00
Yi Zhang
d8636c2be8
Add enable_onnx_tests in windows nuget test step (#12926) 2022-09-12 10:08:24 -07:00
Sumit Agarwal
f78ed1388a Fixed build break: inbox version of WindowsAI repo 2022-09-09 18:25:01 -07:00
Sumit Agarwal
bcdddb47ba Merge remote-tracking branch 'origin/main' into WindowsAI 2022-09-09 17:34:48 -07:00