Commit graph

7262 commits

Author SHA1 Message Date
Cassie Breviu
3e57cd88fc
Csharp docfx update (#12755)
* update dest to csharp folder, update ci to remove unused files, update git ignore

* add test branch to ci
2022-08-29 08:13:45 -05:00
Baiju Meswani
80c8d934b8
Add debug option to packaging pipeline (#12685) 2022-08-26 20:25:52 -07:00
mwootton
817dc94345
Add first pass of rocm kernel profiler (#10911)
* Add first pass of rocm kernel profiler

* Clean up rocm_profiler. Format args. Demangle kernel names.
Add Api EventRecords

* Remove debug output

* Temporarily disable profiling unit test 'api record check' for cupti

* Fix compile error for non-gpu builds

* Use common file for demangle and pid/tid.  Namespace ThreadUtil.  Fix gpu buffer clearing.

* Merge demangle into profiler_common

* Merge demangle into profiler_common part 2

* Style cleanup

* Resolve linking issues via ProviderHost interface

* Demangle cuda kernel names

* Clean up comments

* Fix formatting

* Fix anal retentive formatting
2022-08-26 19:38:03 -07:00
Adam Louly
ee543a47f6
upgrade cuda version on ci pipelines (training CI pipelines) (#12708)
* upgrade cuda version on ci pipelines

* keeping folder name same

* keeping folder name same

* setting manual seed for primitive test case

* resolving comments

* changing atol and rtrol only for test case

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-26 16:51:19 -07:00
edgchen1
64e8806148 Address some static analysis warnings. 2022-08-26 15:05:53 -07:00
edgchen1
c270ea148a Move 'using common::Status;' from common.h to status.h. 2022-08-26 15:05:53 -07:00
Dmitri Smirnov
3ff75fa05f
Address static analysis warnings (#12711)
Address static analysis warnings
2022-08-26 14:24:14 -07:00
Baiju Meswani
34d90dd5bd
mac-objc-static-analysis-ci-pipeline increase timeout (#12737) 2022-08-26 12:49:49 -07:00
Chi Lo
c9fd193ef6
Make TRT EP fully support control flow op and its subgraphs (#12692)
* sync graph proto in node's attributes

* Don't fuse nodes of control flow op until later in control flow op level

* remove unnecessary ep funtions

* remove unnecessary ep funtions

* remove unnecessary ep funtions

* missing 'override' keyword which makes MacOS/Web CI fail

* Add one more test run for Test3LayerNestedSubgraph with disabling graph optimization

* Update the comments to better understand the 4 cases
2022-08-26 12:45:47 -07:00
Yi-Hong Lyu
a972db06bf
Disable SYMMQGEMM benchmark for CPU other than ARM (#12739)
Besides, MlasGemmPackBSize should be MlasSymmQgemmPackBSize instead
2022-08-26 01:47:21 -07:00
cloudhan
5bdb1d4146
Add Tunable GEMM composed from rocblas and composable kernels (#12599)
* Add tunable gemm
2022-08-26 14:32:56 +08:00
cloudhan
46c074a6c8
Update composable kernel and enable experimental inter wave scheduling (#12626)
Update ck to latest master and enable interwave scheduling
2022-08-25 22:19:41 -07:00
Adam Louly
3bb5fb0f90
moving training pipelines from cuda 11.5 to 11.6 and deprecating 11.3 (packaging pipeline) (#12688)
* moving training pipelines from cuda 11.5 to 11.6 and deprecating cuda 11.3

* change to cuda 11.6.2

* change pytorch's & torchvision's cuda version to 11.6

* specify deps version to 11.6.2

* update pytorch and torch text version

* torch 1.12.1

* change torchvision and torchtext version to be compatible with torch 1.12.1

* change cuda to 11.6 for cuda_home comaptibility

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-25 22:12:01 -07:00
cloudhan
f76b40aa5b
Change TunableOp to use a type erased interface (#12597)
* Change to type erased interface, so that there is no need to implement a class for a simple kernel launch function
2022-08-25 19:46:04 -07:00
Cheng
baf141a084
Enable xnnpack EP in Android AAR package (#12720)
* take new features to export symbols

* comments to explain why
2022-08-26 10:29:23 +08:00
Scott McKay
8483b9c6e3
MacOS pipeline and MAUI CoreML fixes (#12724)
* Add asm statement to model.mm to force linker to link against CoreML.Framework.

Update targets.xml as per Rolf's suggestions

* Remove explicit numpy version from macos build. We don't specify it for other CIs and the version specified doesn't have a pre-built 3.10 wheel. This leads to the CI attempting to build numpy which fails.
2022-08-26 08:51:37 +10:00
abhi-ort
ebff15d743
Pinning manual seed (#12714) 2022-08-25 10:09:02 -07:00
Cassie Breviu
e85dce8cea
Add csharp docfx (#12596)
* add docfx and gh action to build docs

* kick off build from feature branch

* Fix LGTM linting

* update az pipeline to win22 & remove nuget install

* remove azure ci changes

* fix implicit using to support 5.0

* fix more js issues

* remove resource designer changes

* remove space

* fix linting misspellings in autogenerated js temp

* fix misspellings in generated code

* delete log file
2022-08-25 09:51:32 -05:00
Vincent Wang
5104c7dbd3
Fix Prefast Warnings (#12717)
fix prefast warnings
2022-08-25 17:09:37 +08:00
Yulong Wang
5be3e87c71
[js] upgrade minimist@1.2.6 (#12689) 2022-08-25 01:40:42 -07:00
Hariharan Seshadri
cde504ebbf
Fix/Suppress some VC static analyzer warnings (#12713) 2022-08-24 23:39:40 -07:00
Yi Zhang
dee2fdffb0
Remove debug build/test in Mac CPU training (#12698)
* run mac training parallely

* update jobname

* remove debug build/test
2022-08-25 13:38:53 +08:00
Yi Zhang
d91f017da1
remove redundant publish unit test results (#12697)
rm redundant publish unit test results
2022-08-25 11:18:07 +08:00
Cheng
eba4f77d00
enable xnnpack in default_full_aar_build_settings (#12682) 2022-08-25 10:41:06 +08:00
Pranav Sharma
f1528ea50f
Fix arithmetic overflow warning. (#12712)
Fix arithmetic overflow warning. Suggested fix by static analysis tool
Arithmetic overflow: Using operator '+' on a 4 byte value and then casting the result to a 8 byte value.
Cast the value to the wider type before calling operator '+' to avoid overflow (io.2).
2022-08-24 18:27:30 -07:00
Changming Sun
7927d525a7
Remove CUDNN path from CI build scripts (#12671) 2022-08-24 18:21:50 -07:00
Dwayne Robinson
3f47119f33
DML EP Fix InstanceNormalization with 3D tensors (#12693)
Fix InstanceNormalization with 3D tensors
2022-08-24 14:58:38 -07:00
Adam Louly
94f76b944e
nightly pipeline build using PTCA image. (#12605)
* nightly pipeline yaml and requirements files

* changed names, removed torchvision installing

* delete old file

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-24 10:40:55 -07:00
Nat Kershaw (MSFT)
0757d51334
Fix Java api docs broken link (#12686) 2022-08-24 09:56:51 -07:00
Vincent Wang
53ecb9e635
Update Supporting DS Version to 0.7.1 for ORTModule (#12696)
update ds version support for fp16_optimizer
2022-08-24 14:56:12 +08:00
Yi Zhang
de3d772995
Check GCC version (#12680)
* check gcc version
2022-08-24 12:10:08 +08:00
Edward Chen
8d657de4b2
Update Newtonsoft.Json version to 13.0.1. (#12691) 2022-08-23 18:45:38 -07:00
abhi-ort
73e5741a9a
Enabling softmax grad and logsoftmax grad on ORT (#12614)
* Enabling softmax grad and logsoftmax grad on ORT

* formatting changes

* formatting changes

* reverting changes

* Changing the OpType
2022-08-23 15:49:02 -07:00
Changming Sun
cb2601c5ea
Update mac-ci.yml to increase macOS build jobs' timeout value to 3 hours (#12675) 2022-08-22 21:31:30 -07:00
Tianlei Wu
8d78f96dfe
[CUDA] Fuse add bias and transpose into one kernel in Attention (#12670)
* fuse add bias and transpose in attention
2022-08-22 15:46:13 -07:00
Chun-Wei Chen
6246662b1d
[Dup] Fix SAME_UPPER/SAME_LOWER (auto_pad attribute) in ConvTranspose (#12537)
* Fix SAME_UPPER/SAME_LOWER (auto_pad attribute) in ConvTranspose

* Bump ONNX 1.10.2 globally

* load ONNX_VERSION from VERSION_NUMBER

* /

* revert deprecate warning in ORT 1.12

* add a comment about why removing cntk_simple_seg

* correct the implem in DML as well
2022-08-22 15:35:34 -07:00
Yulong Wang
c144acc534
Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Tianlei Wu
d93e6533b7
Format bert or transformers code (#12646)
(1) Modify some lines to fit line length limit 120
(2) Adjust parameter order of LaunchAttentionKernel
(3) Format code with Clang-Format in VS Code
(4) Fix spelling errors
2022-08-22 10:18:52 -07:00
Wei-Sheng Chin
dc486d146b
Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460)
* Make ORT as Pytorch JIT backend

LORT likely doesn't work with aten fallback so we only test LORT in its own CI.

* Revert changes to enable external CUDA allocator. Will add it later.

Revert "Revert changes to enable external CUDA allocator. Will add it later."

This reverts commit d5487f2e193014c805505afae8fb577c53667658.

Fix external allocator

* Relax tolerance and remove commented code

* Print more information in CI

* Fix pointer

* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.

* Use Pytorch master branch as all PRs are merged

Fix

* Refine based on cpplint feedbacks

* Revert changes to allow custom CUDA allocator in public APIs

* Use torch.testing.assert_close

* Use unittest framework

* Switch docker repo

* Rename *.cpp to *.cc

* Address comments

* Add comment

* Use same pipeline file for eager and lort pipelines

* Address comments

* Add yaml comment

* Fix cmake files

* Address comments

* Rename flags, remove printing code, remove dead comment
2022-08-22 09:40:40 -07:00
G. Ramalingam
53090f620e
Fix attribute renaming bug in function inliner (#12445)
* Fix attribute renaming bug in function inliner

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix attr name

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2022-08-22 08:19:42 -07:00
Vincent Wang
a078c8d99b
Update Supporting Deepspeed Version of ORTModule's FP16_Optimizer (#12668) 2022-08-22 22:22:53 +08:00
Chen Fu
8456f5fd97
qdq_util bug fix (#12647)
bugfix: when creating a temp infer file, an existing file maybe accidentally deleted
2022-08-22 09:32:43 -04:00
Scott McKay
2102b8f67c
Avoid duplicate symbol error between ONNX and ORT for ostream operator<< with TensorShapeProto (#12651)
* Remove ostream operator<< definitions for TensorShapeProto and TensorProto as they clash with ONNX definitions in onnx/defs/printer.h/cc.

Currently printer.h (unnecessarily) pulls in a number of other ONNX headers which causes naming clashes with parts of ORT. It is also excluded in a minimal build.

Instead convert the onnx::TensorShapeProto to onnxruntime::TensorShape so we use the existing ostream operator<< for TensorShape.

Make GetTensorShapeFromTensorProto consistent with GetTensorShapeFromTensorShapeProto so both return a TensorShape (as the name implies).
2022-08-22 17:20:52 +10:00
Yulong Wang
f40e90c33f
[js/web] fix incorrect shader for 'Resize' (#12588) 2022-08-21 21:47:28 -07:00
Yulong Wang
bfdd191eec
[wasm] use same export name for SIMD/NOSIMD build (#12545) 2022-08-19 18:17:50 -07:00
Dwayne Robinson
aa85092b51
DML EP squeeze all axes when empty (#12649)
DML EP squeeze empty axes
2022-08-19 08:56:03 -07:00
Changming Sun
b270334e1e
Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644) 2022-08-18 22:11:48 -07:00
Chen Fu
56dd0176a1
QDQ debugger - Adding Error Calculator (#12632)
QDQ debugger - Adding Error Calculator
2022-08-18 09:30:43 -07:00
Cheng
81b128b5e9
Qlinearsoftmax take FLOAT lookup-table (#12574)
* [loopuptable] float-type

* typed y-scale

* round to nearest even
2022-08-18 09:54:39 +08:00
Erick Muñoz
82b724fa5e
[oneDNN] Improve DequantizeLinear operator performance. (#12611)
* Detect when ZeroPoint = 0 and avoid sub op.

* Added tests to verify constant initializer behaviour.
2022-08-17 12:31:10 -07:00