Commit graph

5567 commits

Author SHA1 Message Date
Guoyu Wang
bee5c26580
Add CPU_ONLY runtime option to NNAPI EP (#9066)
* Add NNAPI cpu only option

* update java

* Update comments
2021-09-15 15:50:18 -07:00
Suffian Khan
e758870b18
Upgrade ROCm CI pipeline for ROCm 4.3.1 and permit run inside container (#9070)
* try to run inside 4.3.1 container

* no \ in container run command

* remove networking options

* try with adding video render groups

* add job to build docker image

* try without 1st stage

* change alpha, beta to float

* try adding service connection

* retain huggingface directory

* static video and render gid

* use runtime expression for variables

* install torch-ort

* pin sacrebleu==1.5.1

* update curves for rocm 4.3.1

* try again

* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05

* disable RoBERTa due to high run variablity on ROCm 4.3.1

* put reduction unit tests back in
2021-09-15 12:32:02 -07:00
austinpagan
a05e32803a
Fixing MORE mlas unittest failures in POWER (#8673) 2021-09-15 11:39:46 -07:00
Sheil Kumar
273494ee9e
Ensure ms-experimental domain Audio Ops build in mac pipeline (#8857)
* Globally enable ms-experimental ops

* change meaning of ms_experimental to mean *all* ms_experimental ops. Some experimental ops will still be enabled globally without this flag like audio ops.

* add cmath

* add cmath to signal_defs.cc

* move audio back into experimental, verify on mac

* remove experimental from mac builds

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-09-15 10:59:32 -07:00
ashbhandare
98ac341c5b
Filter nones from ctx saved tensors (#9063)
Co-authored-by: Aishwarya Bhandare <aibhanda@5cb7a9c3931a4b19a66ae028b49221a6000001.ahkw4qp232huflxlm4gmpq4nbh.jx.internal.cloudapp.net>
2021-09-15 10:13:45 -07:00
Changming Sun
4930320647
Delete linux-pytorch-custom-ops-ci-pipeline.yml (#9023) 2021-09-14 21:51:21 +00:00
Changming Sun
0270ab17c5
Set onnxruntime_DISABLE_RTTI to default OFF (#9049) 2021-09-14 13:53:02 -07:00
Edward Chen
32366fea02
[Objective-C API] WIgnore clang documentation warnings from C/C++ header usage. (#9057) 2021-09-14 13:03:48 -07:00
Tianlei Wu
3ec3e9f705
Add t-test to compare experiments in GPT-2 mixed precision conversion (#9042)
* Add t-test to compare two experiments
* Ranking based on pair-wise T-test results and a custom scoring function
2021-09-14 12:40:25 -07:00
G. Ramalingam
7d28b596f4
Add function-body to opschema of FastGeluGrad (#9028)
* Add function body to FastGeluGrad

* Add test case
2021-09-14 12:27:55 -07:00
Suffian Khan
4322f7e647
Fix ROCm wheels CI pipeline break by installing latest protobuf from source (#9047)
* install protobuf from source

* fix rm command in Dockerfile

* fix options on rm command

* fix cd into protobuf source directory

* try again

* remove strip step

* debug list the files

* ls on /usr

* more debug

* more debug

* adjust LD_LIBRARY_PATH

* try remove protobuf before ORT build
2021-09-14 12:07:00 -07:00
Guoyu Wang
cf70635d2a
Add Android executable drop in the Package pipeline (#9050)
* add copy executable for android job

* minor fix

* Variable fix

* Move to use tgz because zip is not part of the docker image

* update compression
2021-09-14 11:45:33 -07:00
Yulong Wang
be80698698
[js/web] a bugfix and add tests for wasm proxy worker (#9048)
* [js/web] add tests for wasm proxy worker

* fix script src override
2021-09-14 10:38:58 -07:00
Edward Chen
e574be4a53
[C API Docs] Add docs for run options tag/log level accessors/modifiers. (#9045)
Add documentation for these C API functions:
RunOptionsGetRunLogSeverityLevel
RunOptionsGetRunLogVerbosityLevel
RunOptionsGetRunTag
RunOptionsSetRunLogSeverityLevel
RunOptionsSetRunLogVerbosityLevel
RunOptionsSetRunTag

Update some existing documentation.
2021-09-14 08:53:35 -07:00
mindest
6036a6b915
Add type int64 for Equal, float types for ReduceSum (ROCm) (#9010) 2021-09-14 00:07:30 -07:00
Sherlock
9174cbe3d5
Optimize CUDA Kernel for 3D and 4D Transpose (#8928)
* Optimize Transpose120 and Transpose102

* Generalize Transpose0123 for more input shapes

* Add Transpose3D test cases

* update rocm kernel
2021-09-13 23:00:53 -07:00
Tianlei Wu
5969d576e5
Revert "disable half2 kernel by dfault (#9034)" (#9044)
This reverts commit 289999af35.
2021-09-13 17:25:25 -07:00
baijumeswani
34f37d2920
Disable fallback for ortmodule api tests (#9018) 2021-09-13 16:00:13 -07:00
Guoyu Wang
c709380c52
Add full iOS job in package pipeline (#9036)
* Add full ios xcframework job

* create zip file of the xcframework
2021-09-13 15:54:11 -07:00
baijumeswani
1422a9ba6b
Remove previous temporary fixes and address TODOs (#9020) 2021-09-13 10:10:07 -07:00
Edward Chen
011cb8fd48
Fix Where op type reduction processing (#9033)
* Update type reduction script to track Where Op's second input type.

* Clean up op_kernel_type_control.h includes.

* Use more maintainable include.
2021-09-13 08:37:58 -07:00
mindest
a1021a1cf4
Add BatchNorm kernel for ROCm (#9014)
* Add BatchNorm kernel for ROCm, update BN test

* correct epsilon_ setting; limit min epsilon
2021-09-13 15:15:05 +08:00
Rajalakshmi Srinivasaraghavan
e83cc534d4 Fix cmake POWER10 detection
Recent commit 60c98a8 changed variable mlas_common_srcs which affects
POWER10 detection.
2021-09-12 11:56:55 -07:00
Hariharan Seshadri
c674343d94
Remove document text from error message in a couple of ops (#9003) 2021-09-11 08:37:52 -07:00
Ryan Hill
c3321b1778
Fix NVTX profiling so it can run in the shared CUDA provider (#9035)
* Move NVTX profiling so it can run in the shared provider properly
2021-09-11 00:35:54 -07:00
Tianlei Wu
289999af35
disable half2 kernel by dfault (#9034) 2021-09-10 20:09:21 -07:00
Tang, Cheng
8eb6546e8e
enable eager mode with ortmodule (#8961)
* initial change for eager/ortmodule integration

* pdate to latest pytorch api

* add test model;fix torch version issue

* fix comments in pr

* fix python test break

* fix api change

* fix comments in PR

* pass device into the fw function
2021-09-10 15:09:23 -07:00
Edward Chen
29d6573f3d
Increase timeouts for Mac CI builds. (#9024)
Increase timeouts for "orttraining-mac-ci-pipeline" and "iOS CI Pipeline" CI builds.
2021-09-10 12:57:08 -07:00
Chen Fu
b3c2725862
fix cpuinfo compilation flag usage (#9029)
Co-authored-by: Chen Fu <fuchen@microsoft.com>
Bug was introduced from PR #8716

When restricting cpuinfo to only known platforms, compilation flag change was not thorough, which accidentally turned off hybrid core detection for ARM systems.

This PR fixes this bug
2021-09-10 12:43:38 -07:00
satyajandhyala
ce7b12bf5d
Added new fp16 allow/safe opcodes in PropagateCastOps (#8964)
* Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps.

* Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops

* Added onnx models for squeeze/unsqueeze tests.
2021-09-10 11:53:26 -07:00
Bowen Bao
31af88c0bc
Update cross_entropy_loss symbolic for new argument from upstream torch (#9007)
In torch 1.10, `label_smoothing` is added as additional input to `cross_entropy_loss`. Update the symbolic function to handle this change.
2021-09-10 10:32:59 -07:00
Zuwei Zhao
ff66cfdfa6
Enable linking in exception throwing support library when build onnxruntime wasm. (#8973)
* Enable linking in exception throwing support library when build onnxruntime webassembly containing onnxruntime-extensions.

* Add flag in build.py to enable linking exceptions throwing library.

* Update onnxruntime-extensions document and bind custom_ops build flag with use_extensions.

* Update doc.

* Update cgmanifest.json.

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-09-10 22:09:16 +08:00
Tianlei Wu
e5ee0b435d
Attention Fusion for GPT-2 from Megatron (#8987)
(1) Attention Fusion for gpt-2 model from Megatron.
(2) Update symbolic shape inference of Attention to support 4D mask.
(3) Add an otpion in save_model_to_file to save external data in one file or not, and warning of existing external data
(4) Fix deprecation: logger.warn => logger.warning
(5) Add model loader to test model without external data
(6) Add an API of optimize_by_fusion, and topological sort after optimization.
2021-09-10 00:29:40 -07:00
Du Li
57b7ab56cd
Adding async fetching for webgl backend (#8951)
* Adding async fetching for webgl backend

* fix PR comments and CI failure.

* fixing a bug

* adding a flag
2021-09-09 22:17:42 -07:00
Yulong Wang
5145fa236f
[js/web] fix ort web e2e test (#9025) 2021-09-09 22:08:27 -07:00
Ryan Hill
2439ced3ec
API Documentation (#8948)
* Make help information compile properly
2021-09-09 22:04:51 -07:00
liqun Fu
6412c6a362
do not add pkg wheel entry to the index html file if it already exists (#9004)
* do not add pkg wheel entry to the index html file if it already exists
2021-09-09 16:20:19 -07:00
Gary Miguel
e357022362
Remove onnxruntime team from CODEOWNERS (#8954)
There are currently 98 members in the team. Requesting review from
all of them for every PR is too noisy.
2021-09-09 15:26:59 -07:00
Spike Curtis
00fbc3b0bc Instruct dockerfile users to do submodule updates
Signed-off-by: Spike Curtis <spike@lodestar.ai>
2021-09-09 11:17:21 -07:00
baijumeswani
d78e90d1af
Adding preprocessor checks for torch version during torch cpp extensions compilation (#8989) 2021-09-09 10:26:38 -07:00
Chi Lo
0367e1f1c2
Update Nuget Packge Pipline to CUDA11.4 and TensorRT8 on Windows (#9000)
* Update to CUDA11.4 and TensorRT-8.0.3.4

* update trt pool, remove cudnn from setup_env_gpu.bat

* revert pool

* test gpu package pipeline on t4

* back out changes

* back out changes

Co-authored-by: George Wu <jywu@microsoft.com>
2021-09-09 06:56:37 -07:00
pengwa
d209fe29b9
custom autograd func memory refinement (#8993)
* Release torch tensor referenced by torch gradient graph (created in PythonOp)

* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/torch_interop_utils/torch_interop_utils.cc

* refine with comments

Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
2021-09-09 18:37:24 +08:00
Pranav Sharma
d39959172f
Fix fuzz testing build blocking release. (#9008) 2021-09-09 00:44:40 -07:00
Guoyu Wang
1533f574e4
Add full Android job in package pipeline (#9009)
* Add full Android job in package pipeline

* Address CR comments
2021-09-08 21:12:59 -07:00
Hariharan Seshadri
c20cb766be
Optimize sequence type usage on CUDA [3/n] (#9002) 2021-09-08 16:01:38 -07:00
Yulong Wang
2e8792ca42
[js/web] fix karma launch with chrome headless (#8998) 2021-09-08 11:52:41 -07:00
Ashwini Khade
ec63d10303
add model local function support (#8540)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* draft - enable model local function

* enable model local functions in ORT

* update to latest rel onnx commit

* plus tests

* plus more updates

* plus updates

* test updates

* Fix for nested functions + shape inference

* plus bug fix and updates per review

* plus fixes per review

* plus test updates

* plus updates per review

* plus fixes

* fix a test
2021-09-08 11:47:01 -07:00
Vincent Wang
b7b42e0c5d
fast reduction for reducemean (#8976) 2021-09-08 10:28:57 -07:00
stevenlix
1c872f9d74
Fix issues in TensorRT EP (#8996)
* fix big engine load issue and add cuda_cpu_alloc

* remove redundancy

* fix minor issues
2021-09-08 10:28:16 -07:00
Olivia Jain
6fbd0a8233
Change cmake_cuda_architectures to double quotes (#8990) 2021-09-08 09:41:52 -07:00