Edward Chen
32366fea02
[Objective-C API] WIgnore clang documentation warnings from C/C++ header usage. ( #9057 )
2021-09-14 13:03:48 -07:00
Tianlei Wu
3ec3e9f705
Add t-test to compare experiments in GPT-2 mixed precision conversion ( #9042 )
...
* Add t-test to compare two experiments
* Ranking based on pair-wise T-test results and a custom scoring function
2021-09-14 12:40:25 -07:00
G. Ramalingam
7d28b596f4
Add function-body to opschema of FastGeluGrad ( #9028 )
...
* Add function body to FastGeluGrad
* Add test case
2021-09-14 12:27:55 -07:00
Suffian Khan
4322f7e647
Fix ROCm wheels CI pipeline break by installing latest protobuf from source ( #9047 )
...
* install protobuf from source
* fix rm command in Dockerfile
* fix options on rm command
* fix cd into protobuf source directory
* try again
* remove strip step
* debug list the files
* ls on /usr
* more debug
* more debug
* adjust LD_LIBRARY_PATH
* try remove protobuf before ORT build
2021-09-14 12:07:00 -07:00
Guoyu Wang
cf70635d2a
Add Android executable drop in the Package pipeline ( #9050 )
...
* add copy executable for android job
* minor fix
* Variable fix
* Move to use tgz because zip is not part of the docker image
* update compression
2021-09-14 11:45:33 -07:00
Yulong Wang
be80698698
[js/web] a bugfix and add tests for wasm proxy worker ( #9048 )
...
* [js/web] add tests for wasm proxy worker
* fix script src override
2021-09-14 10:38:58 -07:00
Edward Chen
e574be4a53
[C API Docs] Add docs for run options tag/log level accessors/modifiers. ( #9045 )
...
Add documentation for these C API functions:
RunOptionsGetRunLogSeverityLevel
RunOptionsGetRunLogVerbosityLevel
RunOptionsGetRunTag
RunOptionsSetRunLogSeverityLevel
RunOptionsSetRunLogVerbosityLevel
RunOptionsSetRunTag
Update some existing documentation.
2021-09-14 08:53:35 -07:00
mindest
6036a6b915
Add type int64 for Equal, float types for ReduceSum (ROCm) ( #9010 )
2021-09-14 00:07:30 -07:00
Sherlock
9174cbe3d5
Optimize CUDA Kernel for 3D and 4D Transpose ( #8928 )
...
* Optimize Transpose120 and Transpose102
* Generalize Transpose0123 for more input shapes
* Add Transpose3D test cases
* update rocm kernel
2021-09-13 23:00:53 -07:00
Tianlei Wu
5969d576e5
Revert "disable half2 kernel by dfault ( #9034 )" ( #9044 )
...
This reverts commit 289999af35 .
2021-09-13 17:25:25 -07:00
baijumeswani
34f37d2920
Disable fallback for ortmodule api tests ( #9018 )
2021-09-13 16:00:13 -07:00
Guoyu Wang
c709380c52
Add full iOS job in package pipeline ( #9036 )
...
* Add full ios xcframework job
* create zip file of the xcframework
2021-09-13 15:54:11 -07:00
baijumeswani
1422a9ba6b
Remove previous temporary fixes and address TODOs ( #9020 )
2021-09-13 10:10:07 -07:00
Edward Chen
011cb8fd48
Fix Where op type reduction processing ( #9033 )
...
* Update type reduction script to track Where Op's second input type.
* Clean up op_kernel_type_control.h includes.
* Use more maintainable include.
2021-09-13 08:37:58 -07:00
mindest
a1021a1cf4
Add BatchNorm kernel for ROCm ( #9014 )
...
* Add BatchNorm kernel for ROCm, update BN test
* correct epsilon_ setting; limit min epsilon
2021-09-13 15:15:05 +08:00
Rajalakshmi Srinivasaraghavan
e83cc534d4
Fix cmake POWER10 detection
...
Recent commit 60c98a8 changed variable mlas_common_srcs which affects
POWER10 detection.
2021-09-12 11:56:55 -07:00
Hariharan Seshadri
c674343d94
Remove document text from error message in a couple of ops ( #9003 )
2021-09-11 08:37:52 -07:00
Ryan Hill
c3321b1778
Fix NVTX profiling so it can run in the shared CUDA provider ( #9035 )
...
* Move NVTX profiling so it can run in the shared provider properly
2021-09-11 00:35:54 -07:00
Tianlei Wu
289999af35
disable half2 kernel by dfault ( #9034 )
2021-09-10 20:09:21 -07:00
Tang, Cheng
8eb6546e8e
enable eager mode with ortmodule ( #8961 )
...
* initial change for eager/ortmodule integration
* pdate to latest pytorch api
* add test model;fix torch version issue
* fix comments in pr
* fix python test break
* fix api change
* fix comments in PR
* pass device into the fw function
2021-09-10 15:09:23 -07:00
Edward Chen
29d6573f3d
Increase timeouts for Mac CI builds. ( #9024 )
...
Increase timeouts for "orttraining-mac-ci-pipeline" and "iOS CI Pipeline" CI builds.
2021-09-10 12:57:08 -07:00
Chen Fu
b3c2725862
fix cpuinfo compilation flag usage ( #9029 )
...
Co-authored-by: Chen Fu <fuchen@microsoft.com>
Bug was introduced from PR #8716
When restricting cpuinfo to only known platforms, compilation flag change was not thorough, which accidentally turned off hybrid core detection for ARM systems.
This PR fixes this bug
2021-09-10 12:43:38 -07:00
satyajandhyala
ce7b12bf5d
Added new fp16 allow/safe opcodes in PropagateCastOps ( #8964 )
...
* Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps.
* Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops
* Added onnx models for squeeze/unsqueeze tests.
2021-09-10 11:53:26 -07:00
Bowen Bao
31af88c0bc
Update cross_entropy_loss symbolic for new argument from upstream torch ( #9007 )
...
In torch 1.10, `label_smoothing` is added as additional input to `cross_entropy_loss`. Update the symbolic function to handle this change.
2021-09-10 10:32:59 -07:00
Zuwei Zhao
ff66cfdfa6
Enable linking in exception throwing support library when build onnxruntime wasm. ( #8973 )
...
* Enable linking in exception throwing support library when build onnxruntime webassembly containing onnxruntime-extensions.
* Add flag in build.py to enable linking exceptions throwing library.
* Update onnxruntime-extensions document and bind custom_ops build flag with use_extensions.
* Update doc.
* Update cgmanifest.json.
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-09-10 22:09:16 +08:00
Tianlei Wu
e5ee0b435d
Attention Fusion for GPT-2 from Megatron ( #8987 )
...
(1) Attention Fusion for gpt-2 model from Megatron.
(2) Update symbolic shape inference of Attention to support 4D mask.
(3) Add an otpion in save_model_to_file to save external data in one file or not, and warning of existing external data
(4) Fix deprecation: logger.warn => logger.warning
(5) Add model loader to test model without external data
(6) Add an API of optimize_by_fusion, and topological sort after optimization.
2021-09-10 00:29:40 -07:00
Du Li
57b7ab56cd
Adding async fetching for webgl backend ( #8951 )
...
* Adding async fetching for webgl backend
* fix PR comments and CI failure.
* fixing a bug
* adding a flag
2021-09-09 22:17:42 -07:00
Yulong Wang
5145fa236f
[js/web] fix ort web e2e test ( #9025 )
2021-09-09 22:08:27 -07:00
Ryan Hill
2439ced3ec
API Documentation ( #8948 )
...
* Make help information compile properly
2021-09-09 22:04:51 -07:00
liqun Fu
6412c6a362
do not add pkg wheel entry to the index html file if it already exists ( #9004 )
...
* do not add pkg wheel entry to the index html file if it already exists
2021-09-09 16:20:19 -07:00
Gary Miguel
e357022362
Remove onnxruntime team from CODEOWNERS ( #8954 )
...
There are currently 98 members in the team. Requesting review from
all of them for every PR is too noisy.
2021-09-09 15:26:59 -07:00
Spike Curtis
00fbc3b0bc
Instruct dockerfile users to do submodule updates
...
Signed-off-by: Spike Curtis <spike@lodestar.ai>
2021-09-09 11:17:21 -07:00
baijumeswani
d78e90d1af
Adding preprocessor checks for torch version during torch cpp extensions compilation ( #8989 )
2021-09-09 10:26:38 -07:00
Chi Lo
0367e1f1c2
Update Nuget Packge Pipline to CUDA11.4 and TensorRT8 on Windows ( #9000 )
...
* Update to CUDA11.4 and TensorRT-8.0.3.4
* update trt pool, remove cudnn from setup_env_gpu.bat
* revert pool
* test gpu package pipeline on t4
* back out changes
* back out changes
Co-authored-by: George Wu <jywu@microsoft.com>
2021-09-09 06:56:37 -07:00
pengwa
d209fe29b9
custom autograd func memory refinement ( #8993 )
...
* Release torch tensor referenced by torch gradient graph (created in PythonOp)
* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/torch_interop_utils/torch_interop_utils.cc
* refine with comments
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
2021-09-09 18:37:24 +08:00
Pranav Sharma
d39959172f
Fix fuzz testing build blocking release. ( #9008 )
2021-09-09 00:44:40 -07:00
Guoyu Wang
1533f574e4
Add full Android job in package pipeline ( #9009 )
...
* Add full Android job in package pipeline
* Address CR comments
2021-09-08 21:12:59 -07:00
Hariharan Seshadri
c20cb766be
Optimize sequence type usage on CUDA [3/n] ( #9002 )
2021-09-08 16:01:38 -07:00
Yulong Wang
2e8792ca42
[js/web] fix karma launch with chrome headless ( #8998 )
2021-09-08 11:52:41 -07:00
Ashwini Khade
ec63d10303
add model local function support ( #8540 )
...
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* draft - enable model local function
* enable model local functions in ORT
* update to latest rel onnx commit
* plus tests
* plus more updates
* plus updates
* test updates
* Fix for nested functions + shape inference
* plus bug fix and updates per review
* plus fixes per review
* plus test updates
* plus updates per review
* plus fixes
* fix a test
2021-09-08 11:47:01 -07:00
Vincent Wang
b7b42e0c5d
fast reduction for reducemean ( #8976 )
2021-09-08 10:28:57 -07:00
stevenlix
1c872f9d74
Fix issues in TensorRT EP ( #8996 )
...
* fix big engine load issue and add cuda_cpu_alloc
* remove redundancy
* fix minor issues
2021-09-08 10:28:16 -07:00
Olivia Jain
6fbd0a8233
Change cmake_cuda_architectures to double quotes ( #8990 )
2021-09-08 09:41:52 -07:00
Chi Lo
5ae4c54ab8
Fix bug for validating GPU packages ( #8997 )
2021-09-08 02:06:53 -07:00
George Wu
a30d9f5317
fix windows gpu pipelines that use cuda 10.2 (training, reduced_ops and 10.2 validation) ( #8994 )
...
* build for arch 52
* arch 52
* gpu arch 52
2021-09-07 22:01:06 -07:00
Sunghoon
450524359e
[js/web] WebAssembly profiling ( #8932 )
...
* add p50 in test
* Preallocate WebAssembly worker threads to minimize worker creation time
* WebAssembly profiling
* merge master
* merge with proxy changes
* disable profiling tests from WebAssembly build
* fix e2e test failure
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2021-09-07 17:18:08 -07:00
ytaous
0193490cbf
ReduceMin - add int64 cuda kernel support for opset12/13 ( #8966 )
...
* ReduceMin - int64 support
* fix doc
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-07 17:01:26 -07:00
Changming Sun
91c15843cd
Fix a directml python packaging error ( #8981 )
2021-09-07 16:29:33 -07:00
Ye Wang
e2194797a7
bumping up to version 1.9 ( #8982 )
...
* bump up version
* makes the windowAI column align with ORT version
* update the hardcoded version string
* fix a typo
2021-09-07 14:30:55 -07:00
George Wu
00eca42413
make_policy(SET CMP0104 OLD) ( #8793 )
2021-09-07 13:12:50 -07:00