Commit graph

2718 commits

Author SHA1 Message Date
Tracy Sharpe
5d773ee57b
MLAS: add sgemv path for aarch64 builds (#4254)
Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.
2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen
5da849b414
Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230)
The CMake module is FindProtobuf.cmake [1]. Thus the name should be
capitalized so that protobuf can be found on case-sensitive file
systems.

[1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake
2020-06-17 17:34:47 -07:00
Changming Sun
43deec2174
Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266) 2020-06-17 16:25:24 -07:00
Vincent Wang
b41fcf1570
Bugfix for shape inference and GetShape. (#4243)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-06-17 15:11:02 +08:00
Yulong Wang
12367a6b11
[C#] enable string-typed FixedBufferOnnxValue in input (#4178) 2020-06-16 11:06:11 -07:00
Wei-Sheng Chin
189fb60ef9
Fix a bug and add code to profile memory (#4241)
* Fix a bug and add code to profile memory

1. Compile Send/Recv again (currently broken because of
   HOROVOD refactor).
2. Add code to print out initializer allocation size and
   activation memory size.

* Address comments

* Split memory counts per locations

* Fix a metric
2020-06-16 10:17:27 -07:00
edgchen1
63bf587623
Use azcopy to download test data (#4221)
Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy.
Update download_test_data.py to use helper function.
2020-06-16 10:14:34 -07:00
Tianlei Wu
61fa5476d5
Update PyTorch Bert notebooks (#4239)
update PyTorch Bert SquAD notebooks to use onnxruntim-tools and update usage of intra_op_num_threads.
rename python files according to coding style
Fix change_input_to_int32.
update keras notebook to copy script from rel-1.3.0 branch (Will update them later)
2020-06-16 09:36:51 -07:00
Weixing Zhang
7ccce4379e
Improve fast_divmod (#4224)
* improve fast_divmod

BERT-L throughput is improved about ~1.8%

* fix Win build.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-16 03:03:58 -07:00
Changming Sun
825392c25b
Fix ORT server CI build (#4165) 2020-06-15 21:26:19 -07:00
ytaous
5d28efd434
opset12 code cleanup (#4242)
* opset12 code cleanup

* opset12 code cleanup

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 19:45:35 -07:00
ytaous
e0334f177c
Opset12 upgrade for existing models used by perf/e2e pipelines (#4238)
* opset12 support

* opset12 support

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 14:26:53 -07:00
Ashwini Khade
4486c66ed4
enable conv transpose 3D (#4218)
* enable convtranspose 3D

* test fix
2020-06-15 13:38:32 -07:00
Bowen Bao
b08771f00e
Add ONNX Training Post-Passes to Front-End - Cont (#4041)
* Add ONNX postpasses

* add flag + add bert test from onnx file

* address PR comments

* fix typo

* fix rebase

* address comments

* Fix test failures

* add new pass for expand for new pt version, add comments

* fix rebase

Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-15 10:33:26 -07:00
Cecilia Liu
0b5bbb16b8
Benchmark With IO Binding (#4206)
* add io binding to benchmark.py
2020-06-15 10:06:33 -07:00
Weixing Zhang
b4b1c6440a
Enable ORT with CUDA 11 toolkit (#4168)
* ORT on CUDA 11

1. Seperate HOROVOD and MPI
2. Seperate NCCL from HOROVOD in CMakeLists.txt
2. Remove dependency on external cub
3. cudnnSetRNNDescriptor is changed in cuDNN 8.0

* polish the code about MPI/NCCL in CMakeLists.txt and build.py

* check CUDA version

* ${MPI_INCLUDE_DIRS} should be PUBLIC

* sm30, sm50 are deprecated in CUDA 11 Toolkit

* update change based on code review feedback.

* add sm_52

* improve MPI/NCCL build path

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-15 08:47:03 -07:00
Emad El-Haraty
88a9cceb41
fix relative links in CONTRIBUTING.md (#4212)
* fix a links to Engineering Design and API in CONTRIBUTING.md

* fix additional links in CONTRIBUTING.md

* correct the link to the public API in CONTRIBUTING.md

Co-authored-by: Emad El-Haraty <emad.elharaty@limebike.com>
2020-06-15 06:48:09 -07:00
Guoliang Hua
d0d31efd86
fix transformer doc format (#4003)
fix transformer doc format
2020-06-15 01:30:47 -07:00
Wei-Sheng Chin
ecc901717e
Use subset to release gradient tensors earlier (#4222) 2020-06-14 22:52:54 -07:00
Andrews548
886befaba1
Add BatchNorm and Concat to ACL EP (#4190)
* Fix acl padding

* Add BatchNormalization operator to ACL Execution Provider

* Add Concat operator to ACL Execution Provider

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-06-14 21:48:22 -07:00
Hariharan Seshadri
877862184e
Fix subgraph based reshape fusion (#4185) 2020-06-14 21:10:08 -07:00
Tracy Sharpe
bf3c32166d
fix optional input/outputs (#4229) 2020-06-15 08:10:22 +10:00
Hariharan Seshadri
5708c4feaf
Handle corner case in Resize op (#4183)
* Handle corner case in Resize op

* Nit

* Fix build

* PR feedback
2020-06-13 18:05:25 -07:00
Tracy Sharpe
7a96cfc8f5
operator code cleanup (#4228)
Search/replace of the pattern "const auto foo = tensor.Shape()" to "const auto& foo = tensor.Shape()" to avoid unneeded copies at runtime and reduce code size (8KB drop for onnxruntime.dll). Remove some unnecessary header includes.
2020-06-13 14:47:44 -07:00
jornt-xilinx
c55f6d76be
[Vitis-AI EP] Fix to enable multi-output subgraphs inside Vitis-AI EP + edit docs (#4171) 2020-06-13 04:56:07 -07:00
Wei-Sheng Chin
de9da123cf
Enable static memory planning for pipeline. (#4204)
* Enable static memory planning for pipeline.
1. We fix a bug when resolving symbolic shape for scalars.
2. We pass the original inputs to all pipeline stages so that
   the symbolic shapes can be resolved.

* Further Improvements
1. Address comments.
2. Further reduce activation size by ~50% when pipeline is on.
   This is done by removing all but one gradient tensor from the last
   RecordEvent in the backward pass.

* Address a comment

* Fix Windows build
2020-06-12 21:43:50 -07:00
Hariharan Seshadri
b377266eb3
Fix Mac build linker warnings (#4155) 2020-06-12 21:10:12 -07:00
Hariharan Seshadri
91a41298cc
Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954) 2020-06-12 19:32:57 -07:00
Tracy Sharpe
155e22d1ab
MLAS: fuse float output into quantized GEMM (#4215)
Add more variants of MlasGemm that do a u8x8 GEMM with the output type as float. This fuses the common sequence of MatMulInteger + Cast + Mul(OutputScale) + optional Add(BiasVector).
2020-06-12 17:50:40 -07:00
Tiago Koji Castro Shibata
2e3607c7cd
Remove hardcoded desktop lib (#4193) 2020-06-12 16:51:54 -07:00
Edward Chen
f74861841e Fix dangling pointer to local string variable in onnxruntime_pybind_state.cc. 2020-06-12 14:28:39 -07:00
Edward Chen
6b4f652017 Clean up status checks in gradient_graph_builder_test.cc. 2020-06-12 14:28:39 -07:00
Edward Chen
7096e6f5ef Reduce severity of GraphAugmenter logging statement. 2020-06-12 14:28:39 -07:00
Changming Sun
6f4320fb85
Fix the python package name issue (#4207)
Fix the package package name issue. In my last change(#4197) about enabling code sign. I forgot to pass the additional flags to setup.py,
2020-06-12 08:32:59 -07:00
Yufeng Li
87d68d8531
matmul integer fusion (#4195)
* Introduce DynamicQuantizeMatMul
It fuses DynamicQuantizeLinear, MatMul and following cast, multiplier. It gets float in and float out for quantized matmul. We have a MLAS kernel in implementation for this op.
2020-06-11 21:42:09 -07:00
Tianlei Wu
2605faef88
Add past state support in Attention Op for GPT-2 (#4107)
Update Attention op to allow past state input and output.
Add fusion script and tests
2020-06-11 14:19:55 -07:00
pengwa
e6ccb1ac28
GatherNDGrad for CPU (#4123)
* GatherNDGrad on CPU

* Remove __CUDA_ARCH__ check in .cc files
2020-06-12 02:43:49 +08:00
Xueyun Zhu
65a682354b
enable pipeline to run with mixed precision (#4113)
* enable pipeline to run with mixed precision

* address feedback

* address feedback

* test log

* pipe infomation if test fails

* ci failure
2020-06-10 22:16:24 -07:00
Changming Sun
8f8d899bf2 Enable code sign in c api pipeline and python pipeline 2020-06-10 19:31:22 -07:00
Yulong Wang
73bc6be5d1
build: split nodejs binding build and test to avoid timeout issue (#4188)
* split nodejs binding build and test

* enable nodejs tests
2020-06-10 19:16:32 -07:00
Matthew Hill
117b2e7743
Fix GPU memory leak on TensorRT (#4172) 2020-06-10 16:56:51 -07:00
Dmitri Smirnov
af0750ba1b
Java GPu artifact naming (#4179)
Modify gradle build so artifactID has _gpu for GPU builds.
  Pass USE_CUDA flag on CUDA build
  Adjust publishing pipelines to extract POM from a correct path.

Co-Authored-By: @Craigacp
2020-06-10 11:15:48 -07:00
George Wu
e8ed14bcb3 disable MEMLEAK CHECKER for openvino 2020-06-10 11:12:17 -07:00
stevenlix
c296884fc3
bump up ORT version to 1.3.1 (#4181) 2020-06-10 08:44:03 -07:00
Changming Sun
c0bdbc0b39
Enable telemetry for the C API and python pipeline (#4174) 2020-06-10 00:07:46 -07:00
Tracy Sharpe
35d9f396c4
MLAS: refactor quantized GEMM loops (#4182) 2020-06-09 23:28:55 -07:00
George Wu
9d65ce53bc
move back to toolset 14.16 to possibly work around nvcc bug (#4180) 2020-06-09 19:36:30 -07:00
Changming Sun
a7366d82af
Disable nuphar large model test (#4173)
Disable nuphar large model test, because it takes too long(40+ minutes), while the default cpu provider takes about 5 minutes. After this change, we still keep a lot of other nuphar model tests, I think that should be enough.
2020-06-09 17:45:17 -07:00
Ashwini Khade
9eba9fba7c
Fix for BiasGelu fusion optimizer (#4160)
* Fix for BiasGelu fusion optimizer

* changes per review comments
2020-06-09 14:33:34 -07:00
Yulong Wang
2b3ce1b090
add script to support update nodejs binding version (#4164) 2020-06-09 13:12:55 -07:00