onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
KeDengMS	b15e43a541	[NupharEP] Multiple optimizations (#2380 ) Fuse transpose into MatMul Implement Pow and constant scalar simplification Vectorize ReduceMean Improve symbolic shape inference Minor updates for better debugging in fused function name	2019-11-14 10:40:33 -08:00
Pranav Sharma	7e164eaa6a	Fix reuse logic in allocation planner. (#2393 ) * Fix reuse logic in allocation planner. * PR comments * Add helpful comments * Don't allow reuse across string tensors.	2019-11-13 22:51:12 -08:00
Ilya Lavrenov	b90d55b7ea	Fixed compilation with ngraph (#2388 )	2019-11-13 17:49:00 -08:00
nihui	dde410e073	fix BUILD.md typo (#2375 ) build.py: error: argument --config: invalid choice: 'RelWithDebugInfo' (choose from 'Debug', 'MinSizeRel', 'Release', 'RelWithDebInfo')	2019-11-13 17:48:08 -08:00
KeDengMS	51571030ef	Another try to stabilize CUDA CI (#2383 ) The root cause seems to be failure in CUDA dealloc when tear down. cudaFree return code was ignored before, so should the debug check.	2019-11-13 15:58:15 -08:00
liuziyue	ffa2812587	Skip layer norm transform (#2350 ) * skip layer normalization transformer	2019-11-13 13:46:09 -08:00
Yufeng Li	8ed2928dd5	Fuse Add + Gelu (#2360 ) Implement the transformer to fuse add + gelu Implement the accurate kernel	2019-11-13 09:26:00 -08:00
liuziyue	4b72fedbd5	Layer Norm Fusion Fix (#2379 ) * layer norm fusion fix * Add input shape check in code and unit tests	2019-11-12 17:19:51 -08:00
Scott McKay	8c733c8d82	Add opset 11 version of Split to CUDA ops (#2376 ) Organize the CUDA ops definitions so all the opset 10 and 11 parts are together (same setup used for CPU ops)	2019-11-13 07:40:13 +10:00
Scott McKay	c0d23d5ffe	Fix bug with Slice. Need to pass in flattened input dimensions so the initial offset into the input is calculated correctly. (#2372 )	2019-11-13 07:00:26 +10:00
KeDengMS	9e26f4de6f	Extend OneHot CPU kernel to support more types (#2311 ) * Extend OneHot CPU kernel to support input int64_t, depth int32_t, output float * Skip BERT before the test data fix is picked up	2019-11-12 11:54:06 -08:00
Ashwini Khade	437772d5bc	update output size calculation for resize (#2366 ) * change how output size is calculated for resize op * add tests for ver 10 resize	2019-11-12 10:06:17 -08:00
KeDengMS	192dcfaa8e	Fix a bug in TLS refcount that may destabilized CUDA CI (#2374 )	2019-11-12 00:48:31 -08:00
Yang Chen	41b9f01e4c	test bidaf with nuphar for avx target (#2370 ) increase nuphar test coverage a bit	2019-11-12 00:47:13 -08:00
Changming Sun	fc6773a65b	Add Tracelogging for profiling (#1639 ) Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON	2019-11-11 21:34:10 -08:00
George Wu	0c6e9f94d0	fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS (#2369 ) * fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS * update	2019-11-11 15:26:18 -08:00
Scott McKay	53ed36a3da	Add helper to create output to minimize binary size. (#2365 ) Add ConstEigenTensorMap typedef so we don't unnecessarily const_cast the const input Tensor.	2019-11-12 09:08:04 +10:00
Zhang Lei	aa37e2de8f	Direct use python numpy array's memory if already contiguous. (#2355 ) * Direct use python numpy array's memory if already contiguous. This could greatly improve performance for session with large input, like big image 1920x1080 fastrcnn, 30~40% speed up could be achieved. * Add test case enforce contiguous/non-contiguos numpy array as inputs.	2019-11-11 13:46:55 -08:00
Zhang Lei	ed6da0d191	Implement cuda nonzero op. (#2056 ) Implement cuda nonzero op.	2019-11-11 13:45:52 -08:00
avidiyal	3d3cf0e159	Openvino EP R3.1 onnxrt server (#2357 ) * onnxrt server with OVEP * onnxrt server with OVEP * Update Dockerfile.server.openvino * onnxrt server OVEP fix reviews * onnxrt server OVEP fix reviews	2019-11-11 12:22:19 -08:00
Scott McKay	599d72a94f	Fix/test dim value of 0 handling in a couple of places (#2337 ) * Update the CUDA Where implementation broadcasting logic to handle a dim with value of 0. Add unit test Also add unit test for unary op with dim value of 0 * Exclude ngraph from Where test with 0 dim.	2019-11-11 07:57:19 +10:00
Dmitri Smirnov	25b3c51661	Introduce PrimitiveType into a Type System along with an integer constant (#2307 ) Improve perf by avoiding GetType<T>() calls. Introduce MLTypeCallDispatcher to switch on Input Type. Add Tensor IsType<T>() fast method.	2019-11-08 17:47:06 -08:00
jignparm	fa30b1e758	Set ElementType to String type of node metadata, instead of byte[] (#2348 ) * Set ElementType to String type of node metadata, instead of byte[] * Fix spacing	2019-11-08 14:52:56 -08:00
Zhang Lei	7fcd752393	Cuda Reverse Sequence Op, maping types of same size using same template function. (#2281 )	2019-11-08 13:52:26 -08:00
Changming Sun	080a0a3186	Nuget pipeline changes (#2305 ) 1. refactor the pipeline, remove some duplicated code 2. Move Windows_py_GPU_Wheels job to Win-GPU-CUDA10. We'll deprecated the "Win-GPU" pool 3. Delete cpu-nocontribops-esrp-pipeline.yml and cpu-nocontribops-pipeline.yml 4. In Linux nuget jobs, run "make install" before creating the package. So that extra RPAH info will be removed	2019-11-08 09:45:52 -08:00
Scott McKay	5a3ea7469a	Remove unused initializer from GraphProto as well as name_to_initial_tensor_ in CleanUnusedInitializers. (#2320 ) * Remove unused initializer from GraphProto as well as name_to_initial_tensor_ in CleanupUnusedInitializers. This means initializers that have been replaced during graph optimizations are not left in the GraphProto when we save an optimized model. * Handle edge case where a model has an unused initializer with matching graph input by also removing the graph input. * Use non-const iterators in std::find_if calls to make centos build happy.	2019-11-08 16:29:50 +10:00
Yulong Wang	da3c0ba14b	implement CPU contrib OP Attention (#2333 )	2019-11-07 17:14:59 -08:00
Tianlei Wu	b539cc74c7	Add FastGelu Cuda Op for Gelu and Add bias fusion (#2293 ) * Add FastGelu cuda op * Add AddBiasGelu for experiment * Revert "Add AddBiasGelu for experiment" This reverts commit 5c1ee019858c657e6bb75887265cb85675626e5b. * Add bias * Add unit tests * update comment * update script * fix build error * update coding style * update for CR feedback Enable half2 optimization only when cuda arch >= 7.0 * move _Tanh to common.cuh	2019-11-07 17:05:55 -08:00
liuziyue	259bff8cf1	Layer Normalization Fusion (#2319 ) basic layer normalization transform	2019-11-07 12:00:08 -08:00
Hariharan Seshadri	553537ed52	Add CUDA GatherElements kernel (#2310 ) * Updates * Update test * Update * Updates * nits * PR feedback * Update * Update * PR feedback * PR comments * Update * Fix build * Fix build * Nits * Fix	2019-11-07 10:54:20 -08:00
Yufeng Li	6651d2f662	Make elementwise op run 4 items per thread (#2335 ) Description: Describe your changes. Make elementwise op run 4 items per thread unroll for loop to leverage ILP remove unnessary N==0 check inside elementwise GPU kernel Motivation and Context Why is this change required? What problem does it solve? It can improve the performance of GPU elementwise ops. ~2% performance gain on popular NLP bert model. If it fixes an open issue, please link to the issue here.	2019-11-06 17:15:25 -08:00
George Wu	ba0e7daf20	update dockerfiles/README (#2336 )	2019-11-06 16:54:10 -08:00
baowenlei	0f1e24f4a9	[NupharEP] tensorize int8 GEMM for avx (#2142 ) * finish avx tensorization and save state * split tests for better debug * add missing avx option * update configure for AVX * update tensorize avx support * Merged PR 5327: Fix llvm cross compilation Fix llvm cross compilation Related work items: #4080	2019-11-06 14:35:13 -08:00
KeDengMS	58e6aaa414	Fix crash in releasing TLS from CUDA EP dtor (#2329 ) thread_local/global/static destruction order depends on implementation details of compilers and OS. The bug happens when thread_local is already out of scope while static EP being destructed, thus causing access violation in EP's destructor when accessing thread_local. The fix is to maintain ownership inside EP with a mapping from tid to ThreadLocalContext, to avoid accessing thread_local in EP's destructor. This way, no matter what the destruction order is, no access violation would be triggered.	2019-11-06 13:00:17 -08:00
Yulong Wang	c0b8926863	implement CPU contrib OP EmbedLayerNormalization (#2332 )	2019-11-06 12:27:08 -08:00
George Wu	06a6d74a67	update ngraph dockerfile. add python lib location to LD_LIBRARY_PATH for cuda/tensorrt Dockerfiles. (#2330 )	2019-11-06 11:29:55 -08:00
Vinitra Swamy	ace19129b9	MCR Docker Images v1.0.0 refresh (#2302 ) * update dockerfile table with new MCR tags * add new openvino dockerfiles to table	2019-11-05 22:06:47 -08:00
Patrick Foley	151075790d	[OpenVINO-EP] Update to latest version: OpenVINO 2019 R3.1 (#2308 ) * Updates OpenVINO EP to latest version: 2019 R3.1 * Reviews fixed * Update Dockerfile.openvino * Addressed PR comments and disabled model tests temporarily * Update Dockerfile.ubuntu_openvino	2019-11-05 19:55:46 -08:00
Dwayne Robinson	db454beacf	TensorDesc::Placement test failure - cherry pick Vibranium fix. (#2328 )	2019-11-05 18:18:31 -08:00
Scott McKay	67ec626d88	Copy blocks in Slice when possible (#2312 ) * Add logic to try and flatten inner dimensions being copied by Slice and do a block copy if they can be. Do a block copy for just the inner most dimension where possible (applies even if we don't flatten inner dimensions).	2019-11-06 10:53:30 +10:00
Changming Sun	104f3b2a59	Exclude candy from CUDA tests	2019-11-05 15:22:09 -08:00
Changming Sun	143ae98a37	Fix a bug in onnxruntime_pybind_state.cc when TENSORRT is enabled (#2326 )	2019-11-05 15:04:50 -08:00
George	8a102c6e99	apply eigen patch only for ACL.	2019-11-05 13:53:53 -08:00
Changming Sun	5ce4d4fc49	Fix a test failure when it runs on FreeBSD	2019-11-04 23:47:37 -08:00
Yufeng Li	035913d42f	Support int32_t for Reduction (#2317 )	2019-11-04 20:52:01 -08:00
manashgoswami	d5c36bfff2	Updated links in docs (#2303 ) * Update README.md * Update README.md * Update README.md	2019-11-03 09:10:56 -08:00
Faith Xu	556bae17a5	Fix versions table (#2309 ) * Update table values * Fix onnxml opset version	2019-11-03 08:58:21 -08:00
Yulong Wang	cba93f7c8d	fix Gelu CPU: remove MayInplace() declaration (#2306 )	2019-11-01 18:10:05 -07:00
Yulong Wang	204a6872d3	remove unused param 'input_count' in ConcatImpl (#2304 )	2019-11-01 15:50:11 -07:00
Tianlei Wu	a6b2c9fc09	Fix mask in EmbedLayerNormalization (#2300 )	2019-11-01 13:49:55 -07:00

1 2 3 4 5 ...

1583 commits