onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Zhang Lei	f77ff1bc3d	Quantization support for split operator with its NHWC support (#6107 ) * Make split working for quantization. * NHWC transformer support for split operator * Refactor some according to Feedback. Will add test cases soon. * Fix build error on windows. * Add test case for split op on uint8_t support * Add nhwc_transformer_test for split uint8_t support * Some change according to PR feedbacks.	2021-01-13 10:05:34 -08:00
Dmitri Smirnov	6b73bae035	Java: add Semmle to Java publishing pipelines (#6326 ) Add Semmle to Java API pipeline Add security results publishing and add Java GPU.	2021-01-12 15:12:13 -08:00
Tim Harris	aacc8dbfa3	Remove false positive prefast warning from threadpool (#6324 )	2021-01-12 14:47:52 -08:00
Ashwini Khade	0ed56d491a	fix opset imports for function body (#6287 ) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix	2021-01-12 13:44:36 -08:00
Tim Harris	b491d7c179	Avoid false sharing on thread pool data structures (#6298 ) Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops. Motivation and Context MobileNet on a 212-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread). The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 214-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine. Before change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232 After change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317	2021-01-12 19:58:41 +00:00
Tianlei Wu	ec81e29c84	Add longformer to python package (#6314 ) * add longformer to python package * move test related script and data to a new folder	2021-01-12 10:38:39 -08:00
Zhang Lei	a8257666bd	Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295 ) * Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.	2021-01-12 09:53:13 -08:00
Luyao Ren	3b3e698674	Remove abs in LpPool (#6303 )	2021-01-12 01:39:13 -08:00
Tianlei Wu	a038924bee	update transformers required package versions (#6315 )	2021-01-12 00:10:56 -08:00
Changming Sun	c43ca45c4f	Force reinstall onnx python package on Windows (#6309 )	2021-01-11 22:12:56 -08:00
Vincent Wang	ac5b5e5d1e	more dtype for Equal CUDA kernel (#6288 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-01-12 10:46:21 +08:00
Tianlei Wu	938e65d878	add --sequence_lengths option (#6285 )	2021-01-11 14:26:22 -08:00
Chun-Wei Chen	84024bdfa9	Enable ONNX backend test of SequenceProto input/output (#6043 ) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name	2021-01-11 11:30:33 -08:00
Changming Sun	5084ce0969	Update nuget build (#6297 ) 1. Update the ProtoSrc path. The old one is not used anymore. 2. Regenerate OnnxMl.cs 3. Delete some unused code in tools/ci_build/build.py 4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build. 5. Fix a typo in the C API pipeline.	2021-01-11 10:49:05 -08:00
Jesse Benson	fa851bff66	Add workaround to remove ROCm-specific binary-elementwise files.	2021-01-11 10:00:18 -08:00
Jesse Benson	1059bfaf75	Workaround for static_cast<double>(half)	2021-01-11 10:00:18 -08:00
Ye Wang	da952a9a20	A list of changes in transformers tool (#6224 ) * longformer fp16 e2e * add fp16/fp32 parity check helper file * excludes nodes with subgraph in profiling * use onnxconverter_common to do fp32->fp16 * add version check for onnxconverter_common * remove helper file * add pkg installation on notebooks and script	2021-01-08 11:11:14 -08:00
Tianlei Wu	ac5ca2bbe0	fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284 ) fix io binding crash for past_sequence_length=0	2021-01-07 23:43:50 -08:00
Hariharan Seshadri	7fc827a8a1	Fix Min/Max CPU kernels for float16 type (#6205 )	2021-01-07 23:32:52 -08:00
Ye Wang	a72fcbd5fc	Add helper to compare model with different precision (#6270 ) * add parity_check_helper.py * add real example * remove lines	2021-01-07 16:54:56 -08:00
Edward Chen	04287ec770	Increase timeout for Linux GPU CUDA11 build. (#6280 )	2021-01-07 15:44:42 -08:00
Edward Chen	c10948699b	Rename MakeString and ParseString functions. (#6272 ) Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, ParseString to ParseStringWithClassicLocale. Add missing pass-through versions of MakeStringWithClassicLocale for string types.	2021-01-07 15:43:42 -08:00
Tianlei Wu	b80e8ce6a5	rename past to past_key_values for GPT-2 (#6269 ) rename past to past_key_values for transformers 4.*	2021-01-07 11:12:04 -08:00
Xavier Dupré	481a2cdf61	Add script to preprocess python documentation before publishing (#6129 ) * add script to preprocessing python documentation before publishing	2021-01-07 19:23:59 +01:00
Edward Chen	d761571afc	Deprecate Python global configuration functions [Part 2] (#6171 ) Update Python API to allow more flexibility for setting providers and provider options. The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict). Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order. Convert some usages of the deprecated global configuration functions to use EP-specific options instead. Update some EP-specific option parsing to fail on unknown options. Other clean up.	2021-01-07 10:10:55 -08:00
Hariharan Seshadri	bbc9ed908a	Fix VS 2017 build break (#6276 )	2021-01-07 02:09:35 -08:00
Tang, Cheng	431604ef89	add bfloat16 to gathergrad type constrains (#6267 ) Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-01-06 15:04:14 -08:00
Hariharan Seshadri	2347de4a9e	Fix Linux/Mac error message on input type mismatch (#6256 )	2021-01-05 22:21:24 -08:00
Hariharan Seshadri	d42399e1b0	Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248 )	2021-01-05 22:18:03 -08:00
pengwa	eea3806db1	model parallel refinement (#6244 ) * Megatron Transformation as a seperate step * remove useless header * clang formating * Re-Structure megatron transformer for subsquent changes * fix comments	2021-01-06 10:30:22 +08:00
liqunfu	addb4b8c2b	Liqun/speech model loop to scan (#6070 ) Provide a tool to convert Loop to Scan for Nuphar performance Fix Nuphar CI pipeline failures. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-05 15:15:23 -08:00
Edward Chen	ce6161cf67	Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252 ) * Add MakeStringLite which uses current locale, update macros to use that to generate messages. * Convert calls to MakeStringLite().	2021-01-04 19:27:24 -08:00
ashbhandare	493bf931c5	Add the Concat Slice Elimination transform, fix constant_folding transform (#5457 ) * Add concat slice transform + test * Cosmetic improvements in concat slice transform * Remove unrelated file, fix comment, fix constant folding bug * Add test onnx graph * fix windows build * Review comments * review comment Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-04 16:18:33 -08:00
Changming Sun	6fd9d34bb0	Remove a debug log in provider_test_utils.cc (#6200 )	2021-01-04 13:58:11 -08:00
baijumeswani	93bf7c4d52	Documentation for distributed CI tests pipeline (#6140 )	2021-01-04 10:09:39 -08:00
Olivia Jain	c8de3f355a	Refactor EP Perf Tool (#6202 ) * merge master, keep postprocess status commit * download float16.py everytime * using variables to reference eps * adding ACL EP to ep perf tool * accuracy with absolute tolerance configurable * add acl to dict + remove commented line	2021-01-04 08:50:41 -08:00
Suffian Khan	46e0e4e69f	Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239 ) * bias gelu grad use exp(...) instead * update cuda to rocm * missing semicolon * comment * remove dockerfile * missing factor of two	2021-01-02 08:54:16 -08:00
Hector Li	ffb4b62826	Fix allocator issue for TensorRT IOBinding (#6240 ) * Fix issue: https://github.com/microsoft/onnxruntime/issues/6094 Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt. Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT	2020-12-31 20:15:43 -08:00
Changming Sun	1685167e46	Update manylinux docker image to the latest (#6242 )	2020-12-31 19:57:04 -08:00
Changming Sun	d5cb17c679	Update BUILD.md	2020-12-31 17:20:00 -08:00
Xavier Dupré	cd14c1af29	Support double for operator ArgMin (#6222 ) * Support double for operator ArgMin * add test specifically for double * add new test on pai-excluded-tests.txt	2020-12-31 11:25:46 +01:00
Xavier Dupré	84addcd2cf	Support double for operator ReduceMean, ReduceLogSumExp (#6217 ) * Support double for operators ReduceMean, ReduceLogSumExp	2020-12-31 11:24:54 +01:00
Xavier Dupré	5968a91ea6	Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223 ) * Support double for operator Gemm * fix type size while copying data in gemm operator for GPU * fix type in gemm implementation for rocm	2020-12-31 11:24:16 +01:00
Xavier Dupré	70e2f96ef4	Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220 ) * Support double for operator TopK * add static classes for topk/double * fix cast issue in topk	2020-12-31 11:23:19 +01:00
Tracy Sharpe	ecb2e119e4	MLAS: handle MlasGemm(M/N/K==0) cases (#6238 )	2020-12-30 23:25:10 -08:00
Hariharan Seshadri	4cc2ffef21	Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233 )	2020-12-30 20:41:59 -08:00
William Tambellini	39a988ce1c	Upgrade build.py to assert for python 3.6+ Upgrade build.py to assert for python 3.6+ as python 3.5 cannot build anymore todays master.	2020-12-30 20:17:09 -08:00
Changming Sun	c15a858745	Update the readme file	2020-12-30 20:16:45 -08:00
Changming Sun	3911105f09	Remove python 3.5	2020-12-30 20:16:45 -08:00
Changming Sun	1b23b28706	Remove MKLML/openblas/jemalloc build config (#6212 )	2020-12-30 17:18:19 -08:00

1 2 3 4 5 ...

4027 commits