onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-27 03:11:28 +00:00

Author	SHA1	Message	Date
Hariharan Seshadri	a1b5bfc4f8	Fix SDL warning (#6390 )	2021-01-20 08:35:42 -08:00
Hariharan Seshadri	d7bdd96425	Refine auto_pad based pad computation in ConvTranspose (#6305 )	2021-01-19 19:01:49 -08:00
Ye Wang	ac36596fb8	fix convert_common version retrival (#6382 )	2021-01-19 13:56:34 -08:00
Tianlei Wu	baac7c91e2	Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355 ) * Add CumSum-14 for Cuda	2021-01-19 09:55:44 -08:00
wezuo	5b6753ce27	Wezuo/memory analysis (#5658 ) * merged alloc_plan * pass compilation * Start running, incorrect allocation memory info * add in comments * fix a bug of recording pattern too early. * debugging lifetime * fix lifetime * passed mnist * in process of visualization * Add code to generate chrome trace for allocations. * in process of collecting fragmentation * before rebuild * passed mnist * passed bert tiny * fix the inplace reuse * fix the exception of weight in pinned memory * add guards to ensure the tensor is in AllocPlan * add customized profiling * debugging * debugging * fix the reuse of differnt location type * add rank * add the rank * add fragmentation * add time_step_trace * Add summary for each execution step (total bytes, used/free bytes). * add top k * change type of top k parameter * remove prints * change heap to set{ * add the name pattern * add the useage for pattern * add partition * change to static class * add custom group * remove const * update memory_info * in process of adding it as runtime config * change the memory profiling to be an argument * add some comments * add checks to recored meomry_info in traaining session * set the "local rank setting" to correct argument. * addressing comments * format adjustment * formatting * remove alloc_interval * update memory_info.cc to skip session when there is no tensor for a particular memory type * fix memory_info multiple iteration seg-fault * consolidate mainz changes * fixed some minor errors * guard by ORT_MINIMAL_BUILD * add ORT_MEMORY_PROFILE flag * added compiler flag to turn on/off memory profiling related code * clean up the code regarding comments * add comments * revoke the onnx version * clean up the code to match master * clean up the code to match master * clean up the code to match master Co-authored-by: Jesse Benson <benson.jesse@gmail.com> Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com> Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>	2021-01-19 08:30:55 -08:00
Ryan Lai	4db4982a5e	This added telemetry isn't needed (#6363 )	2021-01-15 16:36:59 -08:00
stevenlix	eab164e1a5	Add python example of TensorRT INT8 inference on ResNet model (#6255 ) * add trt int8 example on resnet model * Update e2e_tensorrt_resnet_example.py * remove keras dependency and update class names * move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example * simplify e2e_tensorrt_resnet_example.py * Update preprocessing.py * merge tensorrt_calibrate * Update calibrate.py * Update calibrate.py * generalize calibrate * Update calibrate.py * fix issues * fix formating * remove augment_all	2021-01-15 09:59:56 -08:00
Ashwini Khade	f5a4f7fc2a	fix -Wdangling-gsl (#6357 )	2021-01-15 09:30:41 -08:00
Pranav Sharma	c8e37e3a36	Fix one more SDL warning (#6359 )	2021-01-15 09:22:41 -08:00
Ryan Lai	961bb62ae4	Add create session to WinML telemetry to track WinML Usage (#6356 )	2021-01-14 22:42:55 -08:00
Wei-Sheng Chin	8ce252caa9	Pipeline Parallel Experimental Python API (#5815 )	2021-01-15 12:07:28 +08:00
Dmitri Smirnov	6d0fb3ebb3	Java: Set C language warnings to W4 and adjust JNI code (#6347 ) Set /W3 for C language and fix up JNI warnings.	2021-01-14 15:04:47 -08:00
Scott McKay	e54e2f969d	Use readelf for minimal build binary size checks. (#6338 ) * Use readelf for minimal build binary size checks. The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes. Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored) * Remove unused function	2021-01-15 07:46:02 +10:00
Ye Wang	5d9552cc8b	fix longformer benchmark io_binding output_buffers (#6345 ) * fix longformer benchmark io_binding output_buffers * format * import benchmark_helper from parent directory.	2021-01-14 11:29:31 -08:00
Changming Sun	ea6789b754	Add PREfast to python packaging pipeline (#6343 ) * Add PREfast to python packaging pipeline	2021-01-14 10:39:24 -08:00
ashbhandare	fd21c84eb8	Enable graph save for orttrainer (#6333 ) * Enable graph save for orttrainer * Fix CI * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py * Update orttraining/orttraining/python/training/orttrainer_options.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-01-14 10:07:54 -08:00
Yufeng Li	c24f2950bf	update quantize to support basic optimization and e2e example for image classification (#6313 ) update the resnet50-v1 to standard one from onnx zoo. add an example for mobilenet run basic optimization before quantization fix a bug in Clip	2021-01-14 09:27:10 -08:00
Pranav Sharma	5b9d993a2e	Fix DerefNullPtr issues raised by SDLNativeRules. (#6348 )	2021-01-14 08:36:07 -08:00
Vincent Wang	4df356d1c9	Train BERT Using BFloat16 on A100 (#6090 ) * traing bert using bf16 * Adam support bf16 * bugfix * add fusedmatmul support * fix after merge from master. * bugfix * bugfix after merge from master * fast reduction for bf16. * resolve comments * fix win build * bugfix * change header file. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-01-14 19:04:32 +08:00
Guoyu Wang	e35db194e3	fix the pipeline failure (#6346 )	2021-01-14 00:33:22 -08:00
Edward Chen	042053c55e	Add support for running Android emulator from build.py on Windows. (#6317 )	2021-01-13 19:21:49 -08:00
Guoyu Wang	b220feee2f	[NNAPI] Add pow support (#6310 )	2021-01-13 17:15:05 -08:00
Tracy Sharpe	fcd9fc9b6d	remove gemmlowp submodule (#6341 )	2021-01-13 15:54:37 -08:00
Scott McKay	cfd6f10098	Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321 )	2021-01-14 09:39:31 +10:00
Tixxx	d367941cc4	changed wording. (#6337 )	2021-01-13 15:12:04 -08:00
Ashwini Khade	f7034b9bca	add external data support to tensor proto utils (#6257 ) * update unpack tensor utilities to support loading external data * more updates * fix test * fix nuphar build * minor build fix * add tests * fix Android CI * fix warning * fix DML build failure and some warnings * more updates * more updates * plus few updates * plus some refactoring * changes per review * plus some change * remove temp code * plus updates to safeint usage * build fix * fix for safeint	2021-01-13 14:14:18 -08:00
Suffian Khan	62e404591a	Enable add + softmax fusion for Rocm platform (#6259 ) * add bias softmax; tests appear to pass * check fusion occurs for rocm as well * check for rocm provider compatible as well * build for cpu scenario as well * try again; broader cope * proper scope on kGpuExecutionProvider * been editing wrong file * remove commented #include lines * try again due to mac os ci error * try again * test fusion both cuda and rocm to avoid mac ci error	2021-01-13 14:09:09 -08:00
Olivia Jain	56ab2166e8	Delete float16.py (#6336 ) No longer needed. Also doesn't pass policheck.	2021-01-13 13:41:06 -08:00
Tracy Sharpe	87ec1f6208	MLAS: add fallback implementation for quantized GEMM (#6335 ) Add a non-vectorized version of the kernel used for the quantized version of MlasGemm.	2021-01-13 10:53:47 -08:00
Alberto Magni	5623cc6d17	Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340 ) This removes a special case of the cuda EP.	2021-01-13 18:27:13 +00:00
liqunfu	aeca96caba	Liqun/enable pipeline parallel test (#6331 ) enable pipeline parallel test Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-01-13 10:24:04 -08:00
Zhang Lei	f77ff1bc3d	Quantization support for split operator with its NHWC support (#6107 ) * Make split working for quantization. * NHWC transformer support for split operator * Refactor some according to Feedback. Will add test cases soon. * Fix build error on windows. * Add test case for split op on uint8_t support * Add nhwc_transformer_test for split uint8_t support * Some change according to PR feedbacks.	2021-01-13 10:05:34 -08:00
Dmitri Smirnov	6b73bae035	Java: add Semmle to Java publishing pipelines (#6326 ) Add Semmle to Java API pipeline Add security results publishing and add Java GPU.	2021-01-12 15:12:13 -08:00
Tim Harris	aacc8dbfa3	Remove false positive prefast warning from threadpool (#6324 )	2021-01-12 14:47:52 -08:00
Ashwini Khade	0ed56d491a	fix opset imports for function body (#6287 ) * fix function opsets * add tests and update onnx * changes per review comments * add comments * plus updates * build fix	2021-01-12 13:44:36 -08:00
Tim Harris	b491d7c179	Avoid false sharing on thread pool data structures (#6298 ) Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops. Motivation and Context MobileNet on a 212-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread). The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 214-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine. Before change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232 After change: BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33 BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31 BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31 BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25 BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407 BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242 BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240 BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186 BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938 BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659 BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591 BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317	2021-01-12 19:58:41 +00:00
Tianlei Wu	ec81e29c84	Add longformer to python package (#6314 ) * add longformer to python package * move test related script and data to a new folder	2021-01-12 10:38:39 -08:00
Zhang Lei	a8257666bd	Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295 ) * Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.	2021-01-12 09:53:13 -08:00
Luyao Ren	3b3e698674	Remove abs in LpPool (#6303 )	2021-01-12 01:39:13 -08:00
Tianlei Wu	a038924bee	update transformers required package versions (#6315 )	2021-01-12 00:10:56 -08:00
Changming Sun	c43ca45c4f	Force reinstall onnx python package on Windows (#6309 )	2021-01-11 22:12:56 -08:00
Vincent Wang	ac5b5e5d1e	more dtype for Equal CUDA kernel (#6288 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-01-12 10:46:21 +08:00
Tianlei Wu	938e65d878	add --sequence_lengths option (#6285 )	2021-01-11 14:26:22 -08:00
Chun-Wei Chen	84024bdfa9	Enable ONNX backend test of SequenceProto input/output (#6043 ) * assert sequence tensor and remove skips * update testdata json * use ONNX 1.8 in cgmanifest.json * use previous commit to workaround * update ONNX commit ID in docker * skip test_maxpool_2d_dilations test for now * update function name	2021-01-11 11:30:33 -08:00
Changming Sun	5084ce0969	Update nuget build (#6297 ) 1. Update the ProtoSrc path. The old one is not used anymore. 2. Regenerate OnnxMl.cs 3. Delete some unused code in tools/ci_build/build.py 4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build. 5. Fix a typo in the C API pipeline.	2021-01-11 10:49:05 -08:00
Jesse Benson	fa851bff66	Add workaround to remove ROCm-specific binary-elementwise files.	2021-01-11 10:00:18 -08:00
Jesse Benson	1059bfaf75	Workaround for static_cast<double>(half)	2021-01-11 10:00:18 -08:00
Ye Wang	da952a9a20	A list of changes in transformers tool (#6224 ) * longformer fp16 e2e * add fp16/fp32 parity check helper file * excludes nodes with subgraph in profiling * use onnxconverter_common to do fp32->fp16 * add version check for onnxconverter_common * remove helper file * add pkg installation on notebooks and script	2021-01-08 11:11:14 -08:00
Tianlei Wu	ac5ca2bbe0	fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284 ) fix io binding crash for past_sequence_length=0	2021-01-07 23:43:50 -08:00
Hariharan Seshadri	7fc827a8a1	Fix Min/Max CPU kernels for float16 type (#6205 )	2021-01-07 23:32:52 -08:00

1 2 3 4 5 ...

4058 commits