onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-11 00:49:31 +00:00

Author	SHA1	Message	Date
Zhang Lei	9992f0f812	Implement QLinear GlobalAveragePool with sse2/neon. (#5838 ) Add QLinear Global Average Pool for quantization for ARM and SSE2. Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>	2020-11-23 19:23:58 -08:00
sfatimar	916410151c	Fix for hetero multi python binding with new shared library (#5895 ) Co-authored-by: sfatimar <sahar.fatima@intel/com>	2020-11-23 15:41:10 -08:00
Ye Wang	3d5b48a894	remove use_cdn when loading pretrained model (#5900 )	2020-11-23 14:26:55 -08:00
Hariharan Seshadri	d46dbeafd3	Expose knobs to create and share (CPU) allocators across sessions in C# and Python (#5634 )	2020-11-21 14:12:33 -08:00
Ryan Hill	ba739a8000	Convert OpenVINO into a shared provider (#5778 ) Same as Dnnl and TensorRT before it, now with more methods and more cleanup.	2020-11-20 17:39:57 -08:00
Olivia Jain	3738ca7e10	Improve perf testing (#5760 ) * build off a specific commit and archive wheel file * rename to fp32, prefix results w/ commit, add CPU col * rename 99th to 90 percentile * get symbolic_shape from master each time * add install archive wheel, parallel build * shortening hash	2020-11-20 16:03:09 -08:00
Scott McKay	f0142da59c	Add NNAPI to providers that can be used via the python bindings. (#5867 ) Update ORT model conversion script - add args for specifying optimization level and whether to use NNAPI - add logic to create a list of required ops and ORT format model that can be used with NNAPI	2020-11-21 09:18:35 +10:00
Takeshi Watanabe	a622533ecc	Support profile_file_prefix in python binding (#5864 )	2020-11-20 14:28:50 -08:00
S. Manohar Karlapalem	ff58f621fa	Remove nGraph Execution Provider (#5858 ) * Remove nGraph Execution Provider Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice Deprecation Notice \| \| \| \| --- \| --- \| \| Deprecation Begins \| June 1, 2020 \| \| Removal Date \| December 1, 2020 \| Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit. Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware. * Remove nGraph Licence info from ThirdPartyNotices.txt * Use simple Test.Run() for tests without EP exclusions To be consistent with rest of test code. * Remove nGraph EP functions from Java code	2020-11-19 16:47:55 -08:00
Hariharan Seshadri	62508ef0e4	Revert "Remove MKLML build config (#5559 )" (#5855 )	2020-11-19 10:53:08 -08:00
Yufeng Li	6f86c4dbe3	Quantize LSTM (#5595 ) Quantize LSTM: 1. dynamically quantizes MatMul inside the LSTM. It doesn't quantize activation function. 2. support per-channel on the input weight and recurrent weight.	2020-11-18 11:21:49 -08:00
Peichen Xie	e8c0f5d0ff	Update the quantization script to support GEMM (transB==1) (#5432 ) * Modify onnx_quantizer.py * Fix topology order issues * Handle more cases	2020-11-17 21:24:48 -08:00
Scott McKay	7b76b57fc8	Support EPs that compile nodes in a minimal build. (#5776 ) * Support EPs that compile nodes in a minimal build. This enables NNAPI being used.	2020-11-17 13:52:22 +10:00
Maajid khan	a84a058f9e	[OpenVINO-EP] Enabling Multi Device support (#5740 ) * Enabling Multi Device support for UEP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor fix added *Added a simple fix to determine OpenVINO version for Arm build as well Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2020-11-11 15:16:30 -08:00
Chi Lo	92292de135	Tensorrt perf tool (#5436 ) * Add YAML file for pipeline * Modify typo * Add working directory * Modify and test * Modfiy and test * Modify and test * Modify and test * Modify * Modify * Modify * Modify * Make sure to copy all the result files * Add clearn up * Modify * Modify agent pool name * Upload only specific artifacts * Modify * Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target * Fix bug * Fix bug * Add reading the information regarding previously known failing models and then skip testing them during benchmark/validation * Modify the script file for CI * Replace print with logger.info * Fix bug * Fix bug * Refine the code * Modify the script so that it can capture script segmentation fault while running ORT * Fix bug * fix bug * fix bug * Add debug info * fix bug * Refine perf code * Refine the code * fix bug * Code refactoring * change many-models path * remove metadata after validation/benchmark are done * Update README.md * Fix bug so that metadata doesn't hold stale value * Remove hardcode and update README * Add arguments to the script to make it run correctly * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Fix bug so that metadata doesn't hold stale value * Fix small bug of finding test dataset directory for FP16 test data, as well as modification of some output information * use -i random for perf test of TRT changes Co-authored-by: Olivia Jain <oljain@microsoft.com>	2020-11-06 12:27:42 -08:00
Ye Wang	95e6da7957	Revert saving optimized model as external data (#5690 ) * revert and add support for saving external data * review comments * update	2020-11-06 11:54:19 -08:00
Zhang Lei	77b1eea9cf	Add option to allow quantize_input() use input_qtype for initializers. (#5721 )	2020-11-06 09:33:24 -08:00
Yufeng Li	5c4543e194	Calibrate float tensor only (#5704 )	2020-11-04 23:55:48 -08:00
Ye Wang	a028ca41ec	Optimize flaubert (#5651 ) * optimize flaubert * fix an issue and format * revert non-relevent change * review comments	2020-11-03 09:51:42 -08:00
Wei-Sheng Chin	8856c2595b	Sync the two IDs in OrtMemoryInfo when calling ctor (#5663 ) * Sync the two IDs in OrtMemoryInfo when calling ctor * Also fix the same problem for output	2020-11-02 23:22:47 -08:00
Tianlei Wu	2c02530603	Bert Model Profiling Tool (#5654 ) * Add profiler tool for BERT models	2020-11-02 13:47:37 -08:00
Derek Murray	ff538b8d3a	Minor fixes in BERT Inference notebook (#5637 ) Add missing commas to the code example.	2020-11-02 09:49:23 -08:00
Maajid khan	d98062da0c	[OpenVINO-EP] Hetero support (#5627 ) * Implement Hetero in UEP * Added security checks to take valid Hetero combinations as device type * Integrating Hetero features * Get the statistics Report in Debug Mode Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Passing right device type for vadm_baackend Added simple fix to pick the right device type when using vadm_backend with Hetero as well. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed batching logic for 2020.4 and above * Fixed flake8 PEP8 errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor Fixes Added Added security checks for device_type passed in for Hetero build during run time code cleanup Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes Added Fixed batch_size bug in vadm_backend code cleanup *Documentation updated for Hetero Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>	2020-10-30 22:35:08 -07:00
KeDengMS	32bf6390ad	Some fixes to symbolic shape inference (#5642 ) * Some fixes to symbolic shape inference 1. Topological sort before iteration in graph 2. Fix a case in slice: start=100000, end=-100000, step=-1, dim=2 3. Fix Nuphar Gemm test's random seed 4. Slice opset 1 axes is optional	2020-10-30 19:28:47 -07:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Maajid khan	ddf83d1ace	Maajid/multi threading 2 (#5568 ) * Enabled multi-threading for OpenVino EP ->Enabled support for concurrent_session_runs Run UEP using concurrent_session_runs > 1 Enabled support for ORT_PARALLEL ExecutionMode ->Documentation Added for Enabling MultiThreading Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor Fixes added Configure the value of nireq during Runtime Documentation typos rectified and details added for Multi_Threaded Inference Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Some checks added for this fix Added checks to invalidate wrong nireq value and assigned it to default value of 8 Added new config options for enable_vpu_fast_compile which were changed w.r.t OpenVINO_2021.1 Release Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2020-10-27 14:48:12 -07:00
Tianlei Wu	1f304fbee7	Attention with past and no unidirectional mask (#5557 ) * Update fusions to support shared node, and mask of all ones	2020-10-21 20:12:02 -07:00
Changming Sun	5802fe1699	Remove MKLML build config (#5559 ) Remove MKLML build config	2020-10-21 13:11:25 -07:00
Hariharan Seshadri	4291c57322	[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481 )	2020-10-21 10:32:13 -07:00
Yufeng Li	6c2162e97a	Fix quantization of Conv1D with bias (#5491 ) * Fix reshape for Conv with bias	2020-10-20 15:27:26 -07:00
KeDengMS	e1a54c4090	Symbolic shape inference: fix a bug in shape merge (#5519 ) * Symbolic shape inference: fix a bug in shape merge OpType Where: input0: ['mt_src_tokens_batch', 1, 1, 'mt_src_tokens_len'] input1: [] input2: ['mt_prev_output_tokens_batch', 12, 'mt_prev_output_tokens_len', 'floor(mt_src_tokens_batchmt_src_tokens_len/mt_prev_output_tokens_batch)'] 1 output: [None, 12, 'mt_prev_output_tokens_len', None] Undo unintended TRT change	2020-10-16 17:54:57 -07:00
Chun-Wei Chen	2b6b3a2ee6	Add GetProfilingStartTimeNs() to Python/C# APIs (#5280 ) * add Python API for getProfilingStartTime * debug for using Python API * add in C# api * use uint intead of uint64_t to prevent warning * typo for GetProfilingStartTimeNs * remove const * Update onnxruntime/python/session.py Co-authored-by: Pranav Sharma <emailpranav@gmail.com> * remove unnecessary return * Add Python unit test * Add C# unit test and refactor Python test * use ulong in C# for uint64_t in C++ * remove time.monotonic_ns * syntax: remove public for inner function * correct the API's order * getprofilingstarttime after run * Correct the right order in NativeMethod.cs * update order * nit: remove spaces * Update csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> * use the updated function * add comment about the precision * add more comments * add session.py back * fix flake8 * remove session.py * Add comments in C, C#, Python APIs about precision Co-authored-by: Pranav Sharma <emailpranav@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>	2020-10-14 05:32:43 -07:00
Ye Wang	67315d8ae0	Optimize openai-gpt/albert model and add fusion test (#5466 ) * optimize openai-gpt * add huggingface model fusion test * move albert's attention fusion here * add test for albert fusion	2020-10-13 19:24:14 -07:00
KeDengMS	c444b9d76a	Add CUDA option to run copy in default stream (#5445 ) * Add CUDA option to run copy in default stream This change fixes #4829. Thanks @maherzog for providing the repro! The bug is caused by memory reuse in BFC arena, where copy and compute stream in CUDA has a racing condition. BFC arena is an arena allocator on top of cudaMalloc/Free to reduce the cost in syncing CPU and GPU when alloc/free. It means when CPU alloc/free the memory, GPU might not finished previous work on the memory, so that CPU and GPU could run asynchronously. This is OK if there's only one stream, where the execution order in CPU and GPU are consistent. For example, if we have two kernels A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB, A and B could shares the same memory since computeA and computeB will not have racing as long as they run in the same GPU compute stream. However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB, the order of execution in GPU could have copyA happen after computeB, if copy and compute happens in different GPU streams. This change makes copy to run in default compute stream, while adding an option to fall back to previous behavior if there's perf hit. This is a short term fix before BFC arena could support multiple streams. User may use following options to revert to previous behavior: C API: struct OrtCUDAProviderOptions cudaProviderOpt; cudaProviderOpt.do_copy_in_default_stream = false; C++ API: CUDAExecutionProviderInfo cudaEPInfo; cudaEPInfo.do_copy_in_default_stream = false; C# API: pending... Python: import onnxruntime onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False) * Confirmed the test failes in CI when doing copy in separate stream Revert the test to get CI pass now * Fix Windows test * Address CR	2020-10-12 22:12:05 -07:00
Hariharan Seshadri	b9f90e297e	Support sharing of initializers between session via the Python API (#5407 )	2020-10-09 20:26:28 -07:00
Ye Wang	90f976d060	Some improvements on transformers tool (#5383 ) * modify tensoflow benchmark gpu setting * add export from tf choice in script * fix typo * match more embedlayernorm pattern * format	2020-10-08 19:35:17 -07:00
Tianlei Wu	15696b8fce	bump version to 1.5.2 (#5420 )	2020-10-08 16:30:13 -07:00
Yufeng Li	b04cf2d229	Update ORT to 1.5.1 in Bert Quantization Notebook (#5396 ) * Update ORT to 1.5.1 in Bert Quantization Notebook	2020-10-08 09:55:01 -07:00
Tianlei Wu	8ee2b08325	Allow benchmark different threads (#5390 )	2020-10-07 11:13:01 -07:00
Tianlei Wu	094384781e	Add --use_external_data_format in convert_to_onnx.py (#5393 )	2020-10-07 09:42:02 -07:00
Hariharan Seshadri	6f54113a1b	Support OrtValue binding in Python to enable interesting IOBinding scenarios in Python (#5248 )	2020-10-06 21:14:41 -07:00
Du Li	323c4dfe02	Adding an option for cudnn conv algorithms. (#5159 ) * adding cudnn conv algorithm selection options. * adding cudnn conv algorithm selection options. * export the api * adding the perf test option. * accomodating pr comments. * Move OrtSessionOptionsAppendExecutionProvider_CUDA to onnxruntime_c_api.h * Accomodating PR comments.	2020-10-05 16:53:52 -07:00
Vlad Burlik	c20fcf26eb	Onnx GPU runtime fails to fallback to CPU when GPU is not available/busy (#5304 ) * ONNX GPU runtime fails to fallback to CPU when GPU is not available OR busy https://github.com/microsoft/onnxruntime/issues/5299 * comments * Init _fallback_providers before C.InferenceSession * As per review: Fallback providers order supersedes user's providers order, IF they are included into providers list. * Code convention fix * pep8	2020-10-02 22:45:14 -07:00
Tianlei Wu	f5e4c0ea04	Fix benchmark_gpt2 model verification (#5343 )	2020-10-02 13:53:02 -07:00
Sherlock	e71668f92c	Expose recompute configs to the frontend (#5318 ) * Expose recompute configs to the frontend * Add frontend test * Ensure recompute graph transformer is only applied once Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-02 09:49:47 -07:00
Tianlei Wu	e33de20861	Update gpt2 notebook for int8 quantization (#5346 ) * Update gpt2 notebook for ORT 1.5 * add sections for int8 quantization including QAT note	2020-10-02 09:41:52 -07:00
Yufeng Li	e8b9aa1f29	fix quantization of EmbeddingLayerNorm (#5321 )	2020-10-01 20:08:43 -07:00
KeDengMS	7495dc167a	Symbolic shape inference: fix a bug in auto_merge when broadcasting (#5349 ) The bug happens when merging following shapes: input0: [1, 1, 'Min(1024, input1_dynamic_axes_3)', 'Min(1024, input1_dynamic_axes_3)'] input1: ['input1_dynamic_axes_1*input1_dynamic_axes_2', 12, 'input1_dynamic_axes_3', 'input1_dynamic_axes_3'] input2: [] The fix is to avoid broadcasting merge on input2	2020-10-01 15:24:00 -07:00
Ye Wang	caed6c264c	Add tf2pytorch wrapper in transformers tool (#5316 ) * init checkin * format * refactor * review comments	2020-10-01 13:58:58 -07:00
Ye Wang	1a12f510fc	Support T5 benchmarking in transformers tool (#5133 ) * init checkin * review comments * modify according to transformers release	2020-09-29 22:58:28 -07:00

1 2 3 4 5 ...

303 commits