onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Dmitri Smirnov	8ee4e8226e	Preserve relative order of the results and the tests. (#5225 )	2020-09-19 00:45:44 -07:00
Weixing Zhang	b49f6a5e2c	using GPU_WARP_SIZE to make kernel portable between AMD and Nvidia GPU (#5173 )	2020-09-18 14:56:16 -07:00
Suffian Khan	84589c7e05	Fuse softmax(a + b) in case of simple broadcast (#4937 ) * bias softmax kernel * bias softmax kernel * remove debug comments * remove debug comment * windows build doesnt handle unary minus on unsigned type * int64 => int treated as error * only support cuda * add bias softmax fusion tests * PR comments * more PR comments * use MLTypeCallDispatcher * break function into pieces * add loop unroll and add to list for inference as well * use std::min and move operator== * revert std::min (doesnt work ci pipeline) and fix int to size_t error * pr comments * fixes for windows ci * fix for windows ci * pr comments on consistency * p_model_ * fix formatting and add anonymous namespace Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-18 14:15:55 -07:00
Tang, Cheng	e0b49844e9	Provide option to let layernorm stash mean/var as fp32 or bfloat16 (#5215 ) * add option to set layernorm stash type * bug fix * fix merge error * fix win build error	2020-09-18 13:42:01 -07:00
Dmitri Smirnov	a90ab12589	Refactor onnx_test_runner (#5169 ) Refactor onnx_test_runner for better object ownership, code readability and maintainability.	2020-09-18 13:19:35 -07:00
Ryan Hill	13318ab0d4	Remove invalid install line (#5219 )	2020-09-18 11:58:40 -07:00
Shucai Xiao	a632dd2d3b	Amdmigraphx improvements (#5158 ) * code backup * remove unnecessary log info * code backup * code backup * merge changes from master branch * code backup * code backup * merge changes from master branch * code backup * code backup for constant folding enhancement * code backup * include more scenarios for constant folding * code backup * remove unnecessary code * remove unnecessary log information * fix an error in comments * update algorithm to do graph partition * code backup * remove unnecessary log information * remove an unused function * remove unnecessary changes	2020-09-18 11:56:50 -07:00
Weixing Zhang	f91248e0cc	remove curand_generator_ related code since it is not used. (#5220 )	2020-09-18 11:50:35 -07:00
KeDengMS	ce3b67e0cd	[Python] Move symbolic_shape_infer from nuphar to tools (#5162 ) * [Python] Move symbolic shape inference from nuphar to tools * Fix PEP8 ERROR	2020-09-18 09:31:06 -07:00
RRRachelllll555	f7c1e51810	Remove shape inference and fix save large model(>2g) issue (#5210 ) * remove shape inference and fix save large model problem * remove unnecessary import * refine code and add external format for quantize_qat * remove initializers in tensors_to_calibrate * small refine Co-authored-by: t-yguo <t-yguo@microsoft.com>	2020-09-18 08:46:31 -07:00
Scott McKay	c46a480306	Update conversion script and process to simplify creating ORT format models and a minimal build (#5217 ) * Update conversion script and process to simplify creating ORT format models and a minimal build.	2020-09-18 18:49:54 +10:00
George Wu	1b61dfaf69	fix _WIN32 (#5218 )	2020-09-18 00:23:17 -07:00
Pranav Prakash	f5df96256c	Fix order of returned values in quantize_weight_per_channel (#5205 ) Must match returned order of `quantize_inputs`	2020-09-17 17:57:46 -07:00
liqunfu	f37e1292a1	--shm-size=1024m to fix nccl shared memory issue (#5214 ) * --shm-size=256m to fix nccl shared memory issue Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-17 17:21:47 -07:00
Guoyu Wang	8156e0dd10	[ORT Mobile] Some updates to iOS/Android build settings (#5184 ) * Update android CI and build settings * add build_java to arm64 also * Add ios signing param * fix a small build warning * address pr comments	2020-09-17 15:53:14 -07:00
Tracy Sharpe	8698157112	NCHWc optimizer fixes for quantized models (#5203 ) This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.	2020-09-17 09:52:21 -07:00
Pranav Sharma	d535894297	Add API to allow configuration of the global thread pools. (#5199 )	2020-09-17 09:19:18 -07:00
Suffian Khan	e01e0b2e40	Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160 ) * register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_* * forgot to remove axis check * add tests all axis Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-17 07:15:25 -07:00
S. Manohar Karlapalem	584638e5d3	Corrects doc typos and formatting (#5201 )	2020-09-17 01:25:19 -07:00
Zhang Lei	cd0386b649	MaxPool versioning in quantization tools. (#5194 ) MaxPool versioning in quantization tools.	2020-09-16 22:52:24 -07:00
Ryan Hill	b11c106346	Remove almost all of the reinterpret_casts from the provider shared API (#5190 )	2020-09-16 17:00:15 -07:00
Vincent Wang	c37472a1aa	Mixed Precision Transformer and Gradient Builder Refactor (#4892 ) * transform mixed precision before build gradient * resolve comments Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-09-17 02:44:50 +08:00
Tiago Koji Castro Shibata	f3f119a945	Use onecore umbrella lib in onecore builds (#5182 ) * delayload hack * Skip tests * Onecore uses onecore umbrella * Uncomment tests * cleanup * Disable dev mode for WinML	2020-09-16 10:46:27 -07:00
Tiago Koji Castro Shibata	1a2e289d2d	Fix nuget build (#5163 ) * Fix nuget content * Revert "Fix nuget content" This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443. * Nuget packaging * skip tests * msbuild path * Force msbuild version * Workaround https://github.com/NuGet/Home/issues/7621 * cleanup	2020-09-16 10:37:09 -07:00
Dmitri Smirnov	e6f85f338e	Refactor TensorAt, prepare for release (#5180 ) * Refactor TensorAt locations* must be const and int64_t since our dims are int64_t Remove unnecessary copy of locations. Remove unnecesary casting and C-casting. Simplify implementation. Add a check for string type. Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it covers more ground and eliminate redundant copy. Eliminate inner loop, compute strides first.	2020-09-16 10:20:45 -07:00
edgchen1	a20f8037f6	Install ssh in builder image, fix segfault in TrainingRunnerTest.Basic. (#5186 )	2020-09-16 09:53:30 -07:00
Bowen Bao	400ac85565	Improve error message for FE model export checking (#5156 )	2020-09-16 09:22:37 -07:00
Changming Sun	965e2b095d	Update MCR CUDA docker image to 10.2 (#5181 )	2020-09-16 09:01:31 -07:00
Tracy Sharpe	79e27d937a	MLAS: add sgemm weight prepacking (#5183 ) Add support to MLAS to prepack weights for the float GEMM. Support for prepacking has been added to MatMul and Attention for this release.	2020-09-16 08:36:27 -07:00
Oliver Rausch	3afc2bfa73	Remove mutable arguments from symbolic_shape_infer (#5166 )	2020-09-16 00:25:51 -07:00
Chun-Wei Chen	7f3aa3a163	Add GetStartTime() for profiler to get private profiling_start_time_ (#4994 ) * add GetStartTime() for profiler * add function in inference_session * remove qualified name * add the api in cxx_api.h * rename starttime to StartTimeNs, expost profiling object * rename GetProfilingStartTime * move Ortapis to the right place * move to the end * add const for session * const the right place * use const auto instead of const auto* for session * remove const for auto getstarttime * remove const for auto getstarttime add unit tests * nit: update test name and add comments	2020-09-16 00:17:04 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Tianlei Wu	0752fd7425	change version number from 1.4.0 to 1.5.0 (#5178 )	2020-09-15 15:50:25 -07:00
Chi Lo	9f526f45ac	TensorRT Perf Tool (#4900 ) * Initialize tensorrt perf script * Add bert-squad dependencies * Modified code to make ort inference with CUDA/Tensorrt * Add get CUDA/TRT version * uncomment bert-squad * Add BERT-SQUAD inputs.json * Add FastRCNN * Make preprocess/validation in to common functions * Add MaskRCNN and SSD and consolidate the code * Add dependencies for MaskRCNN * following modifications are made: - create common fetch function to get inputs/outputs of model from ONNX model zoo. - create common validation function to compare inference outputs with reference outputs from ONNX model zoo. - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile). - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side. * Add approache to analyze profling file and also update model related settings * Add models * Add most of models from ONNX model zoo * Add model input name and print all the model names at the end of run * Add system info * Add TRT fp16 support * Refine the code * Handle TRT fall back and modify the way to get input data * Refine code * Modify code * Add more precise approach to measure inference * Add io-binding * Add YoLoV4 * Refine the code * Refine the code * Add models * Add yolov4 notebook for jetson device * Update notebook * Update notebook * Add CVS models * Add missing model * Add support of float16 * Add new way to get trt version * Add "validate" and "benchmark" mode * Add randomly generated input * Refine perf script * Refine the code. * Add README * Refine the code * Update README.md * Refine code * Update README.md * Remove all the model related python and instead using model_list.json as models configuration. Refine the benchmark.py * Refine the code Co-authored-by: Chi Lo <lochi@microsoft.com>	2020-09-15 10:06:01 -07:00
Changming Sun	ef496d36ea	Build: Add missing EXCLUDE_FROM_ALL to ONNX submodule (#5161 ) Avoid building unnecessary things	2020-09-15 09:22:09 -07:00
Wenbing Li	de6e3fb61d	Reduce IOS shared library size by symbol file. (#5171 )	2020-09-14 23:59:41 -07:00
Ryan Hill	8fa427b264	Ryanunderhill/backout 5014 (#5167 ) * Revert 5014	2020-09-14 22:48:00 -07:00
Scott McKay	089789c135	Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168 )	2020-09-15 15:11:06 +10:00
Sheil Kumar	c0d7c8bc44	Add docs indicating that the onnxruntime engine from other distributions can be compatible with the WinRT NuGet (#5009 ) * add docs for mix and matching * typos Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-09-14 21:15:51 -07:00
RandySheriffH	1dde215d96	promote cuda version on packacking pipelines (#5154 ) * promote cuda version on packacking pipelines * fix cudnn version in py packaing template Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-14 21:09:09 -07:00
Yufeng Li	3068a835f1	Fix quantization of 1-D conv with bias (#5157 )	2020-09-14 18:07:14 -07:00
Andrei Shadrikov	82b25e1731	Fix datasize call in calibrate (#5110 ) * Moving datasize to the interface. * Reverting changes and adressing the comment	2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem	f7edf0aa57	[OpenVINO-EP] Enable EP config options for VPU hardware (#5119 ) * Added config flags for VPU Fast Recompile * clean-up ifdefs * Add VPU Fast compile config option Adds an option that enables Fast compilation of models to VPU hardware specific format. * Add config option to choose specific device id for inference Inference of all subgraphs will be scheduled only on this device even if other devices of the same type are available. * Add Python API to list available device IDs * code cleanup * Add second C/C++ API with settings string parameter Adds an additional C/C++ API that allows passing multiple key-value pairs for settings as a single string. Multiple settings are delimited by '\n' while the key and value within a setting are delimited by '\|'. * Append 'Ex' to the extended C/C++ API * Use set_providers Py API to set config options. Uses Session.set_providers Python API to set EP runtime config options as key/val pairs Deprecated older module function definitions for config settings. Updates documentation. * avoid globals for py config options where possible Co-authored-by: intel <you@example.com>	2020-09-14 15:46:14 -07:00
Zhang Lei	d45e49dd2b	Add LeakyRelu and Sigmoid QLinear Quantization support (#5116 ) * Add LeakyRelu and Sigmoid QLinear Quantization support * Change due to reflect master changes.	2020-09-14 14:46:24 -07:00
Changming Sun	8946d212bf	Remove the dependency on CUDA SDk's version.txt (#5155 )	2020-09-14 14:25:28 -07:00
Yufeng Li	20b2f45b24	Support per-channel quantization of weight tensor (#5057 ) * Support per-channel quantization of weight tensor * rename util functions * fix bugs in calibrate * add support of reduce_range * refine opset check	2020-09-14 11:53:50 -07:00
Wenbing Li	2a456d16c0	Enable onnxruntime iOS shared library build. (#5148 )	2020-09-14 10:32:39 -07:00
ashbhandare	cc3212f9d5	Add fp16 pow kernel (#5016 ) * Add fp16 pow kernel * Fix test added for non-cuda runs	2020-09-14 10:01:39 -07:00
Moshe David	1d6a21fd08	[TensorRT] Add slightly faster hash computation for `vector<int>` (#5142 ) * w * w Co-authored-by: modav <modav@microsoft.com>	2020-09-14 09:01:59 -07:00
sfatimar	0c7e9fb52a	changes to ensure compilation issues in windows is fixed by disabling the level 3 warning 4267 (#5147 ) while a more permanent fix is found Co-authored-by: sfatimar <sahar.fatima@intel/com>	2020-09-14 08:59:41 -07:00

1 2 3 4 5 ...

3404 commits