onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Jeff Bloomfield	389cca7a45	Handle missing initializers in allocation planner to fix crashes with DML provider (#5244 ) * Fix memory planning bug with DML EP * Address PR comments * Fix typo	2020-09-23 16:50:58 -07:00
Dwayne Robinson	b648fe5f74	ORT DirectML EP for Iron release, ONNX 1.5 (part 2) (#5263 ) * Merged PR 5195856: Fix broken cases of zero size tensors in Cast/Reduce MaskRCNN failed when `Cast` tried to execute `Xor` with emptiness (zero in dimensions). This is perfectly legal and should be treated as a nop. Ultimately DML itself should treat this case as a nop, just like how C's `memcpy` treats 0 count as a nop, but I'm just addressing it in ORT now, as enabling it in DML would impact more operators to be consistent (probably should incrementally add a flag to tensor validation so operators can be opted in gradually). Corresponding WindowsAI PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/5195850 Related work items: #27469839, #28761382 * Merged PR 5201369: Remove copy of initializers added in DMLXP refactor When used in ORT, a common method shouldn't copy and return initializer data Related work items: #29514403 Co-authored-by: Justin Stoecker <justoeck@microsoft.com> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2020-09-23 16:50:58 -07:00
Yufeng Li	eb75b492cc	Fix bug in the back to back quantization of matmul and conv (#5264 ) * fix bug in the back to back quantization of matmul and conv * fix bug in back to back gather	2020-09-23 16:50:58 -07:00
Tianlei Wu	47447da4fd	bump version to 1.5.1 (#5258 )	2020-09-23 16:50:58 -07:00
Ye Wang	87b15f32ef	Fix reshape fusion crash (#5252 ) * fix reshape fusion crash * handling start_node statelessly * fix	2020-09-23 16:50:58 -07:00
Guoyu Wang	fc259de3bc	Fix possible ios build break after update to Xcode 12 (#5246 ) * Fix possible ios build break after update to Xcode 12 * Address comments	2020-09-23 16:50:58 -07:00
Sherlock	9fd76c8693	Place Shape's output in CPU memory (#5245 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
edgchen1	9158679c43	Update BUILD.md training dependency info. (#5240 ) Update training dependency versions based on Dockerfile.training.	2020-09-23 16:50:58 -07:00
Changming Sun	b9b7c279fa	Update BUILD.md for CUDA versions (#5239 )	2020-09-23 16:50:58 -07:00
George Wu	0cbe240ea3	update TensorRT docs (#5238 ) * doc updates TensorRT * update * update * fix warning * newline * format	2020-09-23 16:50:58 -07:00
Scott McKay	c93f292d1f	Revert to using release SafeInt repo now that it supports a build with exceptions disabled. (#5233 )	2020-09-23 16:50:58 -07:00
edgchen1	6371ad61c5	Fix TransposeScaleMatMul and MatMulScaleFusion issues (#5230 ) - Rename TransposeScaleMatMul back to TransposeMatMul for backwards compatibility - Fix MatMulScaleFusion issues: - Add check for supported execution providers - Add check for supported MatMul input types	2020-09-23 16:50:58 -07:00
stevenlix	c27f461c1d	Create profile for all dynamic shape input tensors (#5229 )	2020-09-23 16:50:58 -07:00
Adam Pocock	4427b1e2a3	[java] Fixing the buffer semantics. (#5223 ) * [java] Fixing the buffer semantics. * Renaming bufferCapacity to bufferRemaining. * Adding a cast to char* so the pointer arithmetic works on Windows.	2020-09-23 16:50:58 -07:00
George Wu	c909c67701	fix _WIN32 (#5218 )	2020-09-23 16:50:58 -07:00
Scott McKay	95b2e31659	Update conversion script and process to simplify creating ORT format models and a minimal build (#5217 ) * Update conversion script and process to simplify creating ORT format models and a minimal build.	2020-09-23 16:50:58 -07:00
liqunfu	21a7afb2c6	--shm-size=1024m to fix nccl shared memory issue (#5214 ) * --shm-size=256m to fix nccl shared memory issue Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
RRRachelllll555	b791402f84	Remove shape inference and fix save large model(>2g) issue (#5210 ) * remove shape inference and fix save large model problem * remove unnecessary import * refine code and add external format for quantize_qat * remove initializers in tensors_to_calibrate * small refine Co-authored-by: t-yguo <t-yguo@microsoft.com>	2020-09-23 16:50:58 -07:00
Pranav Prakash	0a31b9ed3c	Fix order of returned values in quantize_weight_per_channel (#5205 ) Must match returned order of `quantize_inputs`	2020-09-23 16:50:58 -07:00
Tracy Sharpe	f726af34e0	NCHWc optimizer fixes for quantized models (#5203 ) This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.	2020-09-23 16:50:58 -07:00
S. Manohar Karlapalem	84ffdbc467	Corrects doc typos and formatting (#5201 )	2020-09-23 16:50:58 -07:00
Pranav Sharma	24d111c342	Add API to allow configuration of the global thread pools. (#5199 )	2020-09-23 16:50:58 -07:00
Zhang Lei	498483b464	MaxPool versioning in quantization tools. (#5194 ) MaxPool versioning in quantization tools.	2020-09-23 16:50:58 -07:00
Suffian Khan	39a7f96a44	Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160 ) * register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_* * forgot to remove axis check * add tests all axis Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
Shucai Xiao	8e650c5384	Amdmigraphx improvements (#5158 ) * code backup * remove unnecessary log info * code backup * code backup * merge changes from master branch * code backup * code backup * merge changes from master branch * code backup * code backup for constant folding enhancement * code backup * include more scenarios for constant folding * code backup * remove unnecessary code * remove unnecessary log information * fix an error in comments * update algorithm to do graph partition * code backup * remove unnecessary log information * remove an unused function * remove unnecessary changes	2020-09-23 16:50:58 -07:00
Ye Wang	b693cb1370	Fix a bug in EmbedLayerNorm fusion (#5150 ) * fix embedlayernorm bug * review comments * interim checkin * review comments * Fix core dump in MacOS * remove unnecessary lines * update document * Update graph_utils.cc * Update onnx_exporter.py * resolve comments	2020-09-23 16:50:58 -07:00
Changming Sun	5b5bcba9e3	Update MCR CUDA docker image to 10.2 (#5181 )	2020-09-17 08:39:47 -07:00
Dmitri Smirnov	ece9a7c1fc	Refactor TensorAt, prepare for release (#5180 ) * Refactor TensorAt locations* must be const and int64_t since our dims are int64_t Remove unnecessary copy of locations. Remove unnecesary casting and C-casting. Simplify implementation. Add a check for string type. Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it covers more ground and eliminate redundant copy. Eliminate inner loop, compute strides first.	2020-09-17 08:39:47 -07:00
Tracy Sharpe	b2994492af	MLAS: add sgemm weight prepacking (#5183 ) Add support to MLAS to prepack weights for the float GEMM. Support for prepacking has been added to MatMul and Attention for this release.	2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata	ecf04d23c4	Fix nuget build (#5163 ) * Fix nuget content * Revert "Fix nuget content" This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443. * Nuget packaging * skip tests * msbuild path * Force msbuild version * Workaround https://github.com/NuGet/Home/issues/7621 * cleanup	2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata	b523fa08bc	Use onecore umbrella lib in onecore builds (#5182 ) * delayload hack * Skip tests * Onecore uses onecore umbrella * Uncomment tests * cleanup * Disable dev mode for WinML	2020-09-17 08:39:47 -07:00
Chun-Wei Chen	393ff2f434	Add GetStartTime() for profiler to get private profiling_start_time_ (#4994 ) * add GetStartTime() for profiler * add function in inference_session * remove qualified name * add the api in cxx_api.h * rename starttime to StartTimeNs, expost profiling object * rename GetProfilingStartTime * move Ortapis to the right place * move to the end * add const for session * const the right place * use const auto instead of const auto* for session * remove const for auto getstarttime * remove const for auto getstarttime add unit tests * nit: update test name and add comments	2020-09-17 08:39:47 -07:00
edgchen1	5d3c962481	Install ssh in builder image, fix segfault in TrainingRunnerTest.Basic. (#5186 )	2020-09-17 08:39:47 -07:00
Bowen Bao	53d8779dbc	Improve error message for FE model export checking (#5156 )	2020-09-17 08:39:47 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Tianlei Wu	0752fd7425	change version number from 1.4.0 to 1.5.0 (#5178 )	2020-09-15 15:50:25 -07:00
Chi Lo	9f526f45ac	TensorRT Perf Tool (#4900 ) * Initialize tensorrt perf script * Add bert-squad dependencies * Modified code to make ort inference with CUDA/Tensorrt * Add get CUDA/TRT version * uncomment bert-squad * Add BERT-SQUAD inputs.json * Add FastRCNN * Make preprocess/validation in to common functions * Add MaskRCNN and SSD and consolidate the code * Add dependencies for MaskRCNN * following modifications are made: - create common fetch function to get inputs/outputs of model from ONNX model zoo. - create common validation function to compare inference outputs with reference outputs from ONNX model zoo. - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile). - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side. * Add approache to analyze profling file and also update model related settings * Add models * Add most of models from ONNX model zoo * Add model input name and print all the model names at the end of run * Add system info * Add TRT fp16 support * Refine the code * Handle TRT fall back and modify the way to get input data * Refine code * Modify code * Add more precise approach to measure inference * Add io-binding * Add YoLoV4 * Refine the code * Refine the code * Add models * Add yolov4 notebook for jetson device * Update notebook * Update notebook * Add CVS models * Add missing model * Add support of float16 * Add new way to get trt version * Add "validate" and "benchmark" mode * Add randomly generated input * Refine perf script * Refine the code. * Add README * Refine the code * Update README.md * Refine code * Update README.md * Remove all the model related python and instead using model_list.json as models configuration. Refine the benchmark.py * Refine the code Co-authored-by: Chi Lo <lochi@microsoft.com>	2020-09-15 10:06:01 -07:00
Changming Sun	ef496d36ea	Build: Add missing EXCLUDE_FROM_ALL to ONNX submodule (#5161 ) Avoid building unnecessary things	2020-09-15 09:22:09 -07:00
Wenbing Li	de6e3fb61d	Reduce IOS shared library size by symbol file. (#5171 )	2020-09-14 23:59:41 -07:00
Ryan Hill	8fa427b264	Ryanunderhill/backout 5014 (#5167 ) * Revert 5014	2020-09-14 22:48:00 -07:00
Scott McKay	089789c135	Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168 )	2020-09-15 15:11:06 +10:00
Sheil Kumar	c0d7c8bc44	Add docs indicating that the onnxruntime engine from other distributions can be compatible with the WinRT NuGet (#5009 ) * add docs for mix and matching * typos Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-09-14 21:15:51 -07:00
RandySheriffH	1dde215d96	promote cuda version on packacking pipelines (#5154 ) * promote cuda version on packacking pipelines * fix cudnn version in py packaing template Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-09-14 21:09:09 -07:00
Yufeng Li	3068a835f1	Fix quantization of 1-D conv with bias (#5157 )	2020-09-14 18:07:14 -07:00
Andrei Shadrikov	82b25e1731	Fix datasize call in calibrate (#5110 ) * Moving datasize to the interface. * Reverting changes and adressing the comment	2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem	f7edf0aa57	[OpenVINO-EP] Enable EP config options for VPU hardware (#5119 ) * Added config flags for VPU Fast Recompile * clean-up ifdefs * Add VPU Fast compile config option Adds an option that enables Fast compilation of models to VPU hardware specific format. * Add config option to choose specific device id for inference Inference of all subgraphs will be scheduled only on this device even if other devices of the same type are available. * Add Python API to list available device IDs * code cleanup * Add second C/C++ API with settings string parameter Adds an additional C/C++ API that allows passing multiple key-value pairs for settings as a single string. Multiple settings are delimited by '\n' while the key and value within a setting are delimited by '\|'. * Append 'Ex' to the extended C/C++ API * Use set_providers Py API to set config options. Uses Session.set_providers Python API to set EP runtime config options as key/val pairs Deprecated older module function definitions for config settings. Updates documentation. * avoid globals for py config options where possible Co-authored-by: intel <you@example.com>	2020-09-14 15:46:14 -07:00
Zhang Lei	d45e49dd2b	Add LeakyRelu and Sigmoid QLinear Quantization support (#5116 ) * Add LeakyRelu and Sigmoid QLinear Quantization support * Change due to reflect master changes.	2020-09-14 14:46:24 -07:00
Changming Sun	8946d212bf	Remove the dependency on CUDA SDk's version.txt (#5155 )	2020-09-14 14:25:28 -07:00
Yufeng Li	20b2f45b24	Support per-channel quantization of weight tensor (#5057 ) * Support per-channel quantization of weight tensor * rename util functions * fix bugs in calibrate * add support of reduce_range * refine opset check	2020-09-14 11:53:50 -07:00
Wenbing Li	2a456d16c0	Enable onnxruntime iOS shared library build. (#5148 )	2020-09-14 10:32:39 -07:00

1 2 3 4 5 ...

3407 commits