onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-26 03:00:54 +00:00

Author	SHA1	Message	Date
Changming Sun	e4f71abd90	Exclude GPT2_LM_HEAD from OpenVino's model test list (#5356 ) GPT2_LM_HEAD is a new ONNX model zoo model that OpenVino doesn't support. Error message:1: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running OpenVINO-EP-subgraph_1162 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1162_1' Status Message: _Map_base::at	2020-10-12 13:27:37 -07:00
Tiago Koji Castro Shibata	d77241e7f1	Fix WinML warnings (#5228 )	2020-10-12 13:27:37 -07:00
Tianlei Wu	4227dd7df5	Revert "Move flatbuffers to 1.12 release (#5392 )" This reverts commit 0294d9aa5624b463ef34aa3ad8458e983715e2a4.	2020-10-12 13:27:37 -07:00
Tianlei Wu	fc0fc80db2	Revert "Add flatbuffers verifier for ORT format buffer (#5378 )" This reverts commit e8bf3ba2bb383055b5974e8904324c33e0f4cbb1.	2020-10-12 13:27:37 -07:00
Guoyu Wang	6782866529	Mitigate pybind11 build break using Xcode 12 on macOS (#5381 )	2020-10-12 13:27:37 -07:00
Guoyu Wang	fda4363992	Add flatbuffers verifier for ORT format buffer (#5378 )	2020-10-12 13:27:37 -07:00
Pranav Sharma	5f331af157	Include config keys header file in the release packages for Linux and Mac. (#5388 )	2020-10-12 13:27:37 -07:00
Guoyu Wang	8adfa7ac70	Move flatbuffers to 1.12 release (#5392 )	2020-10-12 13:27:37 -07:00
Tiago Koji Castro Shibata	8283526541	Fix com ptr refcount (#5404 )	2020-10-12 13:27:37 -07:00
Tianlei Wu	58bf508ce0	bump version to 1.5.2 (#5420 )	2020-10-12 13:27:37 -07:00
Tianlei Wu	4e983634ff	clear cudaDelayLoadedLibs since delayload is disabled (#5386 )	2020-10-12 13:27:37 -07:00
Yufeng Li	5de47affb1	fix quantization of EmbeddingLayerNorm (#5321 )	2020-09-29 01:00:47 -07:00
Tianlei Wu	c00e13a291	Cherry pick (batch 2) to rel-1.5.1 (#5290 ) * remove implicit linking of tensorrt and dnnl ep shared libs (#5262) * Update DirectML Nuget to 1.3.0 (#5274) * Update PyTorch TransformerModel sample (#5275) * Insert telemetry template into GPU build, add telemry build switches. (#5278) * Synchronize training dependency versions between Docker image and Python wheel (#5261) * Downgrade GCC (#5269) * Remove --enable_symbolic_shape_infer_tests to fix linux ci pipeline build error. Co-authored-by: Edward Chen Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-09-25 09:26:40 -07:00
Jeff Bloomfield	389cca7a45	Handle missing initializers in allocation planner to fix crashes with DML provider (#5244 ) * Fix memory planning bug with DML EP * Address PR comments * Fix typo	2020-09-23 16:50:58 -07:00
Dwayne Robinson	b648fe5f74	ORT DirectML EP for Iron release, ONNX 1.5 (part 2) (#5263 ) * Merged PR 5195856: Fix broken cases of zero size tensors in Cast/Reduce MaskRCNN failed when `Cast` tried to execute `Xor` with emptiness (zero in dimensions). This is perfectly legal and should be treated as a nop. Ultimately DML itself should treat this case as a nop, just like how C's `memcpy` treats 0 count as a nop, but I'm just addressing it in ORT now, as enabling it in DML would impact more operators to be consistent (probably should incrementally add a flag to tensor validation so operators can be opted in gradually). Corresponding WindowsAI PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/5195850 Related work items: #27469839, #28761382 * Merged PR 5201369: Remove copy of initializers added in DMLXP refactor When used in ORT, a common method shouldn't copy and return initializer data Related work items: #29514403 Co-authored-by: Justin Stoecker <justoeck@microsoft.com> Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>	2020-09-23 16:50:58 -07:00
Yufeng Li	eb75b492cc	Fix bug in the back to back quantization of matmul and conv (#5264 ) * fix bug in the back to back quantization of matmul and conv * fix bug in back to back gather	2020-09-23 16:50:58 -07:00
Tianlei Wu	47447da4fd	bump version to 1.5.1 (#5258 )	2020-09-23 16:50:58 -07:00
Ye Wang	87b15f32ef	Fix reshape fusion crash (#5252 ) * fix reshape fusion crash * handling start_node statelessly * fix	2020-09-23 16:50:58 -07:00
Guoyu Wang	fc259de3bc	Fix possible ios build break after update to Xcode 12 (#5246 ) * Fix possible ios build break after update to Xcode 12 * Address comments	2020-09-23 16:50:58 -07:00
Sherlock	9fd76c8693	Place Shape's output in CPU memory (#5245 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
edgchen1	9158679c43	Update BUILD.md training dependency info. (#5240 ) Update training dependency versions based on Dockerfile.training.	2020-09-23 16:50:58 -07:00
Changming Sun	b9b7c279fa	Update BUILD.md for CUDA versions (#5239 )	2020-09-23 16:50:58 -07:00
George Wu	0cbe240ea3	update TensorRT docs (#5238 ) * doc updates TensorRT * update * update * fix warning * newline * format	2020-09-23 16:50:58 -07:00
Scott McKay	c93f292d1f	Revert to using release SafeInt repo now that it supports a build with exceptions disabled. (#5233 )	2020-09-23 16:50:58 -07:00
edgchen1	6371ad61c5	Fix TransposeScaleMatMul and MatMulScaleFusion issues (#5230 ) - Rename TransposeScaleMatMul back to TransposeMatMul for backwards compatibility - Fix MatMulScaleFusion issues: - Add check for supported execution providers - Add check for supported MatMul input types	2020-09-23 16:50:58 -07:00
stevenlix	c27f461c1d	Create profile for all dynamic shape input tensors (#5229 )	2020-09-23 16:50:58 -07:00
Adam Pocock	4427b1e2a3	[java] Fixing the buffer semantics. (#5223 ) * [java] Fixing the buffer semantics. * Renaming bufferCapacity to bufferRemaining. * Adding a cast to char* so the pointer arithmetic works on Windows.	2020-09-23 16:50:58 -07:00
George Wu	c909c67701	fix _WIN32 (#5218 )	2020-09-23 16:50:58 -07:00
Scott McKay	95b2e31659	Update conversion script and process to simplify creating ORT format models and a minimal build (#5217 ) * Update conversion script and process to simplify creating ORT format models and a minimal build.	2020-09-23 16:50:58 -07:00
liqunfu	21a7afb2c6	--shm-size=1024m to fix nccl shared memory issue (#5214 ) * --shm-size=256m to fix nccl shared memory issue Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
RRRachelllll555	b791402f84	Remove shape inference and fix save large model(>2g) issue (#5210 ) * remove shape inference and fix save large model problem * remove unnecessary import * refine code and add external format for quantize_qat * remove initializers in tensors_to_calibrate * small refine Co-authored-by: t-yguo <t-yguo@microsoft.com>	2020-09-23 16:50:58 -07:00
Pranav Prakash	0a31b9ed3c	Fix order of returned values in quantize_weight_per_channel (#5205 ) Must match returned order of `quantize_inputs`	2020-09-23 16:50:58 -07:00
Tracy Sharpe	f726af34e0	NCHWc optimizer fixes for quantized models (#5203 ) This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.	2020-09-23 16:50:58 -07:00
S. Manohar Karlapalem	84ffdbc467	Corrects doc typos and formatting (#5201 )	2020-09-23 16:50:58 -07:00
Pranav Sharma	24d111c342	Add API to allow configuration of the global thread pools. (#5199 )	2020-09-23 16:50:58 -07:00
Zhang Lei	498483b464	MaxPool versioning in quantization tools. (#5194 ) MaxPool versioning in quantization tools.	2020-09-23 16:50:58 -07:00
Suffian Khan	39a7f96a44	Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160 ) * register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_* * forgot to remove axis check * add tests all axis Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-23 16:50:58 -07:00
Shucai Xiao	8e650c5384	Amdmigraphx improvements (#5158 ) * code backup * remove unnecessary log info * code backup * code backup * merge changes from master branch * code backup * code backup * merge changes from master branch * code backup * code backup for constant folding enhancement * code backup * include more scenarios for constant folding * code backup * remove unnecessary code * remove unnecessary log information * fix an error in comments * update algorithm to do graph partition * code backup * remove unnecessary log information * remove an unused function * remove unnecessary changes	2020-09-23 16:50:58 -07:00
Ye Wang	b693cb1370	Fix a bug in EmbedLayerNorm fusion (#5150 ) * fix embedlayernorm bug * review comments * interim checkin * review comments * Fix core dump in MacOS * remove unnecessary lines * update document * Update graph_utils.cc * Update onnx_exporter.py * resolve comments	2020-09-23 16:50:58 -07:00
Changming Sun	5b5bcba9e3	Update MCR CUDA docker image to 10.2 (#5181 )	2020-09-17 08:39:47 -07:00
Dmitri Smirnov	ece9a7c1fc	Refactor TensorAt, prepare for release (#5180 ) * Refactor TensorAt locations* must be const and int64_t since our dims are int64_t Remove unnecessary copy of locations. Remove unnecesary casting and C-casting. Simplify implementation. Add a check for string type. Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it covers more ground and eliminate redundant copy. Eliminate inner loop, compute strides first.	2020-09-17 08:39:47 -07:00
Tracy Sharpe	b2994492af	MLAS: add sgemm weight prepacking (#5183 ) Add support to MLAS to prepack weights for the float GEMM. Support for prepacking has been added to MatMul and Attention for this release.	2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata	ecf04d23c4	Fix nuget build (#5163 ) * Fix nuget content * Revert "Fix nuget content" This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443. * Nuget packaging * skip tests * msbuild path * Force msbuild version * Workaround https://github.com/NuGet/Home/issues/7621 * cleanup	2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata	b523fa08bc	Use onecore umbrella lib in onecore builds (#5182 ) * delayload hack * Skip tests * Onecore uses onecore umbrella * Uncomment tests * cleanup * Disable dev mode for WinML	2020-09-17 08:39:47 -07:00
Chun-Wei Chen	393ff2f434	Add GetStartTime() for profiler to get private profiling_start_time_ (#4994 ) * add GetStartTime() for profiler * add function in inference_session * remove qualified name * add the api in cxx_api.h * rename starttime to StartTimeNs, expost profiling object * rename GetProfilingStartTime * move Ortapis to the right place * move to the end * add const for session * const the right place * use const auto instead of const auto* for session * remove const for auto getstarttime * remove const for auto getstarttime add unit tests * nit: update test name and add comments	2020-09-17 08:39:47 -07:00
edgchen1	5d3c962481	Install ssh in builder image, fix segfault in TrainingRunnerTest.Basic. (#5186 )	2020-09-17 08:39:47 -07:00
Bowen Bao	53d8779dbc	Improve error message for FE model export checking (#5156 )	2020-09-17 08:39:47 -07:00
Changming Sun	a0a435abc6	Add sympy==1.1.1 to Linux docker image (#5177 )	2020-09-15 16:08:49 -07:00
Tianlei Wu	0752fd7425	change version number from 1.4.0 to 1.5.0 (#5178 )	2020-09-15 15:50:25 -07:00
Chi Lo	9f526f45ac	TensorRT Perf Tool (#4900 ) * Initialize tensorrt perf script * Add bert-squad dependencies * Modified code to make ort inference with CUDA/Tensorrt * Add get CUDA/TRT version * uncomment bert-squad * Add BERT-SQUAD inputs.json * Add FastRCNN * Make preprocess/validation in to common functions * Add MaskRCNN and SSD and consolidate the code * Add dependencies for MaskRCNN * following modifications are made: - create common fetch function to get inputs/outputs of model from ONNX model zoo. - create common validation function to compare inference outputs with reference outputs from ONNX model zoo. - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile). - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side. * Add approache to analyze profling file and also update model related settings * Add models * Add most of models from ONNX model zoo * Add model input name and print all the model names at the end of run * Add system info * Add TRT fp16 support * Refine the code * Handle TRT fall back and modify the way to get input data * Refine code * Modify code * Add more precise approach to measure inference * Add io-binding * Add YoLoV4 * Refine the code * Refine the code * Add models * Add yolov4 notebook for jetson device * Update notebook * Update notebook * Add CVS models * Add missing model * Add support of float16 * Add new way to get trt version * Add "validate" and "benchmark" mode * Add randomly generated input * Refine perf script * Refine the code. * Add README * Refine the code * Update README.md * Refine code * Update README.md * Remove all the model related python and instead using model_list.json as models configuration. Refine the benchmark.py * Refine the code Co-authored-by: Chi Lo <lochi@microsoft.com>	2020-09-15 10:06:01 -07:00

1 2 3 4 5 ...

3420 commits