onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-26 03:00:54 +00:00

Author	SHA1	Message	Date
Adam Louly	cf8bf0c141	add on device training to the packaging pipelines (#13446 ) ### Description enabling on device training apis in the packaging pipelines. ### Motivation and Context adding on device training flag so we can enable the on-device training apis for Federated learning scenarios Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-10-25 15:03:34 -07:00
Tianlei Wu	7aafd86229	Update Attention operator to support separated Q/K/V inputs (#13410 ) ### Description Allow separated Q, K and V inputs to support cross attention: * Q: [batch_size, sequence_length, hidden_size] * K: [batch_size, kv_sequence_length, hidden_size] * V: [batch_size, kv_sequence_length, v_hidden_size] * Output: [batch_size, sequence_length, v_hidden_size] To use separated Q/K/V inputs, the input tensor is for query, and two optional inputs are added for key and value. Weights for input projection is not included for now, so the MatMul of input projection shall be done out of Attention operator, but Add bias is included for performance consideration.	2022-10-25 11:51:06 -07:00
Changming Sun	a396a91c9a	Move build machines with Nvidia M60 GPUs to Nvidia T4 (#13170 )	2022-10-25 11:21:13 -07:00
Dwayne Robinson	0201cd75e1	Document generation for operator kernels, enable internal overload of DML EP to initialize on software-only devices (#13428 ) ### Description The documentation pipeline does not require an actual GPU, and running on GPU-capable agents costs more. So to enable running on CPU-only devices and to potentially consolidate future pipelines, and since the tests are not actually executed on this device anyway (it just needs to initialize the EP for the sake of operator kernel enumeration), add an initialization flag to skip the software device check - this is only an internal overload not exposed in the public API. See https://github.com/microsoft/onnxruntime/pull/13308. ### Motivation and Context - If it fixes an open issue, please link to the issue here. NA	2022-10-25 11:14:43 -07:00
Tianlei Wu	d80212d42c	Add script for question answering (SQuAD) accuracy evaluation of BERT model (#12947 ) Add script to evaluate accuracy of BERT/DistilBERT/Roberta models on question-answering task. By default, pretrained model `bert-large-uncased-whole-word-masking-finetuned-squad` will be used if model name is not specified. If onnx path is not specified, optimum will be used to export an ONNX model for testing. Example usage: * Evaluate with CPU execution provider: `python eval_squad.py` * Evaluate with CUDA execution provider: `python eval_squad.py --use_gpu` * Evaluate an optimized onnx model for 'distilbert-base-cased-distilled-squad' with sequence lengths 128/192/256/384 on first 100 samples: `python eval_squad.py -m distilbert-base-cased-distilled-squad --use_gpu -s 128 192 256 384 --onnx_path ./optimized_fp16.onnx -t 100`	2022-10-25 09:21:01 -07:00
cloudhan	d82036dbbd	Add Pre- and Post-tunning API to allow pre- and postprocessing of params (#13411 ) Some op will use a buffer for input and output at the same time, so it will do inplace update to it. If we blindly tune over the `params`, there will be accumulated update to that buffer during FindFastest, which is an undesired side effect. In this case, we use a proxy params struct for the tuning to avoid this side effect.	2022-10-25 17:44:28 +08:00
Vincent Wang	b6a3562ffb	[ORTModule] Add Env Variable to Control Disabling Custom AutoGrad Function Support (#13430 ) Add env variable to control disabling custom autogard function support. When using ORTModule, if the torch model has torch.nn.Function, if user confirms that it can be exported to ONNX (for example, by inline PythonOp) and the backward implementation is matched to the forward impl, user can export "ORTMODULE_DISABLE_CUSTOM_AUTOGRAD_SUPPORT=1" to disable the custom autograd support so that it won't use ORT's PythonOp to fallback to PyTorch. Exporting to ONNX sometimes can leverage some graph optimizations in ORT so that perf is better.	2022-10-25 16:58:04 +08:00
Cheng	ea1bdb162f	[NNAPI] Refactor `Resize` as layout insensitive (#13412 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-25 16:50:05 +08:00
cloudhan	93f7a97a6d	Exculde hipify option from policheck (#13431 )	2022-10-25 16:35:16 +08:00
PeixuanZuo	28f470c26c	[ROCm] Use SkipLayerNorm original implementation in kernel explorer (#13382 ) ### Description <!-- Describe your changes. --> Wrap SkipLayerNormoriginal implementation as a function. Use it as part of SkipLayerNormTunableOp. Use it in Kernel explorer to compare the gap between TunableOp and Original implementation. the profile output like below: `float16 8 512 768 <class '_kernel_explorer.SkipLayerNorm_half_Original'> 23.48 us 804.04 GB/s float16 8 512 768 <class '_kernel_explorer.SkipLayerNorm_half_Tunable'> 20.41 us 925.00 GB/s ...` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-10-24 22:00:24 -07:00
cloudhan	2748f38362	Drop hip_add_library (#13406 ) Switching to use CMake's builtin hip language support.	2022-10-25 12:57:48 +08:00
Yi Zhang	e160688a9b	Skip some failed models winml and training workflows on Windows CPU (#13407 ) ### Description 1. update model name structure in model_tests.cpp with source name. To avoid `Condition test_param_names.count(param_name) == 0 failed. Duplicate parameterized test name 'BERT_Squad_opset10_CPU'` 2. skip some failed models https://github.com/onnx/models/issues/568 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-25 10:05:04 +08:00
sumitsays	24818cfd73	[DML EP] Attention Kernel (#13371 ) ### Description DML EP kernel for com.microsoft.attention operator. It has been implemented via DML_Graph. References for this implementation: 1. [Hugging Face Attention for BERT](`310340d0d0/src/transformers/models/bert/modeling_bert.py (L245-L284)`) 2. Chapter 3 of book Orielly: Natural Language Processing with Transformers, Revised Edition This PR also - includes a very tiny fix for QLinearSigmoid kernel, which is storing the temporary object into a named variable. - enables 4 L2 transformers LayerNorm, Gelu, MatMulScale, Attention. ### Motivation and Context - Why is this change required? What problem does it solve? One of the main operators used in Transformer-based model. It contributes to the overall perf of DML EP for Transformer models. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2022-10-24 14:32:37 -07:00
Yi Zhang	1885460776	skip some models failed in dynamic shape infer (#13400 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Some models from model zoo failed in the Linux CPU workflow. https://github.com/onnx/models/issues/562 Skip them temporarily. ###Verfication Linux CPU CI passed with beta image https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789772&view=results 2022-10-21T13:31:17.6740348Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/Inception-1-int8/inception-v1-12-int8.onnx 2022-10-21T13:31:17.6740998Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/DenseNet-121-12-int8/densenet-12-int8.onnx 2022-10-21T13:31:17.6741618Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MNIST-12/mnist-12.onnx 2022-10-21T13:31:17.6742207Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-int8/ssd-12-int8.onnx 2022-10-21T13:31:17.6742898Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet50_fp32/resnet50-v1-12.onnx 2022-10-21T13:31:17.6743544Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MobileNet v2-1.0-fp32/mobilenetv2-12.onnx 2022-10-21T13:31:17.6744259Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet101_DUC_HDC-12/ResNet101-DUC-12.onnx 2022-10-21T13:31:17.6744891Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/YOLOv3-12-int8/yolov3-12-int8.onnx 2022-10-21T13:31:17.6745501Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/AlexNet/bvlcalexnet-12.onnx 2022-10-21T13:31:17.6746114Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ZFNet-512-int8/zfnet512-12-int8.onnx 2022-10-21T13:31:17.6746768Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-MobilenetV1-12-int8/ssd_mobilenet_v1_12-int8.onnx	2022-10-25 01:48:46 +08:00
Yi Zhang	143725604e	Skip some models failed in Windows CPU C# tests (#13395 ) ### Description For models from model zoo, in C# tests of Windows CPU CI skip models whose name contains int8 or qdq. skip some models (VGG16, VGG19) in x86 workflow ### Motivation and Context These models always failed in Windows CPU C# tests (https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789442&view=results) ### verified https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789861&view=results C# tests passed	2022-10-22 13:54:24 +08:00
Jian Chen	397edf9918	Bumping up version number to 1.14.0 on main branch (#13401 ) ### Description Bumping up version number to 1.14.0 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-21 19:16:44 -04:00
Ye Wang	928c9889a3	A few fixes for generative model ops (#13363 ) ### Description <!-- Describe your changes. --> Fix a bug in GreedySearch Op when batch > 1 Support custom attention mask in GreedySearch and BeamSearch with GPT2 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-21 15:00:18 -07:00
sumitsays	62cc927f05	[ORT+DML] Validate DML EP header files in ORT+DML NuGet pacakge (#13359 ) ### Description Today, ORT+DML NuGet package does not validate the existence of the DML EP header files and DML dlls. This change extends the existing python script to verify the existence of DML EP related headers. For DML as a dependent package, we will be using another task and it will a separate PR. ### Motivation and Context - Why is this change required? What problem does it solve? Pro-actively verifies the ORT+DML release candidate rather than a customer raise an issue after it gets published to NuGet. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-10-21 11:10:26 -07:00
cloudhan	a8701c2a59	Test TunableOp GEMM and MatMul (#13378 ) 1. Extends `OpTester` class with builder pattern to ease the parameter passing. 2. Add run option `kOpTesterRunOptionsConfigTestTunableOp` for testing purpose and let rocm ep subscribe to it. 3. Use the new builder pattern interface to launch test, with tunable op tests enabled.	2022-10-21 16:44:41 +08:00
cloudhan	928c9fc348	Hipify during build instead of before cmake config (#13333 ) ### Description Currently, hipify happens before cmake is configured and then cmake glob the directories. This get rids of thoes customized python threading logic and opt for build system itself to generate the files. This also supersede the half baked branch [sukha/hipify-with-cmake](https://github.com/microsoft/onnxruntime/tree/sukha/hipify-with-cmake)	2022-10-20 22:46:22 -07:00
Yi Zhang	bb16ee712e	skip 2 models in C# test (#13384 ) ### Description <!-- Describe your changes. --> ### Motivation and Context these 2 models are also skipped in gtest `fc12abf6b1/onnxruntime/test/providers/cpu/model_tests.cc (L119-L122)`	2022-10-21 09:01:34 +08:00
George Wu	7a3486c3ee	enable arm32/arm64 target for .net apps built against OnnxRuntime.ML.OnnxRuntime (#13385 ) couldn't build arm64 .net app due to target file not allowing it.	2022-10-20 15:34:36 -04:00
Adam Louly	bed169192d	Windows build fix for on device training training. (#13354 ) ### Description This is a fix for on device training wheel build. ### Motivation and Context when building linux wheel it treats PathString same as std::string, but when trying to build the wheel on windows it fails because we needed to cast the std::string to a PathString. This error was found manually because there is no pipeline that uses the --enable_training_on_device for windows. Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-10-20 09:58:02 -07:00
Jian Chen	ac5948cb48	Fix bug for percentile calibration module. (#13376 ) ### Description Fix bug for percentile calibration module. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-20 12:33:07 -04:00
cloudhan	fc12abf6b1	Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116 ) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.	2022-10-19 22:35:08 -07:00
PeixuanZuo	4b2b588895	[ROCm] Fix azcopy issue on ROCm ci pipeline (#13365 ) ### Description <!-- Describe your changes. --> Use SAS Token to fix error` failed to perform copy command due to error: no SAS token or OAuth token is present and the resource is not public` Generate SAS Token of target data, add it into Key vault, and use it as Pipeline Variable. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-10-20 12:08:57 +08:00
cloudhan	24b25df641	Add verbose level log for TunableOp (#13369 )	2022-10-19 20:59:48 -07:00
PeixuanZuo	665fb346ab	[ROCm] set parallel=16 when build on ROCm CI (#13368 ) ### Description <!-- Describe your changes. --> ROCm CI build step takes more than one hour. Set parallel=16 when build on ROCm CI to reduce build time. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-10-20 11:36:00 +08:00
Vincent Wang	67150baa8d	[ORTModule] ATen Support for aten::upsample_nearest (#13364 ) ATen support for aten::upsample_nearest, which is required for Huggingface's diffusers model training using ORTModule.	2022-10-20 08:30:04 +08:00
Vincent Wang	b6b3f41636	Fixes of Hierarchical ORTModule and ORTModule PythonOp (#13347 ) The PR applies some fixes to Hierarchical ORTModule and ORTModule PythonOp. For Hierarchical ORTModule: - Don't wrap module if the caller is to call other function instead of forward() function - Support single module instance is call multiple times with different types of inputs - Check if module can be warped from top to bottom instead of from bottom to top For ORTModule PythonOp: - Add env variable control to allow using torch.utils.checkpoint.CheckpointFunction - Add env variable control to skip register some autograd functions so that there is no conflict for some models.	2022-10-20 08:16:03 +08:00
Adrian Lizarraga	418304743d	[EP-Perf-Dashboard] Update table schemas (#13327 ) Updates EP perf benchmarking scripts to upload new data with an improved table schema. In order to preserve compatibility with the current benchmarking pipeline, we still upload data that uses the old schema as well. These changes are required in order to improve data filtering capabilities and general UX in dashboards that visualize this data. Details: - EP names no longer hardcoded as columns for tables that store inference latency, session creation times, memory usage, and model/EP status. - Add explicit branch, commit ID, and commit date columns to all tables - Improvements to the docker image building scripts (simplify docker image build; support installing binary TensorRT packages) - Remove use of deprecated DataFrame.append in favor of pandas.concat.	2022-10-19 16:15:05 -07:00
Chi Lo	86c5c07ea4	TRT EP race condition fix during ep compile time (#13356 ) ### Description TRT EP has the chance to encounter race condition when multiple threads are doing engine serialization/deserialization during EP compile time. Let's say one thread is serializing the engine and has not yet completely written all the data to file, and at this moment, another thread finds the engine file is existed and begins to deserialize the engine, it will end up deserialize the corrupt file. The fix is to put a lock around engine deserialization/serialization, engine build and context build. ### Motivation and Context The TensorRT EP Windows CI sometimes fails because of `TensorrtExecutionProviderTest.MultiThreadsTestWithOneSessionSingleThreadInference` unit test fails (This PR changes the name to SessionCreationWithMultiThreadsAndInferenceWithMultiThreads). It's highly possible due to race condition. The TensorRT CI failure also been reported [here](https://github.com/microsoft/onnxruntime/issues/13030)	2022-10-19 11:19:10 -07:00
Scott McKay	565da71275	Make 'env' argument to Session const (#13362 ) ### Description <!-- Describe your changes. --> The Env argument does not need to be mutable to call the underlying C API. Update the Ort::Session ctor to have a const Env. All other changes are from clang-format running. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup	2022-10-19 14:23:24 +10:00
Vincent Wang	9efa8e20bb	Add Symbolic Shape and Type Infer for aten::group_norm (#13348 ) Add symbolic shape and type infer for aten::group_norm.	2022-10-19 10:37:33 +08:00
Edward Chen	2fa18ea77e	[React Native CI] Record more info to debug E2E test (#13329 ) Record more info from the React Native CI E2E test. In particular, log the view hierarchy when exiting the test and dump logs from Android emulator to the build output.	2022-10-18 17:21:28 -07:00
Dmitri Smirnov	9189ebb415	Optimize slicing when possible by copying bigger blocks at once (#13261 ) ### Description Currently, SliceIterator copies inner dimension size at once at best. However, there are many slices when several inner dimensions can be copied at once. Furthermore, even if a dimension is sliced, it may employ step 1 and, therefore, has a continuous block of inner dimensions that can be copied at once. ### Motivation and Context For example, `[N, C, H, W]` with slice `[:, :, i:, :]` and `[N, C, H-i, W]`. Meaning, we slice along single axis, with step = 1. Current implementation does `C * (H-i) memcpy` with W elements each. With this change we can do `C memcpy with (H-i)*W` elements each. The optimization produces ~11% savings on certain internal models.	2022-10-18 14:41:46 -07:00
Dmitri Smirnov	f5e3165cc3	Fix move Base::operator= (#13355 ) ### Description Base::operator= move is broken, loses a valid ptr. ### Motivation and Context Address https://github.com/microsoft/onnxruntime/pull/13215#discussion_r997814275	2022-10-18 13:07:40 -07:00
Jake Mathern	f96f222526	Change CPU EP behavior with auto_pad when ConvTranspose output shape is specified. (#13311 ) ### Description Based on the ORT spec for ConvTranspose: ``` output_shape can also be explicitly specified in which case pads values are auto generated using these equations: total_padding[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - output_shape[i] If (auto_pads == SAME_UPPER): pads[start_i] = total_padding[i]/2; pads[end_i] = total_padding[i] - (total_padding[i]/2) Else: pads[start_i] = total_padding[i] - (total_padding[i]/2); pads[end_i] = (total_padding[i]/2). ``` However the CPU EP logic differs. Basically, unless SAME_UPPER is specified, the default behavior (for VALID,NOTSET,SAME_LOWER) should be SAME_LOWER. I think this is the pragmatic fix, however it's perhaps still not totally up to standard. In the case tested, the operator is actually only valid if padding is inserted. Perhaps it "should" throw some error then, if auto_pad is not SAME_UPPER or SAME_LOWER, as the spec also mentions: "VALID mean no padding." (For convtranspose-1 but this was removed in convtranspose-11, making it less clear.) "NOTSET, which means explicit padding is used" (should technically require explicit padding then, and not generate it) HOWEVER, changing it to throw errors could do more harm than good. For now, probably just best to make it consistent. ### Motivation and Context We noticed that there was a discrepancy in one of the DML tests between CPU and DML. auto_pad is not specified, and DML is doing SAME_LOWER behavior by default, where CPU EP is doing SAME_UPPER behavior. ```json { "graph_name": "ConvTranspose output_shape with even strides odd kernel autopad NOTSET", "op_type": "ConvTranspose", "dilations": [1,1], "group": 1, "strides": [2,2], "kernel_shape": [3,3], "output_shape": [1,1,4,4], "X": {"dims": [1,1,2,2], "function": "iota"}, "W": {"dims": [1,1,3,3], "value": [1,2,3,4,5,6,7,8,9]}, "B": [1], "Y": {"dims": [1,1,4,4], "value": [1,5,6,7,5,17,15,19,11,25,16,19,17,40,25,28]}, "T": "float32" } ```	2022-10-18 12:57:47 -07:00
Hariharan Seshadri	15673b4537	Revert "Fix shape-related issues in FuseConv (#12410 )" (#13353 ) The commit causes subtle perf regressions in image models (caught by Anubis). Since we are close to the release, reverting this change for now so that the regression cause analysis doesn't push the release timeline. Once the PR is merged, I will re-open the GH issues that the original PR closed. ### Motivation and Context Fix regression in ORT 1.13 RC	2022-10-18 12:30:38 -07:00
Jian Chen	e3982416d3	Fix Bug where zero point isn't correct under entropy calibration (#13346 ) ### Description Fix Bug where zero point isn't correct under entropy calibration ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-18 12:05:40 -04:00
Adam Louly	61ee5585b2	update the nightly build to use the latest ptca image. (#13309 ) ### Description updating the ptca image used in the nightly pipeline Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-10-17 14:12:03 -07:00
Adam Louly	68eff69ab1	Add Utils for federated learning scenarios (#13014 ) Description: utils for federated learning. Motivation and Context - This PR includes utils that will be used on federated learning scenarios. - Exposing python bindings to some utils, and added a util to calculate the difference between two buffers. Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2022-10-17 12:39:43 -07:00
PeixuanZuo	b4853a978a	[ROCm] add rocm python package pipeline with --use_rocm_profiling (#13068 ) ### Description <!-- Describe your changes. --> ROCm developers always need to build onnxruntime whl with `--enable_rocm_profiling`. Add a ROCm dev python package pipeline which product .whl with build args `--enable_rocm_profiling`. The dev *whl need to upload to azure storage and can get from https://download.onnxruntime.ai/onnxruntime_nightly_rocm53.profiling.html ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-17 10:11:20 +08:00
cloudhan	c4d3c7003f	Refactor provider test utils (#13272 ) Refactor the OpTester core logic to make adding more code easier.	2022-10-17 09:46:42 +08:00
Dmitri Smirnov	4a63cd0290	Improve thread pool creation failure handling. (#13313 ) ### Description Detect and report thread creation failure on Windows. Do not throw out of constructor after the thread is created, the thread handle is lost and cannot be joined, resulting in a deadlock. Make setting a thread priority on Linux consistent with windows. Set thread priority in the thread itself. Log failure properly, but do not exit the thread. ### Motivation and Context Address issues https://github.com/microsoft/onnxruntime/issues/13291 And https://github.com/microsoft/onnxruntime/issues/13285#issuecomment-1278063223	2022-10-15 17:57:19 -07:00
Maxiwell S. Garcia	1ab11a111c	ppc64le: mlas: fix both MaximumFloat and MinimumFloat to return NAN (#12628 ) Avoid using vec_max/vec_min because their behaviors are undefined if one of the elements is NAN. The Power Vector Intrinsic Programming Reference says: "For floating-point types, if both source elements contain signed zeros, or if either source element contains a NaN, it is undefined which of the two source elements is copied into the corresponding result element." As the unittest Activation.ShortExecute expects NAN, this patch uses vec_sel and vec_cmpgt to return NAN if one of the elements is NAN. https://git.openpower.foundation/systemsoftware/Programming-Guides/src/branch/master/Intrinsics_Reference/ch_vec_reference.xml#L26808	2022-10-14 14:43:58 -07:00
fxmarty	4fe6b23699	Fix typo OpTypesToExcludeOutputQuantizatioin (#13096 ) Change all occurences of `OpTypesToExcludeOutputQuantizatioin` into `OpTypesToExcludeOutputQuantization`	2022-10-14 14:11:37 -07:00
donglinb	c4a52820a5	bug fix for symbolic shape infer (#13067 )	2022-10-14 14:06:31 -07:00
Jeff Daily	65c67764ae	remove line "ADD model ${WORKSPACE_DIR}/model" in the amdgpu Dockerfile (#12914 ) Follow-up to #12707. docker build is broken otherwise; model dir is gone.	2022-10-14 13:17:28 -07:00
Ted Themistokleous	a561fde126	MIGraphX Execution Provider: Stream Synchronization (#12899 ) Description: Changes to the MIGraphx execution provider code to allow for stream synchronization on the gpu side Motivation and Context Performance boost by removing redundant host to device synchronizations The current implementation of the execution provider continuously calls hipDeviceSynchronize() between computations which adds overhead and an idle wait between the GPU's computations. This is noticeable during device This change leverages new functionality that's been added to MIGraphX to allow for GPU side synchronization which avoids the need for host->device waits. To maintain backwards compatibility with older MIGraphX versions, the compile time define MIGRAPHX_STREAM_SYNC has been added to the API to allow for older version operate with newer builds of onnxruntime without loss of functionality to the current feature set as of (08/09/22) Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2022-10-14 10:23:51 -07:00

1 2 3 4 5 ...

7600 commits