onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-16 01:33:39 +00:00

Author	SHA1	Message	Date
BoarQing	e951f837e4	[VITISAI] fix out of bound error on graph with loop (#17065 ) ### Description <!-- Describe your changes. --> Check the bound of the node_get_inputs for out of bound error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Model with loop would encounter this error. Currrent we do not support custom op for loop. So, ideally it would throw an error and fall back to CPU evalution.	2023-08-09 18:38:30 -07:00
Hector Li	555f346923	[QNN EP] Enable DepthToSpace & SpaceToDepth Ops (#17038 ) ### Description [QNN EP] Enable DepthToSpace & SpaceToDepth Ops	2023-08-09 16:52:15 -07:00
Adrian Lizarraga	d793e239b0	[QNN EP] Increase tolerance for ReduceProd test on x64 Windows (#17078 ) ### Description Slightly increases the allowable error tolerance for ReduceProd tests on x64 Windows/Linux with the QNN CPU backend. ### Motivation and Context A recent [PR](https://github.com/microsoft/onnxruntime/pull/16916) updated the input range for ReduceProd tests, which uncovered an inaccuracy for ReduceProd on x64 Windows/Linux with the QNN CPU backend. This PR updates the allowable error tolerance and adds a TODO for investigation. This is needed to ensure the QNN_Nuget_Windows pipeline runs successfully.	2023-08-09 13:52:14 -07:00
Patrice Vignola	4bc2287a85	Fix GroupNorm tests failing when no providers are supported (#17054 )	2023-08-09 13:14:13 -07:00
RandySheriffH	a7542f48d6	Make AzureEP default for python and c# packaging (#17025 ) Make AzureEP default for python and c# packaging, with UT. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-09 12:36:52 -07:00
sfatimar	2c5d4dce77	Openvino ep ort 5.1 (#17042 ) OpenVINO EP ORT 5.1 Branch Changes for the new API to take in OpenVINO Provider Options and compatibility with OV 2023.1 ### Motivation and Context The change is required for the new API to take in OpenVINO Provider Options and make it seamless. --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: saurabhintel0 <saurabh1.kale@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-08-09 11:50:10 -07:00
Chi Lo	7361c283c7	Add API for updating CUDA EP provider option user compute stream (#17037 ) Add a generic `UpdateCUDAProviderOptionsWithValue()` C API to update CUDA EP provider options where its data type is pointer that can't be represented by string. Note: Please see some comments for the similar [PR ](https://github.com/microsoft/onnxruntime/pull/16965)for TRT EP.	2023-08-09 09:24:19 -07:00
cloudhan	a4902ee65b	[CUDA][ROCm] Allow allocating ScratchBuffer from TuningContext (#17028 ) By switching to ort native stream, we can allocate scratch buffer directly from tuning context.	2023-08-10 00:05:10 +08:00
pengwa	6e6f582e08	Use full qualified name for PythonOp export (#17021 ) ### Use full qualified name for PythonOp export Originally, when there are duplicate named torch.autograd.Function in different module, for example: `a.b.c.Gelu` v.s. `d.e.func.<locals>.Gelu` We by default will throw exception to let user be aware we cannot distinguish the two Gelu because during model export, we did not module path. The workaround is we introduced `ORTMODULE_SKIPPED_AUTOGRAD_FUNCTIONS` to ignore those duplicated named Gelu that is not used by model run. This has limitations obviously for example if two Gelus are both used in training. This PR finds a way to construct a full qualified name. `def _export_pt_1_10(g, n, args, *kwargs):` 1. in exporter function, kwargs contains `name` and `module`, in the above example: `a.b.c.Gelu` --> name: `Gelu`, module: `a.b.c` `d.e.func.<locals>.Gelu` --> name: `Gelu`, module: `d.e` Using name and module is not enough to get a full qualified name, for the second case, where `d.e` is the module path, then there is a function called `func`, in this function, there is a local auto.grad.Function named `Gelu`. (Many of our UT looks like this). We can only get `d.e.Gelu`, but this is not the correct full qual name. The reason for this: `kwargs[name]` or `n.name` only return the class's name, not the class's full qual name. (be noted kwargs[module]` is correct). 2. `n` is torch.Node, we can access `pyobj` to get the torch.autograd.Function's apply method instance, then use `._self` to get the torch.autograd.Function class. Then we can get the `module` and `class`'s ful qual name, added together, we get the full qual name. With the above change, we don't need use `kwargs[name]` and `kwargs[module]` , and don't need check naming conflicting or `ORTMODULE_SKIPPED_AUTOGRAD_FUNCTIONS` env var any more.	2023-08-09 10:58:33 +08:00
Dmitri Smirnov	c424e42594	[C++] Correctly handle scalar inputs in reduction ops, enforce Transpose perm attribute matches input rank. (#17041 ) ### Description This PR addresses the following issues related to the use of the functions in ORT. - https://github.com/microsoft/onnxruntime/issues/16492 - https://github.com/microsoft/onnxruntime/issues/16997 - https://github.com/microsoft/onnxruntime/issues/14678 - Partially addresses https://github.com/microsoft/onnxruntime/issues/16813 The optimization case for a scalar input did not correctly recognize it as such. Transpose kernel assumed that `perm` attribute would always match input tensor rank. ### Motivation and Context The issues causes crashes and erratic behavior.	2023-08-08 14:47:01 -07:00
Tianlei Wu	fb11c67368	Fix SkipLayerNorm for 2D input (#17014 ) Fix an obvious bug: (1) In packing mode, the input for SLN has two dimensions (introduced by #15283): [token_count, hidden_size]. Current code of `element_count = input_dims[0] * sequence_length * hidden_size` will use element_size = token_count * hidden_size * hidden_size, and causes invalid memory write in cuda kernel and ORT crash and two minor issues: (2) potential integer overflow in `static_cast<int>(element_count)` (3) some dead code after `return LaunchSkipLayerNormKernel` that will never have chance to run.	2023-08-08 14:04:03 -07:00
Chi Lo	73037978f8	Add PerThreadContext for TRT EP (#16599 ) Maintaining one execution context on a per thread basis is suggested per TRT [doc](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#threading) to avoid synchronization issue. For previous TRT EP, we did see synchronization issues when running multithreading on some models, for example, FasterRCNN. This PR leverages per thread context implementation from CUDA EP. Followings are the modifications: - Move CUDA graph and IExecutionContext objects to per thread context. - Remove lock_gruad that previously placed for the whole compute_func() and put lock_gruad in the blocks where multiple threads may update kernel function state, access one builder, create/serialize/save engine, save profile and serialize/save timing cache. - On CentOS, don't unload TRT EP shared library and leave it around, so that destructor of thread local data is still accessible upon thread exits. Note: Tested this PR with onnxruntime_perf_test and the overhead of PerThreadContext is small.	2023-08-08 13:02:34 -07:00
Arthur Islamov	c3f04251c7	[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830 ) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks	2023-08-08 09:09:37 -07:00
Chi Lo	5b9bf8b663	[TensorRT EP] Fix bug for using correct device id for EP allocator (#17036 ) The code always uses device id 0. Fix to use provider option `device_id_`	2023-08-08 09:06:44 -07:00
Xavier Dupré	d0316ee768	Updating QDQ to support Float8E4M3FN (#16550 ) ### Description Naive update quantization tools to support Float8E4M3FN for Gemm.	2023-08-08 12:18:48 +02:00
Yi-Hong Lyu	e48dc3b281	Parallelize Transpose (#16854 ) It gives up to 5.6% improvement for prompt and 2.3% improvement for token generation in LLaMA 7B case.	2023-08-07 14:25:53 -07:00
Chen Fu	3c10f027de	4b quantization for weights of LLMs (#16833 ) ### Description Blockwise 4b quantization for LLMs. 1. Introduce 4b block-wise quantization for linear layer weights. 2. Implements matrix multiplication kernel for fp32 x int4 3. Implements special operator MatMulFpQ4 4. Implements quantization tool, that convert MatMul operator to MatMulFpQ4, when the right hand side is 2D const tensor. ### Motivation and Context Compress and accelerate LLMs \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|Q4GEMM/Q4Sym/M:1/N:4096/K:4096/Threads:8\| 218054\| \|Q4GEMM/Q4Sym/M:1024/N:4096/K:4096/Threads:8\| 35830155\| \|Q4GEMM/Q4Sym/M:2048/N:4096/K:4096/Threads:8\| 73479790\| \|Q4GEMM/Q4Zp8/M:1/N:4096/K:4096/Threads:8\| 270152\| \|Q4GEMM/Q4Zp8/M:1024/N:4096/K:4096/Threads:8\| 35826721\| \|Q4GEMM/Q4Zp8/M:2048/N:4096/K:4096/Threads:8\| 73021200\| \|Q4GEMM/Q4Sym128/M:1/N:4096/K:4096/Threads:8\| 213832\| \|Q4GEMM/Q4Sym128/M:1024/N:4096/K:4096/Threads:8\| 36749874\| \|Q4GEMM/Q4Sym128/M:2048/N:4096/K:4096/Threads:8\| 72618120\| \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|SGEMM/LLM/M:1/N:4096/K:4096/Threads:8\| 522610\| \|SGEMM/LLM/M:1024/N:4096/K:4096/Threads:8\| 39237689\| \|SGEMM/LLM/M:2048/N:4096/K:4096/Threads:8\| 75983467\| --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-08-07 12:23:55 -07:00
Khalia Spear	4e6ea730d6	Broadcasting for SLN for CPU and CUDA (#16510 ) ### Description Enhanced SkipLayerNorm by implementing broadcasting for both CPU and CUDA ### Motivation and Context The input and skip tensors no longer have to be the same size which means that it can accept data where the skip shape can be the same size as the input shape, have a shape of {1, sequence_length, hidden_size}, or {sequence_length, hidden_size}. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2023-08-07 09:55:42 -07:00
pengwa	3649376f09	Fix few small bugs (#17019 ) ### Fix few bugs 1. symbolic shape infer, there is no None check before get length. 2. Rename PythonOp/PythonOpGrad's attribute `name` to `func_name`, otherwise, when we use onnx.helper.make_node to create node, `name` conflicts with node name. 3. Filter shape inference warnings for PythonOp for torch 2.0 or newer. 4. Close file descriptor for log suppression. Without the fix, two extra fd is left after the log suppression exit its context. Before enter log suppression (left), Before exit log suppression (right) ![image](https://github.com/microsoft/onnxruntime/assets/10530022/3cd3057a-59f9-4c89-8359-d9b32c49a17e) With the fix, no fd added after context exit. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/03454a8f-ab48-4552-bb9b-293a4f51be67)	2023-08-07 14:01:36 +08:00
Chi Lo	a451318820	Refactor TRT EP error message with details (#17007 ) If users use `trt_profile_min_shapes`, `trt_profile_max_shapes` and `trt_profile_opt_shapes`, they need to provide all the dynamic shape input with associated shape profiles. In the case of the main graph is partitioned into TRT/CUDA subgraphs, if the input of the subgraph is also dynamic shape, users need to provide its shape profiles as well. User might not notice, so TRT EP will tell them which input shape profiles need to be provided. New warning message is : ``` Traceback (most recent call last): File "/home/azureuser/disk2/debug/optional_inputs.py", line 218, in <module> test_optional_input_dynamic(trt_profile=True, optional=True) File "/home/azureuser/disk2/debug/optional_inputs.py", line 195, in test_optional_input_dynamic session = ort.InferenceSession( File "/home/azureuser/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/home/azureuser/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 471, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.EPFail: [ONNXRuntimeError] : 11 : EP_FAIL : User needs to provide all the dynamic shape inputs with associated profiles if they want to explicitly set profiles through provider options. Please note that main graph could be partitioned into TRT/CUDA/CPU subgraphs, in this case, user also needs to provide shape profiles for the TRT subgraph's input if it's dynamic shape input. Following input(s) has no associated shape profiles provided: x1 ``` Please see this github issue: https://github.com/microsoft/onnxruntime/issues/16600	2023-08-06 09:04:21 -07:00
Sheil Kumar	78a5f049f4	[DML] Model corrupter during layernorm fusion and DmlNonZeroOperator crashes (#16918 ) [DML] Model corrupter during layernorm fusion and DmlNonZeroOperator crashes Two issues fixed in this PR: 1) Changes to layernom fusion regressed DirectML. This has been disabled for DML to unblock models. 2) DmlNonZero needs to create an operator call that needs to know the number of non-zero elements (size in bytes). Therefore this needs to be allocated during compute, but is being allocated during initialization. This causes the output tensor size to mismatch with the operator's expectations. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-08-04 17:44:54 -07:00
Yifan Li	d6ce43db5e	[EP Perf] MemTest: Add Valgrind and fix addressSanitizer (#16930 ) ### Description 1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak details 1. General Valgrind logs and logs related to ORT-TRT will be parsed in [CI artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts) 1. Logic: 1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log to `valgrind.log` 2. Identify if any `definitely lost` memleak happened 1. For log paragraphs which show `definitely lost`, parse if they have keyword `TensorrtExecutionProvider`. 2. If so, extract these details to `ort_trt_memleak_detail.log`, and return `build failure` to EP Perf CI 3. Fix existing addressSanitizer and sync the squeezenet testcase with latest update from [ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp) 1. Updates in short: Upgrade main.cpp to be using OrtTensorRTProviderOptionsV2 4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable MemTest by default	2023-08-04 16:58:57 -07:00
Chi Lo	fc8003349e	Add API for updating TRT EP provider option user compute stream (#16965 ) Add a generic `UpdateTensorRTProviderOptionsWithValue()` C API to update TensorRT provider options where its data type is pointer that can't be represented by string.	2023-08-04 15:14:43 -07:00
satyajandhyala	7ad43d9564	[JS/Web] Fixed ArgMin and ArgMax and refactored (#17002 ) Fixed ArgMin and ArgMax and refactored using functionality from Reduce operator code. ### Description Removed code/functionality duplication and fixed some issue. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-04 12:59:36 -07:00
Adrian Lizarraga	191f98a00e	[QNN EP] Improve QDQ model accuracy tests (#16916 ) ### Description - Improves how unit tests measure the accuracy of QDQ models on QNN EP. - Adds tests for ops: Add, Mul, Abs<sup>1</sup>, And<sup>1</sup>, Or<sup>1</sup>, Ceil<sup>1</sup>, Cos<sup>1</sup> <sup>1</sup>: Not previously supported due to missing node unit handling. ### Motivation and Context The new approach for testing QDQ operator accuracy requires running 3 inferences: 1. float model on CPU EP (baseline) 2. qdq model on CPU EP 3. qdq model on QNN EP The units tests check that running the QDQ model on QNN EP (3) is at least as accurate (+- small tolerance) as running the QDQ model on CPU EP (2). We measure accuracy by comparing to the baseline (1). This is essentially what we care about: is qnn ep as accurate as cpu ep. If not, it is worth investigating as a potential bug.	2023-08-04 12:15:27 -07:00
Edward Chen	f98d3f8a23	[CoreML EP] Enable inputs with dynamic shape (#16915 ) Enable node inputs with dynamic shape to be handled by the CoreML EP.	2023-08-03 18:15:00 -07:00
Tianlei Wu	a25d0d296b	Add --mask_type option to generate different format of attention mask in bert_perf_test.py (#16976 ) ### Description Add an option to generate different formats of attention_mask for testing transformers models: 1 - 1D mask index, actual sequence length excluding padding 2 - 2D attention mask. Value 0 means padding, 1 otherwise. 3 - 1D, key lengths and cumulated sequence lengths of query and key ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:24:20 -07:00
Tianlei Wu	bda012a4b2	Scripts to convert model with MulitHeadAttention to packing mode (#16925 ) ### Description Update scripts for converting model with MulitHeadAttention to packing mode. - [x] Update symbolic shape inference for PackedMultiHeadAttention and GatedRelativePositionBias - [x] Update convert_to_packing_mode to handle model with MulitHeadAttention ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:23:55 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00
Arthur Islamov	c11cffb565	[js/web] Fix typo in JSEP ConvTranspose (#16884 ) ### Description A typo fix in JSEP ConvTranspose. It used $12 as output shape pointer but it should be $13. As $12 holds shape size	2023-08-03 09:46:18 -07:00
Wei-Sheng Chin	e6c9ed0606	More element types in AllGather and AllToAll (#16941 ) Two things done in this PR. - [2nd commit] More tensor element types are supported because in distributed computation, we need to re-shard tensors in many different types. - [1st commit] We now specify opset version in test models. Without this change, those models will have opset=20 with latest ONNX and results test errors. - [3rd commit] Tests are modified to test `AllGather` and `AllToAll` for boolean tensors. Several graph patterns are tried for tests. We found that `int64_tensor -> Cast -> bool_tensor -> AllToAll -> bool_tensor -> Cast -> int64_tensor` always generate random results. My guess is that `AllToAll` needs to synchronize all GPUs before calling `ncclSend` and `ncclRecv` since `AllGather` doesn't hit this problem. For reproducing the error, search for `TODO` in this PR. Note that this PR doesn't fix it.	2023-08-03 09:31:55 -07:00
BoarQing	b8bbc898c6	fix errors for node with empty name for vitis ai (#16949 ) ### Description Fixed the issue of finding nodes with empty name for vitis ai. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It is required because we encountered this error when testing newly created models.	2023-08-02 19:08:49 -07:00
Dmitri Smirnov	246cb3a197	Simplify shrink, replace Eigne in Sign implemenation (#16975 ) ### Description <!-- Describe your changes. --> Simplify Shrink. Replace Eigen code with the one that does not require fp16 conversion in Sign. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-02 18:24:38 -07:00
Guenther Schmuelling	0df2e14038	js/webgpu: argmax,argmin,softmax support (#16882 ) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin	2023-08-02 18:16:19 -07:00
Hariharan Seshadri	506ddb3d5d	[js/WebGPU] Support int32 Transpose in WebGPU (#16952 )	2023-08-02 16:27:24 -07:00
BoarQing	6361b22103	vitis ai support generic data type (#16902 ) ### Description <!-- Describe your changes. --> Support more data types for vitis ai. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It is required because the models we are testing now have uint8 data type. To solve this once for all, we changed the code to support generic data type.	2023-08-02 15:56:39 -07:00
satyajandhyala	d399648869	[JS/Web] Added Resize kMSInternalNHWCDomain domain registration. (#16946 ) ### Description Added Resize NHWC domain kernel registration. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-02 14:16:21 -07:00
Tianlei Wu	76aff63f37	Update bert_perf_test to test inputs with different padding ratio (#16963 ) Add --average_sequence_length and --random_sequence_length so that we can test the performance of model on different padding ratio.	2023-08-02 10:28:39 -07:00
RandySheriffH	c392fdeb1b	RunAsync Python API (#16760 ) Implement python binding for RunAsync API. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-02 10:15:34 -07:00
satyajandhyala	f8d933df31	[JS/Web] Register JSEP contrib ops only once per process. (#16950 ) ### Description Fix contrib ops once once. ### Motivation and Context Fix the earlier commit adding Gelu contrib op to the JSEP.	2023-08-02 00:27:11 -07:00
Wanming Lin	ba49d64f67	[WebNN EP] Support LpPool, GlobalLpPool, and Log ops (#16954 ) BTW, reset minimal supported opset to 1, because with minimal supported opset 7 will ignore all ops that have last since version less than 7. e.g. GlobalLpPool, it only has two opset versions: 1, 2.	2023-08-01 22:35:10 -07:00
zesongw	5912837791	[WebNN EP] Fix bug when Pad has negative padding value. (#16878 ) Padding value in ONNX Pad can be negative, which indicates remove pixel. WebNN EP can not support such operation, so it needs to use slice to handle this case.	2023-08-01 19:41:02 -07:00
Tianlei Wu	50bf310dea	[CUDA] RelativePositionBias supports input with padding removed (#16923 ) update RelativePositionBias to support input with padding removed. - [x] add bias transpose kernel - [x] add test - [x] update operator document	2023-08-01 16:39:09 -07:00
Tianlei Wu	1fbd1ed179	[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs (#16913 ) ### Description Follow-up change for PackedMultiHeadAttention added in https://github.com/microsoft/onnxruntime/pull/16779: - [x] Add Bias input - [x] Add CUDA kernels to support separated query, key and values inputs. - [x] Update operator documents - [x] Add unit tests	2023-08-01 15:30:41 -07:00
Patrice Vignola	49512e558a	[DML EP] Add I/O binding and `If` operator (#16859 ) Being able to leverage I/O binding for DML and registering `If` for the DML EP allows us to avoid copying the past/present key/values back and forth between the CPU and the GPU after every token. This gives us a 25% performance increase for Dolly V2 with 128 tokens on an RTX 4090.	2023-07-31 19:45:59 -07:00
kunal-vaishnavi	3c72f43f78	Extend saving models optimized by inference session (#16912 ) ### Description This PR adds support for saving model optimizations after loading a model that contains external data into an `InferenceSession`. ### Motivation and Context This PR is a follow-up to a [previous PR](https://github.com/microsoft/onnxruntime/pull/16716) for saving a model optimized by an `InferenceSession`.	2023-07-31 16:39:35 -07:00
satyajandhyala	77b2b618b2	[JS/WebGPU] Add Resize operator (#16680 ) ### Description Implemented Resize operator support in JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:35:06 -07:00
Hector Li	3fd1d3b9bd	Improve graph transformer DoubleQDQPairsRemover (#16910 ) Improve graph transformer DoubleQDQPairsRemover ### Description Improve DoubleQDQPairsRemover to not reset the scale & zero point if existing value are same on the target DQ & Q nodes. ### Motivation and Context Fix a bug that DoubleQDQPairsRemover reset the scale value while removing unnecessary DQ & Q nodes.	2023-07-31 09:24:46 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
Tianlei Wu	92b6e10d37	skip test_smooth_quant to unblock Python Package Pipeline (#16914 ) ### Description Python Package Pipeline failed since there is exception raised in test_smooth_quant (from #16288): ``` File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant") File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module> from .contrib import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module> from .strategy import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module> from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module> from .strategy import STRATEGIES File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module> from ..algorithm import AlgorithmScheduler, ALGORITHMS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module> from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module> from neural_compressor.utils.create_obj_from_config import get_algorithm File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module> from neural_compressor.metric import METRICS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module> from pycocotools import coco File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module> from . import mask as maskUtils File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module> import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ``` The cause is pycocotools package uses "oldest-supported-numpy", which might cause older version numpy in build pycocotools: `9e9164f979/PythonAPI/pyproject.toml (L4)` Related issue: https://github.com/cocodataset/cocoapi/issues/248 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-29 11:24:28 -07:00

1 2 3 4 5 ...

5538 commits