onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-11 17:48:34 +00:00

Author	SHA1	Message	Date
Changming Sun	b129f425fc	Fix test model URL issue (#18823 ) ### Description ONNX model zoo changed their dir structure. So some our pipelines are failing. In prevent such things happening again, we'd better to read the test data for a cache from local disk instead of downloading it remotely every time.	2023-12-14 13:06:08 -08:00
Chi Lo	afe5cdc938	[TensorRT EP] Switch to enqueueV3 with support DDS output (copy version) (#18714 ) It's branched off from https://github.com/microsoft/onnxruntime/pull/17751 but removes KernelContext_SetOutput() API. It copies output allocation buffer to kernel context. --------- Co-authored-by: George Wu <jywu@microsoft.com>	2023-12-14 11:10:58 -08:00
Changming Sun	7386e21121	Replace some ORT_ENFORCE with ORT_THROW_IF_ERROR (#18812 ) ### Description Replace some ORT_ENFORCE with ORT_THROW_IF_ERROR to get better error messages.	2023-12-14 10:14:22 -08:00
Changming Sun	95193cb440	Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810 ) ### Description To upgrade the clang version in preparation for PR #17031 .	2023-12-14 08:08:41 -08:00
Yi Zhang	7dade5d05b	Readd basetargets in Microsoft.ML.OnnxRuntime.csproj (#18789 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Now, the nightly Microsoft.ML.Onnxruntime.Managed Nuget Packag couldn't be added in dotnet console program in VS2022 with target framework .NET 6.0. I just restore it to previous setting to make it work.	2023-12-14 14:44:11 +08:00
Changming Sun	7047d13c68	Update windowsai-steps.yml: enable "/profile" linker flag (#18022 ) ### Description Update windowsai-steps.yml: enable "/profiling" linker flag for an internal requirement.	2023-12-13 19:47:04 -08:00
Suryaprakash Shanmugam	0723dcb8b5	OpenVINO Execution Provider with 2023.2 support (#18596 ) - Add support for OpenVINO 2023.2 - num_of_threads provider option is mapped to the CPU device property inference_num_threads of the CPU plugin, so users can control the #threads used for inference by the CPU - Logging in Debug mode now includes the runtime properties set for devices - Fix issue in using external weights through OpenVINO --------- Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-12-13 15:56:43 -08:00
Rachel Guo	f3fa045681	Enable MacOS build in ORT Objc Pod (#18786 ) ### Description <!-- Describe your changes. --> Add macos build for objc pod. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Follow up pr for #18550 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-12-13 13:50:42 -08:00
Ashwini Khade	487abcd25e	Update gradient ops tests (#18783 ) ### Description <!-- Describe your changes. --> TrainingSession has been deprecated for a while now, but the gradient ops tests are still using training session. This PR updates these tests to use inference session instead of training session. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This will enable us to remove all the training session related deprecated code from the repo.	2023-12-13 11:26:52 -08:00
Changming Sun	17eaf9b053	Fix a build warning in SparseTensor code for 32-bit build configs (#18766 ) ### Description The warning is: ``` C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.1812949Z with 2023-12-08T20:58:48.2144272Z [ 2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.2801935Z ] 2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2894476Z with 2023-12-08T20:58:48.2911521Z [ 2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr, 2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) 2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3197946Z with 2023-12-08T20:58:48.3198565Z [ 2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3905678Z ] 2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3913414Z with 2023-12-08T20:58:48.3913660Z [ 2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.3914499Z ] 2023-12-08T20:58:48.3914743Z qlinear_concat.cc 2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5534583Z with 2023-12-08T20:58:48.5541266Z [ 2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5544914Z ] 2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5553712Z with 2023-12-08T20:58:48.5555569Z [ 2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5558707Z ] 2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5566354Z with 2023-12-08T20:58:48.5568185Z [ 2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5571339Z ] 2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5578562Z with 2023-12-08T20:58:48.5580399Z [ 2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5583465Z ] 2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5591396Z with 2023-12-08T20:58:48.5593220Z [ 2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5595955Z ] ``` And the warning in #18195 ### Motivation and Context AB#22894 --------- Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>	2023-12-13 11:11:13 -08:00
Changming Sun	44054e7508	Move NuGet nightly package publishing job to a separated pipeline (#18801 ) ### Description Move NuGet nightly package publishing job to a separated pipeline. Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs Packaging Pipeline'. This PR moves it to a separate pipeline so that we can manually trigger this step for any branch(e.g. release branches).	2023-12-13 11:10:50 -08:00
Jiajia Qin	b30e721dc8	[js/webgpu] Provide a naive vectorized matmul algorithm (#18758 ) ### Description This PR provided a vectorized matmul algorithm. In most situations, we still go to the workgroup memory optimized matmul. But for some situations, like N and K are very small, using workgroup optimized matmul can't fully utilize the underlying hardware due to the 32x32 tile size. So for very small N/K, we switch to the naive vectorized matmul algorithm to improve the hardware execution unit usage. With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3], input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs.	2023-12-13 09:03:23 -08:00
Ted Themistokleous	1ad6eb1359	Add DynamicQuantizeLinear as supported OP (#18798 ) Supported added in MIGraphX. should be in operator list ### Description Simple change to add support to EP for DynamicQuantizeLinear ### Motivation and Context Changes added in MIGraphX. Should also be available in the EP to run models that are int8 quantized. Currently we fail and fallback ops to ROCm->CPU EPs Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2023-12-13 16:25:56 +08:00
pengwa	dbe886abb3	Disable test_bert_result_with_layerwise_recompute (#18800 ) ### Disable test_bert_result_with_layerwise_recompute <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-13 12:16:39 +08:00
cloudhan	3940ef20be	[ROCm] Refactor to hide ck layout (Row/Col) from ORT interface (#18777 ) Previously, we use `ck::tensor_layout::gemm::RowMajor` or `ColumnMajor` to tag the template for correct dispatch. This is cumbersome in the case of CK is disabled. Switch to use the ORT BlasOp to tag the template and use `CKBlasOpAdaptor` to adapt between ORT BlasOp enum and ck's Col/Row. Just like what we have done for ORT datatype and ck datatype with `CKDataTypeAdaptor`.	2023-12-13 11:37:26 +08:00
satyajandhyala	0ca84549ab	[JS/Web] Added uniforms to Reduce, Resize and Split Ops. (#18727 ) ### Description <!-- Describe your changes. --> Added uniforms to Reduce op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve perforamnce.	2023-12-12 11:12:23 -08:00
Adrian Lizarraga	81796a3081	[QNN EP Quantization] Add fusion preprocessing to QNN quantization (#18719 ) ### Description - Adds graph fusions to preprocessing step that can be called before creating a QDQ model for QNN EP. - Fuse Erf sequence to Gelu (adapted from [optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_gelu.py)). Required by QNN EP. - Fuse ReduceMean sequence to LayerNormaliation (adapted from [optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_layernorm.py)). Not required by QNN EP. - Fuse ReduceL2 sequence to LpNormalization (new, specific to QNN EP). Required by QNN EP. Example use: ```python3 from quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model # Added by this PR: model_updated = qnn_preprocess_model("model.fp32.onnx", "model.fp32.preprocessed.onnx", fuse_layernorm=True) model_to_quantize = "model.fp32.preprocessed.onnx" if model_updated else "model.fp32.onnx" # Quantize model ... qnn_config = get_qnn_qdq_config(model_to_quantize, data_reader, activation_type=QuantType.QUInt16) quantize(model_to_quantize, "model.qdq.onnx", qnn_config) ``` ### Motivation and Context Allow more models to be quantized for use with QNN EP --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2023-12-12 08:43:04 -08:00
BODAPATIMAHESH	65300610e2	[PowerPC] Type casting the output operand of vec_xst. (#18057 ) This fix resolves the build error “error: invalid parameter combination for AltiVec intrinsic ‘__builtin_vec_vsx_st’” which is coming up with the commit `dea425e7c1`.	2023-12-12 07:55:48 -08:00
satyajandhyala	d673e39ad8	[JS/WebGPU] Added uniforms to Tile and Where Ops (#18768 ) ### Description <!-- Describe your changes. --> Added uniforms to Tile and Where Ops ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance.	2023-12-11 20:58:52 -08:00
Jiajia Qin	b4be9e1bbb	[js/webgpu] Fix shader compilation errors in cumsum (#18779 ) ### Description This PR fixes below shader compilation errors: ``` Tint WGSL reader failure: :39:31 error: no matching overload for operator + (f32, i32) 5 candidate operators: operator + (T, T) -> T where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (vecN<T>, T) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (T, vecN<T>) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (vecN<T>, vecN<T>) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (matNxM<T>, matNxM<T>) -> matNxM<T> where: T is abstract-float, f32 or f16 sum = sum + get_inputByIndices(inputIndices); ^ - While validating [ShaderModuleDescriptor "CumSum"] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor "CumSum"]).	2023-12-11 18:11:38 -08:00
ivberg	a85ef652ed	Log out ORT session options (#16259 ) ### Description Logs out ORT session options as INFO if LogSeverityLevel is set high enough. Also log out ORT session options on Windows if the provider is enabled. The events are not Telemetry are will be emitted for local analysis (if enabled). [Microsoft.ML.ONNXRuntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/windows/telemetry.cc#L47) - 3a26b1ff-7484-7484-7484-15261f42614d ### Motivation and Context ORT session options are key to understanding ORT behavior. This allows better diagnosability to see what the options are set to.	2023-12-11 17:56:27 -08:00
Caroline Zhu	eb03032925	[js/web/training] lazyResetGrad implementation (#18711 ) ### Description * implemented lazyResetGrad function ### Motivation and Context * we are in the process of adding language bindings to enable training on web * lazyresetgrad ensures that the gradients are calculated correctly after the first runTrainStep call --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-11 17:36:54 -08:00
pengwa	ccf3b2054b	Allow layer-wise recompute (#18566 ) ### Allow layer-wise recompute Early, we need users/developers to specify the subgraphs to recompute, now we introduced a more user-friendly way to enable recompute for all detected stashed activation recomputation subgraphs. This scarifies getting the best configs while makes it easier to support user requirements when they switches from PyTorch per-layer gradient checkpoint to ORTModule. `ORTMODULE_MEMORY_OPT_LEVEL` is introduced to control the usage, by default, it is 0, e.g. `USER_SPECIFIED`, all subgraphs definedin `ORTMODULE_MEMORY_OPT_CONFIG` will be recomputed. So this is compatible to existing recompute usage in ORTModule integrated models. Using `ORTMODULE_MEMORY_OPT_LEVEL=1`, we will enable all recompute plans detected, so those configs in `ORTMODULE_MEMORY_OPT_CONFIG` will not be respected any more. Add Unit Tests using 3 layer blooms. https://github.com/microsoft/onnxruntime/blob/pengwa/add_aggresive_recompute/docs/Memory_Optimizer.md	2023-12-12 08:44:05 +08:00
Chen Fu	68c832d53b	Fix buffer overrun in 4b dequant cuda (#18780 ) ### Description Bugfix: Dequantize4BitsKernel buffer overrun when the input matrix has less than the number of blocks that a single thread block can handle. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 15:05:41 -08:00
Jian Chen	ce1fed6ddf	Adding a new pipeline for publishing to Python Cuda 12 packages. (#18712 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 14:17:46 -08:00
Jian Chen	bfa5eb4591	Adding a new pipeline for pubilshing cuda 12 nuget packages (#18713 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 13:07:05 -08:00
Ashwini Khade	16df8377d3	Update transformers package to fix the security issue (#18730 ) ### Description Updating transformers package in test pipeline to fix a security vulnerability. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-11 09:15:23 -08:00
Patrice Vignola	8d641229e6	Fix GQA shape inference (#18723 ) The shape inference is always returning before getting the chance to infer the key/value outputs.	2023-12-10 21:36:19 -08:00
cloudhan	de32baeeef	[ROCm] Add GemmFloat8 (#18488 )	2023-12-11 11:37:29 +08:00
Xavier Dupré	d41dd77241	Extend API page on the python documentation (#18762 )	2023-12-09 15:33:57 -08:00
Abhishek Jindal	2f93d97fd0	Add cuda visible devices for Mistral benchmark (#18764 ) ### Description <!-- Describe your changes. --> Add cuda visible devices for Mistral benchmark as it is not working for Torch compile and throwing an error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Error: File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_inductor/triton_heuristics.py", line 556, in run return launcher( File "<string>", line 8, in launcher RuntimeError: Triton Error [CUDA]: invalid device context	2023-12-08 23:12:48 -08:00
Changming Sun	c7799d7058	Build fixes for Windows ARM32 desktop build (#18752 ) ### Description Fix a link error: ``` onnxruntime_common.lib(cpuid_info.obj) : error LNK2019: unresolved external symbol __imp_RegGetValueA referenced in function "privat e: void __cdecl onnxruntime::CPUIDInfo::ArmWindowsInit(void)" (?ArmWindowsInit@CPUIDInfo@onnxruntime@@AAAXXZ) [C:\Users\snnn\src\on nxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] onnxruntime_common.lib(telemetry.cc.obj) : error LNK2019: unresolved external symbol __imp_EventRegister referenced in function "pub lic: __cdecl onnxruntime::WindowsTelemetry::WindowsTelemetry(void)" (??0WindowsTelemetry@onnxruntime@@QAA@XZ) [C:\Users\snnn\src\on nxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] onnxruntime_common.lib(telemetry.cc.obj) : error LNK2019: unresolved external symbol __imp_EventUnregister referenced in function "p ublic: virtual __cdecl onnxruntime::WindowsTelemetry::~WindowsTelemetry(void)" (??1WindowsTelemetry@onnxruntime@@UAA@XZ) [C:\Users\y ilyu\src\onnxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] onnxruntime_common.lib(telemetry.cc.obj) : error LNK2019: unresolved external symbol __imp_EventSetInformation referenced in functio n "public: __cdecl onnxruntime::WindowsTelemetry::WindowsTelemetry(void)" (??0WindowsTelemetry@onnxruntime@@QAA@XZ) [C:\Users\snnn\ src\onnxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] onnxruntime_common.lib(telemetry.cc.obj) : error LNK2019: unresolved external symbol __imp_EventWriteTransfer referenced in function _tlgWriteTransfer_EventWriteTransfer [C:\Users\snnn\src\onnxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] C:\Users\snnn\src\onnxruntime\build\ARM32\RelWithDebInfo\RelWithDebInfo\onnx_test_runner.exe : fatal error LNK1120: 5 unresolved ex ternals [C:\Users\snnn\src\onnxruntime\build\ARM32\RelWithDebInfo\onnx_test_runner.vcxproj] ```	2023-12-08 12:45:06 -08:00
pengwa	44b5843740	Fix gemm_float8 build failure on CUDA 11.3-11.7 (#18760 ) ### Fix gemm_float8 build failure on CUDA 11.3 ~ 11.7 User env: CUDA 11.3, build option include "--disable_types float8" ``` /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(256): error: identifier "CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET" is undefined /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(264): error: enum "cublasLtMatmulDescAttributes_t" has no member "CUBLASLT_MATMUL_DESC_FAST_ACCUM" /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(268): error: identifier "CUBLASLT_MATMUL_DESC_A_SCALE_POINTER" is undefined /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(271): error: identifier "CUBLASLT_MATMUL_DESC_B_SCALE_POINTER" is undefined /tmp/onnxruntime/onnxruntime/contrib_ops/cuda/math/gemm_float8.cu(274): error: identifier "CUBLASLT_MATMUL_DESC_D_SCALE_POINTER" is undefined 5 errors detected in the compilation of "/tmp/onnxruntime/onnxruntime/contrib_ops/cu ``` Here is a versions (major version) diff on the requested attributes: ``` cuda 11.5.1 no CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET cuda 11.6 https://docs.nvidia.com/cuda/archive/11.6.0/pdf/CUBLAS_Library.pdf has CUBLASLT_MATMUL_DESC_SM_COUNT_TARGET cuda 11.7 no CUBLASLT_MATMUL_DESC_FAST_ACCUM no CUBLASLT_MATMUL_DESC_A_SCALE_POINTER no CUBLASLT_MATMUL_DESC_B_SCALE_POINTER no CUBLASLT_MATMUL_DESC_D_SCALE_POINTER cuda 11.8 https://docs.nvidia.com/cuda/archive/11.8.0/pdf/CUBLAS_Library.pdf has CUBLASLT_MATMUL_DESC_FAST_ACCUM has CUBLASLT_MATMUL_DESC_A_SCALE_POINTER has CUBLASLT_MATMUL_DESC_A_SCALE_POINTER has CUBLASLT_MATMUL_DESC_B_SCALE_POINTER has CUBLASLT_MATMUL_DESC_D_SCALE_POINTER ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-08 21:01:34 +08:00
Wanming Lin	e8f33b54ba	[WebNN EP] Don't covert all inputs except the 0th input for Resize (#18687 ) Currently all the inputs of Resize node will be converted to NHWC if the preferred layout is NHWC, and the ORT will call `IsOpSupportedImpl` twice, first time the inputs are NCHW, and the second time the inputs have been converted to NHWC. This would make the validation for scales input complicated and difficult to identify the height and width values.	2023-12-07 18:18:35 -08:00
Edward Chen	7ed48a299a	Objective-C API updates (#18738 ) - Add ORTSession and ORTTrainingSession strong references to ORTEnv. - Make ORTTrainingSession session options parameter optional.	2023-12-07 16:47:46 -08:00
Changming Sun	bf33919afb	Update absl and gtest to fix an ARM64EC build error (#18735 ) ### Description Update absl and gtest to fix an ARM64EC build error ### Motivation and Context We need to get an important fix into ORT. The fix is: `8028a87c96`	2023-12-07 15:55:17 -08:00
Rachel Guo	305db31301	fix build aar error in Zip-Nuget-Java-Nodejs Packaging pipeline (#18745 ) ### Description <!-- Describe your changes. --> [Pipeline failure info](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=387310&view=logs&j=0aae05c9-1dc0-5099-eb4a-4cbb949c7458&t=71450a55-3e84-511c-7394-a06145376912&l=1044) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline brought by pr. Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-12-07 14:48:55 -08:00
Yulong Wang	efbef5f611	[js/webgpu] allow to specify callback for profiling data (#18732 ) ### Description This PR is a replacement of #17820. allow to specify callback for profiling data Previous: ```js ort.env.webgpu.profilingMode = 'default'; // enable profiling // profiling data will output to console. ``` Now: ```js ort.env.webgpu.profiling = { mode: 'default'; // enable profiling ondata: (data) => { // .. process the profiling data } }; //for each kernel, "ondata" will be called once. only output to console if ondata is not specified. ```	2023-12-07 14:10:28 -08:00
junchao-loongson	4abec9749e	[mlas] add loongarch lsx and lasx optimize code (#17937 ) ### Description Hello we(@lixing-star) are the developers of loongson team. We add 128 (lsx), 256 (lasx) vector optimization code for the loongarch architecture [100% tests passed, 0 tests failed out of 7](https://cloud.a-boat.cn:2021/api/public/dl/6831z1Bi?inline=true) ### Development Environments1 ``` CPU: Loongson-3C5000L uname -a: Linux localhost.localdomain 4.19.190-6.4.lns8.loongarch64 #1 SMP Thu Jul 14 12:08:04 CST 2022 loongarch64 loongarch64 loongarch64 GNU/Linux ``` ### LonngArch Documents - [LoongArch Reference Manual - Volume 1: Basic Architecture: This manual describes the basic part of the LoongArch architecture.](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html) - [LoongArch ELF psABI: This manual describes the LoongArch ELF psABI.](https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html) - [more](https://loongson.github.io/LoongArch-Documentation/README-EN.html)	2023-12-07 11:15:59 -08:00
Yi Zhang	a045be335b	use EO pool for windows web_cpu stage (#18737 ) ### Description reuse EO pool in NPM pipeline. ### Motivation and Context build_web_debug failed in onnxruntime-Win-CPU-2022 but it works in EO pool. Reuse EO pool to make the pipeline work now. When I'm free, I'll try upgrading the chrome in the custom image.	2023-12-07 10:10:00 -08:00
Hector Li	e469de65f5	Re-enable Sign op int64 test for QNN CPU test (#18734 ) ### Description Re-enable Sign op int64 test for QNN CPU test	2023-12-07 08:42:25 -08:00
Wanming Lin	3d8af6eb65	[WebNN EP] Skip split initializer (#18729 )	2023-12-07 08:09:49 -08:00
Tianlei Wu	49470f06e8	Add benchmark script for control net (#18717 ) Add script to benchmark PyTorch and StableFast for control net. Add an option --max-batch-size in demo for benchmark purpose.	2023-12-06 21:54:51 -08:00
Dmitri Smirnov	e603e78627	Enforce If condition size == 1 (#18733 ) ### Description <!-- Describe your changes. --> ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/18549	2023-12-06 21:04:18 -08:00
moyo1997	9479ba525b	Build onnxruntime.dll as arm64x (#18633 ) Build onnxruntime.dll as arm64x Added a .cmake file to generate a link repro of the onnxruntime.dll during arm64 build. This provides us a directory containing all the arm64 objs, def file and libs to link to when it is time to building arm64x onnxruntime.dll during the arm64ec build by passing the /machine:arm64x flag to the linker along with the arm64 artifacts. If other dlls wanted to be built as x, setting the ARM64X_TARGETS variable in the toplevel cmakelists.txt to include these other targets is all that will be needed. Added build_arm64x.bat as a wrapper for the multiple (rm64, then arm64ec) cmake calls needed to build as arm64x. AB#22533	2023-12-06 16:49:00 -08:00
Rachel Guo	7762f3f7c5	[NNAPI EP] Add NNAPI Split (#18702 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> yolo-v8 model missing operator support. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-12-06 15:11:15 -08:00
Wanming Lin	c4b8120c5b	Rename op elementwiseIf to where (#18657 ) WebNN latest spec uses `where`.	2023-12-06 14:56:26 -08:00
Hector Li	9768a727e1	[QNN EP] Fix a bug that can't create context binary if the model has inputs/outputs with different data type (#18722 ) Fix a bug that can't create context binary if the model has inputs/outputs with different data type ### Description Update EPContext op schema to unblock nodes with different data type among inputs & outputs	2023-12-06 13:07:09 -08:00
Adrian Lizarraga	559bd52252	[QNN EP] Update QNN SDK to version 2.17.0 (#18684 ) ### Description - Update QNN CI Pipelines to use QNN SDK version 2.17.0 - Print warning if unit test requires adjusted tolerance to pass - Temporarily disable unloading QnnCpu.dll for windows x64 due to crash when calling FreeLibrary - Enable fixed HTP tests - QnnHTPBackendTests.LayerNorm1D_LastAxis_DynamicScale - QnnHTPBackendTests.GlobalMaxPool_LargeInput2_u8 - QnnHTPBackendTests.ReduceSumS8Opset13_Rank5 - QnnHTPBackendTests.ReduceSumU8Opset13_Rank5_LastAxis - QnnHTPBackendTests.WhereLargeDataBroadcastU8 - QnnHTPBackendTests.WhereLargeDataBroadcastTransformedU8 - Enabled fixed CPU tests - QnnCPUBackendTests.Resize_DownSample_Linear_AlignCorners_scales - Increased tolerance for HTP tests that are less accurate on QNN SDK 2.17.0 - QnnHTPBackendTests.AveragePool_CountIncludePad_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameUpper_HTP_u8 - QnnHTPBackendTests.AveragePool_AutopadSameLower_HTP_u8 - QnnHTPBackendTests.ConvU8U8S32_bias_dynamic_input - QnnHTPBackendTests.ConvU8U8S32_bias_initializer - QnnHTPBackendTests.ConvU8U8S32_large_input1_padding_bias_initializer - QnnHTPBackendTests.LRNSize3 - QnnHTPBackendTests.LRNSize5 - QnnHTPBackendTests.MaxPool_Large_Input_HTP_u8 - QnnHTPBackendTests.MaxPool_LargeInput_1Pads - QnnHTPBackendTests.Resize_DownSample_Linear_HalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearPytorchHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearHalfPixel - QnnHTPBackendTests.ResizeU8_2xLinearAlignCorners - QnnHTPBackendTests.ResizeU8_2xLinearAsymmetric - Disabled ONNX model tests - averagepool_2d_ceil: Accuracy issues only on Windows x64 QnnCpu.dll - Disabled QDQ model tests (onnx_test_runner) - facedetection_op8_qdq: Accuracy issues - Disabled CPU EP tests (these use QnnCpu.dll) - ActivationOpTest.Relu: QNN SDK 2.17 Relu treats inf as FLT_MAX - GemmOpTypedTests/0.TestGemmBroadcast: Inaccuracy when weight is initializer and bias is not - MathOpTest.MatMulFloatType "test padding and broadcast B > A": Inaccuracy (only linux) - Fix Gemm translation bugs in QNN EP: - Do not skip processing of inputs that need to be transposed. ### Motivation and Context - Allow testing with newest QNN SDK version - Take advantage of improvements to enable new models.	2023-12-06 11:05:41 -08:00
Ye Wang	c012e41f93	MoE with Expert Slicing (#18565 ) ### Description <!-- Describe your changes. --> Registered Sharded MoE op under contrib_op/cuda/collective with expert slicing. The broadcast process happens just before adding second bias(if has) and permutation undoing. Tensor slicing is planned but not included in this PR. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-05 16:56:38 -08:00

1 2 3 4 5 ...

10161 commits