onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Changming Sun	71da0824f3	Upgrade binskim and fix an error in nuget packaging pipeline (#17340 ) ### Description Upgrade binskim and fix an error in nuget packaging pipeline.	2023-08-30 07:52:06 -07:00
Adrian Lizarraga	21ae86e405	[QNN EP] Fix test zero-point calculation and flaky MatMul test (#17338 ) ### Description - Fix incorrect zero-point calculation in unit tests. Affects int8(signed) QDQ models. - Replace flaky MatMul test that occasionally fails on main branch with a version that uses explicit inputs. ### Motivation and Context Fix bug and improve test accuracy and stability.	2023-08-29 23:16:57 -07:00
Jian Chen	922629aad8	Upgrade Centos7 to Alamlinux8 (#16907 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Get the latest gcc 12 by default --------- Co-authored-by: Changming Sun <chasun@microsoft.com>	2023-08-29 21:05:36 -07:00
Tianlei Wu	c961f67b5e	Handle dtype attribute in float16 conversion script (#17321 ) Some operators have dtype attribute (search `dtype` in https://github.com/onnx/onnx/blob/main/docs/Operators.md). This change make sure dtype attribute is handled correctly in float16 conversion.	2023-08-29 18:41:56 -07:00
Adam Louly	8224891236	add logits option to generate artifacts (#17276 ) ### Description Adding the ability to export logits as an output for train and eval graphs in generate_artifacts it will remain optional..	2023-08-29 16:55:31 -07:00
cloudhan	f3682eee3b	Fix log color, otherwise, the immediate line followed by the colored log will be tainted (#17329 )	2023-08-30 07:46:04 +08:00
Ryan Hill	c438360c1e	Noticed a simple simplification in beam_search_topk (#17275 ) ### Description There was an Init() method that does exactly like the lines I replaced, so I switched to it. ### Motivation and Context Simpler with no drawbacks.	2023-08-29 15:17:33 -07:00
Yi Zhang	d4a61ac71f	Pr trggiers generated by code (#17247 ) ### Description 1. Refactor the trigger rules generation. 2. Skip all doc changes in PR pipelines. ### Motivation and Context Make all trigger rules generated by running set-trigger-rules.py to reduce inconsistences. It's easily to make mistakes to copy&paste manually. For example: these 2 excludes are different, Why? `4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml (L16-L18)` `4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml (L27-L29)` ### Note All changes in workflow yamls are generated by code. Please review the skip-js.yml, skip-docs.yml and set-trigger-rules.py. @fs-eire, please double check the filter rules in skip-js.yml and the skipped workflows `7023c2edff/tools/ci_build/set-trigger-rules.py (L14-L41)`	2023-08-30 05:57:03 +08:00
AtanasDimitrovQC	fd0917b27b	Propagate noop_with_empty_axes in reduce operators. (#16845 )	2023-08-29 14:15:03 -07:00
kushalpatil07	7b92057376	EvalStep called with wrong inputs onnxruntime_training_cxx_inline.h (#17331 )	2023-08-29 14:14:35 -07:00
Yulong Wang	e5ca3f3dcb	[js/api] introducing IO binding for tensor (#16452 ) [//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve #15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```	2023-08-29 12:58:26 -07:00
Chen Fu	8827363fd2	Bugfixes: dangling pointers and python property typo (#17285 ) ### Description Bug fixes ### Motivation and Context Fixing one dangling pointer, and one python property name typo	2023-08-29 12:50:15 -07:00
Jiajia Qin	fffefb1c22	[js/webgpu] Optimize matmul (#16969 ) ### Description Changes in this PR: 1) use the optimized version `makeMatMulPacked[Vec4]Source` to support matmul. 2) enable the conv2dByMatMul path. 3) support broadcast 4) use IndicesHelper. MatMul with M = 512, K = 512, N = 512 becomes 2ms from 15ms when enabling profilingMode on my ADL.	2023-08-29 12:40:57 -07:00
Patrice Vignola	4880f1da46	Fix attention fusion for UNet onnx model export when using LoRA weights (#17249 ) ### Description Tested with stable diffusion unet models exported by both pytorch 2.1.0 (nightly) and pytorch 1.13.1, with and without LoRA weights. ### Motivation and Context LoRA weights modifiy the unet model by adding matmul and scale operations to every q/k/v/out tensors, which breaks the current MHA pattern recognition.	2023-08-29 11:59:30 -07:00
Hector Li	761c4333b5	[QNN EP] GridSample op support (#17317 ) ### Description QNN EP GridSample op support	2023-08-29 11:41:59 -07:00
Hector Li	742b192a34	[QNN EP] Enable GlobalMaxPool op (#17304 ) ### Description [QNN EP] Enable GlobalMaxPool op	2023-08-29 11:25:34 -07:00
Artem Shilkin	6e60dba726	Fix compilation with newer flatbuffers (#17164 ) In flatbuffers@v23.5.9 was broken forward declaration for FlatBufferBuilder. Trying to compile onnxruntime falls with the following error: ``` flatbuffers/include/flatbuffers/flatbuffer_builder.h:1420:38: error: typedef redefinition with different types ('FlatBufferBuilderImpl<false>' vs 'flatbuffers::FlatBufferBuilder') typedef FlatBufferBuilderImpl<false> FlatBufferBuilder; ^ onnx_runtime/include/onnxruntime/core/graph/graph.h:47:11: note: previous definition is here class FlatBufferBuilder; ``` This PR removes these declarations and puts includes instead	2023-08-29 10:28:26 -07:00
Yi Zhang	0e9e9b2a67	Fix one exception in post merge (#17327 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-29 19:24:50 +08:00
Baiju Meswani	5d2c57363f	Sign CUDA Kernel (#17293 )	2023-08-28 21:03:58 -07:00
Baiju Meswani	38ea8c3931	Increase max error tolerance for ConvTransposeGrad test (#17315 )	2023-08-28 17:05:40 -07:00
Tianlei Wu	ee9d046112	Fix model serialization with external data in current directory (#17311 ) When original model has external data in current directory, saving the optimized model will raise File not found exception during looking for external data file under root directory "/". This fix will look under current directory for this case. I manually tested an extra case and it is working: Original model with external data in root directory ("/"), and save optimized to current directory. BTW, there is another bug found: when "session.optimized_model_external_initializers_min_size_in_bytes" is set a large value, some tensor is still pointed to the original external data file. Add a TODO in unit test for this bug. Possible solution: load external data into memory before saving model.	2023-08-28 16:06:04 -07:00
Caroline	228db24317	Add training API functions to WASM API (#16521 ) ### Description * Created `wasm/training_api` source and header files & modified WebAssembly CMake to include training flags * The `wasm/training_api` files use an `OrtTrainingManager` handle which is a struct of an OrtCheckpointState and an OrtTrainingSession, rather than creating a CheckpointState handle & a separate TrainingSession handle. * This is so that the TypeScript side only has to manage one handle that will be passed between TrainingSession & CheckpointState representations, rather than the TypeScript side managing separate CheckpointStateHandle and TrainingSessionHandle. ### Motivation and Context WASM API needs to be updated with ORT training API function calls so that ORT training web bindings can be added for on-device training. --------- Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: carzh <carolinezhu@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-08-28 11:05:02 -07:00
Hariharan Seshadri	cbd97515cd	[JS/WebGPU] Support GatherElements kernel (#17243 ) ### Description As title ### Motivation and Context Improve WebGPU kernel coverage	2023-08-28 09:55:25 -07:00
mindest	53169f59e5	[ROCm] Sort candidate solutions in rocBLAS/hipBLASLt for deterministic offline tuning (#17297 ) ### Description Sort the candidates in rocBLAS/hipBLASLt to make sure that they are properly ordered and can be correctly fetched by saved indices in offline tuning cases.	2023-08-28 16:34:21 +08:00
cloudhan	bf8b1681f9	Build nuget pkg for ROCm (#16791 ) Add nuget pkg building and publishing for ROCm EP --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2023-08-28 13:35:08 +08:00
Yulong Wang	bb1871332f	[js/webgpu] add kernel Not and Equal (#17306 ) ### Description This PR adds kernel implementation for operator "Not" and "Equal". Also removed download cache in gpu data manager. Why removing download cache The following test case failed. ("Or" is on CPU, "Greater" and "Equal" are on JSEP) ![image](https://github.com/microsoft/onnxruntime/assets/7679871/8d9798ad-2703-4fb9-907e-ff716c67d0b2) after debugging, I found that both "Equal" and "Greater" are using the same output GPU Data ID. This is because when ORT executes the graph, it first run "Equal", allowing its shader to write into GPU Data ID 2; then a Gpu2Cpu copy for it is issued (because currently "Or" is on CPU EP); at this point, ORT thinks GPU Data ID=2 is free to use; so it reuse it as output for "Greater". This means there is no allocation for output of "Greater" kernel, and both kernel writes to GPU Data ID=2. For gpu data manager, there will be 2 downloads from the same GPU buffer. Previously I think this is a waste of resource so I cached the data. But now it shoes that we need to perform 2 downloads because the GPU data is already different. The download data cache should be removed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-27 19:50:17 -07:00
simonjub	4eedd3bb46	[TRT EP] Fix logic to reach cache encryption code. (#17111 ) ### Description This is a followup to PR #15519 that is closed in favor of this one. ### Motivation and Context The current implementation of TRT cache has no code execution path possible so that an encrypted TRT engine cache could be created when flags engine_cache_enable and engine_decryption_enable are true. This was originally raised in issue #12551.	2023-08-26 20:09:03 -07:00
Scott McKay	ca0159b45d	Various test infra updates from testing Azure ops with MAUI test app (#17262 ) ### Description <!-- Describe your changes. --> - fix issue with handling string input - set minSdkVersion - otherwise defaults to 19 which we don't support and the build breaks - comment out the debug logging hook - enabling it breaks the Android native logging - can be enabled if you need to debug C# code - update test data tools to allow creating input data for raw file contents (e.g. audio) and from strings (e.g. auth token value) - fix some warnings ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve test setup	2023-08-27 09:35:00 +10:00
Yulong Wang	ddcd46174e	[js/webgpu] fix jsepOnRunEnd (#17300 ) ### Description fix jsepOnRunEnd: jsepOnRunEnd() need to be run after runPromise is resolved.	2023-08-26 00:30:28 -07:00
Yifan Li	808215366d	Fix Multi GPU TensorRT tests (#17269 ) ### Description * Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10 vm) * Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon) * Add multi gpu support to existing C# test cases * Deprecate unfunctional flag `--enable_multi_device_tests` ### Motivation and Context * Two contexts of replacing Linux MultiGPU TRT CI: * Flag `--enable_multi_device_tests` is not functional, which cannot detect issues like #17036 * The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23. Need to enable this test in other dualGPU vm pool.	2023-08-25 20:30:45 -07:00
Arthur Islamov	c262879214	Added DML and CUDA provider support in onnxruntime-node (#16050 ) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request https://github.com/microsoft/onnxruntime/issues/14127 which is tracked here https://github.com/microsoft/onnxruntime/issues/14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------	2023-08-25 16:57:06 -07:00
Jiajia Qin	873ef8b8f0	[js/webgpu] add label for some webgpu APIs (#17291 ) ### Description <!-- Describe your changes. --> With the label, it's more easier to identify which op causes the error. Without the label, the error message is like below: ``` Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32' return W[i2o_W(indices)]; ^^^^^^ - While validating [ShaderModuleDescriptor] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]). ``` With the label, the error message is like below: ``` Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32' return W[i2o_W(indices)]; ^^^^^^ - While validating [ShaderModuleDescriptor "ConvTranspose2D"] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor "ConvTranspose2D"]). ``` ### Motivation and Context This change is mainly for debugging. With this change, we can easily know that `ConvTranspose2D`'s shader has problem from above message.	2023-08-25 12:12:56 -07:00
xhcao	5e8d94cec8	[js/webgpu] support Greater and Less operators (#17296 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 12:11:25 -07:00
Adrian Lizarraga	5a83a67f32	Support QDQ transformations with com.microsoft.Quantize/Dequantize ops (#17127 ) ### Description - Enables int32 support for com.microsoft.DequantizeLinear (contrib op) - Makes the `zero_point` input optional for Quantize/Dequantize contrib ops - Enables QDQ transformations with the Quantize/Dequantize contrib ops - Update tests: EnsureUniqueDQForNodeUnitTests, QDQTransformerTests, TransposeOptimizerTests ### Testing List of tested graph transformations: - [x] QDQSelectorActionTransformer - qdq_transformer_test.cc - [x] QDQS8ToU8Transformer - qdq_transformer_test.cc - [x] DoubleQDQPairsRemover - qdq_transformer_test.cc - [x] IdenticalChildrenConsolidation - qdq_transformer_test.cc - [x] QDQPropagation - qdq_transformer_test.cc - [x] QDQFinalCleanup - qdq_transformer_test.cc - [x] CliQuantFusion - qdq_transformer_test.cc - [x] ReluQuantFusion - qdq_transformer_test.cc - [x] EnsureUniqueDQForNodeUnit - ensure_unique_dq_for_node_unit_test.cc - [x] TransposeOptimizer - transpose_optimizer_test.cc - [x] CommonSubexpressionElimination - graph_transform_test.cc - [x] ConstantFolding - graph_transform_test.cc ### Motivation and Context We need to [support mixed 16-bit/8-bit precision QDQ models](https://github.com/microsoft/onnxruntime/pull/17015). This PR is the first step in achieving this goal: we need to make QDQ contrib ops work with our optimizations/transformations. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-08-25 09:57:51 -07:00
Yulong Wang	79c4ed9a45	[js/webgpu] support error pop and kernel name (#17260 ) ### Description This PR contains changes to support error pop and kernel name. - Add a function `JsepGetNodeName` to allow reading kernel name from JS to C++ - When in debug mode ( `env.debug = true;` ) or in profiling mode ( `env.webgpu.profilingMode = 'default';` ), kernel name will be read from ORT; otherwise use the kernel pointer ( a number ) as kernel name to save calls from JS to C++. - When in debug mode, WebGPU validation errors will be recorded and if any error occurs, `inferenceSession.run()` will fail (Promise get rejected). Behavior when not in debug mode is not changed. This is because recording errors are not zero-overhead, and GPU validation errors should occur consistently in and not in debug mode. - Add `jsepOnRunStart()` and `jsepOnRunEnd()` hook to: - allow implementation of the features mentioned above. - pass session ID to backend.	2023-08-25 08:08:15 -07:00
satyajandhyala	da180b20fa	[JS/Web] Fix ConvTranspose shader code compilation errors. (#17232 ) ### Description Fix JSEP ConvTranspose shader code errors. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 06:25:54 -07:00
guyang3532	401129d484	Add support for more ops for padding elimination (#17217 ) Add support for Gelu/ReduceMean/SimplifiedLayerNormalization for padding elimination	2023-08-25 18:02:15 +08:00
mindest	735cc8e6c8	[ROCm] enable If op for ROCm EP. (#17279 ) ### Description Enable If op for ROCm EP.	2023-08-25 17:49:49 +08:00
Yi Zhang	9cd33e07b4	Readd Tests in Window GPU Reduced Ops workflow (#17294 ) ### Description Add single test step in Window GPU Reduced Ops workflow ### Motivation and Context The old workflow's building and testing were running in one command. In PR #17263, the test step was removed by mistake. So, readd it. How to consolidate the test step is in consideration.	2023-08-25 15:56:59 +08:00
Yi Zhang	4a0f8f6672	Skip one flaky Test (#17290 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It's skipped in the PR ``` 2023-08-25T02:37:48.7772670Z 1: [ RUN ] ModelTests/ModelTest.Run/cuda__models_opset9_Candy_candy 2023-08-25T02:37:48.7824755Z 1: D:\a\_work\1\s\onnxruntime\test\providers\cpu\model_tests.cc(91): Skipped 2023-08-25T02:37:48.7825343Z 1: Skipping single test It's in broken_tests ```	2023-08-25 14:48:41 +08:00
Changming Sun	3e934030f4	nodejs: Release Ort Env before main function returns (#17288 ) ### Description Release OrtEnv before main function returns. Before this change, OrtEnv is deleted when C/C++ runtime destructs all global variables in ONNX Runtime's core framework. The callstack is like this: ``` * frame #0: 0x00007fffee39f5a6 libonnxruntime.so.1.16.0`onnxruntime::Environment::~Environment(this=0x00007fffee39fbf2) at environment.h:20:7 frame #1: 0x00007fffee39f614 libonnxruntime.so.1.16.0`std::default_delete<onnxruntime::Environment>::operator()(this=0x00007ffff4c30e50, __ptr=0x0000000005404b00) const at unique_ptr.h:85:2 frame #2: 0x00007fffee39edca libonnxruntime.so.1.16.0`std::unique_ptr<onnxruntime::Environment, std::default_delete<onnxruntime::Environment>>::~unique_ptr(this=0x5404b00) at unique_ptr.h:361:17 frame #3: 0x00007fffee39e2ab libonnxruntime.so.1.16.0`OrtEnv::~OrtEnv(this=0x00007ffff4c30e50) at ort_env.cc:43:1 frame #4: 0x00007fffee39fa96 libonnxruntime.so.1.16.0`std::default_delete<OrtEnv>::operator()(this=0x00007fffefff8f78, __ptr=0x00007ffff4c30e50) const at unique_ptr.h:85:2 frame #5: 0x00007fffee39f394 libonnxruntime.so.1.16.0`std::unique_ptr<OrtEnv, std::default_delete<OrtEnv>>::~unique_ptr(this=0x7ffff4c30e50) at unique_ptr.h:361:17 frame #6: 0x00007ffff78574b5 libc.so.6`__run_exit_handlers + 261 frame #7: 0x00007ffff7857630 libc.so.6`exit + 32 frame #8: 0x00007ffff783feb7 libc.so.6`__libc_start_call_main + 135 frame #9: 0x00007ffff783ff60 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128 frame #10: 0x0000000000abbdee node`_start + 46 ``` After this change, OrtEnv will be deleted before the main function returns and nodejs is still alive.	2023-08-24 23:07:02 -07:00
mindest	93ae17d1bb	[ROCm] Add hipBLASLt workspace support (#17096 ) ### Description * hipBLASLt extra workspace for split-k * type update (due to extra support for fp8 in hipBLASLt) * minor changes	2023-08-25 13:08:57 +08:00
pengwa	7c98f45928	Fix layernorm and softmax axis after upstream (#17255 ) ### Fix layernorm and softmax axis after upstream For Gather (the slicing is a scalar), the output rank is small than its inputs. When we upstream this kind of Gather before softmax or layernorm, we should also update the axis attribute. Otherwise, the axis might be out-of-date and incorrect for the updated rank. ``` File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_fallback.py", line 157, in handle_exception raise exception File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 280, in forward self._build_graph(graph_transformer_config) File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_logger.py", line 158, in wrapper result = func(graph_execution_manager, args, kwargs) File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_logger.py", line 273, in wrapper result = func(graph_execution_manager, args, *kwargs) File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 361, in _build_graph super()._build_graph(graph_transformer_config) File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 184, in _build_graph self._graph_builder.build(config) RuntimeError: /onnxruntime/orttraining/orttraining/python/orttraining_pybind_state.cc:823 onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder, const onnxruntime::training::TrainingGraphTransformerConfiguration&)> [ONNXRuntimeError] : 1 : FAIL : Node (Softmax_2904) Op (Softmax) [ShapeInferenceError] 'axis' must be in [-3 , 2]. Its actual value is: 3 ```	2023-08-25 12:26:22 +08:00
Faith Xu	86238fb507	[Docs] Auto generate JS API (#17271 ) ### Description Adds new workflow to generate js docs with latest changes so the API page can stay up to date [Test page of latest js docs](https://faxu.github.io/onnxruntime/docs/api/js/modules/InferenceSession.html)	2023-08-24 17:35:37 -07:00
Yi Zhang	756eda2cc4	Windows CI build steps template (#17263 ) ### Description 1. New windows ci build steps template. 2. Remove useless variables. ### Motivation and Context 1. Make it easier to apply build cache to all windows CIs. 2. Other team's devs only need to take care of build options ###Comparision Before: `9f21f694cf/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L19-L82)` After: `b4c1f2261b/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L35-L54)`	2023-08-25 05:58:49 +08:00
Hector Li	680fac64ed	[QNN EP] Support non-quantized Op on HTP (#17194 ) ### Description [QNN EP] Support non-quantized Op on HTP 1. Remove the limitation in GetCapability that always require QDQ node unit group to partition the node on NPU backend. So that we can support non-quantized Slice op with int32 data input on HTP. 2. Enable Where QDQ node unit 3. Separate out the flag is_npu_backend & is_quantized_node to make it clear 4. Separate output QuantizeLinear, DequantizeLinear to QdqOpBuilder to better identify quantized/un-quantized input/output tensor 5. Separate out a TransposeOpBuilder to make it simple for Transpose node processing. Especially for Single Transpose node in QDQ model, for case like Q->Tranpose->DQ, Transpose is not QDQ node unit group, it's single node. But we should treat it as quantized node. Output should has same data type and quantization parameter with input. Another case is to support non-quantized data for Transpose in QDQ model. 6. Remove is_npu_backend flag from OpBuilder interface. Set the backend type in QnnBackendManager, QnnMOdel & QnnModelWrapper, so that OpBuilders can always get it from QnnModelWrapper. 7. Add unit tests for quantized/non-quantized Transpose (int32, float32) on HTP backend	2023-08-24 14:57:16 -07:00
pengwa	18d5cfdb85	Fix build - redefinition of default argument for ‘long unsigned int Extent’ (#17281 ) ### Fix build - redefinition of default argument for ‘long unsigned int Extent’ One of the training customer env, building ORT, there is such a build error. The GCC version are ``` aiscuser@node-0:/tmp/onnxruntime$ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 aiscuser@node-0:/tmp/onnxruntime$ g++ --version g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 ``` But on our dev node using same GCC/G++, we don't have build issue., not sure what's the difference but giving an explict type when creating `gsl::span` fixed the problem. ``` /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:394:7: error: redefinition of default argument for ‘long unsigned int Extent’ 394 \| class span \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span_ext:46:51: note: original definition appeared here 46 \| template <class ElementType, std::size_t Extent = dynamic_extent> \| ^~~~~~~~~~~~~~~ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:82:93: error: return type ‘class gsl::span<const std::byte>’ is incomplete 82 \| [[nodiscard]] inline gsl::span<const std::byte> AsByteSpan(const void* data, size_t length) { \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h: In function ‘void onnxruntime::AsByteSpan(const void, size_t)’: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: class template argument deduction failed: 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: error: no matching function for call to ‘span(const std::byte, size_t&)’ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: candidate: ‘template<class Type, long unsigned int Extent> gsl::span(Type (&)[Extent])-> gsl::span<ElementType, FirstExtent>’ 740 \| span(Type (&)[Extent]) -> span<Type, Extent>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:740:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘Type [Extent]’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte>(data), length); \| ^ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: candidate: ‘template<class Type, long unsigned int Size> gsl::span(std::array<_Tp, _Nm>&)-> gsl::span<ElementType, FirstExtent>’ 743 \| span(std::array<Type, Size>&) -> span<Type, Size>; \| ^~~~ /tmp/onnxruntime/build/Linux/RelWithDebInfo/_deps/gsl-src/include/gsl/span:743:1: note: template argument deduction/substitution failed: /tmp/onnxruntime/include/onnxruntime/core/common/span_utils.h:83:68: note: mismatched types ‘std::array<_Tp, _Nm>’ and ‘const std::byte’ 83 \| return gsl::span(reinterpret_cast<const std::byte*>(data), length); \| ^ ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 00:40:40 +08:00
pengwa	d90afc697b	Introduce ZeROOffloadSubscriber for ORTModule (#17006 ) ### Introduce ZeROOffloadSubscriber for ORTModule As part of the work: integrate ORTModule with DeepSpeed stage3, this PR mainly focus on moving original PyTorch-based (leveraging hooks) param partition/offload implementation to ORTModule compatible implementation. Changes include: 1. Refactor `SubscriberBase`/`SubcriberManager` to support pre-forward/post_forward hooks. 2. Implement new `ZeROOffloadSubscriber` by re-using DeepSpeed hook function as much as possible. Since all hook functions are defined in `DeepSpeedZeRoOffload._register_hooks_recursively` and `DeepSpeedZeRoOffload.setup_zero_stage3_hooks`, and the good thing is, the closure is not complex, all hooks are referencing the owning `DeepSpeedZeRoOffload` instance, so we can create new hook function with `FunctionType` by binding the owning `DeepSpeedZeRoOffload` instance, then call the new created function in subscriber's `pre_forward_module_apply_impl` and `post_forward_module_apply_impl` interfaces. 3. Monkey patch `DeepSpeedZeRoOffload.setup_zero_stage3_hooks` to register the `ZeROOffloadSubscriber` for the model, then we don't need change any code on the DeepSpeed repo (at least so far). 4. Fix the ATen embedding custom symbolic exporter function by tolerating weights size be (0) (changed by DeepSpeed zero stage 3). UT will be added once stage3 is fully supported. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 00:15:22 +08:00
Baiju Meswani	fca81cc5d5	ConvTransposeGrad CUDA Kernel (#17201 )	2023-08-24 09:08:06 -07:00
Jian Chen	33415b9da4	Removing 10.14 suffix from osx nuget package (#17277 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-24 08:51:54 -07:00

1 2 3 4 5 ...

9508 commits