onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Baiju Meswani	465540d29b	Update training api python documentation (#19287 )	2024-01-29 14:14:15 -08:00
Changming Sun	e91d91ae4f	Fix a build issue: /MP was not enabled correctly (#19190 ) ### Description In PR #19073 I mistunderstood the value of "--parallel". Instead of testing if args.parallel is None or not , I should test the returned value of number_of_parallel_jobs function. If build.py was invoked without --parallel, then args.parallel equals to 1. Because it is the default value. Then we should not add "/MP". However, the current code adds it. Because if `args.paralllel` is evaluated to `if 1` , which is True. If build.py was invoked with --parallel with additional numbers, then args.parallel equals to 0. Because it is unspecified. Then we should add "/MP". However, the current code does not add it. Because `if args.paralllel` is evaluated to `if 0` , which is False. This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons. ### Motivation and Context	2024-01-29 12:45:38 -08:00
Changming Sun	4ee222413f	Update OneBranch.Nuget-WindowsAI-Pipeline.Official.yml for Azure Pipelines (#19293 ) To fix a pipeline issue.	2024-01-29 12:00:42 -08:00
Guenther Schmuelling	9e69606360	fix f16 for attention, enable slice and flatten for more types (#19262 )	2024-01-29 10:13:46 -08:00
Yi Zhang	e96a038f01	Add VP test in Stable diffusion pipeline (#19300 ) ### Description 1. Add visual parity test based on openai clip model 2. Add trigger rules ### Motivation and Context 1. check generated image is expected 2. reduce unnecessary triggers	2024-01-29 09:33:58 -08:00
PeixuanZuo	82c1cb416b	[CUDA] Refactor GroupNorm and add common vectorize implementation (#19158 ) Co-authored-by: Peixuan Zuo <peixuanzuo@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-01-29 09:15:10 +08:00
Adrian Lizarraga	6d7ac9c93a	Support general session config entries in perf test tool (#19289 ) ### Description Adds the ability to specify general session configuration entries via the `-C` command-line option. Example: `-C "session.disable_cpu_ep_fallback\|1 ep.context_enable\|1"` Some session config entries can already be set via dedicated command-line options. If the user uses multiple command-line options to set the same session config entry, we'll print a warning. Note that the dedicated command-line options will take precedence. ### Motivation and Context Allows setting session configurations when testing EPs. QNN EP, for example, uses the `session.disable_cpu_ep_fallback` and `ep.context_*` options.	2024-01-26 19:51:48 -08:00
Tianlei Wu	d7ff81dfb7	[CUDA] support user_compute_stream in python API (#19229 ) ### Description It is an important feature to pass user cuda stream to avoid synchronization in python API. Here we allow user to pass cuda stream for CUDA provider. Note that TRT or ROCm provider need similar change, which are not included in this pull request. Note that we will set `has_user_compute_stream` automatically based on whether there is cuda stream passed, so setting `has_user_compute_stream` through python API has no effect. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/19094	2024-01-26 10:34:43 -08:00
cao lei	7d4dc66846	ExecutionProvider API refactor - make GenerateMetaDefId a standalone function, decouple it from EP (#18977 ) ### Description <!-- Describe your changes. --> Make EP's member function, GenerateMetaDefId, a standalone function which decouples from EP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is for ExecutionProvider API refactoring, we will make a clean ExecutionProvider API first for later EPv2 work	2024-01-26 07:39:08 -08:00
Baiju Meswani	fc44f96ad5	Add support for a collection of OrtValue as inputs and outputs to C# TrainingSession (#19048 )	2024-01-25 21:55:36 -08:00
Tianlei Wu	358650d441	Fix BigModel stable diffusion pipeline (#19277 ) ### Description Fix two issues: (1) We can only use single quote inside `bash -c "..."`. Current pipeline job stopped at `python3 demo_txt2img.py astronaut` and skip the following commands. In this change, we remove the remaining commands to get same effect (otherwise, the pipeline runtime might be 2 hours instead of 15 minutes). (2) Fix a typo of Stable.	2024-01-25 17:19:04 -08:00
Xu Xing	a3f0e2422b	[js/webgpu] Support f16 uniform (#19098 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-25 16:58:22 -08:00
Tianlei Wu	8b4517218b	Remove USE_CUTLASS flag (#19271 ) ### Description Since Cutlass can be built with CUDA 11.4 (The minimum CUDA version for onnxruntime CUDA build), there is no need to have a flag to disable cutlass. Changes: (1) Reverted https://github.com/microsoft/onnxruntime/pull/18761 (2) remove the condition to build cutlass. (3) Fix a few build errors or warnings during testing CUDA 11.4 build. Note that SM 89 and 90 (including fp8) requires CUDA 11.8 or later. Flash attention and cutlass fused multihead attention will not be built for CUDA < 11.6. It is recommended to use CUDA 11.8 or above to build if you want to support latest GPUs. It is better to include it in 1.17.0 (otherwise, the release branch might encounter build failure with CUDA 11.4). Tests: (1) Build with flash attention and efficient attention off: passed (2) Build with CUDA 11.4: passed Example build command used in Ubuntu 20.04: ``` export CUDA_HOME=/usr/local/cuda-11.4 export CUDNN_HOME=/usr/lib/x86_64-linux-gnu/ export CUDACXX=/usr/local/cuda-11.4/bin/nvcc sh build.sh --config Release --build_shared_lib --parallel --use_cuda --cuda_version 11.4 \ --cuda_home $CUDA_HOME --cudnn_home $CUDNN_HOME --build_wheel --skip_tests \ --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80 \ --disable_types float8 ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-25 16:57:58 -08:00
Xu Xing	656ca66186	[js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753 )	2024-01-25 15:37:05 -08:00
Chi Lo	a2867b911e	[TensorRT EP] Fix mem leak for TRT plugins custom ops (#19248 ) TRT EP's GetTensorRTCustomOpDomainList() will create vector of OrtCustomOpDomain objects and release the ownership of those objects. But, thoses objects are not released forever. In session level, we need to make TRT EP remember what OrtCustomOpDomain objects it created and release them at EP destruction time.	2024-01-25 11:51:39 -08:00
Tianlei Wu	2b285cd78a	[CUDA] Add functions to dump bfloat16 tensors (#19266 ) ### Description GroupQueryAttention add BFloat16 in https://github.com/microsoft/onnxruntime/pull/19095, and there is build error when enable dumping. This supports print bfloat16 tensor to console.	2024-01-25 09:30:15 -08:00
Jiajie Hu	5b06505073	[js/webgpu] Fix Tanh explosion (#19201 ) ### Description ```math \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}= \left\{ \begin{array}{cc} -\frac{1-e^{-2\cdot(-x)}}{1+e^{-2\cdot(-x)}}, & x<0 \\ 0, & x=0 \\ \frac{1-e^{-2x}}{1+e^{-2x}}, & x>0 \end{array} \right. ``` ### Motivation and Context On some platforms, $$\tanh(1000)=\frac{e^{1000}-e^{-1000}}{e^{1000}+e^{-1000}}$$ would produce NaN instead of 0.999... or 1 (imagine $e^{1000}=\infty$ and $\frac{\infty}{\infty}$ explodes).	2024-01-25 08:25:35 -08:00
PeixuanZuo	1c92e56dc0	[Cuda] Refactor GroupNorm (#19146 ) Split GroupNorm implementation into multiple files, to make ROCm EP can reuse cuda code. Related PR: https://github.com/microsoft/onnxruntime/pull/19158 --------- Co-authored-by: Peixuan Zuo <peixuanzuo@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-01-25 22:28:47 +08:00
Vincent Wang	2b87dd373a	[ORTModule] Remove Mod from Hash to Avoid Conflict for Triton Code-gen (#19256 ) Remove mod (10**8) from hash to avoid conflict for Triton code-gen.	2024-01-25 10:16:41 +08:00
Dmitri Smirnov	7dd1f4b8e2	Pad-18 Cuda implementation (#19211 ) ### Description Implement Pad-18 for Cuda. ### Motivation and Context Latest models converted by Dynamo fall back on CPU for Pad with performance degradation. This contributes to https://github.com/microsoft/onnx-rewriter/issues/126	2024-01-24 18:12:04 -08:00
Phoebe Chen	4477f57ee3	Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238 ) ### Description This pull request introduces the necessary changes to enable RISC-V 64-bit cross-compiling support for the ONNX Runtime on Linux. The RISC-V architecture has gained popularity as an open standard instruction set architecture, and this contribution aims to extend ONNX Runtime's compatibility to include RISC-V, thereby broadening the reach of ONNX models to a wider range of devices. ### Motivation and Context RISC-V is a free and open-source instruction set architecture (ISA) based on established RISC principles. It is provided under open licenses without fees. Due to its extensibility and freedom in both software and hardware, RISC-V is poised for widespread adoption in the future, especially in applications related to AI, parallel computing, and data centers. ### Example Build Command ``` ./build.sh --parallel --config Debug --rv64 --riscv_toolchain_root=/path/to/toolchain/root --skip_tests ``` ### Documentation Updates Relevant sections of the documentation will be updated to reflect the newly supported RISC-V 64-bit cross-compilation feature. https://github.com/microsoft/onnxruntime/pull/19239 --------- Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>	2024-01-24 16:27:05 -08:00
Wanming Lin	0c2f0ba90d	[WebNN EP] Support conv1d by reshaping with prepended 1's (#18857 ) WebNN only supports 4-D inputs for conv2d and convTranspose2d, this PR supports 3-D inputs (i.e. conv1d) by prepending a 1 size dimension and several reshape operations.	2024-01-24 15:53:10 -08:00
Wanming Lin	7252c6e747	[WebNN EP] Support WebNN async API with Asyncify (#19145 )	2024-01-24 15:37:35 -08:00
Yufeng Li	c456f19dba	remove old quantization tool file (#19247 ) ### Description <!-- Describe your changes. --> remove old python files ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We have a new op MatMulNBits and this one is deprecated.	2024-01-24 15:20:36 -08:00
Yang Gu	591f90c0b9	[js/webgpu] Fix issue of timestamp query (#19258 ) When we enable webgpu profiling mode between session.create and session.run, current implementation has a problem to create querySet (and also queryResolveBuffer) if we share the commandEncoder with inputs upload. This PR fixes this by moving the querySet creation to the place we set queryType.	2024-01-24 14:49:37 -08:00
Changming Sun	bc54ad3f03	Update abseil to a release tag and register neural_speed (#19255 ) ### Description Update abseil to a release tag and register neural_speed to CG. ### Motivation and Context Now we are using a non-relesed version of abseil. Using a tag is better.	2024-01-24 14:37:39 -08:00
Changming Sun	a28abeb241	Change "#ifdef WIN32" to "#ifdef _WIN32" (#19254 ) ### Description `_WIN32` is a standard macro listed at https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-170 . But `WIN32` is not.	2024-01-24 14:35:44 -08:00
satyajandhyala	a33b5bd1fa	[JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788 ) ### Description Added Uniforms to SkipLayerNorm ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-01-25 01:12:21 +05:30
Sheil Kumar	a39ac4a979	[DirectML] Register Pad19 (#19175 ) ### Description Register Pad19 in DirectML --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-01-24 10:06:31 -08:00
Yi Zhang	d7aebf9ea8	Move Nuget Test from T4 to A10 to reduce release duration (#19253 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Running release process is very painful and boring because some GPU jobs have to wait so long time. ![image](https://github.com/microsoft/onnxruntime/assets/16190118/1c5c981e-68d4-4678-9758-443fbf362802) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/ba0d79ba-1554-4c7a-93dd-6ea8144c9295) ![image](https://github.com/microsoft/onnxruntime/assets/16190118/36cab833-71c1-4ff5-bca5-f4caa9aee0c9) On the one hand, we could move some T4 from PR process since some jobs are not using T4 any more and on the other hand, we can continue to change some jobs' agent from T4 to A4 too. In the future, T4 will mainly be used for the scenarioes that big GPU memory is needed, multiple GPU cards or some special cases. Test runs: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=401786&view=logs&j=8048494c-e6eb-5e47-5e87-ff0aa863325d cc @YUNQIUGUO @snnn	2024-01-24 14:15:07 +08:00
Chi Lo	c10be1848c	[TensorRT EP] Avoid calling unavailable function with cpu python package (#19251 ) C.register_tensorrt_plugins_as_custom_ops() is only available in gpu python package. Add condition to avoid calling it in cpu python package.	2024-01-23 21:30:22 -08:00
Ye Wang	6a424ccf8c	Fix AMD pipeline test failures (#19250 ) ### Description <!-- Describe your changes. --> Fix amd test failure ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-23 19:33:49 -08:00
aciddelgado	cbb29d80ff	GQA Rotary and Packed QKV with Flash (#18906 ) ### Description These changes add rotary embedding and packed qkv input to gqa. As of now, the changes are only supported with Flash-Attention (SM >= 80) but should soon be supported with Memory Efficient Attention as well. ### Motivation and Context With the fusion of rotary embedding into this Attention op, we hope to observe some perf gain. The packed QKV should also provide some perf gain in the context of certain models, like Llama2, that would benefit from running ops on the fused QKV matrix, rather than the separate Q, K, and V. --------- Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>	2024-01-23 16:34:26 -08:00
Wei-Sheng Chin	532f8c642c	Fix a backend test by using local backend (#19230 ) The decomposition pass (e.g., converting torch.add to aten.add) in DORT no longer exists. Therefore, we have to use `use_aot_autograd=True` to enable Dynamo's built-in operator decomposition. I think we need to add the decomposition pass back to DORT or remove `use_aot_autograd` (remove because it will always be `true`).	2024-01-23 14:57:30 -08:00
petermcaughan	f53068446e	Add Temperature to WhisperBeamSearch input (#19188 ) ### Description <!-- Describe your changes. --> Add `temperature` as an input to WhisperBeamSearch op and initialize correctly in parameter setup. ### Motivation and Context Currently, temperature is included as an attribute to the BeamSearch op, which doesn't let the model act dynamically in a single inference session. By including this variable as an input, the temperature value can be altered in any inference call (important for 1P teams) --------- Co-authored-by: Peter McAughan <petermca@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>	2024-01-23 13:44:34 -08:00
Yi Zhang	54871a2773	Replace T4 to A10 in Linux GPU workflow (#19205 ) ### Description 1. Update Linux GPU machine from T4 to A10, sm=8.6 2. update the tolerance ### Motivation and Context 1. Free more T4 and test with higher compute capability. 2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss and fail this test ``` 2024-01-19T13:27:18.8302842Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:25.8438641Z Expected equality of these values: 2024-01-19T13:27:25.8438841Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:25.8439276Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:25.8439464Z ret.first 2024-01-19T13:27:25.8445514Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ 2024-01-19T13:27:25.8446198Z 2024-01-19T13:27:25.8555736Z [ FAILED ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms) 2024-01-19T13:27:25.8556077Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312 2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:29.3175144Z Expected equality of these values: 2024-01-19T13:27:29.3175389Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:29.3175812Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:29.3176080Z ret.first 2024-01-19T13:27:29.3176322Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ ``` 3. some other test like SSD throw other exception, so skip them ''' 2024-01-22T09:07:40.8446910Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-22T09:07:51.5587571Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358: Failure 2024-01-22T09:07:51.5588512Z Expected equality of these values: 2024-01-22T09:07:51.5588870Z COMPARE_RESULT::SUCCESS 2024-01-22T09:07:51.5589467Z Which is: 4-byte object <00-00 00-00> 2024-01-22T09:07:51.5589953Z ret.first 2024-01-22T09:07:51.5590462Z Which is: 4-byte object <01-00 00-00> 2024-01-22T09:07:51.5590841Z expected 1, got 63 '''	2024-01-23 10:49:24 -08:00
Heflin Stephen Raj	0ea48fc73e	Modified the condition to load the optimiser model (#18891 )	2024-01-23 10:10:54 -08:00
Xu Xing	61610ff986	[js/webgpu] Add FusedConv clip test case (#18900 ) Bug: https://github.com/microsoft/onnxruntime/issues/18899	2024-01-23 08:25:05 -08:00
Tianlei Wu	6ca7c1a933	unet fusion for stable diffusion webui (#19227 ) ### Description Update unet fusion for [stable diffusion webui extension](https://github.com/tianleiwu/Stable-Diffusion-WebUI-OnnxRuntime): (1) Update fusion pattern to support fp16 unet model. (2) Add progress bar (3) Use a cached map to speed up dtype or shape lookup in shape inference result. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-22 20:42:30 -08:00
Jeff Daily	b2aec41a83	[ROCm] enable hipGraph (#18382 ) This ports the cudaGraph support from the CUDA EP to the ROCM EP's hipGraph.	2024-01-23 11:17:04 +08:00
Adrian Lizarraga	37d14d7896	[QNN EP] Create Windows ARM64 nightly python package (#19128 ) ### Description Adds a job to create a nightly python package for ORT/QNN on Windows ARM64. Must build onnxruntime-qnn with python 3.11 and numpy 1.25. Note: pipeline run may take up to 3 hrs ### Motivation and Context Make it possible to get a nightly python package with the latest updates to QNN EP. Issue #19161	2024-01-22 18:14:41 -08:00
Jiajia Qin	d226e40856	[js/webgpu] set query type in onRunStart (#19202 ) ### Description <!-- Describe your changes. --> `env.webgpu.profiling` is a global flag. It may change before each session.run. So the best place is to update it in `onRunStart` event. After this, we can directly check `this.queryType`'s value. Without this pr, we need to make sure that `getCommandEncoder()` is called before checking `this.queryType`. Otherwise, it may happen that `pendingKernels`'s length is not equal to `pendingDispatchNumber`'s length. See the two ugly workarounds [1)](`e630dbf528 (diff-006fc84d3997f96a29b8033bd2075d6a0a9509211bd5812a6b934fc74fedfd9dR267-R268)`) and [2)](`e630dbf528 (diff-618fe297fbe7a1da586380163b8fd2627311ccc217640a3c5cdc9c17a33472c1R73-R80)`) if we don't introduce `onRunStart`. Or we need to call `setQueryType` in each kernel run.	2024-01-22 16:08:55 -08:00
Jiajia Qin	2e0a388c36	[js/webgpu] Add HardSigmoid support (#19215 ) ### Description This op is required in mobilenetv3-small-100. With this PR, mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on ADL.	2024-01-22 15:53:26 -08:00
Yifan Li	e283cdb218	Fix Fuzz Testing CI (#19228 ) ### Description <!-- Describe your changes. --> Add BuildArch To verify: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=400952&view=logs&j=5b022bb4-70a7-5401-8766-a8a7802c7150&t=291e85c7-5547-590b-50de-4e01fcd4eba3&l=14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-22 15:44:57 -08:00
Linnea May	24b74aebcb	[DML] Register DML operators for opset 19 (#16939 ) ### Description <!-- Describe your changes. --> Register DML operators for opset 19. - Cast19 - Castlike19 - Constant19 - Equal19 - Identity19 - QuantizeLinear19 - DequantizeLinear19 - Reshape19 - Shape19 - Size ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: linnealovespie <linneamay@microsoft.com>	2024-01-22 15:37:09 -08:00
snadampal	77da2ef278	[aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031 ) ### Description This PR adds SbgemmKernel for aarch64. This includes Sbegmm kernel to implement matrix multiplication with bfloat16 SIMD instructions (bfmmla) and MatMul operator changes to invoke the Sbgemm kernel. To enable Sbgemm kernel, set the following session option: "kOrtSessionOptionsGemmFastMathMode" The PR also adds new test cases for mlas and ort. ### Motivation and Context This is to improve MatMul performance on aarch64 platform. I have run the below benchmarking script (bert , roberta and gpt2 model inference) on AWS Graviton3 based c7g.4xl instance and observed 1.2x -1.76x performance improvement compared to sgemm (fp32) kernel performance. ``` cd onnxruntime/python/tools/transformers python3 benchmark.py ``` And the unit test precision results are matching to sgemm kernel results. `./build.sh --config RelWithDebInfo --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync `	2024-01-22 14:43:06 -08:00
Yi Zhang	780acda7b4	Add Big models pipeline (#19222 ) ### Description 2 models are added in CI. Stabe diffusion Model stage is based on https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md LLama2 FP16 is based on https://github.com/microsoft/Llama-2-Onnx. 12G GPU memory is not enough, so I choose T4 to run it. ### Motivation and Context Add regular E2E test for big models. It will be triggered in main build, that is, it'll run after one PR is merged. More models will be added later. ### Test Runs ### https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1275191&view=results	2024-01-22 14:02:56 -08:00
Adrian Lizarraga	8d9d751179	[QNN EP] Expose device-level session options (#19212 ) ### Description - Adds the following session options to configure the device: - `soc_model`: The SoC model number. Refer to the QNN SDK documentation for valid values. Defaults to "0" (unknown). - `htp_arch`: The minimum HTP architecture the driver will use to select compatible QNN operators. - `device_id`: The ID of the device to use when setting 'htp_arch'. Defaults to "0" (for single device). ### Motivation and Context Allow more configuration.	2024-01-22 12:47:42 -08:00
Zhang Lei	373ebac167	Zhalei/fix seqoutput type (#18765 ) After refactoring beamsearch, all scores become fp32. Yet it need support fp16 according to original specs.	2024-01-22 10:40:48 -08:00
Ye Wang	21034a2c37	phi2 contrib ops changes (#19112 ) ### Description <!-- Describe your changes. --> 1. support causal mask in MHA cpu 2. support custom rotary_dim in rotary_emb 3. add bf16 for rotary_emb 4. fix a bug in attention rotary ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-22 10:17:11 -08:00

1 2 3 4 5 ...

10451 commits