Commit graph

11127 commits

Author SHA1 Message Date
Yueqing Zhang
59b13b7bbd
[VitisAI] update version and api & bug fix (#20851)
### Description
<!-- Describe your changes. -->
1. Use macro defined to check version number
2. Add a new api
3. Fix bug at attr_proto


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
These are some problems we need to address for the final delivery to
Microsoft.
2024-05-30 07:36:53 -07:00
Xu Xing
25ac65375c
[js/webgpu] Fix mha name (#20860)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-30 00:01:06 -07:00
Jian Chen
228713f635
adding publishing stage to publish java CUDA 12 pkg to ado (#20834) 2024-05-29 16:24:23 -07:00
Carson M
5bfca1dc57
[Build] Change onnxruntime_NVCC_THREADS from option to cache entry (#20768)
### Description
Changes the `onnxruntime_NVCC_THREADS` CMake variable from an
[`option`](https://cmake.org/cmake/help/latest/command/option.html) to a
[cache
entry](https://cmake.org/cmake/help/latest/command/set.html#set-cache-entry).

### Motivation and Context
Fixes #19833.

`option` in CMake (confusingly, IMHO) always defines a *boolean* option.
The original definition of `onnxruntime_NVCC_THREADS` specified a
default of `1`, which I presume is coerced to `ON`. Thus, if the option
is not overridden with a value of another type, NVCC will receive a
malformed option `--threads ON` (rather than the expected `--threads
1`), which causes the error reported in #19833.

This error only occurred if compiling ONNX Runtime via CMake with
`-Donnxruntime_USE_CUDA=ON`; the CI build script always overrode
`onnxruntime_NVCC_THREADS` with a string value:

f1fef19b6e/tools/ci_build/build.py (L1152-L1154)
2024-05-29 12:28:33 -07:00
Wanming Lin
798cea2350
[WebNN EP] Remove legacy MLOperandDescriptor.type (#20783)
Latest Chrome has supported MLOperandDescriptor.dataType, remove legacy
MLOperandDescriptor.type.
2024-05-29 10:20:17 -07:00
Wanming Lin
9ea9f9e46a
[WebNN EP] Add data type constraint (#20779)
WebNN spec has added data type constraint for every op, and its CPU
backend (currently is TFLite) has additional constraint. Add
corresponding constraint to each op in WebNN EP.

Note: Temporarily disable fp16 for CPU backend as which is planned to be
ready in Chromium next month.
2024-05-29 10:19:51 -07:00
Vincent Wang
e77f238dc6
Update Torch Version to Fix ATen CPU Pipeline Failure (#20845)
Update Torch Version to Fix ATen CPU Pipeline Failure.
2024-05-29 16:04:18 +08:00
Adrian Lizarraga
3044aa8743
[Quant tool] Extend support for QDQ type conversion at graph output (#20841)
### Description
Allows mixed-precision overrides that adds a QDQ quantization type
conversion sequence at a graph output that **is not** consumed by other
nodes. This is not a common use-case but should handle it instead of
raising an error.

#### Example
Original model

![image](https://github.com/microsoft/onnxruntime/assets/19691973/4c9c3bb0-4ca1-4213-9259-9d0506ed22f2)

mixed-precision overrides:
```python
        mixed_prec_overrides = {
            "input_0": [{"quant_type": QuantType.QUInt16}],
            "op_0_out": [
                {
                    "quant_type": QuantType.QUInt16,
                    "convert": {"quant_type": QuantType.QUInt8},
                }
            ],
        }
        quantize_static(
            float_model_path,
            qdq_model_path,
            data_reader,
            quant_format=QuantFormat.QDQ,
            activation_type=QuantType.QUInt8,
            op_types_to_quantize=[node.op_type for node in float_model.graph.node],
            extra_options={
                "TensorQuantOverrides": mixed_prec_overrides,
            },
        )
```

QDQ model:

![image](https://github.com/microsoft/onnxruntime/assets/19691973/804fc89b-4a00-43bc-a4ff-21edd6f27e98)

### Motivation and Context
This scenario is arising for certain quantization configurations. Should
handle it gracefully.
2024-05-28 21:27:54 -07:00
Yifan Li
d44be41e1c
[TensorRT EP] Support engine hardware compatibility (#20669)
### Description
<!-- Describe your changes. -->
- Introduce option `trt_engine_hw_compatible` to support engine hardware
compatibility for Ampere+ GPUs
- This enables `nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS` flag
when generating engines
- This option has been validated on sm80/86 GPUs, as engine can be
reused across different ampere+ arch:
- Client side need to enable this option as well to leverage existing
sm80+ engines
- If this option is enabled by users which TRT<8.6 or sm<80, there will
be a warning showing this option not supported

Engine naming:
| When | `trt_engine_hw_compat=false` | `trt_engine_hw_compat=true` |
| -------------- |
------------------------------------------------------------ |
------------------------------------------------------------ |
| A100 (sm80) |
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80**.engine
|
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine
|
| RTX3080 (sm86) |
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**86**.engine
|
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine
|


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reference:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#hardware-compat

---------

Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
2024-05-28 18:12:56 -07:00
Edward Chen
535e9d7114
Update package_release_tasks.py (#20835)
1. Move azcopy environment variables out of script and into an Azure DevOps variable group. Move towards consolidating the managed identity client ID definition in one place.
2. Disable azcopy overwrite. We don't want to accidentally change the files for a released package.
2024-05-28 17:50:25 -07:00
Ye Wang
362a623905
fix a build error with cuda 12.5 (#20770)
### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/20765
2024-05-28 10:46:24 -07:00
Adrian Lizarraga
e78b18a2fb
Increase ComponentDetection timeout for React Native CI (#20800)
### Description
Runs of the React Native CI are timing out during ComponentDetection
after 8 minutes. This increases the timeout value.



### Motivation and Context
Runs of the React Native CI are timing out during ComponentDetection.
2024-05-28 08:36:38 -07:00
Jian Chen
b1b8cb05dc
Adding java build and packaging stage to cuda-packaging-pipeline.yml (#20812)
### Description
Adding java build/packaging stage to `cuda-packaging-pipeline.yml`



### Motivation and Context
This way we can enable publishing the Java Cuda 12 along with Nuget CUDA
12
2024-05-27 07:59:19 -07:00
Chi Lo
454fcdde00
[TensorRT EP] Weightless API integration (#20412)
This PR includes the weight-stripped engine feature (thanks @moraxu for
the #20214) which is the major feature for TRT 10 integration.

Two TRT EP options are added:

- `trt_weight_stripped_engine_enable`: Enable weight-stripped engine
build and refit.
- `trt_onnx_model_folder_path`: In the quick load case using embedded
engine model / EPContext mode, the original onnx filename is in the
node's attribute, and this option specifies the directory of that onnx
file if needed.

Normal weight-stripped engine workflow:

![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56)
Weight-stripped engine and quick load workflow:

![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad)

see the doc [here
](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for
more information about EPContext model.

---------

Co-authored-by: yf711 <yifanl@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: wejoncy <wejoncy@163.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Yi Zhang <your@email.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: inisis <46103969+inisis@users.noreply.github.com>
Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com>
Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Dhruv Matani <dhruvbird@gmail.com>
Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com>
Co-authored-by: Xu Xing <xing.xu@intel.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Jian Chen <cjian@microsoft.com>
Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com>
Co-authored-by: Thomas Boby <thomas@boby.uk>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Michal Guzek <mguzek@nvidia.com>
Co-authored-by: George Wu <jywu@microsoft.com>
2024-05-26 12:24:17 -07:00
Changming Sun
439ed92b96
Remove TVM EP's pipeline (#20813)
### Description
Temporarily remove TVM EP's pipeline until someone helps us upgrade TVM
to a newer version which is compatible with the latest ONNX.

### Motivation and Context
The ONNX version that TVM EP uses has a known security vulnerability. We
cannot continue using it in our hosted build environment. This change is temporary
2024-05-25 20:42:41 -07:00
Adrian Lizarraga
5bae32eb34
Extend DoubleQDQPairsRemover to handle sequences that end in duplicate DQ nodes (#20759)
### Description
Extend the DoubleQDQPairsRemover optimizer to also handle sequences that
end in duplicate DQ nodes.

For example, the following sequence:
```
 Q1 --> DQ1 --> Q2 --+--> DQ2
                     |
                     +--> DQ2'
```
Is now simplified to:
```
 Q1 ---+--> DQ2
       |
       +--> DQ2'
```


### Motivation and Context
The EnsureUniqueDQNodeUnits pass may add duplicate DQ nodes to ensure
valid QDQ node units. The DoubleQDQPairsRemover should still be able to
remove unnecessary QDQ ops if the target sequence ends in duplicate DQ
nodes.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-05-24 18:30:15 -07:00
Chi Lo
a7bc49a565
[TensorRT EP] Use latest commit of onnx-tensorrt parser (#20758)
The 10 GA branch updated with several issues fixed.
https://github.com/onnx/onnx-tensorrt/commits/10.0-GA/
2024-05-24 16:44:16 -07:00
Suryaprakash Shanmugam
1765da17e4
QDQ transformations in the OpenVINO EP for the NPU device (#20622)
We introduce rulesets that eliminate QDQ nodes of unsupported types and
for unsupported quantised operators for the NPU device. This leads to
improved performance and accuracy on critical client AI models.

Here's a summary of the changes:

- Introduces the provider option `enable_qdq_optimizer` which when set
to `True` enables stripping of QDQ nodes on the NPU device for models
with `QuantizeLinear` and `DequantizeLinear` layers in them.
`enable_qdq_optimizer` defaults to `False`.
- Always strip out int16/uint16 QDQ layers as these types are not
supported by the NPU compiler.
- Only supported ops `Conv`, `MatMul`, and `Add` retain QDQ layers
around them, specifically identified for optimal inference performance.
OpenVINO EP achieves this by iterating through NodeUnits in the QDQ
model, and reconstructing the graph only with the required layers.
- Added provider APIs to manipulate node units from EP code by
@adrianlizarraga
- Added capability rule for the Pad operator when it takes DQ layers as
input
- Fixes from static code analysis tool

---------

Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
2024-05-24 16:25:05 -07:00
Adam Louly
ed8275883a
[Training] Add bf16 support to GatherElementsGrad. (#20796)
### Description
Adding bf16 support to GatherElementsGrad.

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>
2024-05-24 15:55:14 -07:00
Suryaprakash Shanmugam
76e1a06986
Fix ordering of value info in GraphProto creation (#20691)
### Description
Graph member value_info_ (unordered_set) is ordered before its values
are added to the graph proto.


### Motivation and Context

- Without this ordering, the model proto used by the OpenVINO EP is not
deterministic and varies across runs.
- Since the model proto varies, it affects caching attempts by OpenVINO.

Q: If creating a vector to have ordered elements is costly, should we
make value_info_ a std::set that is sorted according to NodeArg names?


Related PR about ordering initializers:
https://github.com/microsoft/onnxruntime/pull/14631
2024-05-24 10:49:32 -07:00
Peishen Yan
cfe68e489e
[WebNN EP] Support Trilu op (#20730)
Adds support for Trilu via WebNN Triangular op
2024-05-24 10:46:54 -07:00
Guenther Schmuelling
33a68d221f
add missing file for pr20791 (#20811)
this file should have been in pr20791 to allow fp16 in the tile
implementation
2024-05-24 09:59:13 -07:00
Jian Chen
10c425a4d5
Fix Onnx >= to == (#20798)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-24 09:16:23 -07:00
Jian Chen
fe24006425
Fix Nuget Cuda pipeline package pipeline (#20741)
### Description
<!-- Describe your changes. -->

This PR adding protoc.exe to make the Nuget Cuda Pipleine, which also
allowing it to get build Java for various CUDA version

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-24 09:15:57 -07:00
Satya Kumar Jandhyala
bab5037eab
Eliminate explicit Concat operations in Attention (#20556)
### Description
Remove explicitly concatinating pastKey with Key and pastValue with
Value.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-24 09:07:57 -07:00
Changming Sun
535a030b1e
Remove manylinux build scripts from python packaging pipeline (#20786)
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
https://github.com/onnx/onnx/issues/6047 and
https://github.com/microsoft/onnxruntime-genai/issues/257 .

### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2024-05-24 08:18:22 -07:00
Jian Chen
884acd4598
Fix Nuget-Cuda pubish pipeline (#20794)
### Description
Previous all feed are set to nightly, the offcial released feed-id is
not set


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-23 18:27:46 -07:00
Guenther Schmuelling
0cf7caaff2
[js/webgpu] enable fp16 for tile (#20791) 2024-05-23 16:59:39 -07:00
Edward Chen
d1af19db9d
Add some CPU feature detection for Apple platforms. (#20769)
Add CPUIDInfo::ArmAppleInit() to detect CPU features on Apple platforms. This initial implementation is not comprehensive.
2024-05-23 15:59:46 -07:00
Changming Sun
b522df0ae4
Update RE2 to the latest (#20775)
Update RE2 to the latest.

To keep the components up to date.
2024-05-23 14:30:15 -07:00
Yulong Wang
0996d6e19e
[tools] update pipeline list for run_CIs_for_external_pr.py (#20776)
### Description
add required pipeline "Linux Android Emulator QNN CI Pipeline"
2024-05-23 10:38:42 -07:00
Yi Zhang
fa8670fe5b
Add a test image for stable diffusion (#20780) 2024-05-23 08:50:23 -07:00
Wanming Lin
2c39d0c502
[WebNN EP] Disable ConvTranspose for WebNN CPU (#20762)
WebNN CPU backend implementation has been migrated from XNNPack to
TFLite, currently TFLite has not supported WebNN's convTranspose2d yet,
just disable it for now.
2024-05-22 20:59:37 -07:00
Adam Louly
529feb01f4
Add BF16 for Scale Op. (#20753)
Adding Bfloat16 to scale op

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>
2024-05-22 17:01:17 -07:00
Edward Chen
a39f8862fd
SQNBitGemm - move workspace size calculation functions to hardware-specific implementations (#20757)
The workspace usage may be hardware-specific. Moving away from a common workspace size calculation allows more flexibility in the hardware-specific implementations.
2024-05-22 15:12:17 -07:00
Jian Chen
d4fe4b5b51
Replace ubuntu-latest with onnxruntime-Ubuntu2204-AMD-CPU (#20736)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-22 13:36:02 -07:00
Jian Chen
0a10a3003a
component-governance fix round 4 (#20754)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-22 11:05:24 -07:00
Yulong Wang
e412bc1919
[doc] update file size table for ORT Web (#20755)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-22 11:04:57 -07:00
Xu Xing
f1fef19b6e
[js/webgpu] Support shared memory for transpose 2d (#19267)
For 1024x1024, without shared memoey, 18.7ms. With shared memory 13.2ms.
2024-05-22 08:15:44 -07:00
Yulong Wang
068bb3d5ee
[js/webgpu] add missing space in build script (#20752) 2024-05-21 16:24:34 -07:00
Chi Lo
df01e0d497
[TensorRT EP] Update ORT kernel output with TRT DDS int64 output for TRT 10 (#20738)
TRT 10 now natively supports int64 tensor, so needs to updating the code
where binding the ORT kernel output with DDS int64 output.
2024-05-21 09:03:48 -07:00
pengwa
8a98874e7e
Flash attention recompute (#20603)
### Flash attn recompute

1. Allow PythonOp(FlashAttn) can be recomputed correctly.
45879ff5c2
2. Use JSON to pass the selected-to-recompute subgraphs.
3c374da678

#### Better Memory Efficiency 

Customer model can run both PyTorch SPDA and Flash Attn, this PR make it
possible to let the Flash Attn path work with ORTModule layerwise
recompute. The peak drop from 45.xGB to 32.xGB if we only compare the
layers (not including other pieces, BTW there are few more optimization
targeting other pieces as well later).

#### Better Perf

Using Flash ATTN bring additionally 16% end to end time reduction, with
highly aligned loss curve.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/bb63894a-f281-49bc-a8e6-ff818439be38)

#### Use JSON File to pass Recompute Plans

To overcome the limitation of max length of the strings defined in
session options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-21 13:38:19 +08:00
Adrian Lizarraga
8acf60f35c
Layout transform: Fix-up QDQ units and add constant folding (#20685)
### Description

#### Problem 1: Broken Transpose QDQ unit
Layout transform's specialized cost function aggressively pushes down
transposes with channel-first or channel-last perms. This can lead to a
situation where a channel-fist/last Transpose gets stuck after being
pushed through an Unsqueeze node that makes the Transpose's perm no
longer channel-first/last. At this point, the specialized cost function
defers to the default const function, which does not see a need to
continue pushing this transpose node. This breaks the QDQ node units for
both the Unsqueeze and the Transpose: DQ -> Unsqueeze -> Transpose -> Q.

<img width="266" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/19691973/82f8432d-ca27-451b-8c36-c8d87b806e30">


The transpose optimizer should insert a Q -> DQ pair between the
Unsqueeze and Transpose nodes to fix both QDQ node units: DQ ->
Unsqueeze -> Q[new] -> DQ[new] -> Transpose -> Q

<img width="198" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/19691973/5a584bdf-e5db-4622-b3bb-83c060e09261">


#### Problem 2: Inserted Squeeze/Transpose nodes should be constant
folded when possible.
The transpose optimizer inserts Squeeze (and Transpose) ops between an
initializer and a DQ to counteract the effect of Unsqueezing that
initializer if it is consumed by multiple nodes. This results in a graph
where the inserted nodes are not in valid node units:

Original graph where two Mul nodes share a common initializer input:
<img width="456" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/19691973/4b9155ae-e32f-41fc-9136-f953b73e92e7">

Resulting graph after transpose optimization without constant folding:
<img width="452" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/19691973/3c1bfef1-d45f-4d6e-aa19-1c2929eae3f5">

Here, the circled Transpose and Squeeze nodes operate on a quantized
integer type but are not in valid QDQ node units. The solution is to run
constant folding, which results in:
<img width="405" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/19691973/aebdb91f-f38f-4583-adec-33e46126365f">


### Motivation and Context
Improve the layout transformation to allow more models to run on EPs
that prefer the channel-last layout.

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-05-20 20:19:06 -07:00
Jian Chen
372974e5d6
Using CPU pool to build Linux GPU C API Package (#20648)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 15:25:14 -07:00
Wanming Lin
87d49e3dda
[WebNN EP] Add WebNN operators doc to README.md (#20734) 2024-05-20 14:57:40 -07:00
Wanming Lin
0399d1b12d
[WebNN EP] Update chromium flag (#20732)
WebNN is currently enabled behind "Enables WebNN API" flag.
2024-05-20 14:57:30 -07:00
Jian Chen
ddafbf2224
Component Governance fix round 3 (#20689)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 13:39:09 -07:00
Jian Chen
11df22b59b
Reenabling Nuget Cuda Packaging Pipeline (#20688)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 10:37:15 -07:00
Edward Chen
fefae0cd04
Add Mac CI GitHub Actions workflow (#20717)
Add a new GitHub Actions workflow, `.github/workflows/mac.yml`. It contains these jobs:
- ARM64 MacOS CI build.
- Objective-C static analysis build. This was moved over from another Azure DevOps pipeline to make it more visible.
2024-05-20 10:27:03 -07:00
Preetha Veeramalai
ebed2c3785
Unified OV compile_model API in OVEP (#20700)
### Description
Have a unified API in OVEP that pass the ONNX graph proto from ORT to OV
for compilation


### Motivation and Context
The earlier implementation used two different flows when onnx model path
is present vs model laoded from memory.
The former directly passed the onnx model path to OV when the graph is
fully supported by EP. While the latter pass the ORT model proto to OV.

This cause a difference in results when ORT optimizations are enabled.
This PR address this issue.
2024-05-20 10:20:28 -07:00