Commit graph

7751 commits

Author SHA1 Message Date
Baiju Meswani
2c29938846
[QAT] Introduce FakeQuant op (#13649) 2022-11-29 08:43:37 -08:00
sfatimar
49c3768985
Enabled ops for DeBERTa model (#13690)
### Description
Enabled GatherElements Ops to enable DeBERTA Model



### Motivation and Context
- This change is required to enable DeBerta Model which is relevant to
MSFT
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
2022-11-28 22:39:32 -08:00
pengwa
7c53b6eee8
Skip the tests of saving tensor in backward (#13767)
### skip the tests of saving tensor in backward

The test failed randomly; Let's skip it until the issue got fixed to
unblock the CIs.
2022-11-29 13:02:26 +08:00
Vincent Wang
3c258c878c
[CUDA] Optimize Slice Kernel (#13641)
The PR optimizes Slice CUDA kernel by two ways:
- Coalesce dimensions so less divmod during the kernel compute
- Split data load and write for better memory throughput

Below shows some perf results (cycles number from Nsight Compute) in
V100 using real cases from Huggingface's XLNet model:

  | Old | New
-- | -- | --
[8,12,2048,1024], axis=2, start=1, end=2048 | 1838687| 1539846
[8,12,1024,2047], axis=3, start=0, end=1024 | 951383| 722203
2022-11-29 09:18:03 +08:00
JiCheng
47780b7f3b
[XNNPACK] add more computation heavy ops (#13270)
### Description
This is the first PR of adding remaining Ops for XNPACK EP,
I am gonna add:
- [x] ConvTranspose f32 qu8 q s8
- [x] ~~UnMaxpool   f32 qu8 qs8~~
- [x] Resize f32 qu8 q s8
- [ ]  GEMM see https://github.com/microsoft/onnxruntime/pull/13126

The remains operation support would be seperated into another PR.

### Motivation and Context
2022-11-29 09:09:26 +08:00
Dmitri Smirnov
4fbe16e493
Ifdef cpuinfo code on platforms we do not set affinity (#13486)
### Description
Remove code that invokes cpuinfo library on platforms we do not set
affinity.

### Motivation and Context
`cpuinfo` library increases binary size.
2022-11-28 13:44:16 -08:00
Guenther Schmuelling
2d523c507e
for wasm catch exceptions at top level api (#13644)
fix for https://github.com/microsoft/onnxruntime/issues/13383,
https://github.com/microsoft/onnxruntime/issues/13408

Currently ort-web doesn't catch exceptions because turning on exception
catching increases the binary size by 3MB (~30%).
But ort can throw (ie onnx errors or ORT_ENFORCE) and there is no
useable error message.

Turning on exception catching just for top level api released file will
fix the error messages at minimal increase of binary size.
2022-11-28 10:24:34 -08:00
Faith Xu
b7c3862330
Update resource section in readme (#13724)
### Description
- adds link to release plans page
- adds link to youtube channel
2022-11-28 09:42:31 -08:00
Jicheng Tang
b4a4fa5aac
Fix compile error with protobuf RepeatedIterator (#13731)
### Description
<!-- Describe your changes. -->
There are some compile errors with
google::protobuf::internal::RepeatedIterator.
replace reinterpret_cast with &(*iter), which iter is RepeatedIterator
type.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My protobuf version is:
- libprotoc 3.21.5
- g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

when I use build command:
```
./build.sh --use_cuda --cudnn_home /usr --cuda_home /usr/local/cuda --config Debug --build_shared_lib --parallel 
```

There are some compile errors like this:

- error 1
onnxruntime/test/util/test_utils.cc:186:105: error: no matching function
for call to ‘make_span(google::protobuf::RepeatedField<long
int>::const_iterator, google::protobuf::RepeatedField<long
int>::const_iterator)’
186 | ind_span = gsl::make_span(indices_proto.int64_data().cbegin(),
indices_proto.int64_data().cend());

- error 2
onnxruntime/test/onnx/tensorprotoutils.cc:101:56: error: invalid cast
from type ‘google::protobuf::internal::RepeatedIterator<const long
unsigned int>’ to type ‘const uint32_t*’ {aka ‘const unsigned int*’}
  101 |       *p_data++ = *reinterpret_cast<const T*>(data_iter);
2022-11-28 09:33:53 -08:00
Numfor Tiapo
aa1390e963
Fix Prefast Errors (#13675)
Fixes all C28204, C6031, and C26814 prefast errors.

Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>
2022-11-28 09:16:22 -08:00
Ted Themistokleous
c6bea4f02f
Modify MIGraphX EP for Accuracy tests (#13455)
Allows MIGraphX EP to run the following additional tests. Also adds support to get MIGraphX to run eval_squad.py

Reference to the Rocm EP changes: https://github.com/microsoft/onnxruntime/pull/13306

Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2022-11-27 18:26:49 +08:00
Yufeng Li
4ca62b9ee8
fix build break in test/beam_search_topk.cc (#13739) 2022-11-23 21:20:51 -08:00
Vincent Wang
47e7630378
[CUDA] Transpose3DImpl Supporting more Cases (#13611)
CUDA's Transpose3DImpl is to transpose [batch, m, n] to [batch, n, m].
Currently it requires both m and n can be divided by 32 or 16. If it's
not this case, the compute will fallback to general implementation,
which is slow. This PR is to remove the limitation.

Profiling in V100 using below size of tensors, got the cycles number
from Nsight Compute:
  | Old | New
-- | -- | --
[3072,64,512] | 760793 | 727140
[3072,16,2048] | 854303 | 851146
[3072,2048,12] | 986924 | 737884
[3072,1024,24] | 1212427 | 495117

It shows that even we added extra IF statements to the kernel
implementation, it has nearly no impact to the old version (case 1 and
2). And for case 3 and 4 which will fallback to general implementation
before, it's much faster.

Above data was collected using FP16 tensors, similar results was
observed for float tensors.

This PR is to enhance the perf of ORT training of Huggingface's XLNet
model which has[8,1024,1024,12].permute(0,3,1,2).
2022-11-24 09:40:48 +08:00
Yi Zhang
87d5703b14
skip TestCUDAProviderOptions in End2EndTest (#13737)
### Description
<!-- Describe your changes. -->
Skip the test with --filter in runtest.sh

### Motivation and Context
Recently, the Zip-Nuget-Java-Nodejs Packaging Pipeline always failed in
Nuget_Test_Linux_GPU.
To unblock the packaging workflow, skip the test in Nuget_Test_Linux_GPU
temporally.
the exception message is below.
```
[xUnit.net 00:07:26.28]     TestCUDAProviderOptions [FAIL]
  Failed TestCUDAProviderOptions [1 m 19 s]
  Error Message:
   Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running FusedConv node. Name:'' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:342 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Available memory of 11416064 is smaller than requested bytes of 134217728

  Stack Trace:
     at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.RunImpl(RunOptions options, IntPtr[] inputNames, IntPtr[] inputValues, IntPtr[] outputNames, DisposableList`1 cleanupList)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames, RunOptions options)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs)
   at Microsoft.ML.OnnxRuntime.Tests.CUDATest.TestCUDAProviderOptions() in /mnt/vss/_work/1/s/csharp/test/Microsoft.ML.OnnxRuntime.Tests.NetCoreApp/InferenceTest.netcore.cs:line 93

Failed!  - Failed:     1, Passed:     0, Skipped:     0, Total:     1, Duration: < 1 ms - /mnt/vss/_work/1/s/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/bin/Debug/netcoreapp3.1/Microsoft.ML.OnnxRuntime.EndToEndTests.dll (netcoreapp3.1)
       Done executing task "Microsoft.TestPlatform.Build.Tasks.VSTestTask" -- FAILED.
     1>Done building target "VSTest" in project "Microsoft.ML.OnnxRuntime.EndToEndTests.csproj" -- FAILED.
     1>Done Building Project "/mnt/vss/_work/1/s/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/Microsoft.ML.OnnxRuntime.EndToEndTests.csproj" (VSTest target(s)) -- FAILED.
```
2022-11-23 14:56:04 -08:00
Ye Wang
c1bda4c1cc
fix buffer overuse in addtofeed() (#13733)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-23 10:53:53 -08:00
Tianlei Wu
e306b44e98
Improve coverage of fused MHA in Attention (#13732)
Previously, fused attention was applied to limited sequence lengths (64,
96, 128, 256, 384, 512). This will expand support all sequence lengths
<= 384 for V100 and T4, or 512 for A100.

Previously, fused attention only works for batch_size=1. After this
change, fused MHA has no limit on batch_size.

## Accuracy Tests on SQuAD

Using optimized fp16 onnx model of
distilbert-base-cased-distilled-squad, we test the CUDA EP with IO
Binding using eval_squad.py:

disable_fused_attention | batch_size | sequence_length | exact | f1 |
samples_per_second | latency_in_ms
-- | -- | -- | -- | -- | -- | --
TRUE | 1 | 384 | 79.6 | 86.8 | 283.5 | 3.5
TRUE | 2 | 384 | 79.6 | 86.8 | 308.3 | 3.2
FALSE | 1 | 384 | 79.6 | 86.8 | 313.2 | 3.2
FALSE | 2 | 384 | 79.6 | 86.8 | 340.9 | 2.9
TRUE | 1 | 300 | 79.3 | 86.6 | 278.5 | 3.6
TRUE | 2 | 300 | 79.4 | 86.6 | 301.8 | 3.3
FALSE | 1 | 300 | 79.4 | 86.6 | 305.8 | 3.3
FALSE | 2 | 300 | 79.4 | 86.6 | 335.9 | 3.0

It shows that with/without fused attention could achieve same accuracy. 

Note that latency number here is just for reference (eval_squad.py has
not been optimized for speed). We can see that it is about 10% faster
with fused attention than without fused attention.

version of package used: onnx 1.12.0, torch 1.13.0, transformers 4.24.0,
optimum 1.5.0, datasets 2.7.0, evaluate 0.3.0

## Performance Test of base-based-cased on T4 GPU
```
sudo nvidia-smi -rgc
export ORT_DISABLE_FUSED_ATTENTION=0
python benchmark.py -m bert-base-cased -e onnxruntime -g -p fp16 -o by_script -i 3 -t 1000 -b 1 8  -s 8 16 32 64 80 96 120 128 --use_mask_index --overwrite
```

Disable_Fused_Attention | b1_s8 | b1_s16 | b1_s32 | b1_s64 | b1_s80 |
b1_s96 | b1_s120 | b1_s128 | b8_s8 | b8_s16 | b8_s32 | b8_s64 | b8_s80 |
b8_s96 | b8_s120 | b8_s128
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
| -- | --
FALSE | 1.32 | 1.28 | 1.33 | 1.51 | 1.71 | 1.79 | 1.99 | 2.04 | 1.56 |
1.99 | 2.85 | 4.88 | 6.03 | 7.03 | 9.2 | 9.34
TRUE | 1.37 | 1.34 | 1.44 | 1.68 | 1.89 | 1.99 | 2.15 | 2.21 | 1.63 |
2.31 | 3.19 | 5.48 | 6.98 | 8.14 | 10.54 | 10.66
Latency Reduction  | 3.6% | 4.5% | 7.6% | 10.1% | 9.5% | 10.1% | 7.4% |
7.7% | 4.3% | 13.9% | 10.7% | 10.9% | 13.6% | 13.6% | 12.7% | 12.4%

Perf gain is observed in all sequence lengths tested.
2022-11-23 10:19:04 -08:00
Changming Sun
87e6a26c5d
Enforce Prefast check in Windows CPU CI pipeline (#13735)
Right now we fix the warnings in an ad-hoc way. We run static analysis
in nightly builds, then create work items for the finding it found. Our
CI build pipelines run the same scan but do not break the build. So,
this PR will fix the remaining findings in the CPU EP(including the
training part) and enforce the check. Later on we can continue to expand
the scope.

We still have some warnings left in the JNI part. I will try to address
them later in the next month.
2022-11-23 09:25:02 -08:00
Ted Themistokleous
9168e25738
Patch eval_squad.py script for Python < 3.8 and multiple Execution Providers (#13524)
Need this for benchmarks to function correctly with older containers

This fixes import errors when attempting to run eval_squad.py to
evaluate bert distilled models

Adds a change to the previously merged #12947 which fails when using
Python version < 3.8 to run this script.

Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2022-11-23 15:37:39 +08:00
PeixuanZuo
977da6635b
[ROCm] Remove tuning options on transformerOptions (#13689)
### Description
<!-- Describe your changes. -->

Remove tuning options on transformerOptions, use IsTunableOpEnabled from
provider in the future.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-23 15:36:09 +08:00
Yufeng Li
c43ce64795
Beam search TopK improvement (#13594)
### Description
<!-- Describe your changes. -->

TopK in BeamSearch retrieves top 2*beam next tokens based on logit
score, specifically computing top [batch, 2*beam] tokens based on score
[batch, beam, vocab_size].

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Current implementation use batch as the grid and each thread block
compute top 2*beam from [beam, vocab_size]. It is inefficient because:
1. batch size is usually small( <32) and can not fully leverage GPU's
SMs; 2. vocab_size is usually more than 50k. It is inefficient to
compute 50k * beam in one thread block.

This PR split the topk computation into multiple stages: 
- for small beam size, split [batch, beam, vocab_size] to [batch, beam,
parts_of_vocab, vocab_size_per_part]
- 1st stage, each thread block compute top 2*beam from
vocab_sizer_per_part and gets [batch, beam, parts_of_vocab, 2*beam]
- 2nd stage, each thread block compute top 2*beam from parts_of_vocab
*(2*beam} and gets [batch, beam, 2*beam]
  - last stage, compute [batch, 2*beam] from [batch, beam, 2*beam]
- for large beam size, 1st stage computes [batch, beam, 2*beam] from
[batch, beam, vocab_size] and 2nd stage computes [batch, 2*beam] from
[batch, beam, 2*beam].

With the change, performance improves a lot, it reduces ~100us from 2ms
for batch:4, beam:4, vocab_size:~50k.
2022-11-22 21:24:27 -08:00
apsonawane
7857f59d2b
Use sequences to create initial feeds for decoder subgraph (#13719)
Use sequences to create initial feeds for decoder subgraph instead of
beam_next_tokens

### Description
For TuLG models exporting of decoder is different from bart model.
Passing beam_next_tokens to the decoder while ort inferencing generated
incorrect result from pytorch inference.
This change will use sequences as inputs for the first iteration as well


### Motivation and Context
Pytorch and ORT inference for TuLG models was incorrect, keeping pytorch
as correct result we modified ort to match the result.
2022-11-22 18:00:58 -08:00
Baiju Meswani
fb85b31fac
Remove protobuf pin from training requirements (#13695) 2022-11-22 12:27:18 -08:00
Yulong Wang
2bebe6189a
set node schema when apply NHWC transformer (#13660)
### Description
set node schema when apply NHWC transformer

### Motivation and Context
The implementation in `IExecutionProvider::GetCapability()` checks node
schema to determine the capability of the current EP. If NHWC graph
transformer created a new channel last `Conv` node to replace the
channel first `Conv` node, we need to assign the schema to the replaced
node.
2022-11-22 12:26:52 -08:00
Patrice Vignola
ce460f9cdb
[DML EP] Return device removal reason when D3D12 device gets removed (#13727)
### Description
Before this change, when the D3D12 device was getting removed, we were
returning a generic device removed error, which can be harder to
investigate.



### Motivation and Context
It makes it easier to debug and investigate device removal failures.
2022-11-22 10:38:56 -08:00
Patrice Vignola
6c5333e1a7
[DML EP] Enable more DML tests (#13726)
### Description
Enables more DML tests.



### Motivation and Context
It increases test coverage that was missing for the DML EP
2022-11-22 10:35:16 -08:00
Adam Pocock
dd2c031d95
[java] Sparse tensor support (#10653)
**Description**:

Adds support for creating and receiving sparse tensors in the ORT Java
API.

CSRC and COO tensors as inputs are tested, but there is no op which
accepts a block sparse tensor to test. COO tensors are tested as
outputs, but there is no op which emits a CSRC or block sparse tensor to
test.

**Motivation and Context**
- Why is this change required? What problem does it solve? Request to
expose ORT sparse tensor support in Java.

cc @yuslepukhin
2022-11-22 10:29:24 -08:00
Tianlei Wu
8b0e0f4927
Add RemovePadding and RestorePadding for BERT model (#13701)
Add two operators RemovePadding and RestorePadding based on ideal of
effective transformer (https://github.com/bytedance/effective_transformer) to improve large
batch size inference for BERT model.
2022-11-22 10:00:23 -08:00
guyang3532
ba9a585fcc
Fix the tensor save for backward release problem (#13679)
Motivation:
PythonOp is saving input for backward, it's risky since ONNX Runtime
backend is not aware of this, the tensor buffer may be "released" by
ORT, then potentially modified by other operators before backward
function executes.

Fix:
This pr just clone all input of PythonOp before forward is invoked. This
may be high overhead, it's just a workaround before a better fix.
2022-11-22 17:32:19 +08:00
pengwa
947aab0ae0
Make HF converge with lighting native amp (#13616)
### Fix training convergence issues 

#### Problem:

Huggingface Transformers: 4.22.0
PyTorch Lightning: 1.6.3 
PyTorch: v1.12.1, cuda 11.6
ORT: main branch, cuda 11.6

Model: RobertaForSequenceClassification @
models/roberta/modeling_roberta.py
Mixed Precision training with `torch.autocast`:
a64e1dfd7d/pytorch_lightning/plugins/precision/native_amp.py (L99)

Under this amp autocast context, forward + loss computation run. Here is
a snippet of loss computation.

```
        if labels is not None:
                ...
            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                   ...
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                **loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))**
            elif self.config.problem_type == "multi_label_classification":
                ...

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )
```

It is found after forward run, loss is 1.0850 in float16, looks good..
Then it did a scaling up here:
a64e1dfd7d/pytorch_lightning/plugins/precision/native_amp.py (L62),
the scaler is 65536. then we get a scaled loss 71104 in float type
(because float16 loss multiple fp32 scaler, type got promoted to fp32).
Then backward started with initial grads to be 1, then 1 (float32) *
65536 (float32) as the backward step, generating a float16 gradient,
then we got a `inf`. The problem occurs. With `inf`, the backward feed
the `inf` into crossentropygradient op, generating `nan`s. Then all
gradients got `nan` in back propagation.

So we see training with ORTModule (it almost always `overflow`, the loss
did not drop too much, as compared with PyTorch).

#### Analysis for the UT (when autocast enabled)

PyTorch trace graph looks like this :

```
graph(%0 : Float(16, 3, strides=[3, 1], requires_grad=0, device=cuda:0),
      %target : Long(16, strides=[1], requires_grad=0, device=cuda:0),
      %2 : Float(3, 3, strides=[3, 1], requires_grad=1, device=cuda:0)):
  %9 : int = prim::Constant[value=5]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %10 : bool = prim::Constant[value=0]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %11 : bool = prim::Constant[value=0]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %12 : NoneType = prim::Constant()
  %13 : Half(3, 3, strides=[3, 1], requires_grad=0, device=cuda:0) = aten::to(%2, %9, %10, %11, %12) # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %14 : int = prim::Constant[value=5]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %15 : bool = prim::Constant[value=0]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %16 : bool = prim::Constant[value=0]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %17 : NoneType = prim::Constant()
  %18 : Half(16, 3, strides=[3, 1], requires_grad=0, device=cuda:0) = aten::to(%0, %14, %15, %16, %17) # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %19 : NoneType = prim::Constant()
  %input : Half(16, 3, strides=[3, 1], requires_grad=0, device=cuda:0) = aten::linear(%18, %13, %19) # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %21 : NoneType = prim::Constant()
  %22 : int = prim::Constant[value=1]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py:3,014:0
  %23 : int = prim::Constant[value=-100]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py:3,014:0
  %24 : float = prim::Constant[value=0.]() # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py:3,014:0
  %data : Float(requires_grad=0, device=cuda:0) = **aten::cross_entropy_loss(%input, %target, %21, %22, %23, %24) # /opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py:3,014:0**
  %27 : Float(requires_grad=0, device=cuda:0) = ^_OutputIdentityOp()(%data) # /opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_io.py:430:0
  return (%27)
```

The most important lines 

%target : Long(16, strides=[1], requires_grad=0, device=cuda:0),
%input : **_Half_**(16, 3, strides=[3, 1], requires_grad=0,
device=cuda:0) = aten::linear(%18, %13, %19) #
/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
**_Float_**(requires_grad=0, device=cuda:0) =
aten::cross_entropy_loss(**%_input_**, %target, %21, %22, %23, %24) #
/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py:3,014:0


`aten::cross_entropy_loss` takes Half input, and return Float output. As
said in doc:
https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32,
`cross_entropy` in autocast mode will run in fp32 mode, e.g. convert its
input to fp32 (if it is not), do the compute and return fp32 result. The
other hand, ORT's `SoftmaxCrossEntropyLossInternal` take same types of
input and output, and our code
31cb3cb254/orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py (L68)
when exporting `aten::cross_entropy_loss` assumed this, and set the
output to be fp16 either. So this is the reason we have the problem.

#### Possible Fixes
1. Enhance `SoftmaxCrossEntropyLossInternal` to support different types
of input and output.
2. Check the input and output when exporting, add the input case
explicitly if there is type promotion from input to output.

This PR used the 2nd approach. We can start 1st approach when needed
later.

TODO: revisit all other exporter functions, add the checks, etc. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-22 15:08:30 +08:00
Changming Sun
67e46a873a
Add '-DCMAKE_OSX_ARCHITECTURES=x86_64;arm64' when build protobuf from source on MacOS (#13720)
### Description
Add '-DCMAKE_OSX_ARCHITECTURES=x86_64;arm64' when build protobuf from
source on MacOS. Because later on we will the built library with the
other parts of onnxruntime to generate libonnxruntime.dylib, and if the
target CPU ARCH of libonnxruntime.dylib is not x86_64, it will fail.

### Motivation and Context
To fix a packaging pipeline failure, which was introduced from #13694
2022-11-21 21:59:34 -08:00
PeixuanZuo
8f3c6ea0df
[ROCm] Add GemmFastGelu TunableOp (#13589)
### Description
<!-- Describe your changes. -->

1. Update the rules for GemmFastGelu fusion, MatMul input x should >=
two dimension, input weight should == two dimension.
2. Add GemmFastGelu fusion test.
3. Add GemmFastGelu TunableOp, only contains the original
implementation(Gemm + FastGelu).


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-22 12:58:01 +08:00
PeixuanZuo
45a895cdc3
[ROCm] Fix static TunableOp (#13668)
### Description
<!-- Describe your changes. -->

1. Re-add staticSelectionOp for FastGelu.
2. Call TunableOp when enable tuning. Call StaticSelectionOp when
disable tuning.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-22 10:51:54 +08:00
Yulong Wang
f1b5e4f1c9
[js] [deps] upgrade @xmldom/xmldom@0.7.9 (#13705)
### Description
upgrade @xmldom/xmldom@0.7.9



### Motivation and Context
```
yarn audit
yarn audit v1.22.19
┌───────────────┬──────────────────────────────────────────────────────────────┐
│ critical      │ xmldom allows multiple root nodes in a DOM                   │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ @xmldom/xmldom                                               │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Patched in    │ >=0.7.7                                                      │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ @expo/config-plugins                                         │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ @expo/config-plugins > @expo/plist > @xmldom/xmldom          │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://www.npmjs.com/advisories/1084900                     │
└───────────────┴──────────────────────────────────────────────────────────────┘
1 vulnerabilities found - Packages audited: 952
Severity: 1 Critical
Done in 3.51s.
```
2022-11-21 17:01:42 -08:00
Seungwon Jeong
307ad1413a
[js/web] support 'pytorch_half_pixel' mode for WebGL kernel 'Resize' (#11208)
**Description**: 
1. add pytorch_half_pixel interpolation mode in resize-packed.ts
Changes: add the following case in createPackedResizeProgramInfo
function:
```
case 'pytorch_half_pixel':
          getSourceFracIndex = `
                    vec4 getSourceFracIndex(ivec4 coords) {
                        vec4 fcoords = vec4(coords);
                        return vec4(
                            ${outputWidth}.0 > 1.0 ? (fcoords.x + 0.5) / scaleWHWH.x - 0.5 : 0.0,
                            ${outputHeight}.0 > 1.0 ? (fcoords.y + 0.5) / scaleWHWH.y - 0.5 : 0.0,
                            ${outputWidth}.0 > 1.0 ? (fcoords.z + 0.5) / scaleWHWH.z - 0.5 : 0.0,
                            ${outputHeight}.0 > 1.0 ? (fcoords.w + 0.5) / scaleWHWH.w - 0.5 : 0.0
                          );
                    }
                `;
          break;
```
2. fix "unrecognized input '' for node: Resize_$num" error when inputs
like [input_tensor, None, scale_factor] (roiInput not given) are fed
into the resize layer.
Changes: change in input handling logic in upsample.ts & node scanning
logic in graph.ts

**Motivation and Context**
Before this fix, we aren't able to use webGL backend when the neural
network contains pytorch resize layers. This fix adds
'pytorch_half_pixel' interpolation mode support and makes it possible to
use webGL backend for more kind of computer vision networks.

This commit solves:
#10430

Co-authored-by: neo <neo@icode-lab.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2022-11-21 12:03:48 -08:00
shalvamist
3119381011
ORT Web build script (#12643)
**Description**:
Adding a few scripts to enable user to build ORT Web in a simpler way.

**Instructions**:
Under ROOT\js folder you will have 2 scripts - 
1. "Build_web.bat" - for Windows users
1. "Build_web.sh" - for Linux users

Default build configuration is "Release" to change the build configuration just add to the script call the flag "--config <Desired configuration>". As example:
```
build_web.bat --config Debug
```

Co-authored-by: shalvamist <shalva.mist@microsoft.com>
2022-11-21 11:08:39 -08:00
Changming Sun
a5c2047dd1
Fix the remaining Prefast warnings in CPU EP (#13707)
### Description

Fix the remaining Prefast warnings in CPU EP.
2022-11-21 10:21:38 -08:00
cloudhan
8de5381e84
Add IsSupported support to Op functor (#13692)
Sometime it is a bit risky to call the Op directly to check whether the
impl supports consuming the param. This gives the user a way to actually
implement `IsSupported` for checking in non-compact way.
2022-11-21 19:22:00 +08:00
shalvamist
4a2a857030
Bug Fix - WASM build break (#13699)
### Description
When using the build flag "--cmake_extra_defines
onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1" with WASM it results with a
build break. Since we are comparing a const vs. non-const T type, this
added casting resolves the issue.
2022-11-20 23:30:31 -08:00
PeixuanZuo
da2bd3ad4d
[ROCm] Build ROCm CI with Release config and enable kernel explorer test (#13687)
### Description
<!-- Describe your changes. -->
1. Build ROCm CI with Release config to save time.
2. use 32 threads to build, we have 256 threads on new CI machine.
3. enable ROCm kernel explorer test.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-21 10:04:10 +08:00
dependabot[bot]
8472876155
Bump socket.io-parser from 4.0.4 to 4.0.5 in /js/web (#13608)
Bumps [socket.io-parser](https://github.com/socketio/socket.io-parser)
from 4.0.4 to 4.0.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/socket.io-parser/releases">socket.io-parser's
releases</a>.</em></p>
<blockquote>
<h2>4.0.5</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check the format of the index of each attachment (<a
href="b559f050ee">b559f05</a>)</li>
</ul>
<h4>Links</h4>
<ul>
<li>Diff: <a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/socket.io-parser/blob/main/CHANGELOG.md">socket.io-parser's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">4.0.5</a>
(2022-06-27)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check the format of the index of each attachment (<a
href="b559f050ee">b559f05</a>)</li>
</ul>
<h1><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.2...4.2.0">4.2.0</a>
(2022-04-17)</h1>
<h3>Features</h3>
<ul>
<li>allow the usage of custom replacer and reviver (<a
href="https://github-redirect.dependabot.com/socketio/socket.io-parser/issues/112">#112</a>)
(<a
href="b08bc1a93e">b08bc1a</a>)</li>
</ul>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.1...4.1.2">4.1.2</a>
(2022-02-17)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>allow objects with a null prototype in binary packets (<a
href="https://github-redirect.dependabot.com/socketio/socket.io-parser/issues/114">#114</a>)
(<a
href="7f6b262ac8">7f6b262</a>)</li>
</ul>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.0...4.1.1">4.1.1</a>
(2021-10-14)</h2>
<h1><a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.1.0">4.1.0</a>
(2021-10-11)</h1>
<h3>Features</h3>
<ul>
<li>provide an ESM build with and without debug (<a
href="388c616a92">388c616</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f3329eb5a4"><code>f3329eb</code></a>
chore(release): 4.0.5</li>
<li><a
href="b559f050ee"><code>b559f05</code></a>
fix: check the format of the index of each attachment</li>
<li>See full diff in <a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=socket.io-parser&package-manager=npm_and_yarn&previous-version=4.0.4&new-version=4.0.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-19 12:55:21 -08:00
Nat Kershaw (MSFT)
43a7b520e4
Convert label config to one line regexes (#13702) 2022-11-19 11:38:29 -08:00
Yulong Wang
2d732e9729
[js] [deps] upgrade minimatch@3.1.2 (#13703)
### Description
upgrade minimatch@3.1.2



### Motivation and Context
```
# npm audit report

minimatch  <3.0.5
Severity: high
minimatch ReDoS vulnerability - https://github.com/advisories/GHSA-f8q6-p94x-37v3
```
2022-11-18 22:27:57 -08:00
Hariharan Seshadri
c7329e004d
Improve fp16 performance of GPT-2's logits MatMul while using BeamSearch (#13686) 2022-11-18 18:50:19 -08:00
dependabot[bot]
c358d64b0e
Bump loader-utils from 2.0.0 to 2.0.4 in /js/web (#13666)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.0
to 2.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/webpack/loader-utils/releases">loader-utils's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.4</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.3...v2.0.4">2.0.4</a>
(2022-11-11)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)
(<a
href="ac09944dfa">ac09944</a>)</li>
</ul>
<h2>v2.0.3</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.3">2.0.3</a>
(2022-10-20)</h3>
<h3>Bug Fixes</h3>
<ul>
<li><strong>security:</strong> prototype pollution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)
(<a
href="a93cf6f470">a93cf6f</a>)</li>
</ul>
<h2>v2.0.2</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.2">2.0.2</a>
(2021-11-04)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)
(<a
href="8c2d24ee40">8c2d24e</a>)</li>
</ul>
<h2>v2.0.1</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.1">2.0.1</a>
(2021-10-29)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)
(<a
href="1069f61284">1069f61</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md">loader-utils's
changelog</a>.</em></p>
<blockquote>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.3...v2.0.4">2.0.4</a>
(2022-11-11)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)
(<a
href="ac09944dfa">ac09944</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.3">2.0.3</a>
(2022-10-20)</h3>
<h3>Bug Fixes</h3>
<ul>
<li><strong>security:</strong> prototype pollution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)
(<a
href="a93cf6f470">a93cf6f</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.2">2.0.2</a>
(2021-11-04)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)
(<a
href="8c2d24ee40">8c2d24e</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.1">2.0.1</a>
(2021-10-29)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)
(<a
href="1069f61284">1069f61</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6688b50281"><code>6688b50</code></a>
chore(release): 2.0.4</li>
<li><a
href="ac09944dfa"><code>ac09944</code></a>
fix: ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)</li>
<li><a
href="7162619fb9"><code>7162619</code></a>
chore(release): 2.0.3</li>
<li><a
href="a93cf6f470"><code>a93cf6f</code></a>
fix(security): prototype polution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)</li>
<li><a
href="90c7c4be17"><code>90c7c4b</code></a>
chore(release): 2.0.2</li>
<li><a
href="8c2d24ee40"><code>8c2d24e</code></a>
fix: base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)</li>
<li><a
href="5fb5562084"><code>5fb5562</code></a>
chore(release): 2.0.1</li>
<li><a
href="1069f61284"><code>1069f61</code></a>
fix: md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)</li>
<li>See full diff in <a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=loader-utils&package-manager=npm_and_yarn&previous-version=2.0.0&new-version=2.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-18 18:01:25 -08:00
Edward Chen
4901987d1d
Remove SafeInt dependency from Objective-C API. (#13698) 2022-11-18 17:06:12 -08:00
Changming Sun
3e9e5e9d6d
Patch Protobuf and ONNX's cmake files and enforce BinSkim check (#13694)
Patch Protobuf and ONNX's cmake files and enforce BinSkim check.

This PR has overlap with #13523 . I would prefer to get this one merged
first so that we can finished the BinSkim work, and I try to make this
PR as small as possible.
2022-11-18 10:09:47 -08:00
Wei-Sheng Chin
6160ba0692
Fix aten::_to_copy in DORT (#13682)
`aten::_to_copy` is not exportable to ONNX. In DORT, so it's replaced in 
`_replace_to_copy_with_to`. This replacement logic becomes incorrect in latest PyTorch
commit, and this PR is a fix.

Basically, we examine more key-word attributes passed to
`aten::_to_copy` and if they lead to a type casting operator (i.e.,
mapped to ONNX's Cast), we replace that `aten::_to_copy` with
`aten::to`. Unsupported attributes are removed (with a low risk of
breaking FX graph's assumptions).
2022-11-18 09:31:18 -08:00
Vincent Wang
07812a2fa6
Fix UT Failure on AMD for ORTModule's Conv Test (#13688)
Currently provider option conv_algo_search is for CUDA only, so remove
the checking for ROCm EP.
2022-11-18 17:52:22 +08:00
Changming Sun
7a57976d1a
Make natvis files work better (#13665)
### Description
After this change, you will see GSL.natvis and wil.nativs files will be
added to every onnxruntime_xxx project.

Like this:

![image](https://user-images.githubusercontent.com/856316/202081013-314145a8-7a0f-4f45-bf85-f9ed0e247c63.png)

This is because in onnxruntime_common.cmake we have:

```cmake
    if (MSVC)
    set(ABSEIL_NATVIS_FILE "abseil-cpp.natvis")
    target_sources(
        onnxruntime_common
        INTERFACE $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/external/${ABSEIL_NATVIS_FILE}>)
  endif()
```
It sets a property, INTERFACE_SOURCES, on the target
"onnxruntime_common".

Then if anyone else uses:
```
target_link_libraries(mytarget PRIVATE onnxruntime_common)
```
The nativis file will be added to `mytarget`.

However, in this project we don't use such things for the targets that
are static libraries. For example, onnxruntime_graph is a static
library.

Instead, we use the `onnxruntime_add_include_to_target ` function to
explicitly control what we want to propagate . The function was written
before we started to have nativis files. So it doesn't pass a source
file from one static library to another. Now we have the need. Probably
only for Windows.

### Motivation and Context

Add natvis  files to every project.
2022-11-17 19:13:40 -08:00
Ye Wang
38a74af45d
Support position_ids broadcasting in EmbedLayerNorm (#13677)
### Description
<!-- Describe your changes. -->


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

fix https://github.com/microsoft/onnxruntime/issues/13508
2022-11-17 17:56:27 -08:00