Commit graph

10380 commits

Author SHA1 Message Date
moyo1997
c935c8fbd2
remove unnecessary environment variable (#19166)
remove unnecessary environment variable when building as arm64x
2024-01-16 16:24:37 -08:00
Jian Chen
8e272b9cac
Update build.py to remove unused functions and update python to 3.8 (#19164)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-16 13:53:15 -08:00
Patrice Vignola
80f274ca6f
Fix SkipLayerNormalization shape inference (#18724)
SkipLayerNorm has more than one input, so `propagateShapeAndTypeFromFirstInput` is not enough.
2024-01-16 09:42:59 -08:00
Changming Sun
e2e488d6f8
Revert "iOS packaging pipeline stability" (#19135)
Reverts microsoft/onnxruntime#19097 because it broken Android CI
pipeline.
2024-01-16 09:18:35 -08:00
Jian Chen
c92f72ebeb
Merge Linux Nuget GPU pipeline with zip-nuget (#19120)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-16 08:59:03 -08:00
Jeff Bloomfield
8d4369b77e
Update DirectML nuget version to 1.13.1 (#19122)
### Description
Update DML version to 1.13.1



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-15 19:04:41 -08:00
Wanming Lin
1bab98988b
[WebNN EP] Fixed bug in int8 data type processing (#19134) 2024-01-15 18:44:25 -08:00
Guenther Schmuelling
9dee543bed
fix gemm beta for fp16 (#19153)
per onnx spec beta is always fp32 so we need to cast it
2024-01-15 18:40:38 -08:00
Jeff Bloomfield
9f87c5c41d
Fix build error due to merge with DML adapter enumeration macro defined (#19121)
### Description
Fix build error when ENABLE_NPU_ADAPTER_ENUMERATION is defined


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-15 17:10:58 -08:00
pengwa
1150b1f81e
ORTModule memory improvement (#18924)
## Dependency

https://github.com/microsoft/onnxruntime/pull/19007

## ORTModule memory efficient gradient management

Previously I have tried to solve the coarsed-grained gradient
accumulation/update problem in ORTModule with
https://github.com/microsoft/onnxruntime/pull/8979, while that
resolution somehow is not fully validated with DDP or there is user
hooks on the gradient accumulation on torch parameter.

This PR is addressing the problem in the similar approach as PR 8979,
e.g. trigger gradient accumulation once ORT computed the grad, but
instead of use a AccumulateGrad op, this time with a ONNX operator
PythonOp, internally it will call param.backward(grad), which will help
handle all related hooks correctly.


## Design

Check the details from


https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ

## Convergence Validation:


![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe)

differences are on mostly 0.000x, sometimes 0.00x, which may comes from
the different order gradient apply happens before or after this change
(on deepspeed zero stage 2)


## TODO

Consolidate the logic with Stage3's similar logic.
2024-01-16 08:57:37 +08:00
Adam Pocock
191525301f
[java] Updating TensorInfo so it contains the named dimensions (#18962)
### Description
The Java `TensorInfo` object which is used to describe a tensor's shape,
along with the input and output placeholders for a model couldn't show
any symbolic/named dimensions in that tensor. Now this information is
stored in Java strings on construction and included in the toString.

### Motivation and Context
Setting symbolic dimensions required external information in Java, the
names were not discoverable from within the API.
2024-01-15 14:42:50 -08:00
Ben Niu
a97199c62d
Fix Arm64EC build for test_q4qdq.cpp (#18523)
### Description
Fix ifdef guards in test_q4qdq.cpp to exclude code blocks intended only
for native x64 compilation instead of x64 + Arm64EC.
2024-01-15 14:29:19 -08:00
Yi Zhang
922a2f00e3
Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
2024-01-15 14:37:22 +08:00
Scott McKay
b2ce3eedb9
Fix build error for CoreML Split op (#19099)
### Description
<!-- Describe your changes. -->
The `split` input of the Split op is int64_t. Fixing that resolves a
type mismatch build error on Windows when CoreML is enabled (for
debugging the partitioning code).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix build error

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-01-15 15:09:49 +10:00
Adam Pocock
71657d1eb8
[java] Fix double close (#19133)
### Description
The `OnnxValue` and `OrtProviderOptions` implementations now check to
see if they've been closed before accessing the native pointer, and also
before close is called.

### Motivation and Context
Before they could be closed twice which SIGSEGV'd the JVM. Fixes #19125.
2024-01-14 14:53:26 -08:00
Jian Chen
c3ce9df80c
Disabling python3.12 on training python packaging pipleines (#19123) 2024-01-14 14:51:00 -08:00
Jian Chen
76797127d6
Always download cuda and trt libraries from Azure blob (#19118)
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
2024-01-14 11:37:26 -08:00
Changming Sun
bb4011b2b1
Set default flags nvcc and do not set default compile flags for ROCM EP (#19124)
### Description
Set default flags nvcc and do not set the flags for ROCM EP. 


### Motivation and Context
1. To meet a BinSkim requirement for CUDA EP.

https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations

2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed
to load the EP with the following error message:

Failed to load library libonnxruntime_providers_rocm.so with error:
/build/Release/libonnxruntime_providers_rocm.so: undefined symbol:
vtable for onnxruntime::InsertMaxPoolOutput .

This PR is a hot fix to bring the pipeline back. So far I don't know why
the error happened. The symbol "InsertMaxPoolOutput" is in
onnxruntime_optimizers. I don't see any EP code references it directly.
2024-01-14 11:36:49 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Edward Chen
e1e45901e2
iOS packaging pipeline stability (#19097)
- Remove protoc build step which sometimes times out. Download protoc instead.
- Use macOS-12 image in the set variables stage. It seems more stable.
2024-01-13 19:27:44 -08:00
Changming Sun
5558912d7b
Disable ccache in Windows CPU CI pipeline (#19131)
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:

"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"

After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.

Cache itself should not cause this difference. 

### Motivation and Context
2024-01-13 18:40:43 -08:00
Adrian Lizarraga
65893ef382
Add --parallel to QNN EP NuGet pipeline build command (#19126)
### Description
Add --parallel to QNN EP NuGet pipeline build command

### Motivation and Context
Improve build times for pipeline.
2024-01-13 02:38:40 -08:00
Yang Gu
e803f8eb0f
[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894)
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
2024-01-13 00:23:17 -08:00
Jian Chen
78e796bb27
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096)
### Description
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was
also uploaded.


For example,
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts
2024-01-12 22:30:43 -08:00
Yulong Wang
07cfc56538
[js] enable external data loading for ort-web (#19087)
### Description
enable external data loading for ort-web.

### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.

### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
  // session options
  externalData: [
    {
      // relative or absolute path/URL of the file,
      // or a pre-loaded Uint8Array containing the data of the external data file
      data: './path/data/b.bin', 

      // the relative path of the external data. Should match initializers' "location" value defined in the model file
      path: './b.bin'
    },
    // { } if multiple external data file
  ]
});
```

Currently, this feature only works with JSEP build enabled.
2024-01-12 19:24:24 -08:00
Jian Chen
e5eacc6d11
Fix cuda-packaging-pipeline.yml (#19115)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-12 19:09:25 -08:00
Hector Li
62a4e9103e
Add extreme_power_saver for htp_performance_mode (#19111)
### Description
Add extreme_power_saver mode for htp_performance_mode
2024-01-12 19:07:02 -08:00
Yifan Li
443aeb851c
[TensorRT EP] Customizable engine cache prefix (#19083)
### Description
<!-- Describe your changes. -->
Add new option `trt_engine_cache_prefix` to customize TRTEP engine cache
prefix.

i.e:
- If user specifies `trt_engine_cache_prefix|FRCNN
trt_engine_cache_enable|true` when running FRCNN model
- the cache will be saved/loaded:
`FRCNN_2068723788287043730_*_sm80.engine`. Engine profile follows same
pattern.

- If skipping this option, the engine will be saved/loaded:
`TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_2068723788287043730_*_*_sm80.engine`
as default case.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/16708

---------

Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
2024-01-12 18:10:05 -08:00
Edward Chen
150c4cb8fe
[MLAS AArch64] SQNBitGemm CompInt8 kernel (#18953)
Implement ARM NEON SQNBitGemm kernel that first block quantizes A to int8 and then does int8 multiplication.
2024-01-12 17:58:08 -08:00
Guenther Schmuelling
a756017e9f
[js/webgpu] more fixes for access above 2GB (#19065)
when jsep calls javascript with an index to HEAP8 or HEAP32 the index is
negative when the heap is above 2GB, even if we pass it as uint32_t it
remains negative. So in javascript use >>> 0 to make it unsigned.
2024-01-12 17:47:37 -08:00
Adrian Lizarraga
8deeba3ad0
[Quantization] Fix get_qnn_qdq_config to use new scale/zp np.array data types (#19114)
### Description
- Updates `get_qnn_qdq_config()` to use new scale/zp np.array data
types.
- Adds missing unit test to help prevent future regression.



### Motivation and Context
https://github.com/microsoft/onnxruntime/pull/18043 changed the usage of
`extra_options["TensorQuantizationOverrides"]`. We need to update its
use in quantization/execution_providers/qnn/quant_config.py
2024-01-12 17:02:32 -08:00
Guenther Schmuelling
96dbac6e4b
update to emsdk-3.1.51 (#18844) 2024-01-12 16:04:33 -08:00
Scott McKay
8f2e57f5d0
Make session configuration options available to kernels via OpKernelInfo (#18897)
### Description
<!-- Describe your changes. -->
Pass through the ConfigOptions from the session via OpKernelInfo so that
kernel behavior can be configured.

Initial usage would be to optionally enable a fast path for ARM64 bloat16 GEMM - see #17031
Other usages could be things like selected the exact implementations of the activation functions for RNN operators instead of the default approximations (e.g. use [sigmoid_exact instead of sigmoid](2d6e2e243d/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h (L379-L382)))

OpKernelInfo is already passing through things from the session state, and adding a new member of ConfigOptions
is the simpler update. It's also a more natural fit given it's providing state/info to the kernel.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-13 10:02:43 +10:00
Jiangzhuo
a503561d0c
[js] using OffscreenCanvas when DOM is not available (#19033)
### Description
when DOM API is not avaiable, using OffscreenCanvas


### Motivation and Context
In some environment like service worker or web worker, the DOM API is
not avaiable, we can use OffscreenCanvas API to replace
`document.createElement('canvas')`.
Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same,
except that `toDataUrl` is missing.

It fix this issues #19032
2024-01-12 13:54:05 -08:00
Guenther Schmuelling
4a5f13b681
fix resize for fp16 (#19110)
resize for fp16 has 2 issues: scales are always f32 and roi can be f32
or f16.
scales:
this is fixed.

roi
this is fixed for the case where roi is not passed as optional input
with f16. To fix this it requires a much larger change and I did not
want to risk this short before a release. For all practical purpose
passing roi as input with f16 should be rare and we can fix it in the
near future.
2024-01-12 13:44:28 -08:00
Caroline Zhu
4dbaa73738
[js/web/training] added end-to-end tests (#18700)
## Summary
* following inference's [set-up for end-to-end
tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e),
created an end-to-end test runner for training
* this test runner copies testdata from the [trainingapi
folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api)
* then runs two tests (training session with evalModel & optimizer
model, and training session with the minimum options), and tests if the
ORT-web training package encompasses inference
  * these tests check 
    * createTrainingSession
    * runTrainStep
    * runOptimizerStep if applicable
* the parameters methods (getParametersSize, loadParametersBuffer, and
getContiguousParameters)

## TL;DR
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for setting up and running the end to end tests
*
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8)
contains the test function definitions (`testInferenceFunction`,
`testTrainingFunctionMin`, `testTrainingFunctionAll`)

## Flow
* entrypoint: user runs the following command in the terminal: `npm run
test:training:e2e`
*
[`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28)
was modified to include an npm script that will run `run.js` which will
run the end to end tests
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for
  * detecting and installing local tarball packages of ORT-web
  * copying training data to the `js/web/training/e2e/data` folder
* starting two Karma processes. Karma is a test runner framework that
simulates testing in the browser.
* In this case, the tests happen in Chrome. We can configure the tests
to run in Edge and other browsers in the future.
* one of these karma processes is self-hosted, meaning it pulls the
ORT-web package from local
* the other karma process is not self-hosted, meaning it pulls the
ORT-web package from another source. In this case, we start an http
server that serves the ORT-web binaries.
*
[`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c)
is responsible for starting the HTTP server and serving the ORT binary
files. This code almost identical to the same code in the inference E2E
tests.
*
[`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28)
Karma configuration file that specifies what happens when a karma
process is started. The config specifies Mocha as the testing framework,
which will go through all the loaded files and run any tests that exist
*
[`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221)
File that contains the tests that Mocha will pick up on and run.
* The test functions (such as testInference and testTrainingFunctionAll)
are defined in
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8).

## Notes
* I followed the [tests for training
core](b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc)
where they randomly generated input for the training session
* E2E tests are triggered by running `npm run test:training:e2e` --
suggestions for alternative script names are appreciated!!!

## Motivation and Context
- adding training bindings for web
2024-01-12 13:33:33 -08:00
Preetha Veeramalai
c340bf08f6
Openvino EP code changes for 1.17 update (#19023)
### Description
Introduce AppendExecutionProvider_OpenVINO_V2 API and support for OV
2023.3.


### Context

- The API is added to facilitate customers in using published official
Microsoft onnxruntime libraries with OVEP libraries.
- Add support for OpenVINO 2023.3 official release.
- Extend operator coverage 
- GH fixes

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2024-01-12 13:20:51 -08:00
Aditya Goel
dcd6d4cad6
Label encoder opset4 (#17977)
### Description
<!-- Describe your changes. -->
Implements LabelEncoder as per `ai.onnx.ml` opset 4 for the upcoming
ONNX 1.15 release. ~~This currently depends on a new ONNX release
candidate and so is marked as draft in the meantime.~~


### Motivation and Context
Closes https://github.com/microsoft/onnxruntime/issues/17602
2024-01-12 12:43:44 -08:00
Changming Sun
55b046e97e
Remove enable_mac_silicon settings (#19108)
### Description
Remove enable_mac_silicon settings from two packaging pipelines.

### Motivation and Context
Now we build universal2 packages instead.
2024-01-12 11:01:39 -08:00
RandySheriffH
4520b76417
Exclude TP custom API from minimal (#19086)
Exclude TP custom API from minimal.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2024-01-12 10:40:47 -08:00
Numfor Tiapo
3c0a6b505a
Update transformers module to 4.36 (#18993)
Update transformers module to fix security vulnerabilities in our
internal pipeline
2024-01-12 10:37:48 -08:00
zesongw
3eec1592bd
[WebNN EP] Update WebNN unit test list (#19103)
Update WebNN test list in suite-test-list.jsonc so all test cases are
passed behind WebNN CPU backend on Chrome Stable (Although some cases
may fall back to CPU EP).
Enable int64 support for WebNN in unit tests.
2024-01-12 10:22:38 -08:00
Aditya Goel
c23410a182
StringSplit operator (#18016)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Closes https://github.com/microsoft/onnxruntime/issues/17596
2024-01-12 09:46:23 -08:00
Changming Sun
2cb5781b43
Remove two tests from test_logging_apis.cc (#19100)
### Description
In some environments the test code has undefined behavior. To prove it, save the following code as
test.cpp
```c++
#include <iostream>
#include <stdio.h>

int main(){
  char buf[1024];
  int ret = snprintf(buf, sizeof(buf), "%ls","abc");
  if(ret <0){
    std::cout<< ret<< std::endl;
  } else{
    std::cout<< "OK: ret="<<ret<< std::endl;
  }
  return 0;
}
```
Then compile it as 
```
g++   -DNDEBUG -std=gnu++17    test.cpp -o /tmp/t
```
Or 
```
g++   -O2 -DNDEBUG -std=gnu++17    test.cpp -o /tmp/t
```
The first command is without optimization. The second one turns on
optimization. Then the outputs are different.
When optimization is enabled, the output might be:
```
OK: ret=-1
```
You cannot explain why it would go to this branch when ret is "-1". It
might be a bug of a specific version of GCC. However, at this moment we
cannot change the version. It was found in GCC version 8.5.0 20210514
(Red Hat 8.5.0-18) (GCC) that is provided by UBI8. RHEL9 doesn't have
the problem. snprintf is a builtin function of GCC. So the problem was
not related to glibc.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-12 09:26:28 -08:00
Xavier Dupré
c8399a81fe
Quantization tool: support float 8 with MatMul, support float 16 weights (#18043)
### Description

Whenever a node QuantizeLinear or DequantizeLinear, the type of the
weights before being quantize must be known to create the scale with the
expected type. Another option would be to add many operator CastLike but
that would push the burden to onnxruntime optimizer.

The PR tries to avoid changing the signature. To do so, it modified the
scale computation to use a numpy array to store the result and not a
python float. The numpy array must be of the same type than the weights
to quantize.

The PR adds many `assert` to check the type of the scale is not a python
type or a float64. This was added to make sure all the code follows the
same logic. These lines were kept for the first review.

DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR
https://github.com/onnx/onnx/pull/5709 is missing to fix shape
inference. PR https://github.com/onnx/onnx/pull/5473) is missing to
support QLinearMatMul with float 16. That explains why some tests are
disabled with float 16.

### Motivation and Context

The current quantization tool assumes every weight is float 32. For
large models such as LLAMA, it is usually float 16. The quantization
needs to quantize such weights.
2024-01-12 17:54:55 +01:00
Changming Sun
0e8d4c3d21
Enable Address Sanitizer in CI (#19073)
### Description
1. Add two build jobs for enabling Address Sanitizer in CI. One for
Windows CPU, One for Linux CPU.
2. Set default compiler flags/linker flags in build.py for normal
Windows/Linux/MacOS build. This can help control compiler flags in a
more centralized way.
3. All Windows binaries in our official packages will be built with
"/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft
public symbol
server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols).

Limitations:
1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries.
Therefore once Address Sanitizer is enabled, before running tests we
need to manually set LD_LIBRARY_PATH properly otherwise
libonnxruntime.so may not be able to find custom ops and shared EPs.
4. On Linux we also need to set LD_PRELOAD before running some tests(if
the main executable, like python, is not built with address sanitizer.
On Windows we do not need to.
5. On Windows before running python tests we should manually copy
address sanitizer DLL to the onnxruntime/capi directory, because python
3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the
information provided by PATH env.
6. On Linux Address Sanitizer found a lot of memory leaks from our
python binding code. Therefore right now we cannot enable Address
Sanitizer when building ONNX Runtime with python binding.
7. Address Sanitizer itself uses a lot of memory address space and
delays memory deallocations, which is easy to cause OOM issues in 32-bit
applications. We cannot run all the tests in onnxruntime_test_all in
32-bit mode with Address Sanitizer due to this reason. However, we still
can run individual tests in such a way. We just cannot run all of them
in one process.

### Motivation and Context
To catch memory issues.
2024-01-12 07:24:40 -08:00
Changming Sun
285606108a
Set pythonInterpreter in set-python-manylinux-variables-step.yml (#19105)
### Description
Set pythonInterpreter in set-python-manylinux-variables-step.yml. To fix
a build error:

```
Starting: Set Python manylinux variables
==============================================================================
Task         : Python script
Description  : Run a Python file or inline script
Version      : 0.231.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/python-script
==============================================================================
##[error]Parameter 'toolPath' cannot be null or empty.
Finishing: Set Python manylinux variables
```
The error was because today I deleted a bunch of software from the VM
image. The task might fail if no Python versions are found in
$(Agent.ToolsDirectory).
2024-01-12 07:22:02 -08:00
Changming Sun
e3ee255950
Remove the references to CreateFileMapping2 (#19102)
### Description
Remove the references to CreateFileMapping2 because the function is
mainly for system services. To use the function, we need to link to one
of the four [Windows umbrella
libraries](https://learn.microsoft.com/en-us/windows/win32/apiindex/windows-umbrella-libraries).
It's tricky because a custom build might want to use any of the four. So
I cannot just choose one and add that one to our CMakeLists.txt.
Given it's so complicated and the code is not actually used now, I will
remove it. It is not used because it requires NTDDI_VERSION >=
NTDDI_WIN10_RS5 but in our top level CMakeLists.txt we set the version
to the first Windows 10 release which is lower than RS5.
2024-01-12 07:21:12 -08:00
zesongw
e1db44b4f0
[WebNN EP] Add quantize Ops (#18011)
### Description
<!-- Describe your changes. -->

Add four quantize Ops: MatmulInteger, ConvInteger, DynamicQuantizeLinear
and DequantizeLinear.
Add datatype TensorProto_DataType_INT8 and TensorProto_DataType_UINT8.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Support quantized models.
2024-01-12 02:25:09 -08:00
Jiajie Hu
acba63c36a
[js/webgpu] Change A/sqrt(B) to A*inverseSqrt(B) in normalization ops (#19101)
### Description
Change `A / sqrt(B)` to `A * inverseSqrt(B)` in BatchNormalization,
InstanceNormalization, LayerNormalization and SkipLayerNormalization.

### Motivation and Context
For the same reason as the existence of the `inverseSqrt` built-in in
WebGPU spec.
2024-01-12 00:08:16 -08:00