Commit graph

10388 commits

Author SHA1 Message Date
Yulong Wang
146ebaf91e
[js/web] allow proxy to load model with 1GB <= size < 2GB (#19178)
### Description

allow proxy to load model with 1GB <= size < 2GB

resolves #19157.
2024-01-17 15:03:43 -08:00
Maximilian Müller
bc219ed553
[TensorRT EP] Enable a minimal CUDA EP compilation without kernels (#19052)
Adresses https://github.com/microsoft/onnxruntime/issues/18542.
I followed the advice given by @RyanUnderhill
[here](https://github.com/microsoft/onnxruntime/pull/18731#issuecomment-1848261925)
and went with a minimal CUDA EP for now.
2024-01-17 11:33:34 -08:00
Rachel Guo
bd9d8fb2a5
[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170)
### Description
<!-- Describe your changes. -->

Bump up version to 1.18.0 since the release branch has been cut.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-01-17 11:18:32 -08:00
Xavier Dupré
63dd605d33
Fix untyped float values in quantization tool missing from PR #18043 (#19182)
### Description
Extends the code coverage to Entroy, Histogram and Distribution
calibration method, fix bugs while doing it.



### Motivation and Context
Bugs detected in [Olive](https://github.com/microsoft/OLive).
2024-01-17 19:00:36 +01:00
wejoncy
9876cc7c4f
more inputs support for LLM exporter (#19005)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-17 15:46:19 +08:00
Wanming Lin
07d3aed3aa
[WebNN EP] Fixed build issue with disable_rtti (#19173)
Previously building webnn ep with --disable_rtti will throw
unboundTypeError since unbound type names are illegal with RTTI disabled
in Embind API, we can fix it by adding a
-DEMSCRIPTEN_HAS_UNBOUND_TYPE_NAMES=0 flag.
2024-01-16 21:35:13 -08:00
Changming Sun
81d363045b
Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117)
### Description
Upgrade Ubuntu machine pool from 20.04 to 22.04
2024-01-16 17:25:18 -08:00
Hector Li
e61861b0a1
Clean up generated files in QNN UTs (#19127)
### Description
Clean up generated files in QNN UTs
2024-01-16 16:36:28 -08:00
moyo1997
c935c8fbd2
remove unnecessary environment variable (#19166)
remove unnecessary environment variable when building as arm64x
2024-01-16 16:24:37 -08:00
Jian Chen
8e272b9cac
Update build.py to remove unused functions and update python to 3.8 (#19164)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-16 13:53:15 -08:00
Patrice Vignola
80f274ca6f
Fix SkipLayerNormalization shape inference (#18724)
SkipLayerNorm has more than one input, so `propagateShapeAndTypeFromFirstInput` is not enough.
2024-01-16 09:42:59 -08:00
Changming Sun
e2e488d6f8
Revert "iOS packaging pipeline stability" (#19135)
Reverts microsoft/onnxruntime#19097 because it broken Android CI
pipeline.
2024-01-16 09:18:35 -08:00
Jian Chen
c92f72ebeb
Merge Linux Nuget GPU pipeline with zip-nuget (#19120)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-16 08:59:03 -08:00
Jeff Bloomfield
8d4369b77e
Update DirectML nuget version to 1.13.1 (#19122)
### Description
Update DML version to 1.13.1



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-15 19:04:41 -08:00
Wanming Lin
1bab98988b
[WebNN EP] Fixed bug in int8 data type processing (#19134) 2024-01-15 18:44:25 -08:00
Guenther Schmuelling
9dee543bed
fix gemm beta for fp16 (#19153)
per onnx spec beta is always fp32 so we need to cast it
2024-01-15 18:40:38 -08:00
Jeff Bloomfield
9f87c5c41d
Fix build error due to merge with DML adapter enumeration macro defined (#19121)
### Description
Fix build error when ENABLE_NPU_ADAPTER_ENUMERATION is defined


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-15 17:10:58 -08:00
pengwa
1150b1f81e
ORTModule memory improvement (#18924)
## Dependency

https://github.com/microsoft/onnxruntime/pull/19007

## ORTModule memory efficient gradient management

Previously I have tried to solve the coarsed-grained gradient
accumulation/update problem in ORTModule with
https://github.com/microsoft/onnxruntime/pull/8979, while that
resolution somehow is not fully validated with DDP or there is user
hooks on the gradient accumulation on torch parameter.

This PR is addressing the problem in the similar approach as PR 8979,
e.g. trigger gradient accumulation once ORT computed the grad, but
instead of use a AccumulateGrad op, this time with a ONNX operator
PythonOp, internally it will call param.backward(grad), which will help
handle all related hooks correctly.


## Design

Check the details from


https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ

## Convergence Validation:


![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe)

differences are on mostly 0.000x, sometimes 0.00x, which may comes from
the different order gradient apply happens before or after this change
(on deepspeed zero stage 2)


## TODO

Consolidate the logic with Stage3's similar logic.
2024-01-16 08:57:37 +08:00
Adam Pocock
191525301f
[java] Updating TensorInfo so it contains the named dimensions (#18962)
### Description
The Java `TensorInfo` object which is used to describe a tensor's shape,
along with the input and output placeholders for a model couldn't show
any symbolic/named dimensions in that tensor. Now this information is
stored in Java strings on construction and included in the toString.

### Motivation and Context
Setting symbolic dimensions required external information in Java, the
names were not discoverable from within the API.
2024-01-15 14:42:50 -08:00
Ben Niu
a97199c62d
Fix Arm64EC build for test_q4qdq.cpp (#18523)
### Description
Fix ifdef guards in test_q4qdq.cpp to exclude code blocks intended only
for native x64 compilation instead of x64 + Arm64EC.
2024-01-15 14:29:19 -08:00
Yi Zhang
922a2f00e3
Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Linux_GPU_x64 job in the pipeline has been canceled due to timeout since
0112.
2024-01-15 14:37:22 +08:00
Scott McKay
b2ce3eedb9
Fix build error for CoreML Split op (#19099)
### Description
<!-- Describe your changes. -->
The `split` input of the Split op is int64_t. Fixing that resolves a
type mismatch build error on Windows when CoreML is enabled (for
debugging the partitioning code).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix build error

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-01-15 15:09:49 +10:00
Adam Pocock
71657d1eb8
[java] Fix double close (#19133)
### Description
The `OnnxValue` and `OrtProviderOptions` implementations now check to
see if they've been closed before accessing the native pointer, and also
before close is called.

### Motivation and Context
Before they could be closed twice which SIGSEGV'd the JVM. Fixes #19125.
2024-01-14 14:53:26 -08:00
Jian Chen
c3ce9df80c
Disabling python3.12 on training python packaging pipleines (#19123) 2024-01-14 14:51:00 -08:00
Jian Chen
76797127d6
Always download cuda and trt libraries from Azure blob (#19118)
### Description
This way, we will not need to update the windows images constantly and
allow more flexibility to choose the cuda version in the future.
2024-01-14 11:37:26 -08:00
Changming Sun
bb4011b2b1
Set default flags nvcc and do not set default compile flags for ROCM EP (#19124)
### Description
Set default flags nvcc and do not set the flags for ROCM EP. 


### Motivation and Context
1. To meet a BinSkim requirement for CUDA EP.

https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations

2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed
to load the EP with the following error message:

Failed to load library libonnxruntime_providers_rocm.so with error:
/build/Release/libonnxruntime_providers_rocm.so: undefined symbol:
vtable for onnxruntime::InsertMaxPoolOutput .

This PR is a hot fix to bring the pipeline back. So far I don't know why
the error happened. The symbol "InsertMaxPoolOutput" is in
onnxruntime_optimizers. I don't see any EP code references it directly.
2024-01-14 11:36:49 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Edward Chen
e1e45901e2
iOS packaging pipeline stability (#19097)
- Remove protoc build step which sometimes times out. Download protoc instead.
- Use macOS-12 image in the set variables stage. It seems more stable.
2024-01-13 19:27:44 -08:00
Changming Sun
5558912d7b
Disable ccache in Windows CPU CI pipeline (#19131)
### Description
Disable ccache for all the jobs in in Windows CPU CI pipeline.
Before disabling it, the build has a warning that:

"MSIL .netmodule or module compiled with /GL found; restarting link with
/LTCG; add /LTCG to the link command line to improve linker performance"

After disabling it, the warning is gone and the build doesn't use /GL or
/LTCG.

Cache itself should not cause this difference. 

### Motivation and Context
2024-01-13 18:40:43 -08:00
Adrian Lizarraga
65893ef382
Add --parallel to QNN EP NuGet pipeline build command (#19126)
### Description
Add --parallel to QNN EP NuGet pipeline build command

### Motivation and Context
Improve build times for pipeline.
2024-01-13 02:38:40 -08:00
Yang Gu
e803f8eb0f
[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894)
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
2024-01-13 00:23:17 -08:00
Jian Chen
78e796bb27
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096)
### Description
Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was
also uploaded.


For example,
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts
2024-01-12 22:30:43 -08:00
Yulong Wang
07cfc56538
[js] enable external data loading for ort-web (#19087)
### Description
enable external data loading for ort-web.

### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.

### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
  // session options
  externalData: [
    {
      // relative or absolute path/URL of the file,
      // or a pre-loaded Uint8Array containing the data of the external data file
      data: './path/data/b.bin', 

      // the relative path of the external data. Should match initializers' "location" value defined in the model file
      path: './b.bin'
    },
    // { } if multiple external data file
  ]
});
```

Currently, this feature only works with JSEP build enabled.
2024-01-12 19:24:24 -08:00
Jian Chen
e5eacc6d11
Fix cuda-packaging-pipeline.yml (#19115)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-12 19:09:25 -08:00
Hector Li
62a4e9103e
Add extreme_power_saver for htp_performance_mode (#19111)
### Description
Add extreme_power_saver mode for htp_performance_mode
2024-01-12 19:07:02 -08:00
Yifan Li
443aeb851c
[TensorRT EP] Customizable engine cache prefix (#19083)
### Description
<!-- Describe your changes. -->
Add new option `trt_engine_cache_prefix` to customize TRTEP engine cache
prefix.

i.e:
- If user specifies `trt_engine_cache_prefix|FRCNN
trt_engine_cache_enable|true` when running FRCNN model
- the cache will be saved/loaded:
`FRCNN_2068723788287043730_*_sm80.engine`. Engine profile follows same
pattern.

- If skipping this option, the engine will be saved/loaded:
`TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_2068723788287043730_*_*_sm80.engine`
as default case.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/16708

---------

Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
2024-01-12 18:10:05 -08:00
Edward Chen
150c4cb8fe
[MLAS AArch64] SQNBitGemm CompInt8 kernel (#18953)
Implement ARM NEON SQNBitGemm kernel that first block quantizes A to int8 and then does int8 multiplication.
2024-01-12 17:58:08 -08:00
Guenther Schmuelling
a756017e9f
[js/webgpu] more fixes for access above 2GB (#19065)
when jsep calls javascript with an index to HEAP8 or HEAP32 the index is
negative when the heap is above 2GB, even if we pass it as uint32_t it
remains negative. So in javascript use >>> 0 to make it unsigned.
2024-01-12 17:47:37 -08:00
Adrian Lizarraga
8deeba3ad0
[Quantization] Fix get_qnn_qdq_config to use new scale/zp np.array data types (#19114)
### Description
- Updates `get_qnn_qdq_config()` to use new scale/zp np.array data
types.
- Adds missing unit test to help prevent future regression.



### Motivation and Context
https://github.com/microsoft/onnxruntime/pull/18043 changed the usage of
`extra_options["TensorQuantizationOverrides"]`. We need to update its
use in quantization/execution_providers/qnn/quant_config.py
2024-01-12 17:02:32 -08:00
Guenther Schmuelling
96dbac6e4b
update to emsdk-3.1.51 (#18844) 2024-01-12 16:04:33 -08:00
Scott McKay
8f2e57f5d0
Make session configuration options available to kernels via OpKernelInfo (#18897)
### Description
<!-- Describe your changes. -->
Pass through the ConfigOptions from the session via OpKernelInfo so that
kernel behavior can be configured.

Initial usage would be to optionally enable a fast path for ARM64 bloat16 GEMM - see #17031
Other usages could be things like selected the exact implementations of the activation functions for RNN operators instead of the default approximations (e.g. use [sigmoid_exact instead of sigmoid](2d6e2e243d/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h (L379-L382)))

OpKernelInfo is already passing through things from the session state, and adding a new member of ConfigOptions
is the simpler update. It's also a more natural fit given it's providing state/info to the kernel.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-13 10:02:43 +10:00
Jiangzhuo
a503561d0c
[js] using OffscreenCanvas when DOM is not available (#19033)
### Description
when DOM API is not avaiable, using OffscreenCanvas


### Motivation and Context
In some environment like service worker or web worker, the DOM API is
not avaiable, we can use OffscreenCanvas API to replace
`document.createElement('canvas')`.
Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same,
except that `toDataUrl` is missing.

It fix this issues #19032
2024-01-12 13:54:05 -08:00
Guenther Schmuelling
4a5f13b681
fix resize for fp16 (#19110)
resize for fp16 has 2 issues: scales are always f32 and roi can be f32
or f16.
scales:
this is fixed.

roi
this is fixed for the case where roi is not passed as optional input
with f16. To fix this it requires a much larger change and I did not
want to risk this short before a release. For all practical purpose
passing roi as input with f16 should be rare and we can fix it in the
near future.
2024-01-12 13:44:28 -08:00
Caroline Zhu
4dbaa73738
[js/web/training] added end-to-end tests (#18700)
## Summary
* following inference's [set-up for end-to-end
tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e),
created an end-to-end test runner for training
* this test runner copies testdata from the [trainingapi
folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api)
* then runs two tests (training session with evalModel & optimizer
model, and training session with the minimum options), and tests if the
ORT-web training package encompasses inference
  * these tests check 
    * createTrainingSession
    * runTrainStep
    * runOptimizerStep if applicable
* the parameters methods (getParametersSize, loadParametersBuffer, and
getContiguousParameters)

## TL;DR
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for setting up and running the end to end tests
*
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8)
contains the test function definitions (`testInferenceFunction`,
`testTrainingFunctionMin`, `testTrainingFunctionAll`)

## Flow
* entrypoint: user runs the following command in the terminal: `npm run
test:training:e2e`
*
[`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28)
was modified to include an npm script that will run `run.js` which will
run the end to end tests
*
[`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c)
is responsible for
  * detecting and installing local tarball packages of ORT-web
  * copying training data to the `js/web/training/e2e/data` folder
* starting two Karma processes. Karma is a test runner framework that
simulates testing in the browser.
* In this case, the tests happen in Chrome. We can configure the tests
to run in Edge and other browsers in the future.
* one of these karma processes is self-hosted, meaning it pulls the
ORT-web package from local
* the other karma process is not self-hosted, meaning it pulls the
ORT-web package from another source. In this case, we start an http
server that serves the ORT-web binaries.
*
[`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c)
is responsible for starting the HTTP server and serving the ORT binary
files. This code almost identical to the same code in the inference E2E
tests.
*
[`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28)
Karma configuration file that specifies what happens when a karma
process is started. The config specifies Mocha as the testing framework,
which will go through all the loaded files and run any tests that exist
*
[`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221)
File that contains the tests that Mocha will pick up on and run.
* The test functions (such as testInference and testTrainingFunctionAll)
are defined in
[`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8).

## Notes
* I followed the [tests for training
core](b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc)
where they randomly generated input for the training session
* E2E tests are triggered by running `npm run test:training:e2e` --
suggestions for alternative script names are appreciated!!!

## Motivation and Context
- adding training bindings for web
2024-01-12 13:33:33 -08:00
Preetha Veeramalai
c340bf08f6
Openvino EP code changes for 1.17 update (#19023)
### Description
Introduce AppendExecutionProvider_OpenVINO_V2 API and support for OV
2023.3.


### Context

- The API is added to facilitate customers in using published official
Microsoft onnxruntime libraries with OVEP libraries.
- Add support for OpenVINO 2023.3 official release.
- Extend operator coverage 
- GH fixes

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2024-01-12 13:20:51 -08:00
Aditya Goel
dcd6d4cad6
Label encoder opset4 (#17977)
### Description
<!-- Describe your changes. -->
Implements LabelEncoder as per `ai.onnx.ml` opset 4 for the upcoming
ONNX 1.15 release. ~~This currently depends on a new ONNX release
candidate and so is marked as draft in the meantime.~~


### Motivation and Context
Closes https://github.com/microsoft/onnxruntime/issues/17602
2024-01-12 12:43:44 -08:00
Changming Sun
55b046e97e
Remove enable_mac_silicon settings (#19108)
### Description
Remove enable_mac_silicon settings from two packaging pipelines.

### Motivation and Context
Now we build universal2 packages instead.
2024-01-12 11:01:39 -08:00
RandySheriffH
4520b76417
Exclude TP custom API from minimal (#19086)
Exclude TP custom API from minimal.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2024-01-12 10:40:47 -08:00
Numfor Tiapo
3c0a6b505a
Update transformers module to 4.36 (#18993)
Update transformers module to fix security vulnerabilities in our
internal pipeline
2024-01-12 10:37:48 -08:00
zesongw
3eec1592bd
[WebNN EP] Update WebNN unit test list (#19103)
Update WebNN test list in suite-test-list.jsonc so all test cases are
passed behind WebNN CPU backend on Chrome Stable (Although some cases
may fall back to CPU EP).
Enable int64 support for WebNN in unit tests.
2024-01-12 10:22:38 -08:00