Commit graph

1394 commits

Author SHA1 Message Date
Yifan Li
e2c214d81f
[TensorRT EP] TRT 8.6 minor version update (#16475)
### Description
* Minor version update: TRT 8.6.0.12->8.6.1.6
  * CI pipeline ymls/dockerfiles are updated
* cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt
binaries uploaded to [win img
307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results)
* Re-enable unit tests which were failed in 8.6.0 and re-gained support
in 8.6.1
2023-06-26 10:44:27 -07:00
Scott McKay
48eff09664
Fix file list for test of build with IO debug (#16474)
### Description
<!-- Describe your changes. -->
Update file list to adjust for recent changes to test infra. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-26 16:36:22 +10:00
Chen Fu
5c125b4366
Cfu revertamx (#16455)
### Description

This is to revert two PRs that aim at reducing AMX toolchain
requirements. Unfortunately we still have some pipeline issues.

https://github.com/microsoft/onnxruntime/pull/16390
https://github.com/microsoft/onnxruntime/pull/16086

### Motivation and Context

Looks like gcc link time optimization does not work very well with
inline assembly in the above PRs.
2023-06-23 09:20:23 -07:00
Baiju Meswani
10ba1e270c
Minimal Build for On-Device Training (#16326)
🛠️ __Changes in this pull request:__

This pull request introduces two significant changes to the project:

- Changing on device training checkpoint format: The current
implementation stores the on device training checkpoint as a sequence of
tensors in multiple files inside a checkpoint folder, which can be
inefficient in terms of storage and performance. In this PR, I have
modified the checkpoint format to utilize the flatbuffer table to save
the checkpoint to a single file, providing a more compact and efficient
representation. The changes around this are twofold:
- Add the checkpoint flatbuffer schema that will generate the necessary
checkpoint source files.
- Update the checkpoint saving and loading functionality to use the new
format.

- Adding support for onnxruntime minimal build: To support scenarios
where binary size is a constraint, I made changes to ensure that the
training build can work well with the minimal build.

🔍 __Open Issues:__
- In order to extract the optimizer type, the existing implementation
re-loaded the onnx optimizer model and parsed it. This is no longer
possible, since the model format can either be onnx or ort. One idea is
to do the same for ort format optimizer model. This needs some
investigation.
- Changes to the offline tooling to generate ort format training
artifacts.
- End-to-end training example showcasing the use of the minimal training
build.
- Add support for export model for inferencing in a minimal build.
2023-06-22 12:27:23 -07:00
RandySheriffH
6e29e185f3
Clean AzureEP logics (#16367)
Moving out AzureEP invokers out of core runtime.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-06-21 09:38:52 -07:00
Chi Lo
4e3cff60fd
CUDA graph support for TRT EP (#16081)
CUDA EP already supports [CUDA
graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs),
also we observed some models can benefit from using CUDA graph with
`trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP.

The implementation is based on
https://github.com/microsoft/onnxruntime/pull/9978 with the same
[constraints](https://github.com/microsoft/onnxruntime/pull/9978) as
below:

- Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
- Usage of CUDA Graphs is limited to models where-in all the model ops
(graph nodes) can be partitioned to the TRT EP.
- The input/output types of models need to be tensors.
- Shapes of inputs/outputs cannot change across inference calls.
- IObinding is required.
2023-06-21 09:36:45 -07:00
Yuhong Guo
48e6186b1a
Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161)
### Description
<!-- Describe your changes. -->

1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar
to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is
only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all
CUDA UTs through this ut lib without affecting production lib
`onnxruntime_providers_cuda`.
2. Move all test cases from `core/providers/cuda/test/` to
`test/providers/cuda/`. These test cases are built into lib
`onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all
--gtest_filter="*CUDA_EP_Unittest*"`. Since the lib is only for test, we
can use gtest macros in the test cases. Previous implementation do not
support using gtest lib in the CUDA UT cases.
3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a
bit. A new function `onnxruntime_add_object_library` is to build a
object target. The 2 libs `onnxruntime_providers_cuda_ut` &
`onnxruntime_providers_cuda` share most of the code, so the object files
can be used in both libs, which helps reduce build time. Another
function `config_cuda_provider_shared_module` is used to configure all 3
similar
targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut).
4. Refactored the test to call `testing::InitGoogleTest` &
`RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`.
After this change, we can see all the cases running in
`CUDA_EP_Unittest.All`:

![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87)




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

After https://github.com/microsoft/onnxruntime/pull/13016, there are
still test files in test/providers/cuda/ that are not moved to
core/providers/cuda/test/ and the test cases are disabled. This PR helps
to clean the unfinished TODOs.

Even through onnxruntime_shared_lib_test covers some test for CUDA
provider. onnxruntime_shared_lib_test works like a coarse grain
end-to-end test for CUDA provider. If CUDA unittest can run cases for a
single component, this wound be helpful for CUDA developers.

---------

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-06-20 14:54:55 -07:00
Prateek Chokse
12dffef768
added support for cmake "find_package" (#8919)
**Description**: 
Adds support for cmake find_package.

**Motivation and Context**
As mentioned in issue #7150 onnxruntime doesn't have support for CMake
find_package, this PR adds that and also adds the CMake package version
file. Now anyone can link onnxruntime like this:
```cmake
find_package(onnxruntime)
add_executable(test Source.cpp)
target_link_libraries(test PRIVATE onnxruntime::onnxruntime)
```
this also simplifies #3124
2023-06-19 22:20:31 -07:00
Dipanjan Sengupta
35fa6af428
Fix for the build break in AMX feature on Mac OS. (#16390)
### Description
Fixing the build break issue in Apple pipeline due to AMX flag removal.
2023-06-16 21:00:41 -07:00
Scott McKay
8fdfd20191
Separate out operator vs model testing. (#16228)
### Description
<!-- Describe your changes. -->
Split up OpTester to separate out operator vs model testing. This led to
a lot of other cleanups/refactoring.

- create BaseTester class and derived OpTester/ModelTester classes to
limit APIs to what is applicable for each test type
  - e.g. adding an attribute isn't relevant to a model test
- cleanup structure
- don't expose member variables either directly or via public methods
returning them
  - split out checkers so they can be easily re-used
- refactor so there's one public Check method for comparing two OrtValue
instances containing any data type
  - refactor the GradientOpTester usage
- it required a lot of OpTester internals to be exposed and no other
tests needed this
- it also returned Status through various parts which prevented the
usage of the google test macros which provide better output. change to
return void and use the macros.
- fix some other minor issues
  - update some cmake files so all the source files are included
  - remove some low value helpers (FetchTensor and GetShapeVector)
- remove some outdated code to allow unreleased opset versions from when
onnx opset 15 wasn't released
  - move files from test/util/include/test to test/util/include
- doesn't seem to be any reason for the additional subdirectory given
they're not files use to test the code in test/util
    - files were moved with no changes
    
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Cleanup test infrastructure.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-06-17 12:58:57 +10:00
saurabh
a6ce7b339f
Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147)
### Description
This PR enables execution of subgraphs in OVEP and currently, when OVEP
developers install the onnxruntime-openvino package on windows from
pypi, they would have to additionally download OpenVINO windows binaries
and run the setupvars.bat script which sets the environment PATH to
locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer
sample.



### Motivation and Context
Fix: We want to make the user experience easy for OVEP Python developers
on windows platform.
This fix, introduces a function add_openvino_libs_to_path at the
location tools/python/util/add_openvino_win_libs.py.
The above function, can be called by OVEP python users in the
application code and that takes care of setting
the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino)
which was installed.
This change also makes sure that add_openvino_libs_to_path() function is
added to onnxruntime python package
only when it is build for OpenVINO Execution Provider for ONNXRuntime
and not for default ORT python package builds.

New user experience for Python OVEP developers on windows platform:
step 1: pip install onnxruntime-openvino
step 2: pip install openvino
step 3: <Add these 2 lines in the application code>
import onnxruntime.tools.add_openvino_win_libs as utils
utils.add_openvino_libs_to_path()

---------

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-06-16 19:47:09 -07:00
Silvio Traversaro
4915191e63
Fix build of Python wheel on Windows with single-config generator (#16337)
### Description

Before this PR, the CMake code assumed that when on Windows a
multiple-config CMake generator was used, while on non-Windows there was
the assumption of a single-config CMake generator. After this PR this
information is obtained from the
[`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html)
global CMake propery.



### Motivation and Context

I discovered this problem when building with Ninja generator on Windows,
but I guess this should fix problems also on non-Windows platforms when
using a multiple-config generator (such as Xcode on macOS or "Ninja
Multi-Config" on all platforms).

See
https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html
for more info.
2023-06-16 09:17:49 -07:00
Dipanjan Sengupta
681a0d084d
Removing AMX build flag (#16086)
### Description
1. Replacing AMX intrinsics with machine code macro instructions in
QGEMM kernel.
2. Removing AMX build flags for GCC in cmake file.



### Motivation and Context
The additional AMX flag in cmake adds an extra layer of dependency on
GCC version to use the feature.These changes should allow the usage of
the AMX feature with just the CPU ID check.
2023-06-15 11:22:59 -07:00
satyajandhyala
889f80082f
[js/web] Added Reduce operators support (#16122)
### Description
Added support for ReduceL1, ReduceL2, ReduceMean, ReduceMin, ReduceMax,
ReduceSum, ReduceLogSum, ReduceLogSumExp, ReduceProd and
ReduceSquareSum.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
Co-authored-by: guschmue <guschmue@microsoft.com>
2023-06-12 07:46:27 -07:00
Vrajang Parikh
67f4a4fd16
Objective-C binding for ORT training (#16127)
### Description
Implement Objective-C binding for `ORTCheckPoint`. Additionally, 
- Modify `onnxruntime_objectivec.cmake` to only include training header
and sources when training flag is enabled
- Enable objective-c binding for `orttraining-mac-ci-pipeline`

### Motivation and Context
This PR is part of implementing Objective-C bindings for training API.
It implements objective-c binding for ORTCheckPoint class. The
objective-C API closely resembles the C++ API.

**Note**: The test for saving checkpoint is skipped as it requires use
of training session. It will be added when the objective-c binding for
`ORTTrainingSession` is added.
2023-06-07 14:01:30 -07:00
Edward Chen
1261d0b8ba
Fix some build issues on MacOS with Xcode 14.3. (#15878)
- Fix flatbuffers flatc warning, unused-but-set-variable.
- Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code).
- Update CI builds to use Xcode 14.3.
- Update minimum iOS version to 12.0.
- Update Mac hosted agents to MacOS 13 where possible.
2023-06-07 12:07:11 -07:00
Wanming Lin
a8c2f24ae0
[WebNN EP] Merge support for segment anything into main branch (#16208)
We implemented a number of new ops and data types to support running
segment anything model on Chromium WebNN DML backend (POC) in a forked
branch https://github.com/honry/onnxruntime/tree/stable-diffusion

In this PR, we migrate the changes in the forked branch to main branch,
includes:
 - 22 new ops
- New tensor data types: bool, int32, uint32, uint64, int64, float16 (As
JavaScript hasn't shipped Float16Array, we use Uint16Array as a
workaound)
 - Handle empty input tensors and duplicated outputs
 - Fixed some nits
2023-06-07 09:56:37 -07:00
Changming Sun
7686193c40
Fix DNNL build (#16201) 2023-06-02 09:46:03 +08:00
Xavier Dupré
e726151b5c
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.

* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel

### Motivation and Context
Supports latest onnx version.

Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 13:25:58 -07:00
神楽坂帕琪
abd94b65b7
eigen.cmake use url info from deps.txt (#16129)
### Description

`eigen.cmake` use url info provided by deps.txt instead of using raw
url.
2023-05-30 11:07:20 -07:00
Yuhong Guo
04a8f50674
New configuration to limit the arena extension (#15983)
Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
In our real scenario, we observe that if the model is big enough the
BfcArena will extend uncontrollable.
As showed by the following figures, if a model uses more than 16GB
memory, the BfcArena will totally apply for 32GB memory according to the
`kNextPowerOfTwo` strategy. With the new strategy, the extension is
limited. The default maximum extension size is 1GB.

#### Without the new configuration
After loading the model, ORT uses 32G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/42b93c66-b957-4f20-a13b-d34cb390afff)

#### With the new configuration
After loading the model, ORT uses 23G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/5abffeff-9ca3-4187-a262-37fd2764fe1b)

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-05-25 02:19:07 -07:00
Sumit Agarwal
70d2dc8209
[DML EP] Fix issue with --dml_path build option (#15972)
### Description
DML_PACKAGE_DIR cmake variable is not getting set properly when dml_path
build options is used.


### Motivation and Context
- Why is this change required? What problem does it solve?
It is required for DML Perf dashboard.
<!--- If it fixes an open issue, please link to the issue here. -->
2023-05-24 19:20:40 -05:00
yf711
105f5f0f20
Avoid trt deprecated api warnings shown as errors during ORT-TRT build (#16035)
### Description
Avoid trt deprecated api warnings shown as errors when building
onnxruntime_test_all
This issue is only visible when installing trt via binaries, rather than
deb/rpm pkg (CI pipelines)


The change is similar to existing set_property for
onnxruntime_providers_tensorrt

89ea503024/cmake/onnxruntime_providers.cmake (L421)

### Motivation and Context

onnxruntime/test/unittest_main/[test_main.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/unittest_main/test_main.cc#L32)
includes nvinfer.h, which includes deprecated trt apis and and generates
warnings.
When building onnxruntime_test_all, it will show warnings as errors and
block the build.

### Doubts
Although this issue is visible on trt tar binaries but not on trt
deb/rpm pkgs,
Their file size&hash are the same (creation time vary), regarding
headers/libs installing in different ways.
| tarBin | pkg |
| ------------------------------------------------------------ |
------------------------------------------------------------ |
| 997284784 Apr 26 15:15 libnvinfer_builder_resource.so.8.6.1 |
997284784 Apr 26 22:21 libnvinfer_builder_resource.so.8.6.1 |
| 235369632 Apr 26 15:14 libnvinfer.so.8.6.1 | 235369632 Apr 26 22:21
libnvinfer.so.8.6.1 |
2023-05-24 13:19:27 -07:00
PeixuanZuo
2fddc65c8c
[ROCm] add hipblaslt into GemmFastGelu TunableOp (#15945)
add hipblaslt into GemmFastGelu TunableOp.
2023-05-23 11:07:09 +08:00
RandySheriffH
d35361bf9d
Fix python pipeline for AzureEP without using root (#16023)
Fix python pipeline for AzureEP without using root, this is for 1.15.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-22 16:38:47 -07:00
Changming Sun
0204594f90
Cleanup WASM cmake code (#15996)
### Description
Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if
(CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code
look more nature.
For example,

```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY)
```
becomes
```cmake
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
```
2023-05-20 18:07:39 -07:00
RandySheriffH
4dfb89b3ad
Implement mutex-free spin lock for task queue (#14834)
Implemented "lock-free" spinlock to save CPU usage on context switching.
The change has been tested on queene service of Ads team, the lock-free
version of ort (40 threads) saves CPU usage on gen8 (128 logical
processors on 8 numa nodes) windows by nearly half, from 65% to 35%.

For 32 cores, the curve is flat:

Anubis, 32 vCPU, windows, hugging face models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
 alvert_base_v2 | 34.21 | 34.09
 bert_large_uncased | 116.27| 117.84
 bart_base | 72.06 | 71.99
 distilgpt2 | 25.43 | 25.02
 vit_base_patch16_224 | 37.33 | 37.76

Anubis, 32 vCPU win, Linux, 1st party models,
95 percentile E2E latency in ms:

model | mutex(ms) | mutex-free
--- | --- | ---
deepthink_v2 | 24.35 | 22.95
bing_feeds |  36.96 | 36.48
deep_writes |  14.46 | 14.32
keypoints |  9.34 | 7.69
model11 |  1.71 | 1.66
model12 |  1.82 | 1.44
model2 |  4.21 | 3.95
model6 |  1.08 | 1.05
agiencoder |  0.99 | 0.93
geminet_transformer |  5.32 | 5.24

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-19 10:12:10 -07:00
Patrice Vignola
310b22aa0c
[DML EP] Update DirectML version to 1.12.0 (#16011) 2023-05-18 19:37:12 -07:00
Ashwini Khade
0c815a95b7
android package fix (#15999)
### Description
This PR adds the training headers to the training android packages.


### Motivation and Context
Training headers need to be added as part of the training android
packages, however because of the typo in the cmake these headers were
not being added. This PR fixes the issue.
2023-05-18 09:21:03 -07:00
Changming Sun
842b1a3472
Revert a change in #15797: restore the correct version of emsdk (#15995)
### Description
Revert a change in #15797: restore the correct version of emsdk


### Motivation and Context
Without change, when you build it on Windows you will see:
```
2023-05-17 19:41:30,093 build [INFO] - Activating emsdk...
2023-05-17 19:41:30,093 util.run [INFO] - Running subprocess in 'C:\src\onnxruntime2\cmake\external\emsdk'
  'C:\src\onnxruntime2\cmake\external\emsdk\emsdk.bat' activate 3.1.37
error: tool or SDK not found: '3.1.37'
```
2023-05-18 07:41:38 -07:00
kailums
f62f722c70
integrate triton into ort (#15862)
### Description
In some scenarios, the triton written kernels are more performant than
CK or other handwritten kernels, so we implement a framework that
onnxruntime can use these triton written kernels.

This PR is to integrate triton into ort, so that ort can use kernels
that written and compiled by triton.

The main change focus on two part:
1. a build part to compile triton written kernel and combine these
kernels into libonnxruntime_providers_rocm.so
2. a loader and launcher in c++, for loading and launch triton written
kernels.

#### Build

To compile triton written kernel, add a script
`tools/ci_build/compile_triton.py`. This script will dynamic load all
kernel files, compile them, and generate `triton_kernel_infos.a` and
`triton_kernel_infos.h`.

`triton_kernel_infos.a` contains all compiled kernel instructions, this
file will be combined into libonnxruntime_providers_rocm.so, using
--whole-archive flag.

`triton_kernel_infos.h` defines a const array that contains all the
metadata for each compiled kernel. These metadata will be used for load
and launch. So this header file is included by 'triton_kernel.cu' which
defines load and launch functions.

Add a build flag in build.py and CMakeList.txt, when building rocm
provider, it will call triton_kernel build command, and generate all
necessary files.

#### C++ Load and Launch

On c++ part, we implement load and launch functions in triton_kernel.cu
and triton_kernel.h.

These two files located in `providers/cuda`, and when compiling rocm,
they will be hipified. so this part supports both cuda and rocm. But
currently we only call triton kernel in rocm.

We also implement a softmax triton op for example. Because there will
generate many kernels for different input shape of softmax, we use
TunableOp to select the best one.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-17 09:35:28 +08:00
cloudhan
dc383ed4ce
Basic CSharp packaging support for ROCm EP (#15535)
This PR mainly fixes building errors when trying to build nupkg for ROCm EP.
It also slighly improve the packaging logic so that devlopers can
produce the nupkg on linux natively.
2023-05-16 07:27:38 +08:00
Dmitri Smirnov
896a963492
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921)
### Description

This PR partially reverts changes introduced in
https://github.com/microsoft/onnxruntime/pull/15643

We make two API return std::string always in UTF-8.

We also move the entry points from OrtApiBase to OrtApi to make them
versioned.

### Motivation and Context

`GetVersionString` always returns x.y.z numbers that are not subject to
internationalization.
`GetBuildInfoString` can hold international chars, but UTF-8 should be
fine to contain those.
We prefix them with u8"" in case the compiler default charset is not
UTF-8.
Furthermore, creating platform dependent APIs is discouraged.
`ORTCHAR_T` is platform dependent and was created for paths only.
On non-unix platforms would still produce `std::string` that can only
contain UTF-8

The API was introduced after the latest release, and can still be
adjusted.
2023-05-13 13:45:07 -07:00
RandySheriffH
7c4e8267e7
Implement openAI endpoint invoker for nuget (#15797)
Implement openAI audio endpoint, and enable nuget packaging.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-11 22:04:02 -07:00
Jian Chen
1a73d61829
Update eigen to 3.4 and remove the eigen from git submodule (#15875)
### Description
Update eigen to 3.4 and remove the eigen from git submodule

### Motivation and Context
We need to have eigen 3.4 for c++20
2023-05-11 11:56:59 -07:00
Changming Sun
7c58d013aa
Remove Ubuntu 18.04 usages (#15781)
### Description
Remove Ubuntu 18.04 usages because it will be EOL this month.

### Motivation and Context
2023-05-11 11:44:00 -07:00
sdegrande
cf062dbdb1
FlatBuffers fails to compile with gcc13. (#15787)
When building the FlatBuffers dependencies, gcc13 emits a
stringop-overflow warning. All warnings being turned into errors, that
fails the compilation of FlatBuffers, and as a consequence also fails
the build of onnxruntime.

This commit adds the application of a patch to FlatBuffers's
CMakeList.txt, to add -Wno-error=stringop-overflow to the
CMAKE_CXX_FLAGS.
2023-05-11 11:20:19 -07:00
liqun Fu
ac9ae9f7c5
update onnx release 1.14 for docker files (#15680)
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.


### Motivation and Context

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-05-10 13:15:56 -07:00
Sumit Agarwal
b473e3f3c6
[DML EP] Update DirectML version to 1.11.0 (#15858)
### Description
- Update DML version to 1.11.0
- Disable Gemm+Softmax fusion



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-09 12:48:15 -07:00
Wanming Lin
00b1e79e04
Support WebNN EP (#15698)
**Description**: 

This PR intends to enable WebNN EP in ONNX Runtime Web. It translates
the ONNX nodes by [WebNN
API](https://webmachinelearning.github.io/webnn/), which is implemented
in C++ and uses Emscripten [Embind
API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#).
Temporarily using preferred layout **NHWC** for WebNN graph partitions
since the restriction in WebNN XNNPack backend implementation and the
ongoing
[discussion](https://github.com/webmachinelearning/webnn/issues/324) in
WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW'
layouts. No WebNN native EP, only for Web.

**Motivation and Context**:
Allow ONNXRuntime Web developers to access WebNN API to benefit from
hardware acceleration.

**WebNN API Implementation Status in Chromium**:
- Tracked in Chromium issue:
[#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291)
- **CPU device**: based on XNNPack backend, and had been available on
Chrome Canary M112 behind "#enable-experimental-web-platform-features"
flag for Windows and Linux platforms. Further implementation for more
ops is ongoing.
- **GPU device**: based on DML, implementation is ongoing.

**Open**:
- GitHub CI: WebNN currently is only available on Chrome Canary/Dev with
XNNPack backend for Linux and Windows. This is an open to reviewers to
help identify which GitHub CI should involved the WebNN EP and guide me
to enable it. Thanks!
2023-05-08 21:25:10 -07:00
Yulong Wang
0457fd0b40
upgrade emsdk to 3.1.37 (#15817)
### Description
upgrade emsdk to 3.1.37

WIP branch to debug the mystery memory issue in web assembly
multi-thread build.
2023-05-08 16:49:47 -07:00
Guenther Schmuelling
5a43828b3d
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688)
needed to get tokenizers/decode for whisper

---------

Co-authored-by: Shalva Mist <shalvamist@microsoft.com>
2023-05-05 09:48:07 -07:00
cloudhan
412d05a1d2
[ROCm] Update cmake (#15807)
Followup of #15775
2023-05-04 11:20:56 -07:00
Yulong Wang
33d1372729
[wasm] revert emsdk to v3.1.19 (#15793)
### Description
latest emsdk generated multi-thread version sometimes crash with unknown
reason ( error: memory access out of bounds ).

we don't want to break existing ort-web users, so revert emsdk back to
3.1.19 (same to what ort v1.14.0 uses)
2023-05-04 01:15:01 -07:00
Baiju Meswani
ba7b83ff3c
Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776) 2023-05-03 13:08:35 -07:00
Changming Sun
41c082fdde
Add a Github workflow for Prefast (#15763) 2023-05-03 11:42:51 -07:00
Changming Sun
328cabb194
Download protoc from Github Release instead of Nuget (#15731)
### Description
Download protoc from Github Release instead of Nuget to avoid having
dependency on nuget.exe on Linux

### Motivation and Context
To avoid having dependency on nuget.exe on Linux. Many users' build
environment do not have nuget or dotnet.
2023-05-02 12:18:59 -07:00
Changming Sun
5352f6d9b0
Make "--cuda_version" build arg optional (#15758)
### Description
This change will allow us building CUDA EP without installing CUDA SDK
on Windows.

### Motivation and Context
Nvidia's CUDA installer comes with a VS extension. In the past, we
require installing the extension. It is a little bit inconvenient since:
1. Visual Studio must be installed before CUDA SDK. CUDA's installer
will not install the extension if your machine doesn't have Visual
Studio.
2. We need to install CUDA SDK on our build machines, instead of just
downloading it and using it.

After this change, we will not need to install CUDA SDK on our build
machines. So it will be easier to add a support for a different CUDA
version.

Also, fix two PreFast warnings.
2023-05-01 18:00:47 -07:00
Ashwini Khade
0ffae8073b
Creating Nuget and Android packages for Training (#15712)
### Description
This PR creates Nuget and Android for Training. 


### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.

## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**

### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**

### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
2023-05-01 12:59:56 -07:00
Sumit Agarwal
4c4f688a93
[DML EP] Fix dml_external_project (#15656)
### Description
While building ORT for DML EP with `dml_EXTERNAL_PROJECT` flag, 2
variables (`DML_SHARED_LIB`, `DML_PACKAGE_DIR`) value is not set
properly.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-05-01 12:02:56 -07:00