Commit graph

45 commits

Author SHA1 Message Date
Dmitri Smirnov
d9de054eb5
Multi-Lora support (#22046)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-30 15:59:07 -07:00
Julius Tischbein
1391354265
Adding CUDNN Frontend and use for CUDA NN Convolution (#19470)
### Description
Added CUDNN Frontend and used it for NHWC convolutions, and optionally
fuse activation.

#### Backward compatible 
- For model existed with FusedConv, model can still run. 
- If ORT is built with cuDNN 8, cuDNN frontend will not be built into
binary. Old kernels (using cudnn backend APIs) are used.

#### Major Changes
- For cuDNN 9, we will enable cudnn frontend to fuse convolution and
bias when a provider option `fuse_conv_bias=1`.
- Remove the fusion of FusedConv from graph transformer for CUDA
provider, so there will not be FusedConv be added to graph for CUDA EP
in the future.
- Update cmake files regarding to cudnn settings. The search order of
CUDNN installation in build are like the following:
  * environment variable `CUDNN_PATH`
* `onnxruntime_CUDNN_HOME` cmake extra defines. If a build starts from
build.py/build.sh, user can pass it through `--cudnn_home` parameter, or
by environment variable `CUDNN_HOME` if `--cudnn_home` not used.
* cudnn python package installation directory like
python3.xx/site-packages/nvidia/cudnn
  * CUDA installation path

#### Potential Issues

- If ORT is built with cuDNN 8, FusedConv fusion is no longer done
automatically, so some model might have performance regression. If user
still wants FusedConv operator for performance reason, they can still
have multiple ways to walkaround: like use older version of onnxruntime;
or use older version of ORT to save optimized onnx, then run with latest
version of ORT. We believe that majority users have moved to cudnn 9
when 1.20 release (since the default in ORT and PyTorch is cudnn 9 for 3
months when 1.20 release), so the impact is small.
- cuDNN graph uses TF32 by default, and user cannot disable TF32 through
the use_tf32 cuda provider option. If user encounters accuracy issue
(like in testing), user has to set environment variable
`NVIDIA_TF32_OVERRIDE=0` to disable TF32. Need update the document of
use_tf32 later.

#### Follow ups
This is one of PRs that target to enable NHWC convolution in CUDA EP by
default if device supports it. There are other changes will follow up to
make it possible.
(1) Enable `prefer_nhwc` by default for device with sm >= 70. 
(2) Change `fuse_conv_bias=1` by default after more testing.
(3) Add other NHWC operators (like Resize or UpSample).

### Motivation and Context

The new CUDNN Frontend library provides the functionality to fuse
operations and provides new heuristics for kernel selection. Here it
fuses the convolution with the pointwise bias operation. On the [NVIDIA
ResNet50](https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/)
we get a performance boost from 49.1144 ms to 42.4643 ms per inference
on a 2560x1440 input (`onnxruntime_perf_test -e cuda -I -q -r 100-d 1 -i
'prefer_nhwc|1' resnet50.onnx`).

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>
2024-08-02 15:16:42 -07:00
Prateek Chokse
12dffef768
added support for cmake "find_package" (#8919)
**Description**: 
Adds support for cmake find_package.

**Motivation and Context**
As mentioned in issue #7150 onnxruntime doesn't have support for CMake
find_package, this PR adds that and also adds the CMake package version
file. Now anyone can link onnxruntime like this:
```cmake
find_package(onnxruntime)
add_executable(test Source.cpp)
target_link_libraries(test PRIVATE onnxruntime::onnxruntime)
```
this also simplifies #3124
2023-06-19 22:20:31 -07:00
Ashwini Khade
ea7bbd667d
fix headers for training apis (#14350)
### Description
Minor refactor PR for fixing header placement for training apis
2023-01-19 10:26:53 -08:00
Ashwini Khade
d92c663f28
Create dedicated build for training api (#14136)
### Description
Enable creating dedicated build for on device training. With this PR we
can build a lean binary for on device training using flag
--enable_training_apis. This binary includes only the essentials like
training ops, optimizers etc and NOT features like Aten fallback,
strided tensors, gradient builders etc . This binary also removes all
the deprecated components like training::TrainingSession and OrtTrainer
etc

### Motivation and Context
This enables our partners to create a lean binary for on device
training.
2023-01-10 20:58:04 -08:00
Ashwini Khade
68b5b2d7d3
Refactor training build options (#13964)
### Description
1. Renames all references of on device training to training apis. This
is to keep the naming general. Nothing really prevents us from using the
same apis on servers\non-edge devices.
2. Update ENABLE_TRAINING option: With this PR when this option is
enabled, training apis and torch interop is also enabled.
3. Refactoring for onnxruntime_ENABLE_TRAINING_TORCH_INTEROP option: 
   -  Removed user facing option
- Setting onnxruntime_ENABLE_TRAINING_TORCH_INTEROP to ON when
onnxruntime_ENABLE_TRAINING is ON as we always build with torch interop.

Once this PR is merged when --enable_training is selected we will do a
"FULL Build" for training (with all the training entry points and
features).
Training entry points include:
1. ORTModule
2. Training APIs

Features include:
1. ATen Fallback
2. All Training OPs includes communication and collectives
3. Strided Tensor Support
4. Python Op (torch interop)
5. ONNXBlock (Front end tools for training artifacts prep when using
trianing apis)

### Motivation and Context
Intention is to simply the options for building training enabled builds.
This is part of the larger work item to create dedicated build for
learning on the edge scenarios with just training apis enabled.
2023-01-03 13:28:16 -08:00
Changming Sun
05137e6ec4
Use target name for flatbuffers (#13991)
### Description

Use target name for flatbuffers.
Add version range for flatbuffers. It is similar to #13870 
### Motivation and Context
To fix a build error:
```
CMake Error at onnxruntime_graph.cmake:88 (add_dependencies):
  The dependency target "flatbuffers" of target "onnxruntime_graph" does not
  exist.
Call Stack (most recent call first):
  CMakeLists.txt:1490 (include)
```

It happens when flatbuffers library is already installed. For example,
on Ubuntu people may get it from apt-get. But, the one provided by
Ubuntu 20.04 is not compatible with our code. The one in Ubuntu 22.04
works fine.
2022-12-20 11:44:02 -08:00
Ashwini Khade
6090d8cd6e
Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888)
### Description
Fix usage of enable_training_ops and reduce ifdef complexity for
training builds.




### Motivation and Context
This is the second refactoring PR towards creating a dedicated build for
on device training. This PR aims to reduce some complexity. We can set
ENABLE_TRAINING_OPS in cmake when either ENABLE_TRAINING or
ENABLE_TRAINING_ON_DEVICE is selected, this way we dont have to use if
defined(ENABLE_TRAINING) || defined(ENABLE_TRAINING_ON_DEVICE )
everywhere in the code.

- If it fixes an open issue, please link to the issue here. -->
2022-12-14 08:32:46 -08:00
cloudhan
f79d38181b
Fix hipify to avoid nccl_service.h: No such file or directory (#13852)
Fix various flaky build error due to onnxruntime_session missing dependencies on hipify generated files.
2022-12-07 09:10:37 +08:00
Changming Sun
04900f96c1
Improve dependency management (#13523)
## Description
1. Convert some git submodules to cmake external projects
2. Update nsync from
[1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to
[1.25.0](https://github.com/google/nsync/releases/tag/1.25.0)
3. Update re2 from 2021-06-01 to 2022-06-01
4. Update wil from an old commit to 1.0.220914.1 tag
5. Update gtest to a newer commit so that it can optionally leverage
absl/re2 for parsing command line flags.

The following git submodules are deleted:

1. FP16
2. safeint
3. XNNPACK
4. cxxopts
5. dlpack
7. flatbuffers
8. googlebenchmark
9. json
10. mimalloc
11. mp11
12. pthreadpool

More will come.

## Motivation and Context
There are 3 ways of integrating 3rd party C/C++ libraries into ONNX
Runtime:
1. Install them to a system location, then use cmake's find_package
module to locate them.
2.  Use git submodules 
6.  Use cmake's external projects(externalproject_add). 

At first when this project was just started, we considered both option 2
and option 3. We preferred option 2 because:

1. It's easier to handle authentication. At first this project was not
open source, and it had some other non-public dependencies. If we use
git submodule, ADO will handle authentication smoothly. Otherwise we
need to manually pass tokens around and be very careful on not exposing
them in build logs.
2. At that time, cmake fetched dependencies after "cmake" finished
generating vcprojects/makefiles. So it was very difficult to make cflags
consistent. Since cmake 3.11, it has a new command: FetchContent, which
fetches dependencies when it generates vcprojects/makefiles just before
add_subdirectories, so the parent project's variables/settings can be
easily passed to the child projects.

And when the project went on,  we had some new concerns:
1. As we started to have more and more EPs and build configs, the number
of submodules grew quickly. For more developers, most ORT submodules are
not relevant to them. They shouldn't need to download all of them.
2. It is impossible to let two different build configs use two different
versions of the same dependency. For example, right now we have protobuf
3.18.3 in the submodules. Then every EP must use the same version.
Whenever we have a need to upgrade protobuf, we need to coordinate
across the whole team and many external developers. I can't manage it
anymore.
3. Some projects want to manage the dependencies in a different way,
either because of their preference or because of compliance
requirements. For example, some Microsoft teams want to use vcpkg, but
we don't want to force every user of onnxruntime using vcpkg.
7. Someone wants to dynamically link to protobuf, but our build script
only does static link.
8. Hard to handle security vulnerabilities. For example, whenever
protobuf has a security patch, we have a lot of things to do. But if we
allowed people to build ORT with a different version of protobuf without
changing ORT"s source code, the customer who build ORT from source will
be able to act on such things in a quicker way. They will not need to
wait ORT having a patch release.
9. Every time we do a release, github will also publish a source file
zip file and a source file tarball for us. But they are not usable,
because they miss submodules.
 
### New features

After this change, users will be able to:
1. Build the dependencies in the way they want, then install them to
somewhere(for example, /usr or a temp folder).
2. Or download the dependencies by using cmake commands from these
dependencies official website
3. Similar to the above, but use your private mirrors to migrate supply
chain risks.
4. Use different versions of the dependencies, as long as our source
code is compatible with them. For example, you may use you can't use
protobuf 3.20.x as they need code changes in ONNX Runtime.
6.  Only download the things the current build needs.
10. Avoid building external dependencies again and again in every build.

### Breaking change
The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now 
it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER.
Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build
option, please be aware that this PR will change find_package calls from
Module mode to Config mode. For example, in the past if you have
installed protobuf from apt-get from ubuntu 20.04's official repo,
find_package can find it and use it. But after this PR, it won't. This
is because that protobuf version provided by Ubuntu 20.04 is too old to
support the "config mode". It can be resolved by getting a newer version
of protobuf from somewhere.
2022-12-01 09:51:59 -08:00
cloudhan
2de883c592
Update CK and fix performance issue on dev machine (#13531)
1. Update CK to its latest develop branch
2. `-mllvm -amdgpu-early-inline-all=true` is critical to CK's
performance, ensure it is properly configured.
- The flags are propagated from target `hip-lang::device`'s
`INTERFACE_COMPILE_OPTIONS`, we must not manually add the flags.
- Instead, we must ensure this target is properly configured by checking
_CMAKE_HIP_DEVICE_RUNTIME_TARGET is set.

TL,DR

`hip-lang::device` sometime will be not be properly configured if our
`CMAKE_PREFIX_PATH` is not configured carefully. In the CI docker, the
configuration is in good state, but on dev machine it is not, which then
silently result poor performance for kernels. We fixed it in this PR and
add a guard to avoid unsuccessful future editing and to prevent
convoluted debugging process.

`_CMAKE_HIP_DEVICE_RUNTIME_TARGET ` is shared in
`/opt/rocm/lib/cmake/hip-lang/hip-lang-config.cmake` and it is internal
to
[CMake](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6121/diffs),
the variable name will not be changed in the foreseeable future.
2022-11-03 19:32:30 +08:00
Ashwini Khade
f63e28c92f
C API version 0.001 (#11758)
* C API version 0.001

* fix linker issues

* fixes for save checkpoint api

* plus fixes based on tests

* plus test_runner and other changes

* Plus cosmetic updates

* remove unnecessary headers

* plus some updates

* plus more changes

Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-06-15 11:13:35 -07:00
Jack·Boos·Yu
ea004e953f
[cmake] Export multi targets in static build (#11063)
* [cmake] Export multi targets in static build

* Install more components in static build, format some code

* Fix code pos
2022-04-03 22:37:18 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Zuwei Zhao
89e8bff121
Enable selecting custom ops in onnxruntime-extensions. (#8826)
* Enable selecting custom ops in onnxruntime-extensions.

* Move cmake_helper.py.

* Remove over-indented spaces.

* Add doc.

* Remove onnxruntime-extensions from git submodules, and user should pass path of onnxruntime-extensions for build.

* Modify doc.

* Remove argument --enable_onnxruntime_extensions and use --onnxruntime_extensions_path.

* Fix build error.

* Fix build error.

* Use onnxruntime_extensions_path.

* support both submodule and external source folders

* refinement

* Update cgmanifest.json

* Support building onnxruntime-extensions from either git submodule or pre-pulled path.

* Update doc.

* more standard name

* update docs

* add the copyright header

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-08-27 21:45:52 -07:00
Scott McKay
0034ad72e6
Minimize changes to fix missing symbols used from C# (#8867)
* Revert "Cleanup C# bindings to add EP (#8810)"

This reverts commit b21ea00020.

* Add back in a minimal set of changes.
Provide stubs in for a limited set of things
  - things called from C# using a static lib of ORT built for mac/ios
  - things in OrtApis that are not included in the build by default
  - things in OrtApis that are excluded in a minimal build

* Cleanup order or EPs in test

* Fix unused function in ROCM build
2021-08-28 07:10:14 +10:00
Scott McKay
b21ea00020
Cleanup C# bindings to add EP (#8810)
Fix C# add EP bindings.
Add stubs to ORT so that if EP is not included in the build we return a graceful error message.
Move declaration of stubs into C API and out for EP so they're in one place and are easier to use (no extra header required in the C/C++ world and consistent with the CUDA EP setup).
Fix inconsistency in ROCM EP.
Cleanup a few other things.
2021-08-26 13:59:40 +10:00
Suffian Khan
9fa0d8392a
Extend node debugging utilities to push tensors and node placement to SQL database (#8672)
* adding support for tracing to sqldb instead of files

* use compiled statements

* script to pull tensors from db

* link sqlite3

* remove node info redundant with onnx graph

* addressing PR comments

* address PR comments and include program counter

* third party notice

* use find_pacakge

* add to cgmanifests.json

* address thread safety and add pid suffix

* build fi

* python script to select on devicetype

* remove unpopulated and redundant Shape and Type fields

* comment

* comment

* PR comments

* add graph execution counter to session state

* move increment to inference session

* std::endl to \n

* ifdef on graph execution counter

* add ifdef to inference session

* move DEBUG_NODE_INPUTS_OUTPUTS to CMakeLists.txt
2021-08-21 00:40:12 -07:00
pengwa
39059f2539
enable torch interop build (#8493)
* fix build - python.h not found

* disable --build_shared_lib for ortmodule tests

* fix

* fix the build flag

* disable --build_shared_lib for training path (not only for ortmodule)

* fix missing test model files

* disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON

* enable custom_op_library build

* fix build

* fix

* merge master and fix build failure

* build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON

* resolve comments

* use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"
2021-08-19 09:16:32 +08:00
Wei-Sheng Chin
47b3ecb53b
Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652)
This PR disable UTs in training's package pipelines 
for building packages with PythonOp (torch.autograd.Function).
2021-08-17 10:55:13 -07:00
Changming Sun
0510688411
Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471)
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
2021-07-30 17:16:37 -07:00
Ryan Hill
e04e1d5ce0
Move shared providers CPU providers into separate file (#8293) 2021-07-19 15:19:32 -07:00
Zuwei Zhao
b46310b349
Integrate onnxruntime-extensions into onnxruntime. (#8143)
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-01 09:34:03 -07:00
Changming Sun
b854f2399d
Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632)
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs). 
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2.  (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use. 
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines.  In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build. 
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines. 
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
2021-06-02 23:36:49 -07:00
Sunghoon
1fbc04d691
Enable training ops in inference (#7783)
* Enable training ops in inference

* fix a build error

* relu test name is the same as trainig test
2021-05-21 13:06:14 -07:00
Changming Sun
7b003967b1
Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489) 2021-04-29 11:54:57 -07:00
Edward Chen
0ebeaf529d
Check kernel def hashes (#7120)
Add unit test for verifying kernel def hashes.
Add way to add new types to kernel definition without changing hash.
2021-04-01 17:42:58 -07:00
gwang-msft
7ca8388dc9
[ORT Mobile] file format schema and file I/O code (#4973)
* ort mobile file format schema and [de]serializing code
2020-09-01 11:51:31 +10:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
edgchen1
0ec90f7019
Put safeint_interface include directory into onnxruntime_common interface include directories to simplify usage by other targets. (#3546) 2020-04-16 10:34:32 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00
Scott McKay
a1db87b382
Add SafeInt bounds checking to memory allocation size calculations. (#3022)
* Add SafeInt bounds checking to memory allocation size calculations.

* Fix TensorRT library includes
2020-02-20 11:41:03 -08:00
Hariharan Seshadri
5c2e474751
Add provision in ORT for session options to be parsed when available via model file (#2449)
* Initial commit

* Fix gitmodules

* Nits

* Nits

* Updates

* Update

* More changes

* Updates

* Update

* Some updates

* More changes

* Update

* Update

* Merge

* Update

* Updates

* More changes

* Update

* Fix nits

* Updates

* Fix warning

* Fix build

* Add comment

* PR feedback

* PR feedback

* Updates

* Updates

* Update

* More changes

* Fix build break

* Comment test for now

* Updates

* Updates

* PR feedback

* Updates

* Nits

* Add tests

* Fix build

* Fix build

* Fix build

* Fix build break

* Fix build

* Nits

* PR feedback

* More change

* Expose GetSessionOptions in pybind logic and add unit test for python

* Fix build

* PR feedback

* PR feedback
2019-12-03 16:56:07 -08:00
Changming Sun
fc6773a65b
Add Tracelogging for profiling (#1639)
Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON
2019-11-11 21:34:10 -08:00
Changming Sun
a5da5ff6f4 Remove onnxruntime_USE_EIGEN_THREADPOOL cmake option 2019-10-30 21:51:54 -07:00
Changming Sun
021073b5e5
Update python packaging pipelines (#2167) 2019-10-19 07:42:54 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
Ke Zhang
3bf0e364e2
Move CopyTensor out of IExecutionProvider interface. (#1268)
* add ortdevice class

* add data transfer manager for copying tensors.

* update

* add data trasnfer for gpu

* fix constexpr build break.

* update

* remove unnecessary header files.

* remove unnecessary header files.

* add dependency

* add dependency

* add dependency

* add dependency

* fix linux build break.

* update

* fix build break

* fix build break

* fix build break

* update

* update

* update c api.

* update to not use OrtCreateAllocatorInfo

* change to all eps .

* fix linux build break

* remove useless codes.

* update

* move datatransfermanager in session state

* update

* fix cuda build break.

* fix comments

* fix windows GPU build.

* fix comments

* fix build break

* fix comments

* fix test failure

* update

* fix comments

* fix onnx runtime server.

* update

* fix test failure.

* fix comments

* fix comment
2019-07-11 14:49:20 -07:00
RandySheriffH
cdb27de090
implement python opeartor (#1045)
* implement python opeartor

* format code

* remove dup

* limit type

* format code

* cancel default logging func

* add comment

* fix compile err

* fix comments

* switch to c++ style cast

* implement interop framework

* fix format

* move includes

* cancel needless linking

* fix comment

* add UT

* exclude def
2019-06-07 11:50:23 -07:00
Maik Riechert
ded7eeb033 make builds more robust (#906) (#932) 2019-04-29 12:58:20 -07:00
Ashwin Kumar
492d9fd6cc
Use Eigen ThreadPool in OnnxRuntime (#323)
* switch to nonblocking threadpool in inference session and sessions state

* switch to eigen threadpool - first draft

* refine

* refine

* add a switch to easily revert back to windows thread pool

* switch thread pool in test runner and turn on leak checker

* remove unncessary files

* fix build error

* more build fixes

* catch exceptions in parallel executor

* fix mac build error

* fix mac build error

* more build fixes

* more mac build fixes

* fix cv issue

* change macro to include cuda compiler for  disabled compiler warning

* try switching the macro to win32 only

* test #error

* move #disable warning to the top

* Update onnxruntime_framework.cmake

* move eigen include to public scope

* turn off eigenthreadpool by default and add todo comment
2019-01-15 15:19:30 -08:00
Changming Sun
8cfe8d33a3 Add nsync (#292)
* Add nsync

* nsync2

* nsync3

* fix build

* update comments

* fix build option
2019-01-09 10:40:55 -08:00
Changming Sun
5e113661a9 Build system upgrades (#281)
* update

* runas normal user
2019-01-07 13:15:24 -08:00
Pranav Sharma
7aef8a1cca Sync with internal master. 2018-11-22 20:56:43 -08:00
Pranav Sharma
89618e8f1e Initial bootstrap commit. 2018-11-19 16:48:22 -08:00