->unsetting the CMAKE_MAP_IMPORTED_CONFIG that was
set for OpenVINO EP for Relwithdebinfo build on
windows.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Update Objective-C API to be more usable from Swift. E.g., to allow conversion from Objective-C methods with trailing NSError** parameter to throwing Swift methods.
Update CMake Objective-C framework setup.
* test
* [gwang] make cmake compile work
* [gwang] enble build apks
* some build update
* add simple sigmoid test android project and cmake
* add build.py
* refine and remove unused import lib
* address CR comments
* remove unnecessary files
* add README.md
* minor update
* remove
* minor change
* fix ci failure and minor update
* fix typo in project folder
* remove
* remove and minor update
* refine
* minor fix
* fix
* fix typo
* add gradle spotlessApply task to fix CI failure
* fix
* enable spotlessApply in build gradle
* revert some changes
* minor fix
* run spotless apply for format
* address CR comments and fix CI version and format
* refine
* Refine
* address comments
* refine
* refine
* modify
* reformat
* resolve version conflicts
* minor update
* minor update
* address comments
* minor update
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
* Check whether nvcc supports -Wstrict-aliasing before adding the compiler flag in CMakeList.txt.
* Removed reinterpret_cast to not cause strict aliasing violation errors or require -Wno-strict-aliasing when it is not available.
* initial draft for kernel invoke api
* initial implementation of kernel invoker
* [eager] fix build on Mac
* [eager] increment input name in kernel invoker
* temp fix for type in eager mode
* use global default log manager
* rollback the previous commit since it break linux build
* Revert "rollback the previous commit since it break linux build"
This reverts commit 58c2c3423a.
* Eager Mode: fix linking on macOS
* optimizer_execution_frame: ignore unused lambda capture (model_path)
* fix link issue
* ORTInvoker: set correct input argument tensor element proto types
Do not set a type proto on output arguments to allow ORT to deduce them
* ORTInvoker: create only one logging manager
* Minor fix to set execution provider type correctly. (#7000)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* training fix
* support config output ml values in frame, so we can use it to implement inplace update
* Fix range loop error while building. (#7087)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Conditionally link with nsync_cpp if not windows. (#7151)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Fixed initialization order in ORT kernel invoker (#7342)
* Updated constructor of ort_kernel_invoker to take a logger.
* Changed linking order.
* Updated test.
* add inplace ut
* add build option
* Update include/onnxruntime/core/eager/ort_kernel_invoker.h
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
* resolve comments in pr
* fix build break;merge from master
* fix build break
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
* Include ORT format model conversion scripts and infrastructure in ORT python package.
- tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).
* Address PR comments
This patch helps to set architecture as power, when processor
check output matches ppc64le*.
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* initial dynamic load example
* support load EP in the provider options
* support dynamic load EP in orttrainer
* split the provider interface; fix comments in pr
* remove experiment code
* add test
* remove useless file
* add test model file;fix linux brewak
* fix linux build and missing file
* fix python build
* fix python build
* fix python binding
* fix python test
* fix runtime path for posix env
* exclude the shared library from minimal build
* fix comments in pr;
* seperate the provider shared lib loading
* excluded from minimal / macos / ios build
* skip copy the provider shared lib for minimal build and mac os
* fix macos build
* exclude the test for macos build
* exclude from andorid build
* exclude from web assembly build
* enable the invalid ep test
Co-authored-by: Cheng Tang <chenta@microsoft.com>
* working on re-organizing js code for ortweb
* remove dup files
* move folder
* fix common references
* fix common es5
* add webpack to common
* split interfact/impl
* use cjs for node
* add npmignore for common
* update sourcemap config for common
* update node
* adjust folder/path in CI and build
* update folder
* nit: readme
* add bundle for dev
* correct nodejs paths
* enable ORT_API_MANUAL_INIT
* set name for umd library
* correct name for commonjs export
* add priority into registerBackend()
* fix npm ci pwd
* update eslintrc
* revise code
* revert package-lock lockfileVersion 2->1
* update prebuild
* resolve comments
* update document
* revise eslint config
* update eslint for typescript rules
* revert changes by mistake in backend.ts
* add env
* resolve comments
Parallelize MinMax, Quantize and batched quantize GEMM
Performance problem identified in T5 decoder model (quantized). DynamicMatMul operator is identified as the culprit. This operator spend time on getting MinMax of a Tensor, quantize a tensor, and perform a batched qgemm. All of these can be parallelized.
Currently GEMM is parallelized. However, in batched GEMM, we sequentially call GEMM multiple times. This causes multiple starting and ending of parallel sections, which can be slow sometimes. So we made the following changes:
Parallel task partition no longer depends on degree of parallelism, only on shape of the matrices.
In a single GEMM, perform 2D partition of the multiplication, along panel lines, to reduce repeated packing.
For batched GEMM, all parallel tasks are executed in a single parallel section, reducing the cost of starting threads and waiting for them to finish.