onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-23 22:13:38 +00:00

Author	SHA1	Message	Date
Changming Sun	360e2ae11b	Update eigen to the latest to support C++20 (#4817 )	2020-08-17 10:19:48 -07:00
Thiago Crepaldi	42408aa3ed	Add new PytTrch front-end (#4815 ) * Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-08-17 09:45:25 -07:00
Changming Sun	5eec4f66ed	Refactor manylinux docker image and the related pipelines (#4751 ) 1. Publish the image ACR, instead of building it every time for every PR 2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect. 3. Split nuphar and DNNL to separated pipelines. 4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc. 5. Update the manylinux2010_x86_64 image to the latest.	2020-08-17 09:40:31 -07:00
Tang, Cheng	1b1a6a4ca9	Bump onnx to get bfloat16 in ops, and some update in ort to support bfloat16 (#4791 ) * bump onnx to support bfloat16 * sign test code * fix ut failures * add bfloat type in gradient schema * add bfloat16 to gathernd * add bfloat16 into grad op defs * temp disable gpu fusing transformers * bfloat16 support fix * more fix to bfloat * bug ifx * add bfloat16 to transpose matmul * fix sce loss * fix cast opset13 and other missing part of bfloat16 * Revert "temp disable gpu fusing transformers" This reverts commit `b627bc9019`. * add SCEloss back * fix build break * fix gpu failure due to missing kernel in opset13 * add tile opset 13 kernel * Revert "fix gpu failure due to missing kernel in opset13" This reverts commit 661d63d0599029757f240d29afd64b197b76b880. * fix comments in pr * fix cuda break due to opset13 * fix missing msdomain * add nll loss tests into android build's broken list; disable bfloat16 cast tests due to the wrong type saved in onnx test data, will fix it in onnx first Co-authored-by: Cheng Tang <chenta@microsoft.com>	2020-08-16 17:05:40 -07:00
stevenlix	7acef875bb	Fix bugs in TensorRT (#4780 ) * fix bugs * Move -Wno-deprecated-declarations to target compile flag	2020-08-13 16:09:27 -07:00
Yulong Wang	aa993e95c9	enable build flag '--use_openmp' on MacOS (#4774 ) * enable build flag '--use_openmp' on MacOS * cmake 3.16.1 to enable find_package(OpenMP) on mac	2020-08-13 15:56:42 -07:00
George Wu	f12e9de111	build fixes for https://github.com/microsoft/onnxruntime/pull/4721 (#4784 ) * test * test * add missing CUDA header include * debug * fix * fix python package for dnnl and tensorrt. * fix * fix windows build. * revert * target_link_directories for tensorrt shared lib.	2020-08-14 06:24:44 +08:00
Sheil Kumar	8a66ad79a6	Add Experimental WinRT API IDL as placeholder for adding new winrt features (#4736 ) * Add experimental winrt api idl with dummy type to satisfy the build * remove experimental from the api_lib target * make experimental api available on windows builds also * remove /y /d * revert some pathing changes * remove experimental api call from tests * revert cppwinrt cmake changes * switch to stdapi Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-08-12 12:45:19 -07:00
Ryan Hill	ac725b53f6	Convert TensorRT provider into a shared library (#4721 ) Lots of changes to shared library interfaces, new lighter weight design.	2020-08-10 21:17:16 -07:00
stevenlix	77c69a0325	Upgrade TensorRT to v7.1.3.4 (#4704 ) * upgrade to TensorRT 7.1.3.4 * Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4 * fix format issue * fix format issue * fix format issue * Update tensorrt_execution_provider.cc * change cmake version to 3.14 * Remove --msvc_toolset 14.16 * change to onnxruntime::make_unique * use onnxruntime::make_unique * disable some tests for TensorRT * disable some tests for TensorRT * Update upsample_op_test.cc * Update tile_op_test.cc * disable some tests for TensorRT * Update constant_of_shape_test.cc * update parser * Update Dockerfile.ubuntu_tensorrt	2020-08-07 17:43:56 -07:00
gwang-msft	8507bc1f48	[Android NNAPI EP] Enable test for BatchNormalization, enable dev_mode for Android, fix some issues in concat (#4715 ) * update batch_norm test, enable dev_mode for nnapi, ignore onnx protobuf warning for nnapi ep * fix some issues in concat and mark input without shape as not supported for now * address review comments * addressed comments	2020-08-06 14:11:59 -07:00
Boris Fomitchev	6958f49dae	Added Dockerfile and build instructions for Jetson. Also set CUDA arch set automatically. (#4637 ) * Revert "Remove docstrigs if __ONNX_NO_DOC_STRINGS" (#4495) This reverts commit bb4d331fa7bf1fe8d68b1527dda56e4739c80800. * Bump version to 1.4.0 (#4496) * Create N-1 threads in intra-op pool, given main thread now active (#4493) Create N-1 threads in a thread pool when configured with intra-op parallelism of N. This ensures we have N active threads, given that the main thread also runs work. To avoid ambiguity on the value returned, rename ThreadPool::NumThreads method to ThreadPool::DegreeOfParallelism, and make corresponding updates in MLAS and operators. * Conditionally compile without std::is_trivially_copyable to satisfy old GCC versions. (#4510) * Adding CUDA arch flags for NVIDIA Jetson Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Added Dockerfile for Jetson and instructions to build wheel and image Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Removing guess about nvcc location Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Restoring pip3 setuptools install order Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Updated README with links and notes re NVIDIA Docker runtime Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Added mention of nvidia-docker Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Addressing code review comments Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Addressing code review comments Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Co-authored-by: Tiago Koji Castro Shibata <ticastro@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Tim Harris <tiharr@microsoft.com> Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>	2020-07-31 23:49:23 -07:00
RandySheriffH	948a33bdfc	FixPyOpSegFault&MakeItStaticLib (#4600 ) * remove pyop wrapper * add py threading logic * fix doc * fix doc * fix doc * format doc * format doc * format doc * reenable test Co-authored-by: RandySheriffH <rashuai@microsoft.com>	2020-07-28 11:45:25 -07:00
Tiago Koji Castro Shibata	73c99f8269	Set WINVER (#4636 )	2020-07-27 20:24:11 -07:00
Sheil Kumar	efa393e596	WinML should dynamically link against onnxruntime.dll and only system32 for inbox builds (#4615 ) * Dynamically link onnxruntime.dll * fixes * add preceeding backslash to onnxruntime.dll for inbox builds * remove /d * loadlibrary -> loadlibraryex * use loadlibrary system32 option Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-27 09:56:49 -07:00
Sheil Kumar	222fd08f20	DirectML.dll is loaded via LoadLibraryW but should use LoadLibraryExA (#4616 ) * create dml device via loadlibraryexa * add build_INBOX flag to adapter Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-07-25 21:29:46 -07:00
gwang-msft	c2ec3b734b	[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576 ) * remove dependency of external jd-dnnlibrary * remove extra variables not used any more * update /cgmanifest.json	2020-07-22 14:08:12 -07:00
Andrews548	f20afc4991	Update ACL/ArmNN EP (#4571 ) * Add BN to ArmNN EP * Add Concat to ArmNN EP * ACL logging improvements * ArmNN logging improvements * Fallback to CPU for 9x9 convolution in ACL EP * Fallback to CPU for 9x9 convolution in ArmNN EP * Enable python support for ACL and ArmNN EPs when compiled with BSP toolchain * Removed the matmul operator * Fix conv infer shape function * Fix provider_names list for armnn Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-07-21 22:25:58 -07:00
Chi Lo	affdeb53c2	Add Python API for specifying device options. (#4205 ) * Add python API for specifying CUDA device id * Modification for providing session based python api for specifying device id * When include header file pybind11/stl.h, conversion between c++ containers and Python list, vector and dict data structure are automatically enabled. https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html# Therefore, refactor the code for better leverage this advantage. * Make struct CudaDeviceOptions as default cuda device options * Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts) But still stay consistent with existing sess.set_providers(list_of_provider) * Add cuda provider option default setting * Add support for setting cuda cuda_mem_limit and arena_extend_strategy. Also resolved the merge conflict on session.py * Use python ctypes to call cuda library to help python unittest * Refine the code with reviewer's suggestions * Add the capability of getting execution provider's configuration - Once we introduced the capability to set execution provider's configuration, it makes sense to add capability of getting ep's configuration. * Modify the code with reviewer's suggestions. * Using stoull() and stoul() depends on 32/64-bits architecture. * Rewrite the testcases for testing setting CUDA device id Note: We need to make sure every ORT process be run on one CUDA device at a time. * Make sure old session object is destroyed by python gc before new session object is being created * Move testcases to original onnxruntime_test_python.py * Fix bugs to pass CI build * Make it pass CI build (cont.) * Make it pass CI build (cont.)	2020-07-21 07:28:13 -07:00
Changming Sun	c2c4e6760b	Fix code sign validation errors in nuget and nodejs pipeline (#4527 )	2020-07-20 14:18:47 -07:00
Wei-Sheng Chin	21d2728974	Revise pipeline schedule to consider communication ops (#4524 ) * Revise pipeline schedule to consider communication ops * Add test * Fix warning * inline some short functions * Fix warnings * Rename a class * Add comment for test * op renamed to task * Fix NVTX wrapper's bug	2020-07-17 10:04:56 -07:00
Changming Sun	8ada440961	Move model tests to onnxruntime_test_all (#4521 ) 1. Move model tests to onnxruntime_test_all 2. Publish TestResults of Windows CI build.	2020-07-15 16:46:18 -07:00
stevenlix	0ebe2fab51	Refactor TensorRT EP code to better handle dynamic shape subgraphs (#4504 ) * build engine in runtime for dynamic shape subgraphs * Update TensorRT-ExecutionProvider.md * Update TensorRT-ExecutionProvider.md * fix build issue * Add more instructions on how to use engine caching * add precision to trt node name * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc	2020-07-15 02:35:42 -07:00
Yufeng Li	5dc7339be6	Add quantization tool to python package (#4458 ) * Add quantization tool to python package	2020-07-08 21:42:53 -07:00
Tracy Sharpe	aa06d308a6	Build new AVX file with /ARCH:AVX (#4442 ) Build new file with /ARCH:AVX on Windows to ensure correct vzeroupper behavior.	2020-07-07 12:00:12 -07:00
EronsJ	632b2896f3	Onnxruntime fuzzing (#4341 ) * Add protobuf mutator library as a git submodule * Added files and instructions to build the protobuf mutator library in CMake * Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019' * Added src files and build instructions for the main fuzzing engine * Removed Random number generation test from inside the engine * Added license header to files * Removed all pep8 violations introduced by this change and other E501 violations	2020-07-06 16:34:34 -07:00
Christian Goll	3588484336	use system libnsync (#4377 ) * use system libnsync	2020-07-06 07:53:22 -07:00
suffiank	f6bf66c8cf	Adjustments to MPI and NCCL library discovery on build (#4407 ) * cmake edits for mpi_home and nccl_home * cmake syntax error on else	2020-07-02 12:03:42 -07:00
Tiago Koji Castro Shibata	7fea332f93	Support builds without RTTI (#4333 ) * Support builds without RTTI * Disable RTTI in all builds	2020-07-01 13:05:35 -07:00
Zhang Lei	94c98aa0a7	qlinaradd for arm/sse2/avx2 using intrinsic, enable binary broadcasting parallel (#4216 ) * Support quantization linear binary element wise math ops, implement QLinearAdd. Support tests for quantization linear binary element wise math ops, implement test for QLinearAdd. Add QlinearAdd with SSE2 intrisinc implemntation, Avx2 assembly implemntation, Neon intrisinc support. QLinearAdd support VectorOnVector, VectorOnScalar, ScalarOnVector. Generalized QlinearBinaryOp parallel related with broadcasting. * Modify according to PR feedbacks. Mainly: * template helper for generalize the qladd logic on v2v, s2v, v2s * remove GetKernel related. * change mixed lagecy MM/SSE code in the AVX code * formater, typos, convensions, etc. * Utilize MlasSubtractInt32x4 in MlasDequantizeLinearVector(). * Some format fix. * More nature parallel parameter type. * Fix build break for x86. * Comment goes to 80 before wrap. * Many change on assembly on Marco related. Using vminps than vpminsd to handle NaN. tested on windows. * Using CLang Format to format the file. * Fix arm32 build error. * Remove some duplicate in different #if defined * working add.u8.vector to vector * Fix runtime bus error on real arm32 linux. * fix typo in store last one lane. * arm32 qlinearadd handle scalar. * Move qladd to seperate c++ file * Add neon64 qladd. * refactor some, enhance two instructions on arm64 only instructions * Fix typo for arm64 * use strict op in pure c++ (min/max on float value) * sse2 new version. * mrege arm/sse2/avx2 * pass arm/sse/avx2 linux test * remove non-used assembly file. * Remove unused data definition and tailing spaces. * Fix broadcasting parallel issue. * Enhance broadcasting scenarios. Allow testing result diff due to round on half. * Add Mlas or MLAS_ prefix for namespace safety. * Handle alignment issue for arm32 for GCC/MSVC. remove some unused signed/unsigned int ops. * Specify /arch:AVX2 for qladd_avx2.cpp * Fix type during copy/paste when unrolling. Better one GreatEqual condition. Better formater by splitting two statements on single line. * Arm neon alignment parameter is bits rather than bytes, change it. * Move qladd_avx2.cpp to intrinsics/avx2/ folder * Formatting using mlas style. * Double check mlas style for these files. * change indent 2 to 4 for qladd_avx2.cpp * Fix windows x86 build error due to sse2 no _mm_cvtsi128_si64 * To re-trigger all as old failed pipeline updated. Co-authored-by: Lei Zhang <phill.zhang@gmail.com>	2020-07-01 11:54:44 -07:00
Weixing Zhang	2601f8e1b4	Support to build CUDA EP for NV Ampere GPU (#4345 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-29 21:46:13 -07:00
gwang-msft	9e0f5fc7af	The initial PR for NNAPI EP (#4287 ) * Move nnapi dnnlib to subfolder * dnnlib compile settings * add nnapi buildin build.py * add onnxruntime_USE_NNAPI_BUILTIN * compile using onnxruntime_USE_NNAPI_BUILTIN * remove dnnlib from built in code * Group onnxruntime_USE_NNAPI_BUILTIN sources * add file stubs * java 32bit compile error * built in nnapi support 5-26 * init working version * initializer support * fix crash on free execution * add dynamic input support * bug fixes for dynamic input shape, add mul support, working on conv and batchnorm * Add batchnormalization, add overflow check for int64 attributes * add global average/max pool and reshape * minor changes * minor changes * add skip relu and options to use different type of memory * small bug fix for in operator relu * bug fix for nnapi * add transpose support, minor bug fix * Add transpose support * minor bug fixes, depthwise conv weight fix * fixed the bug where the onnx model input has mismatch order than the nnapi model input * add helper to add scalar operand * add separated opbuilder to handle single operator * add cast operator * fixed reshape, moved some logs to verbose * Add softmax and identity support, change shaper calling signature, and add support for int32 output * changed the way to execute the NNAPI * move NNMemory and InputOutputInfo into Model class * add limited support for input dynamic shape * add gemm support, fixed crash when allocating big array on stack * add abs/exp/floor/log/sigmoid/neg/sin/sqrt/tanh support * better dynamic input shape support; * add more check for IsOpSupportedImpl, refactored some code * some code style fix, switch to safeint * Move opbuilders to a map with single instance, minor bug fixes * add GetUniqueName for new temp tensors * change from throw std to ort_throw * build settings change and 3rd party notice update * add readme for nnapi_lib, move to ort log, add comments to public functions, clean the code * add android log sink and more logging changes, add new string for NnApiErrorDescription * add nnapi execution options/fp16 relax * fix a dnnlibrary build break * addressed review comments * address review comments, changed adding output for subgraph in NnapiExecutionProvider::GetCapability, minor issue fixes * formatting in build.py * more formatting fix in build.py, return fail status instead of throw in compute_func * moved android_log_sink to platform folder, minor coding style changes * addressed review comments	2020-06-26 00:02:39 -07:00
Changming Sun	deea945f80	Remove openmp and scipy from build pipelines (#4305 ) 1. Remove openmp because the default thread pool is already good enough. 2. Remove scipy from build pipelines because it stops support python 3.5.	2020-06-23 20:18:16 -07:00
Yufeng Li	867ba846f7	Implement MinMax with SIMD (#4285 ) * Implement MinMax with SIMD	2020-06-23 20:07:53 -07:00
Pranav Sharma	2204d39a06	Add build option to disable traditional ML ops from the binary. (#4272 ) * Add build option to disable traditional ML ops from the binary. * Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources. * Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.	2020-06-20 06:36:06 -07:00
Yang Chen	a490beedf1	update tvm submodule (#4284 )	2020-06-19 14:51:18 -07:00
goloskokovic	478b923e19	Expose ACL/ARMNN providers to Python (#4260 ) * expose ACL/ARMNN providers to python * add -acl / -armnn to package name when use_acl / use_armnn is specified * build python wheel for ARMNN EP * link ACL/ARMNN EPs into onnxruntime_pybind11_state * wrong argument order in build_python_wheel for wheel_name_suffix	2020-06-18 20:24:14 +05:30
Tracy Sharpe	5d773ee57b	MLAS: add sgemv path for aarch64 builds (#4254 ) Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.	2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen	5da849b414	Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230 ) The CMake module is FindProtobuf.cmake [1]. Thus the name should be capitalized so that protobuf can be found on case-sensitive file systems. [1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake	2020-06-17 17:34:47 -07:00
Wei-Sheng Chin	189fb60ef9	Fix a bug and add code to profile memory (#4241 ) * Fix a bug and add code to profile memory 1. Compile Send/Recv again (currently broken because of HOROVOD refactor). 2. Add code to print out initializer allocation size and activation memory size. * Address comments * Split memory counts per locations * Fix a metric	2020-06-16 10:17:27 -07:00
Weixing Zhang	b4b1c6440a	Enable ORT with CUDA 11 toolkit (#4168 ) * ORT on CUDA 11 1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0 * polish the code about MPI/NCCL in CMakeLists.txt and build.py * check CUDA version * ${MPI_INCLUDE_DIRS} should be PUBLIC * sm30, sm50 are deprecated in CUDA 11 Toolkit * update change based on code review feedback. * add sm_52 * improve MPI/NCCL build path Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-15 08:47:03 -07:00
Hariharan Seshadri	b377266eb3	Fix Mac build linker warnings (#4155 )	2020-06-12 21:10:12 -07:00
Tiago Koji Castro Shibata	2e3607c7cd	Remove hardcoded desktop lib (#4193 )	2020-06-12 16:51:54 -07:00
Yulong Wang	73bc6be5d1	build: split nodejs binding build and test to avoid timeout issue (#4188 ) * split nodejs binding build and test * enable nodejs tests	2020-06-10 19:16:32 -07:00
Dmitri Smirnov	af0750ba1b	Java GPu artifact naming (#4179 ) Modify gradle build so artifactID has _gpu for GPU builds. Pass USE_CUDA flag on CUDA build Adjust publishing pipelines to extract POM from a correct path. Co-Authored-By: @Craigacp	2020-06-10 11:15:48 -07:00
Tiago Koji Castro Shibata	8eb6a539bd	Hardcode WinML tests umbrella lib (#4161 )	2020-06-08 15:24:08 -07:00
Tiago Koji Castro Shibata	6bbd18efd0	Hardcode WinML umbrella lib to windowsapp.lib (#4133 )	2020-06-08 11:04:44 -07:00
Andrews548	62b44527e5	Add ArmNN Execution Provider (#3714 ) * Add ArmNN Execution Provider Add a new execution provider targeting Arm architecture based on ArmNN. Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models. reviewed-by: mike.caraman@nxp.com * Minor fixes - renamed onnxruntime_ARMNN_RELU_USECPU to onnxruntime_ARMNN_RELU_USE_CPU - fixed acl typo * remove extra includes. added exception for ArmNN in test * fix indentation * Separated the activation implementation from the cpu and fixed the blockage from the endif Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-06-03 22:57:51 +05:30
Dmitri Smirnov	afca0d15ee	Create Java publishing pipeline (#3944 ) Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.	2020-06-01 18:18:57 -07:00
Pranav Sharma	6c1b2f33b7	Fix crash reported in #4070 . (#4091 ) * Fix crash reported in #4070. * Add newline to warning message * Add comment for using cout instead of the logger	2020-06-01 15:27:14 -07:00

1 2 3 4 5 ...

497 commits