onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-07 04:39:07 +00:00

Author	SHA1	Message	Date
Cheng Tang	b627bc9019	temp disable gpu fusing transformers	2020-07-03 17:29:03 -07:00
Cheng Tang	a176c29948	Merge branch 'master' into chenta/bfloat16	2020-07-01 15:51:17 -07:00
Yufeng Li	473cd5545f	Simple support of MatMul U8S8 on ARM to pass tests (#4392 )	2020-07-01 15:18:02 -07:00
Bowen Bao	7ec9a73202	deprecate frontend layernorm postpass (#4372 )	2020-07-01 13:06:03 -07:00
Tiago Koji Castro Shibata	7fea332f93	Support builds without RTTI (#4333 ) * Support builds without RTTI * Disable RTTI in all builds	2020-07-01 13:05:35 -07:00
liqunfu	5dcb9b4858	Liqun/backprop deterministic graph (#4315 ) make gradient graph deterministic add to session option use_deterministic_compute.	2020-07-01 12:39:10 -07:00
Zhang Lei	94c98aa0a7	qlinaradd for arm/sse2/avx2 using intrinsic, enable binary broadcasting parallel (#4216 ) * Support quantization linear binary element wise math ops, implement QLinearAdd. Support tests for quantization linear binary element wise math ops, implement test for QLinearAdd. Add QlinearAdd with SSE2 intrisinc implemntation, Avx2 assembly implemntation, Neon intrisinc support. QLinearAdd support VectorOnVector, VectorOnScalar, ScalarOnVector. Generalized QlinearBinaryOp parallel related with broadcasting. * Modify according to PR feedbacks. Mainly: * template helper for generalize the qladd logic on v2v, s2v, v2s * remove GetKernel related. * change mixed lagecy MM/SSE code in the AVX code * formater, typos, convensions, etc. * Utilize MlasSubtractInt32x4 in MlasDequantizeLinearVector(). * Some format fix. * More nature parallel parameter type. * Fix build break for x86. * Comment goes to 80 before wrap. * Many change on assembly on Marco related. Using vminps than vpminsd to handle NaN. tested on windows. * Using CLang Format to format the file. * Fix arm32 build error. * Remove some duplicate in different #if defined * working add.u8.vector to vector * Fix runtime bus error on real arm32 linux. * fix typo in store last one lane. * arm32 qlinearadd handle scalar. * Move qladd to seperate c++ file * Add neon64 qladd. * refactor some, enhance two instructions on arm64 only instructions * Fix typo for arm64 * use strict op in pure c++ (min/max on float value) * sse2 new version. * mrege arm/sse2/avx2 * pass arm/sse/avx2 linux test * remove non-used assembly file. * Remove unused data definition and tailing spaces. * Fix broadcasting parallel issue. * Enhance broadcasting scenarios. Allow testing result diff due to round on half. * Add Mlas or MLAS_ prefix for namespace safety. * Handle alignment issue for arm32 for GCC/MSVC. remove some unused signed/unsigned int ops. * Specify /arch:AVX2 for qladd_avx2.cpp * Fix type during copy/paste when unrolling. Better one GreatEqual condition. Better formater by splitting two statements on single line. * Arm neon alignment parameter is bits rather than bytes, change it. * Move qladd_avx2.cpp to intrinsics/avx2/ folder * Formatting using mlas style. * Double check mlas style for these files. * change indent 2 to 4 for qladd_avx2.cpp * Fix windows x86 build error due to sse2 no _mm_cvtsi128_si64 * To re-trigger all as old failed pipeline updated. Co-authored-by: Lei Zhang <phill.zhang@gmail.com>	2020-07-01 11:54:44 -07:00
Dmitri Smirnov	49268c42da	Change the way java home is set on Mac OS for CI and Java publishing pipeline (#4385 ) * Change the way java_home is set on Mac. * Change the way JAVA_HOME is set on Mac OS	2020-07-01 07:37:14 -07:00
Sherlock	6365760906	BiasDropoutFusion (#4167 ) * Implement BiasDropout Fusion and Kernel Dropout kernel for residual input BiasDropout Fusion to take residual input Fix BiasDropout Kernel Optimize DropoutGrad with 4 elements per thread * Add graph transformer UT * MLTypeCallDispatcher for RatioData * Use MLTypeDispatcher for ratio tensor * Handle traing_mode input for BiasDropout fusion * Add test case for missing ratio input * Replace using FinalizeNodeFusion * Make BiasDropout kernel template-less * Make DropoutGrad template-less * Make Dropout and TrainableDropout template-less * Regenerate onnx file for UT * Minior fix on divmod in BiasDropoutKernel * Adjust pt frontend test due to dropout randomnesss * Make dropout kernel opeartion in fp32 Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-30 15:43:14 -07:00
Ashwini Khade	0404763f23	Update function body initialization for ONNX functions (#4332 ) * Update function body initialization * minor fix * changes per review comments * minor fix * format fix * add function initialization in mixed precision transformer * more updates * more fixes	2020-06-30 14:30:59 -07:00
Cecilia Liu	37b624b688	Match More EmbedLayerNormalization Patterns for Bert Model Graph Fusion (#4354 ) match more embed patterns for bert base cased	2020-06-30 13:12:50 -07:00
Tracy Sharpe	755675541a	NCHWc + Sigmoid optimization (#4360 ) Add support to avoid reordering NCHWc tensors due to the Swish activation (x * sigmoid(x)) in EfficientNet/EfficientDet models.	2020-06-30 10:50:58 -07:00
ytaous	4380b8ba68	adjust bs size (#4375 ) Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-30 10:29:48 -07:00
Ashwini Khade	89c6da99b5	fix output shape calc for matmul (#4362 )	2020-06-30 08:21:20 -07:00
Faith Xu	a4127fc185	Add stale bot (#4323 ) * Add stalebot * Update exemptLabels	2020-06-30 01:51:09 -07:00
Tianlei Wu	55f25a4bbf	Update Attention op to support attention mask for GPT-2 (#4330 ) * Support another two format of mask_index input: 2D attention mask, or 1D mask index with end and start positions. * Update dynamic axes of gpt2 with past state * Update script to fuse model with attention mask	2020-06-29 23:26:23 -07:00
Weixing Zhang	2601f8e1b4	Support to build CUDA EP for NV Ampere GPU (#4345 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-29 21:46:13 -07:00
Hariharan Seshadri	465140b384	Misc fixes to Conv and ConvTranspose CUDA kernels (#4281 )	2020-06-29 16:07:42 -07:00
Changming Sun	35a048ef9b	Ignore one failed test in DML (#4366 ) 2020-06-29 08:51:32.9157882 [E:onnxruntime:Default, runner.cc:452 DataRunner::RunTaskImpl] keras2coreml_Dense_ImageNet:output=output1:expected 0.233292 (3e6ee400), got 0.231783 (3e6d587b), diff: 0.00150879, tol=0.00123329 idx=52. 1 of 255 differ	2020-06-29 14:27:06 -07:00
gwang-msft	5f4e63ede6	Add nhwc support for NNAPI EP, add concat op, handle concurrent calls to NNAPI model (#4356 ) * add support to internally transpose nchw input to nhwc and only transpose back if it is necessary * more changes in nchw<->nhc, fixed small issue in concat * Add option for NNAPI to run on [all device]s/[cpu onl]y/[non-cpu only] * minor code style changes	2020-06-29 11:55:45 -07:00
Tiago Koji Castro Shibata	88402f5293	Make DML operator registration constexpr (#4219 ) * Make DML operator registration constexpr * Refactor requiredConstantCpuInputs template * Revert "Refactor requiredConstantCpuInputs template" MSVC crashes compiling the new constexpr with "Internal compiler error" * Fix braces style	2020-06-29 00:54:43 -07:00
Hariharan Seshadri	012aaa6491	Minor optimization in CUDA Reduction ops (#4353 )	2020-06-28 01:14:28 -07:00
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
S. Manohar Karlapalem	4a1ecd9879	[OpenVINO] Upgrade OpenVINO docker base to Ubuntu 18.04 (#4346 ) * update deps installer to ov 2020.3 * Upgrade docker base to Ubuntu 18.04	2020-06-27 01:57:47 -07:00
Du Li	d1777910a8	fix onnx server build failure. (#4347 )	2020-06-26 15:15:58 -07:00
liqunfu	c3c4ce5ceb	refactor prototypes into headers (#4337 ) * refactor prototypes into headers	2020-06-26 12:02:14 -07:00
Yufeng Li	fc5e65a22d	Add quantization support for GPT2 past state and use model to generate outputs in OpTester (#4340 ) * Make quantization support GPT2 past state * Make OpTester to be able to generate reference outputs with a model. With it, there is no need to compute outputs manually, which are impossible for some cases.	2020-06-26 09:29:29 -07:00
S. Manohar Karlapalem	ceedf126a2	[nGraph] Deprecation notice for nGraph EP (#4344 )	2020-06-26 01:15:34 -07:00
ytaous	381f4c442a	LayerNormFusion - Cast support (#4320 ) * cast support for layernormfusion * cast support for layernormfusion * bug fix * fix build * bug fix * fix test * minor refactor * on comments * on comments * on comments * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-26 00:04:12 -07:00
gwang-msft	9e0f5fc7af	The initial PR for NNAPI EP (#4287 ) * Move nnapi dnnlib to subfolder * dnnlib compile settings * add nnapi buildin build.py * add onnxruntime_USE_NNAPI_BUILTIN * compile using onnxruntime_USE_NNAPI_BUILTIN * remove dnnlib from built in code * Group onnxruntime_USE_NNAPI_BUILTIN sources * add file stubs * java 32bit compile error * built in nnapi support 5-26 * init working version * initializer support * fix crash on free execution * add dynamic input support * bug fixes for dynamic input shape, add mul support, working on conv and batchnorm * Add batchnormalization, add overflow check for int64 attributes * add global average/max pool and reshape * minor changes * minor changes * add skip relu and options to use different type of memory * small bug fix for in operator relu * bug fix for nnapi * add transpose support, minor bug fix * Add transpose support * minor bug fixes, depthwise conv weight fix * fixed the bug where the onnx model input has mismatch order than the nnapi model input * add helper to add scalar operand * add separated opbuilder to handle single operator * add cast operator * fixed reshape, moved some logs to verbose * Add softmax and identity support, change shaper calling signature, and add support for int32 output * changed the way to execute the NNAPI * move NNMemory and InputOutputInfo into Model class * add limited support for input dynamic shape * add gemm support, fixed crash when allocating big array on stack * add abs/exp/floor/log/sigmoid/neg/sin/sqrt/tanh support * better dynamic input shape support; * add more check for IsOpSupportedImpl, refactored some code * some code style fix, switch to safeint * Move opbuilders to a map with single instance, minor bug fixes * add GetUniqueName for new temp tensors * change from throw std to ort_throw * build settings change and 3rd party notice update * add readme for nnapi_lib, move to ort log, add comments to public functions, clean the code * add android log sink and more logging changes, add new string for NnApiErrorDescription * add nnapi execution options/fp16 relax * fix a dnnlibrary build break * addressed review comments * address review comments, changed adding output for subgraph in NnapiExecutionProvider::GetCapability, minor issue fixes * formatting in build.py * more formatting fix in build.py, return fail status instead of throw in compute_func * moved android_log_sink to platform folder, minor coding style changes * addressed review comments	2020-06-26 00:02:39 -07:00
Negin Raoof	37cbe8551d	Adding export registration and tests for custom ops (#4248 )	2020-06-25 22:29:02 -07:00
Josh Bradley	990b43ddf2	Add modern C++ standards to the C++ API (#4217 ) As a zero-cost wrapper around the C API, the current state of the C++ API is still pretty low-level and requires programmers to use C-style standards to interact with ONNX.	2020-06-25 22:28:00 -07:00
Tracy Sharpe	72fb5183d4	Fix Windows ARM64 break (#4343 )	2020-06-25 21:06:18 -07:00
Chih-Hsuan Yen	a37e2e33b4	Add compatibility with Protobuf 3.12 (#4291 ) In Protobuf 3.12, classes generated from protobuf files are declared as `final`, so use those classes as members rather than base classes. Ref: https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.0	2020-06-25 20:34:08 -07:00
Changming Sun	5db67ec000	Fix python package issue and upgrade the linux image to 2010 (#4342 ) 1. Increase job timeout, while we are investigating why the tests take much longer 2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy) 3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.	2020-06-25 20:22:39 -07:00
Shucai Xiao	bfc888613f	Migraphx improvements (#4328 ) * Add amd migraphx execution provider to onnx runtime * rename MiGraphX to MIGraphX * add migraphx EP to tests * support multiple program output * disable more tests * backup changes related to program multiple outputs * remove logging code * remove unnecessary changes in migraphx_execution_provider.cc * add migraphx EP to tests * add input requests of the batchnorm operator * add to support an onnx operator PRelu * update migrapx dockerfile and removed one unused line * chagnes related to support dynamic input shape * fix build error * code backup * code backup * version that has 106 models run correctly * code backup * code backup * remove unnecessary print info * code backup * code backup * code backup * code backup * code backup * code backup * changes corresponding to migraphx change * fix merge conflict * minor code cleanup * code cleanup * remove unnecessary code * remove unnecessary code * add to support more constant folding analysis * more constant folding checking for shape input * add env var to control whether fp16 is enabled. Modify docker file to use ROCM3.3 * fix function name to avoid build error * add build and execution instruction for migraphx execution provider * added more build instructions * fixed a small format error * a minor change * fix review comments * another minor change * additional refinement of the documents * additional changes * remove unnecessary changes in the dockfile * additional changes for the dockerfile * code change backup * fix errors related to a few unit tests * fix a build error related to api change * fix unit test errors by either disabling the test or fix related isssues * remove unnecessary log info * sync submodule tvm with master * remove unnecessary changes * remove an unnecessary code line * refine documents for addition example	2020-06-25 19:22:57 -07:00
edgchen1	0b450dcd9f	Enable BiasGelu fusion for training (#4146 ) Add gradient for BiasGelu and FastGelu with bias. Enable BiasGeluFusion and GeluApproximation transformers in TrainingSession.	2020-06-25 17:48:12 -07:00
Faith Xu	b544f5c83c	Sample updates (#4303 ) * Add section for product integrations * Wording updates	2020-06-25 16:09:17 -07:00
Du Li	645a988c04	Support binding input only for IOBinding in python api. (#4079 ) * Support binding input only in python api. * Addressing PR comments. * fixing build issues	2020-06-25 12:20:02 -07:00
Dmitri Smirnov	a08805daf9	Fix a minor typon in POM file name (#4250 ) Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-06-25 11:15:14 -07:00
Tim Harris	3fc68cb150	Remove non-trivially-destructible thread-local from thread pool state, blocking ARM64 builds (#4336 ) - Move thread hint vectors from thread-local struct - Add static_assert that the per-thread state in the thread pool is trivially-destructible - Rename "thread_data" to "worker_data" (only allocated for workers in the pool, not threads calling into the pool)	2020-06-25 19:04:31 +01:00
George Wu	a3b466cdf1	fix python ep default ordering. (#4335 ) * fix python ep default ordering. cpu provider should be last. * add comment. * add test case to ensure no regressions for get_all_providers(). * expand on get_all_providers() api documentation	2020-06-25 04:25:43 -07:00
Prabhat	151ef1c8a5	Add C++ wrapper for GetAvailableProviders() C API (#4313 )	2020-06-25 13:11:55 +05:30
edgchen1	a6d10376df	Fix build error when USE_NCCL is defined. (#4334 )	2020-06-24 23:32:31 -07:00
Josh Bradley	0d9db2b28d	add informative error message regarding symbolic dimensions (#4297 ) * add informative error message regarding symbolic dimensions * fix code format and move negative value check in for loop	2020-06-25 11:56:14 +10:00
Aaron Bockover	64264c3846	Allow --cmake_generator to work on macOS (#4278 )	2020-06-24 16:30:33 -07:00
S. Manohar Karlapalem	15c07c75f8	[OpenVINO-EP] Upgrade version info to 2020.3 in docs (#4304 ) * Upgrade version to 2020.3 in docs * update online installer size for 2020.3 * update OV 2020.3 install dir path	2020-06-24 15:01:55 -07:00
Tim Harris	a241eb0bbe	Renaming --partition_optimizer to --deepspeed_zero_stage (#4312 ) * Rename partition_optimizer -> deepspeed_zero * Use ZeROConfig in orttraining_pybind_state.cc * deepspeed_zero -> deepspeed_zero_stage for clarity * Expose as deepspeed_zero_stage in pybind	2020-06-24 22:05:03 +01:00
Cecilia Liu	7e71ff2a1f	Match Reshape Subgraph Pattern For GPT2 (#4279 ) Reshape fusion for one element subgraph patterns.	2020-06-24 10:07:30 -07:00
Tim Harris	5c6a27408a	Remove signed/unsigned compiler warnings, add additional pipeline test case (#4314 ) * Avoid signed/unsigned warning on loops * Report sizes when distributed world configuration is inconsistent * Add DistributedRunContextTest for pipeline stage configuration	2020-06-24 11:36:18 +01:00

1 2 3 4 5 ...

2799 commits