onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Weixing Zhang	5b7dc5aeee	fix build failure for ROCm EP (#5816 ) The kernel declaration of Identity needs to be updated in ROCm EP since ROCm EP shares the implementation of Identity with CUDA EP in which it has been changed due to opset 13 support.	2020-11-15 10:36:15 -08:00
Jesse Benson	ced5b66306	Re-enable multi-tensor-apply for LAMB optimizer	2020-11-15 09:35:00 -08:00
Weixing Zhang	fc614ad050	revert the code change which was based on `b4869926` The change `b4869926` which was to remove per-thread allocator would cause seg fault for distributed training. In addition, add dockerfile for ROCm3.9	2020-11-15 00:24:32 -08:00
RandySheriffH	c23fbba463	Fix reduce pipeline by replacing model (#5813 ) * update model and better comment * fix parameter Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2020-11-14 20:17:23 -08:00
Scott McKay	3269e59b2c	Add opset 13 registration for Identity. (#5800 ) * Add opset 13 registration for Identity.	2020-11-14 21:40:24 +10:00
Ori Levari	157d1844fb	Named Dimension Override internals test and experimental API (#5805 )	2020-11-13 21:21:11 -08:00
Ye Wang	262e9ef21d	Support input dimension swap in Attention op (#5774 ) * checkin cpu * checkin cpu * add test * cuda * update comments * review comments * update * modify var name * remove unnecessary error msg * fix comments Co-authored-by: wangye <wangye@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-13 18:29:08 -08:00
sfatimar	dfbf6d78be	OpenVino: fix allocation failure on Window for RelWithDebInfo build (#5713 ) * ng_supported_ops * Remove ng_supported_ops * Revert "Remove ng_supported_ops" This reverts commit 3c27385b2d88c6e8cf7ac4e8c290a367ad5d0bd8. * Revert "ng_supported_ops" This reverts commit 650721ae2913b79739521d58838298e031abdac1. * cmake changes to ensure that the debug build on windows link to debug builds of openvino and do not result in bad allocation error Co-authored-by: sfatimar <sahar.fatima@intel/com>	2020-11-13 07:59:52 -08:00
Vincent Wang	0c8902cbbe	Update Gradient Builder of Some Ops for OpSet13 (#5748 ) * gradient builder for opset13 * code clean. * resolve comments * stop grad for axes input * add split to stop grad list. Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-13 16:20:34 +08:00
Yufeng Li	1f722863b2	Scale Bias post processor for ARM (#5795 )	2020-11-12 21:12:23 -08:00
jeyblu	435b904f0e	add dnnl gpu engine (#5788 )	2020-11-12 20:17:54 -08:00
Ryan Lai	0ea998134a	Skip new x86 tests in ort model tests (#5789 )	2020-11-12 18:08:11 -08:00
Dmitri Smirnov	2f35e65135	Add Float16 and BFloat16 support to C# API (#5775 ) Add Float16 and BFloat16 support.	2020-11-12 17:57:08 -08:00
edgchen1	4d517c68a3	Fix reference to old download_e2e_test_data.py script. It was renamed to download_azure_blob.py. (#5790 )	2020-11-12 15:48:06 -08:00
Alberto Magni	88c3704257	Add shape inference for additional ops This commit adds shape inference support for the following ops: SoftmaxCrossEntropy SoftmaxCrossEntropyLossGrad SoftmaxCrossEntropyGrad LayerNormalizationGrad Motivation and Context	2020-11-12 20:18:54 +00:00
Ryan Lai	4e29f48010	skip gpt2 test on x86 (#5787 )	2020-11-12 11:49:47 -08:00
pengwa	49288de17c	Fix memory planning issues (#5752 ) * Fix memory planning issues * fix build * fix the wrong line...	2020-11-13 03:07:59 +08:00
alexzakv	44d3c31200	Winml_principles_change (#5727 ) * Contributing page change * Update WinML_principles.md * Update WinML_principles.md * Update WinML_principles.md * Updated * Update WinML_principles.md * Update WinML_principles.md * Update WinML_principles.md * Update WinML_principles.md	2020-11-12 10:39:24 -08:00
Guoyu Wang	dc0f7b8f82	Remove onnxruntime_session_options_config_keys.h from c_api (#5772 ) * Remove seesion config keys header from c_api * remove copy session config header in release package * Keep the session option config header in the package	2020-11-12 09:12:13 -08:00
stevenlix	54de618c2e	Improve TensorRT engine caching (#5737 ) * add profile caching to improve engine caching feature * Add comments * fix typo * add decryption for engine caching * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * update onnx-tensorrt submodule * set opt profile to max value of the range * add hash to engine/profile name * Add calibration based INT8 quantization * add an option to enable both FP16 and INT8 * Update tensorrt_execution_provider.cc * add env variable to specify calibration file name * clean up code * Add comments and update TRT document * enable tensorrt basic test and add EngineCachingTest * clean up * update envrionment variable in the test * clean up	2020-11-12 08:56:45 -08:00
Vincent Wang	2a87108431	SoftmaxCrossEntropyLoss OpSet13. (#5777 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-11-12 15:50:34 +08:00
Hariharan Seshadri	b92fc66ea1	Support opset-13 specs of controlflow ops (Loop, If) (#5665 )	2020-11-11 23:44:14 -08:00
Sherlock	07dc25e939	Compute global gradient norm according to 'enable_grad_norm_clip' (#5728 ) * Introduce PassThrough op to wait for all gradient ready before weight update * Compute gradient norm for fp32 runs * Update FE UT expected value * Respect enable_grad_norm_clip	2020-11-11 21:10:34 -08:00
Pranav Sharma	1ae58c960c	Allow turning off printing of shape when compiled with onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS. (#5768 ) * Allow turning off printing of shape when compiled with onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS.	2020-11-11 18:59:04 -08:00
ashbhandare	5aec34500d	Add megatron transforms for BART (#5521 ) * Large model export and run ORT Python support * Megatron change refine a bit workaround self attention issue use partitioned name for weights when megatron model parallel is enabled Fix Megatron Transformer Issue (cuased by the renaming) Add UTs for T5 model parallel Fix megatron seed issue fix log a bit checkkpointing changes + rebase Unintended reshape transform change t5 layer norm changes add t5 layer norm kernel use template for t5 layer norm template definition changes no build error add CPU cuda kernel first unit test other forward unit tests add T5LayerNormGrad Add c++ transform and test for T5 LN minor fix BART MLP Megatron tranform Add concat slice transform + test Cosmetic improvements in concat slice transform Constant folding bug fix + megatron attention transform for BART Undo unnecessary changes * Cleanup * Remove unnecessary changes * Cleanup megatron * Windows build * Add self attention test graph * Correcting transforms + cleanup * review comments * review comments * fix build and test failures * Fix CI * fix windows CI Co-authored-by: Peng Wang <pengwa@microsoft.com> Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-11 16:21:36 -08:00
Hariharan Seshadri	a14cd6267b	Support opset-13 specs of softmax family ops (Softmax, LogSoftmax, Hardmax) (#5707 )	2020-11-11 15:45:03 -08:00
Xavier Dupré	e5c8040c52	Improves performance of operator Transpose (#5550 ) * Improves implementation of transpose operator * Simplifies transposition when it is not really needed.	2020-11-12 00:25:25 +01:00
Maajid khan	a84a058f9e	[OpenVINO-EP] Enabling Multi Device support (#5740 ) * Enabling Multi Device support for UEP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor fix added *Added a simple fix to determine OpenVINO version for Arm build as well Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2020-11-11 15:16:30 -08:00
Guoyu Wang	4207e99be3	[NNAPI EP] Move GetCapability independent of ModelBuilder (#5767 ) * Move GetCapability independent of ModelBuilder * minor code style fix * Move ort_enforce for same number of op_builders and op_support_checkers * minor code fix	2020-11-11 13:33:38 -08:00
Xueyun Zhu	d8ace07ad7	Add CPU send/recv for pipeline (#5315 ) * cpu send/recv * clean up send/recv * remove unused code * assert and nccl option for mnist * add build option to enable build with only cpu. Without this, nccl is always enabled which will break build on machine that only contains cpu * Add USE_MPI distinct from USE_NCCL/USE_HOROVOD * fix * fix * exclude cpu send/recv for machines without mpi Co-authored-by: Tim Harris <tiharr@microsoft.com>	2020-11-11 12:41:39 -08:00
Ashwini Khade	496fa18c96	fix graph partitioning for nested functions (#5755 ) * fix graph partitioning for nested functions * enable broken test for SCE	2020-11-11 11:38:27 -08:00
Derek Murray	bc1768c7f1	Stop gradient flowing to the `k` input of TopK (#5762 )	2020-11-11 10:24:44 -08:00
Dmitri Smirnov	871af477d7	Fix outputs of Sequences and Maps exposure. (#5743 ) Fix outputs of Sequences and Maps exposure. Add more test conditions. Make sure RunWithBingind calls the right function.	2020-11-11 10:21:22 -08:00
liqunfu	1416d12f0b	Liqun/merge e2e pipelines (#5702 ) * Create an Azure Pipeline to merge cpp and python e2e pipelines into one. Still keep cpp 2e2 pipeline until this new pipeline is stable. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-11 09:42:08 -08:00
Yufeng Li	2ba637c558	Implement Scale function for quant gemm (#5632 ) * Implement a Scale function for quantization Quantized GEMM is always followed by Scaling (PerTensor Or PerColumn), and often need to be accumulated to an existing matrix. This PR implements a post-processor for quantized GEMM result and accumulate it to another matrix.	2020-11-10 23:34:38 -08:00
George Wu	cca8cd849a	update native build instructions for ACL on Jetson. (#5764 )	2020-11-10 23:10:59 -08:00
Changming Sun	79350a642a	Update install_deps.sh: remove the unnecessary data generating step (#5758 ) We install onnx python package from this script, so python tests can run the tests for the latest commit which we are importing.	2020-11-10 22:19:03 -08:00
Guoyu Wang	0767c4fdfb	Fix x86 build break (#5759 )	2020-11-10 20:33:27 -08:00
Guoyu Wang	042365029f	[NNAPI] Split OPBuilder IsOpSupported into a separated class (#5746 ) * init change * Split opbuilder into opbuilder and opsupportchecker * Update code comments * Address CR comments, some minor code updates	2020-11-10 15:00:38 -08:00
Scott McKay	6803e4ab44	Fix BatchNormalization registrations. (#5750 ) Add diatribe on how to correctly update registrations.	2020-11-11 07:32:26 +10:00
Alberto Magni	c75b7c5c47	[CMake] Enable NCCL only when enabling CUDA or ROCm support (#5516 ) Conditionally enable NCCL depending on CUDA and ROCM Before this change NCCL support was enabled unconditionally, even when building without CUDA or ROCM support. This caused the command: $ ./build.sh --enable_training To trigger the following cmake warning -- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY) CMake Warning at CMakeLists.txt:1282 (message): NCCL is not found. Please use --nccl_home to specify the path of NCCL. Otherwise, NCCL is disabled. This is a spurious warning because the user did not ask to search for NCCL.	2020-11-10 12:39:23 -08:00
Tim Harris	48b14b52b8	Remove Env::Task wrapper around std::function (#5753 ) This is a small perf / clean-up change. It removes the Env::Task abstraction which wraps a single std::function field, and adds at least one virtual method call overhead when creating a Task and when executing it. The POSIX and Windows implementations are now identical.	2020-11-10 20:22:07 +00:00
leqiao-1	2b1ebbc286	update MCR images table (#5509 ) Add tag 1.5.2 for images. Remove tensorRT image from table.	2020-11-10 11:47:59 -08:00
edgchen1	4c6118eb49	Update get_applicable_matrix_reduction() to combine dimensions of 1 with the given reduction axes. (#5734 )	2020-11-10 10:32:50 -08:00
Hariharan Seshadri	63b85fc696	Fix VS 2017 build break (#5745 )	2020-11-10 10:25:43 -08:00
Xavier Dupré	d59f057db3	enable string for operator Shape (#5742 )	2020-11-10 18:38:36 +01:00
Xavier Dupré	8c74df2068	Add support for string with operator Expand (#5751 )	2020-11-10 18:38:20 +01:00
Changming Sun	4094a09a56	Merge pull request #5731 from microsoft/snnn/rtti Disable RTTI in Windows GPU CI pipeline	2020-11-10 09:02:59 -08:00
Changming Sun	00b18d9dc5	Update InferenceTest.cs to exclude one more model in x86 mode	2020-11-10 09:02:43 -08:00
Tim Harris	5e44d25c5a	Support multi-loop parallel sections, use multi-loop sections in GRU (#5602 ) This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops. The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU. The main changes are: - Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool. - In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section. - Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops. - Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available. - Use of parallel sections in the GRU operator. - Additional test cases in threadpool_test.h. - Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.	2020-11-10 12:24:57 +00:00

1 2 3 4 5 ...

3750 commits