onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Hariharan Seshadri	cffa1b7bf2	Fix (#3812 )	2020-05-05 02:08:13 -07:00
Changming Sun	bd78364411	Parallel all the activations ops (#3722 ) 1. Parallel all the activations ops. 2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms). 3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh. 4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty. 5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.	2020-05-05 01:18:17 -07:00
Changming Sun	c11fbf68e4	Publish gpu package to nuget feed (#3816 )	2020-05-04 21:49:19 -07:00
Scott McKay	b386b41703	Fix bug in GRU when linear_before_reset is true and no bias input is provided (#3797 ) * Allocate linear_output_ when linear_before_reset is true and there is no bias input. Add test for this combination.	2020-05-05 13:15:57 +10:00
M. Zeeshan Siddiqui	a24c71af40	Update Dropout(12) forward kernel with training_mode input. (#3805 ) * Update Dropout(12) forward and backward kernel with training_mode input. * Revert deleted assert. * clean up. * PR feedback.	2020-05-04 20:05:42 -07:00
Dmitri Smirnov	111469728f	Make Java build and run tests on Windows the box (#3811 ) Incorporate .DLL symbolic link names fix. Make unit tests run. Make gradle run on Windows.	2020-05-04 18:19:35 -07:00
M. Zeeshan Siddiqui	6f95cdfa68	Use new cost based threadpool abstractions in CPU gradient operators. (#3807 ) * Use ThreadPool abstractions instead of OpenMP. * PR feedback.	2020-05-04 15:23:10 -07:00
Yufeng Li	156368b67f	Quantize attention with Cuda (#3693 ) * Add definition of QAttention * implemention of QAttention on GPU	2020-05-04 14:20:38 -07:00
Tianlei Wu	49f0610447	Add options --disable_layer_norm, --disable_gelu and --enable_gelu_approximation (#3750 )	2020-05-04 14:06:57 -07:00
Sherlock	2f8a2364c3	Fix loss function builder (#3801 )	2020-05-04 10:41:15 -07:00
Hariharan Seshadri	785b45124d	Add CPU kernel for Einsum op (#3575 )	2020-05-03 23:48:38 -07:00
Yulong Wang	c8269e4b89	move backend test filters into data file (#3798 ) * move backend test filters into data file * update data * update data * update document * fix list for current_failing_tests_OPENVINO_CPU_FP32	2020-05-02 19:05:58 -07:00
Changming Sun	2684d47fc5	Disable data downloading in linux-nocontribops-ci-pipeline (#3803 ) * Disable data downloading in linux-nocontribops-ci-pipeline * update * update	2020-05-02 12:59:24 -07:00
Sheil Kumar	37b60251ca	test packaging (#3756 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2020-05-02 12:23:33 -07:00
Changming Sun	ee8900e21a	Update centos-ci-pipeline.yml (#3800 ) * Update centos-ci-pipeline.yml	2020-05-02 11:04:23 -07:00
Jeff Bloomfield	d5b2cd7493	Add performance best practices to DML EP doc (#2859 ) * Add performance best practices to DML EP doc Co-authored-by: Jeff <38966965+jeffbloo@users.noreply.github.com>	2020-05-02 09:53:33 -07:00
Scott McKay	42cf971ca2	Add a couple of utility scripts to tools/python (#3621 ) * Add a helper script to more easily create a test directory for use with onnx_test_runner or onnxruntime_perf_test. Add example script that can be used as a base for performance testing a model with a variety of input sizes. Add __init__.py so files in this directory can be imported in other scripts. * Fix some flake8 warnings. Add example of specifying attribute for op. * Add ability for test dir creation to fill in all missing input data with random values. Add example of using test dir creation this way	2020-05-02 17:35:43 +10:00
edgchen1	440f361363	Remove orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3788 )	2020-05-02 00:35:08 -07:00
Sheil Kumar	43a828f0a2	Add tests for WinRT Projection Raw ABI consumption (#3718 ) Add tests for WinRT Projection Raw ABI consumption Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-05-02 00:33:17 -07:00
Tianlei Wu	3fab8ebfe9	(MaximKalininMS) Fix Reshape Fusion and Crash in Reshape (#3777 ) * Fix a crash in Reshape Reshape doesn't handle 0 input dimension properly, which leads to a division by zero * Fix reshape fusion https://github.com/microsoft/onnxruntime/pull/3554 introduced a bug: initializers can now come before Shape->Gather->Unsqueeze chains; if those initializers have more than 1 element, expected dimensions in the chains are now incorrect. Authored-by: Max Kalinin <makalini@microsoft.com>	2020-05-02 00:20:00 -07:00
Scott McKay	15eca74d15	Make ThreadPool::PartitionWork a bit more user friendly. Update a few places to use PartitionWork. (#3795 )	2020-05-02 17:09:55 +10:00
Pranav Sharma	2b8d9ef0fd	Refactor scatter/gather ops to use the new cost based threadpool abstractions. (#3776 ) * Update Scatter and Gather ops by replacing pragma omp invocations with the new threadpool abstractions. * Use forward declarations * PR comments	2020-05-02 17:09:31 +10:00
M. Zeeshan Siddiqui	4f9f6aedea	CUDA/CPU test for NegativeLogLikelihoodLoss(12) function based loss operator. (#3793 )	2020-05-01 21:36:29 -07:00
Sheil Kumar	b1c4d6ff4e	bump dml version (#3792 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-05-01 18:16:00 -07:00
David Brownell	4b8fad214a	Initial checkin (#3791 )	2020-05-01 14:58:49 -07:00
Scott McKay	11b819054b	Fix tree ensemble threading bug (#3778 ) * Fix first instance where the calculation for TryBatchParallelFor was incorrect. Other usages need to be validated. * Fix some other usages of the threadpool.	2020-05-02 07:50:35 +10:00
Scott McKay	2fc3984e70	Add test that C is unidirectionally broadcast-able before fusing the MatMul with Add. (#3780 ) Addresses #3764	2020-05-02 07:36:21 +10:00
Xueyun Zhu	e8e95110d3	add pipeline to distributed context config (#3789 ) * add pipeline to distributed context * white space	2020-05-01 13:49:51 -07:00
M. Zeeshan Siddiqui	517bff9675	Function expansion support and Update ONNX to 1.7 release candidate 1. (#3782 ) * Function expansion support, Update ONNX to 1.7 release candidate 1. * Renable disabled tests.	2020-05-01 10:35:16 -07:00
edgchen1	047975e404	Address flaky test ReduceApiTest.Sum. (#3716 ) Increase test comparison tolerance. Add output of random seed value for easier debugging later. Unify RandomValueGenerator::Uniform() to consistently use [min, max) interval.	2020-05-01 09:18:26 -07:00
Changming Sun	edd5855fb7	Remove eigen device from thread pool	2020-05-01 02:21:57 -07:00
George Wu	dcb1a21552	fix python package linux gpu failure (#3786 ) * pin base image for manylinux2010_gpu * pin base image for Dockerfile.manylinux2010	2020-05-01 17:04:59 +08:00
stevenlix	99ec93ea42	Apply onnx-tensorrt bug fixes (#3785 ) * merge latest onnx-tensorrt parser * differentiate kernel names between graph and subgraph * merge more TRT parser bug fixes * merge more onnx-tensorrt bug fixes * fix merge issue Co-authored-by: stevenlix <stevenlix>	2020-05-01 16:51:48 +08:00
Pranav Sharma	e42e0d4787	Update documentation + Update mlas threading lib to use the new TrySimpleParallelFor. (#3779 )	2020-05-01 00:23:06 -07:00
pengwa	29234458af	disable cublasHgemm for training (#3769 ) * disable cublasHgemm for training	2020-05-01 13:57:37 +08:00
pengwa	98b97be635	collect the last few iteration latency for throuput calculation (#3766 )	2020-05-01 13:24:17 +08:00
Ori Levari	ad63e2593d	avoid using LocalFree on FormatMessageA buffer (#3772 ) * avoid using localfree for FormatMessageA buffer because it is only supported on windows 10 Co-authored-by: Ori Levari <orlevari@microsoft.com>	2020-04-30 20:51:09 -07:00
Dmitri Smirnov	f68a326bd9	Implement Pow(12) for cpu and cuda (#3727 ) * Implement Pow(12) cpu and cuda.	2020-04-30 15:29:39 -07:00
Pranav Sharma	027a364922	Remove usage of openmp in reverse seq impl. (#3754 )	2020-04-30 14:44:25 -07:00
Changming Sun	62c730a8df	Revert softmax kernel implementation (#3753 ) Revert softmax kernel implementation to the previous version.(commit `4d26f2ce86`) For 1. Reducing the binary size. Currently the single kernel is 500KB. 2. Removing dependency on Eigen::ThreadPoolDevice	2020-04-30 14:38:41 -07:00
RandySheriffH	86eaa71ec6	sync threads before calling next cub function (#3758 ) Co-authored-by: RandySheriffH <rashuai@microsoft.com>	2020-04-30 14:16:46 -07:00
liqunfu	af3988198c	Liqun/e2e transformer test (#3540 ) * initial change to transformer.py * prepare e2e transformer tests * refactor transformer tests * put test python files in a flat folder * fix typo pip install transform(s) * python 3.6 * python version to 3.6 in install_ubuntu.sh * remove argparser * to use opset ver 12 * workaround loss_scale naming patch in case of loss_fn_ * assign self.loss_fn_ so it can be checked * skip a few un-needed post-process steps * fix loss_scale_input_name, clean up post process steps * skip non-frontend tests * move cpu/cuda related files to coresponding cpu/cuda folder (#3668) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * type cast for ratio is not necessary for dropout (#3682) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * thrustallocator is not needed since cub is used directly for gather now. (#3683) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> * GatherND-12 Implementation (#3645) * Renamed, UT passing * Move GatherND CUDA Kerenl into onnxruntime * Merge GatherNDOpTest * Refactor Test code * Merge CPU Kernel Impl * Handle Negative Indice, Fix UT * Improve CUDA kernel to handle negative index * Minor Fixes * Preserve GatherND-1 Cuda kernel * Fix Mac build * fix UT * Fix Build * fix GatherNDOpTest.double > CUDA error cudaErrorInvalidDeviceFunction:invalid device function Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com> * update with reviewers' comments * testBertTrainingGradientAccumulation was not using rtol and may fail occasionally with small (e-06) difference * fix merge mistakes Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com> Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Sherlock <baihan.huang@gmail.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Peng Wang (pengwa) <pengwa@microsoft.com>	2020-04-30 12:26:38 -07:00
pengwa	177c1357f4	Use cublasHgemm "back" for fp16 computation with Volta GPU (#3765 ) * Use cublasHgemm for fp16 computation with Volta GPU	2020-05-01 00:36:07 +08:00
Scott McKay	3421ec1110	Add Threadpool::TrySimpleParallelFor (#3759 ) * Add TrySimpleParallerFor so that there's a path with OpenMP awareness for SimpleParallelFor. Makes it consistent with [Try]BatchParallelFor and [Try]ParallelFor. Update TopK to check for the number of threads better, and to use TrySimpleParallelFor. * Update doco to mention TrySimpleParallelFor	2020-04-30 20:03:33 +10:00
M. Zeeshan Siddiqui	b9a5ed1fe2	Add SoftmaxCrossEntropyLoss to mixed-precision-transformer. (#3760 )	2020-04-30 02:48:21 -07:00
Scott McKay	9f72752397	Fix 'Install ONNX' CI failure (#3761 ) * Disable flaky test temporarily * turn off pip upgrade warning Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com> Co-authored-by: Zeeshan Siddiqui <mzs@microsoft.com>	2020-04-30 18:18:58 +10:00
pengwa	0531acccc5	Refine GatherND CPU/CUDA Kernels & Add UTs (#3688 ) * Refactor GatherND CPU Kernel (Renaming & Simplify) * Add batch_dim=1 or 2, negative slices tests * Rename gather_nd_gard_impl.cu * Use dispatcher to refactor CUDA GatherND/GatherNDGrad * Change GatherNDBase::CommonComputeKernel --> GatherNDBase::PrepareCompute * Use HasCudaEnvironment instead of __CUDA_ARCH__ for some double type tests	2020-04-30 10:17:54 +08:00
ashbhandare	58f53966d3	Add Distributed Checkpointing support (#3639 ) * Change naming of moments to Moment_x_<weight_name> * Add checkpointing code and zero checkpoint aggregation * Correct aggregation for LAMB, cleanup * Add simple checkpointing test * Add test for zero checkpoint aggregation * Fix tests * fix test * Review changes * Fix test after review comment fix * Fix API, test * Fix test after API change * Decouple save load from ORTTrainer * Add flag to not break checkpointing with ORTModel' Co-authored-by: aishwarya bhandare <aibhanda@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-29 14:52:21 -07:00
David Brownell	7296e06dd5	Properly creating arguments to pass to setup.py (#3744 )	2020-04-29 09:47:51 -07:00
suffiank	ea0e2d1dde	fix warning treated as error due to ignoring return status (#3739 ) Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-29 02:38:53 -07:00

1 2 3 4 5 ...

2450 commits