onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Edward Chen	cd3a5acca0	Update get_docker_image.py to enable use without image cache container registry. (#6177 ) Update get_docker_image.py to enable use without image cache container registry.	2020-12-18 19:01:02 -08:00
Derek Murray	11b0a5401e	Fix typo in BERT pretraining script (#6175 ) A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.	2020-12-18 16:38:14 -08:00
Guoyu Wang	bbb52e9274	[NNAPI EP] Enable per-channel quantization for QlinearConv (#6155 ) * Enable qlinearconv per-channel quantization * Fix the android CI test failure * Add Android Version Check for Per-Channel Quant * Address PR comments * Fix some minor issues * Add verification of per-channel zero points * Make the error tolerance configurable	2020-12-18 16:13:22 -08:00
baijumeswani	39aedbc97f	aggregate model states only for the case when mixed precision was true (#6176 )	2020-12-18 14:09:32 -08:00
Pranav Sharma	86493e6d0c	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
Sergii Dymchenko	824ef9a1de	Don't try to bind unused inputs in the Training frontend (#6166 )	2020-12-17 21:41:28 -08:00
baijumeswani	adc2071043	save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136 ) * save_checkpoint and load_checkpoint implementations * checkpoint aggregation logic * unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints	2020-12-17 21:01:36 -08:00
Guoyu Wang	c339bb2da9	Remove ignored build warnings for pybind on Mac (#6165 )	2020-12-17 19:54:28 -08:00
Yufeng Li	98d8a3e335	Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008 )" (#6169 ) This reverts commit `f2dcba7afe`.	2020-12-17 19:53:50 -08:00
Du Li	34725ae520	Bugfix for topk cuda kernel (#6164 ) * fix the issue that std::numeric_limits cannot handle half type * adding a test Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-17 17:59:37 -08:00
Jay Rodge	dec703b62d	Update TensorRT-ExecutionProvider.md (#6161 )	2020-12-17 17:10:40 -08:00
Tixxx	32c67c2944	Deprecating Horovod and refactored Adasum computations (#5468 ) deprecated horovod submodule refactored adasum logic to be ort-native added tests for native kernel and e2e tests	2020-12-17 16:21:33 -08:00
Pranav Sharma	efa1b0d864	Minor fix to satisfy c++14 (#6162 )	2020-12-17 13:53:24 -08:00
Juliana Franco	36c03b32e9	Using a map of of ops to stages as input of partition function. (#5940 ) * New partition algorithm running before AD * Convert cut_group_info into device map. Work in progress -- works for bert-tiny with pp=2 * Removing code for partition of bwd graphs * Remove old code * Adding some verification code * Handle Shared Initializer * Renaming rank with stage * Added first unit test * new test * redundant check * undo change in bert * Moved cut-based partition to testing utils file Co-authored-by: xzhu1900 Co-authored-by: wschin * New conversion function and tests * minor * remove test that is not needed2 * improve GetDeviceAssignment and PR comments * minor changes * PR comments * improving documentation and variable naming * add documentation * Variable naming and docs * more doc improvements * more doc improvements * missing static cast * Fix test file for windows * Fix test file for windows * Fix test file for windows * stage id is not the same as rank id * PR comments * PR comments * More comments * More comments	2020-12-17 09:03:33 -08:00
Tracy Sharpe	503b61d897	MLAS: add NEON version of int8 depthwise convolution (#6152 )	2020-12-16 18:39:10 -08:00
Edward Chen	0fa04bdc50	Fix clean_docker_image_cache.py detection of image pushes. (#6151 ) Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.	2020-12-16 17:25:22 -08:00
Changming Sun	344a2a8ee5	Revert "work around of the build break in mac (#6069 )" (#6150 ) This reverts commit `3cae28699b`.	2020-12-16 14:41:18 -08:00
Scott McKay	7250562271	Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145 ) #4656	2020-12-17 07:52:31 +10:00
ashbhandare	82690486c1	Partition initial optimizer state for Zero-1 (#6093 ) * Initial changes * Working changes * Working changes * Cleanup * fix windows CI * Review comments * review comments	2020-12-16 15:27:42 -05:00
Derek Murray	8fd085801a	Add gradient registration for Abs. (#6139 )	2020-12-16 08:32:10 -08:00
stevenlix	aa49e476b0	Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115 ) * add static subgraph kernel index * change kernel naming to avoid conflicts	2020-12-16 00:04:53 -08:00
Yateng Hong	0978d2bfbe	Fix CUDA test hang: (#6138 ) - Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.	2020-12-16 16:32:56 +10:00
Guoyu Wang	b648bf641f	nnapi add min max support (#6117 )	2020-12-15 22:31:28 -08:00
George Nash	939cc9b410	Enable running the mnist_training sample without cuda (#6085 ) Signed-off-by: George Nash <george.nash@intel.com>	2020-12-15 17:06:54 -08:00
Ryan Hill	ac62cf8058	Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108 ) * Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers * Change Provider_IExecutionProviderFactory to be the core version.	2020-12-15 16:45:53 -08:00
Cecilia Liu	980a93c164	Model Fusion For Bart (#6105 ) Fusion fix for Bart models	2020-12-15 14:30:15 -08:00
George Wu	297c824807	remove dnnl_dll_path from post build copy (#6142 )	2020-12-15 13:47:39 -08:00
Edward Chen	64709b1335	Deprecate Python global configuration functions [Part 1] (#5923 ) Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.	2020-12-15 11:32:43 -08:00
Jesse Benson	a8d549e181	Minor changes to AMD element-wise kernels to converge with CUDA element-wise kernels.	2020-12-15 08:46:36 -08:00
Pranav Sharma	a9548283d0	Don't mark issues that are marked as enhancement as stale (#6134 )	2020-12-14 18:57:40 -08:00
Edward Chen	9810b9e02b	Reduce amount of compiled CUDA device code (#6118 ) Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight. Make corresponding changes for ROCM execution provider code. Other minor cleanup.	2020-12-14 15:27:40 -08:00
Sheil Kumar	a6a23db130	Enable C# .NET5 for WinML (#6120 ) * build for .net5 * only reference cswinrt for .net5 * remove netstandard2.0 references * upgrade language version * net5 * remove extra comment closure * add targetframework * set target framework * remove net* * pep8 errors * make test project build with .net windows SDK projection * disable c# builds for non-x64 builds * fix pep8 errors * disable for store build * fix tests * remove cswinrt and sdk references from package * bump cswinrt down to 1.0.1 * fix bin path Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-12-14 15:05:15 -08:00
Sherlock	eb5c1f0fcc	Unify activation and initializer alignment value (#6109 ) * Unify activation and initializer alignment value * Fix VerifyInputTensorsAllocatedContiguously	2020-12-14 13:13:41 -08:00
liqunfu	cde723a136	Liqun/move nightly pl to linux multi gpu v100 (#6024 ) * move e2e nightly pipeline to azure devop Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-14 12:43:41 -08:00
baijumeswani	dd2e5a1a05	state_dict and load_state_dict for ORTTrainer (#6095 ) * add functions state_dict and load_state_dict to ORTTrainer * unit tests for state_dict and load_state_dict for ORTTrainer	2020-12-14 11:55:52 -08:00
dependabot[bot]	d4dddd99d9	Bump ini from 1.3.5 to 1.3.8 in /nodejs Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8. - [Release notes](https://github.com/isaacs/ini/releases) - [Commits](https://github.com/isaacs/ini/compare/v1.3.5...v1.3.8) Signed-off-by: dependabot[bot] <support@github.com>	2020-12-12 13:06:43 -08:00
Hariharan Seshadri	c755ca0b71	Honor auto_pad attribute in ConvTranspose (#4271 )	2020-12-11 22:30:17 -08:00
Suffian Khan	6cb5d3ac09	Fix multi-tensor LAMB reduction to be deterministic (#6028 ) * define ordering of reduction across blocks * save state * remove debug code * remove debug code * review comments * significant correction for reduction only over blocks on same tensor * addressing ocmments * update rocm/lamb.cc to build as well * remove times 2048size in multitensor test until threshold error in rocm resolved convert tuple => struct as per recomendation * update comment * apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer * remove excess template arguments from rocm lamb.cc launch_multitensor as well * fixes for AMD build * pr comments * run formatter from vscode * formatter on cuda files	2020-12-11 13:13:05 -08:00
Edward Chen	c8ac34d6a5	Fix DEBUG_NODE_INPUTS_OUTPUTS test by putting it in a separate process, clean up unused test_main.cc files. (#5949 ) Move the DEBUG_NODE_INPUTS_OUTPUTS test into its own process. The implementation uses static variables which do not interact well with other tests. Clean up old test_main.cc files which are no longer used.	2020-12-11 11:36:58 -08:00
Sherlock	a53f4dd379	Introduce VariadicAlias, remove hardcoded alias limits (#6106 ) * Introduce VariadicAlias, remove hardcoded alias limits * Include optional-lite in winml build Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-11 10:47:08 -08:00
Jesse Benson	38c49c2483	Make ROCM and CUDA reduction_all code more similar.	2020-12-11 09:35:07 -08:00
Ryan Lai	1eb146f561	Implement conversion from ORT String to WinML Tensor String (#6097 ) * Implement conversion from ort string to winml string * NIT:comment	2020-12-10 17:47:50 -08:00
Ryan Lai	8bcb5fd119	Add skip test reason for onnx model zoo models and tier 2 models (#6081 )	2020-12-10 14:41:17 -08:00
Ryan Lai	753af576c4	If building inbox, hook up winrt_activation_handler for WinML Tests (#6074 ) * If building inbox, hook up winrt_activation_handler with what is already defined in in dllload.cpp * Add base.h header * Missed custom ops test	2020-12-10 14:41:01 -08:00
Du Li	e945b5fcf6	adding fp16 support for topk cuda kernel (#6082 ) * adding fp16 support for topk. * disable fp16 tests for cpu ep Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-10 11:04:19 -08:00
Vincent Wang	7ddeafdfcc	Add ReduceL2Grad and ClipGrad (#5970 ) * ReduceL2Grad and ClipGrad. * fix win build and amd ci pipeline * resolve comments. Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-12-10 11:03:26 +08:00
RandySheriffH	404982ded5	Enable varied input type for custom op (#6066 ) * allow custom op taking varied types * refactor test case * add test model * refactor test case * enable copy elision * update test case * fix issue in ToString function	2020-12-09 15:10:42 -08:00
Jesse Benson	cc47cfcb31	Update AMD transpose to match CUDA transpose.	2020-12-09 11:00:18 -08:00
Edward Chen	abdbb5fc84	Reduction kernel optimization (#6088 ) Optimize reduction kernel code by moving loads from global memory before computation. Add CMake option to build CUDA code with --generate-line-info option.	2020-12-09 10:20:23 -08:00
Sergii Dymchenko	9e26e59a37	Deprecate opsets <12 for training. (#6027 )	2020-12-09 00:15:27 -08:00

1 2 3 4 5 ...

3953 commits