onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-02 03:55:34 +00:00

Author	SHA1	Message	Date
suffiank	0e12d05cd2	fixes for ort_trainer.py to resume from checkpoint (#3510 ) * fixes for ort_trainer.py to resume from checkpoint * define self.state_dict_ during init * add comment of explanation * add unit test for restore from checkpoint * fix file not found Co-authored-by: suffian khan <sukha@microsoft.com>	2020-04-22 16:33:58 -07:00
Weixing Zhang	e4fc83252d	Refactoring code related to WARP_SIZE. (#3623 ) 1. Centralize its definition in common.cuh. 2. Rename it to GPU_WARP_SIZE which can be extended to AMD GPU later. 3. Centralize warp shuffle functions. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-04-22 15:19:06 -07:00
edgchen1	bb9b0ba5b3	Merge pull request #3607 from microsoft/edgchen1/merge_from_master Merge from master to ort_training	2020-04-22 13:22:32 -07:00
Wei-Sheng Chin	ab70625b29	Add Lamb shape inference (#3634 )	2020-04-22 11:32:28 -07:00
Edward Chen	8df5076d96	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-22 17:16:00 +00:00
Edward Chen	8d09cefafc	Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master	2020-04-22 16:56:15 +00:00
edgchen1	b518cb2a7a	Clean up OPTIONAL name conflict workarounds in ort_training. (#3622 ) * Clean up OPTIONAL name conflict workarounds. * Cleanup unnecessory header files onnx_protobuf.h Co-authored-by: Sherlock Huang	2020-04-22 09:07:55 -07:00
Vincent Wang	d3a2ac5c5c	Eliminate Useless Cast during Transformer. (#3606 ) * Remove Useless Cast during Transformer. * Resolve comments. * Check if graph can remove the node. Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-04-22 16:36:46 +08:00
Tianlei Wu	d69bc31309	Refine BERT optimization script options (#3618 ) * Remove paramters like --gpu_only --sequence_length. Update bert GPU notebook accordingly. * Remove input_int32 and float16 parameters from constructors of BertOnnxModel class and other classes derived from it. * Update gpt2 benchmark. Add comments in gpt2 notebook to indicate work in progress. Clear notebook output before official 1.3.0 release is ready.	2020-04-21 21:28:06 -07:00
Scott McKay	b4508dbdc6	Improve TopK performance. (#3612 ) * Update TopK implementation. - add faster heap - special case k=1 - update selector for when to use heap and when to use nth_element based on performance testing - parallelize if enough work to do - reduce templatized code - add some extra unit tests. Perf tested vs. master. Average speedup is 3.75x using this combination of input sizes: ``` batches = [10, 25, 50] batch_size = [8, 16, 32, 64, 128, 256, 512, 1024, 2048] k = [1, 2, 4, 6, 8, 16, 24, 32, 48, 64, 128] ``` For larger batches (e.g. 50x2048) the speedup is over 20x.	2020-04-22 10:05:13 +10:00
edgchen1	5492d02c4e	Remove Windows CUDA 9 build definition and helper scripts. (#3615 )	2020-04-21 15:22:27 -07:00
Sherlock	d66d5bb86a	Update Optimizer Domain and Opset (#3602 ) * Update Domain and Opset for SGD * Update Adam Domain and Opset * Update Lamb Domain and Opset	2020-04-21 15:06:02 -07:00
Edward Chen	47f1758fdc	Add --skip_onnx_tests to orttraining Windows builds.	2020-04-21 21:50:35 +00:00
Edward Chen	297ab43b0c	Add --enable_onnx_tests to Windows builds to allow set up of test data directory.	2020-04-21 20:34:55 +00:00
Edward Chen	2e4b9b1d0e	Disable CudaKernelTest.SoftmaxCrossEntropyLoss_LargeSizeTensor because it's flaky.	2020-04-21 20:30:45 +00:00
Edward Chen	28a0c863b1	Revert "Convert Gelu to use TryParallelFor (#3599 )" This reverts commit `2579a72a88`.	2020-04-21 18:45:20 +00:00
Edward Chen	d50c3e7a71	Fix GraphTransformationTests tests.	2020-04-21 18:43:49 +00:00
Pranav Sharma	9636da3951	Threadpool related changes. (#3564 ) Threadpool related changes. Don't create ORT threadpool if openmp is enabled (except for inter op threadpool). Created a new static function ThreadPool::NumThreads to account for openmp settings and null threadpool ptr. Log a warning when using SetIntraOpNumThreads when openmp is enabled. Added a document for ORT devs. Fix LSTM to use the new threadpool abstractions. Rename GetNumCpuCores to GetThreadAffinityMasks and move it to the Env class. Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>	2020-04-21 09:57:39 -07:00
Adam Pocock	3dd3f84116	[Java] Adding model metadata support (#3573 ) * java - adding deployment information to build.gradle. * java - adding support for model metadata.	2020-04-21 02:28:15 -07:00
George Wu	1c37d5e6ec	debug option for dumping tensorrt subgraphs. (#3604 )	2020-04-21 11:55:30 +08:00
Edward Chen	87fad09c7b	Fix merge issue.	2020-04-21 03:44:44 +00:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00
edgchen1	ead00f97f3	Sync onnx_backend_test_series.py disabled tests (#3603 ) Make the set of disabled tests consistent between ort_training and master. Fix some regex patterns.	2020-04-20 18:00:53 -07:00
pengwa	e233e6ba45	Refactor - ScatterElements (#3559 ) Refactor ScatterElements using MLTypeCallDispatcherRet to refactor	2020-04-21 08:58:42 +08:00
Changming Sun	2579a72a88	Convert Gelu to use TryParallelFor (#3599 )	2020-04-20 17:32:39 -07:00
Changming Sun	911d125323	Remove openmp from gpu build	2020-04-20 17:13:54 -07:00
liqunfu	781e1c36be	Add front-end MNIST test (#3231 ) * add frontend minst test * to use torch nightly with torchvision * remove incorrect comment per reviewer's comment * experiment torchvision import failure * experiment install_deps.sh * more experiment install_deps.sh * experiment install_deps.sh with --upgrade * Experiment with install_deps.sh. * Experiment with install_ubuntu.sh. * Use Ubuntu 18.04 and Python 3.6 for CI. * Update cmake version for CI. * Install MPI on Ubuntu 18.04 for CI. * Increase tolerance for MNIST test. * Go back to Ubuntu 16.04 for CI, fix installing from deadsnakes ppa. * Clean-up. * Update ort_trainer.py from ort_training. * Get default Ubuntu Python ver back to 3.5. * Add underscore to opset_version parameter name in ORTTrainer constructor. * Move loss/model wrap before the call for sample output. * Update expected values for MNIST test. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>	2020-04-20 11:19:31 -07:00
edgchen1	f180b71f27	Support ONNX test version parsing from path on Windows in onnx_test_runner. (#3588 )	2020-04-20 10:02:51 -07:00
Sheil Kumar	31b6629e99	Fork WinML IDL Guids (#3591 ) Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-04-20 09:17:07 -07:00
Prabhat	381fee47ab	Added support to build onnxruntime with ACL (#3586 ) * Added support to build onnxruntime with ACL * Added ACL build instructions	2020-04-20 13:35:28 +05:30
Changming Sun	75426a3091	Fix build break	2020-04-19 18:32:46 -07:00
Zhang Lei	422266c445	Support conv transpos 1D in cuda provider. (#3300 ) * Support conv transpos 1D in cuda provider. * Clear some old comment. Enable conv_transpose_1d onnx test for cuda.	2020-04-19 22:07:34 +08:00
Scott McKay	7d5348f87e	Add ability to batch device copy for graph inputs and outputs. (#3580 ) * Add ability to batch device copy for graph inputs and outputs.	2020-04-19 17:51:07 +10:00
Prabhat	ea62b3435a	Clean up build.py code (#3466 )	2020-04-18 20:48:30 -07:00
Maxim Kalinin	fcf0f6ee9f	Generalize reshape fusion (#3554 ) * Generalize reshape fusion * Allow arbitrary number of Concat arguments * Apply fusion even when an output of an internal node is used elsewhere * Fix a bug when an internal node's output is the subgraph output * Simplify code	2020-04-18 20:47:23 -07:00
Tiago Koji Castro Shibata	14e387aa1a	Fix WinML namespace build break (#3583 ) * Add missing winrt namespace * Conditional compilation of dxcore code * Fix TAEF macros	2020-04-18 20:46:01 -07:00
Sherlock	56b223bc60	Implement OneHot CUDA Kernels (#3390 ) * Implement OneHot CUDA Kernels * Support fp16 * Use HandleNegativeAxis * Make MLFloat16 test GPU only	2020-04-18 17:41:39 -07:00
Hariharan Seshadri	1599562016	Fix BatchNorm CUDA kernel definition	2020-04-18 17:21:29 -07:00
Zhang Lei	c365822808	Refactor some for the calibate.py. Add QLinearAdd and QLinearMul support. Fix bugs loading jpgs not strict RGB, and typoes in load_batch call. (#3542 )	2020-04-18 17:10:55 -07:00
Dmitri Smirnov	db9566f70d	Implement Inverse(12) for CPU and CUDA (#3485 )	2020-04-18 17:10:21 -07:00
Dmitri Smirnov	38a18023c7	Fix some too popular warnings. (#3578 ) Some pointless and noisy warnings either fixed or disabled.	2020-04-18 17:05:05 -07:00
Changming Sun	d68245853e	Disable downloading test data on Linux (#3581 )	2020-04-18 15:54:58 -07:00
Sergii Dymchenko	3e884b4b6b	Fix some typos. (#3582 ) * Fix some typos. * Fix a typo.	2020-04-18 14:18:05 -07:00
suryasidd	6fe688c732	Disabled failed maxpool test on GPU (#3549 )	2020-04-18 13:49:42 -07:00
edgchen1	52cfc98ec4	Merge pull request #3557 from microsoft/havenka/master-merge Merge from master	2020-04-18 09:40:32 -07:00
edgchen1	811bd67872	Clean up docs. (#3579 ) * Fix orttraining/README.md formatting. * Delete ORT_TRAINING_BUILDS.md. * Fix typo.	2020-04-17 22:13:11 -07:00
ytaous	ca1bbff5d4	subgraph type override handling and unit test (#3560 ) * unit test for subgraph type override * unit test - re-wire input properly to subgraph * update args Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-04-17 19:33:34 -07:00
Tianlei Wu	7f46f347db	Add GPT2 Attention Fusion in optimization script (#3488 ) * Add Attention fusion for GPT2 * Support distilgpt2 in benchmark_gpt2.py * Add options to disable Attention/SkipLayerNormalization/EmbedLayerNormalization/BiasGelu fusions * Add logging at the begining of each fusion * Update notebooks: Add Gpt2OnnxModel.py to list of script files. * Add test for gpt2 model optimization * Add optional parameters (--input_ids --segment_ids --input_mask) for graph inputs * Fuse BiasGelu * Handle model that does not have segment_ids input. * Allow fuse embed layer without mask	2020-04-17 16:23:53 -07:00
Tianlei Wu	5d3b217039	Update Attention operator for GPT2 (#3474 ) Add unidrectional mask for Attention operator. Update mask_index to mask broadcast from B->BxS->BxNxSxS to B->BxSxS->BxNxSxS.	2020-04-17 16:20:40 -07:00
edgchen1	2cb8cb816f	Disable or update flaky tests, improve test random seed accessibility. (#3495 ) - Add output of test random seed - Allow setting of test random seed with environment variable - Disable / relax tolerance for flaky tests	2020-04-17 15:57:32 -07:00

1 2 3 4 5 ...

2237 commits