onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Weixing Zhang	ca9b3f18e9	Explicitly pass cuda stream to thrust function rather than use cuda default stream implicitly (#7414 ) * Pass cuda stream to thrust function to not use default stream. In the commit `299ace0`, ORT has been changed to not use cuda default stream. * update amd_hipify.py * remove un-necessary stream sync Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-25 01:18:56 -07:00
jeyblu	b9cbbc41ff	dnnl matmul tensor dimension check (#7383 )	2021-04-23 23:17:22 -07:00
RandySheriffH	afe912d47c	Reduce perf gap between thread pool and omp (#7333 ) * add async dispatch * minor renamings * build py38 * restore yml * fix sync up issue between dispatch thread and main * fix comments * refactor SummonWorker and rename to RunInParallelInternal	2021-04-23 18:36:36 -07:00
Thiago Crepaldi	410a81b21b	Add support for ORTModule to execute the graph when ONNX drops unused… (#7424 )	2021-04-23 18:10:57 -07:00
Chen Fu	f4f2cc1a00	Add batch interface to floating point GEMM (#7323 ) Currently in high dimension matmul, we call multiple GEMM sequentially. In this change we execute these GEMMs in parallel, removing barriers between two adjacent GEMM operations. Performance tested with Bert and T5 model. Bert model shows no noticeable perf differences, as the heavy lifting is done by the attention operator, which is not changed in this PR. In T5 model, we see no regression on low parallel threads (x4), and performance improvement is more pronounced in high number of threads (8-16). T5 shows 10% speedup with 16 threads. With profiling, we can see the most expensive MatMul operators in T5 achieves around 20% speedup with 16 threads. Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-04-23 17:34:22 -07:00
Suffian Khan	7a3c1787af	Add CI pipeline to publish Python training package targeting Rocm (#7417 ) * first attempt rocm training wheel * modifications needed to python packaging pipeline for Rocm 4.1 * changges to not conflict with cuda missed stage1 changes remove package push add option r to getopt try again without python install try again without python install try again without python install split pipelines and add back push to remote storage try on cuda gpu pool try again try again try running without az subscription set try again on original pipeline change pool passing AMD Rocm whl on AMD-GPU pool split rocm pipeline from cuda pipeline remove comments * try adding Rocm tests as well * try with tests in place * fix trailing ws * add training data * try again as root for tests * use python3 * typo * try to map video, render group into container * try again * try again * try to avoid yum error code * make UID 1001 * try without yum downgrade * define rocm_version=None * remove CUDA related comments for Rocm Dockerfile * Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1) * missed requirements-rocm.txt from last commit * fix whitespace	2021-04-23 17:22:31 -07:00
M. Zeeshan Siddiqui	34ebf7d3dd	Partial graph execution made simple. (#7324 ) * Python changes. * C++ changes. * fixes/hacks. * more hacks. * perf. * changes. * changes. * re-architect partial graph execution and remove iobinding. * changes. * refactor. * prevent copies from python to c++. * perf. * merge conflicts. * misc. * fix merge conflicts and tests. * Ifdef partial executor. * PR feedback. * Delete ORT Task et al. * Clean up. * clean up. * Restore SetOutputMLValue(). * PR feedback. * Re-enable disabled ORTModule tests. * PR feedback. * PR feedback.	2021-04-23 15:09:18 -07:00
Changming Sun	5208231126	Fix some warnings in our CUDA code (#7436 )	2021-04-23 14:56:20 -07:00
Suffian Khan	8889e717eb	add gather elements (#7435 )	2021-04-23 14:05:17 -07:00
Weixing Zhang	ef72764960	Build would fail when nccl is not under standard path (--nccl_home) (#7402 ) * Build would fail when nccl is not under standard path (--nccl_home) * fix build for ROCm EP	2021-04-23 14:04:22 -07:00
Changming Sun	9f683bae78	Revert the TRT change and move the build to a new pool (#7434 )	2021-04-23 14:00:26 -07:00
satyajandhyala	979d63159b	Add level two optimizations for constant propagation transformation. (#7410 ) * Made the python script generating the testcases modular. * Modified RemoveBackToBackCasts function to remove cast even if the parent node has other consumers. * Modified InsertCastNodes to update the graph consistently for other functions to work. * Moved ConcatNames function to the top. * PropagateBackward/SearchUpstream and PropagateFP16CastsFromOutputsToInputs insert FP32 casts if the level >1 in order to propagate FP16 casts backwards. * Added new testcases for level two setting.	2021-04-23 13:25:54 -07:00
Chi Lo	f1c3f3fcc1	TRT EP memory leak fix (#7415 ) * fix memory leak * small refactor * code refactor	2021-04-23 12:04:23 -07:00
Guoyu Wang	043883b52d	[CoreML EP] Add Gemm/MatMul support (#7403 ) * [CoreML EP]Add gemm/matmul support * remove changes in get_execution_providers * Address CR comments * Switch to list initialization * Minor update	2021-04-23 11:54:59 -07:00
Yufeng Li	e7912736b9	Add qdq propagation support (#7404 ) * Add qdq propagation support * add more unit tests	2021-04-23 11:17:44 -07:00
Tang, Cheng	1fa6d8fe1c	support loading external execution provider from python frontend (#7332 ) * initial dynamic load example * support load EP in the provider options * support dynamic load EP in orttrainer * split the provider interface; fix comments in pr * remove experiment code * add test * remove useless file * add test model file;fix linux brewak * fix linux build and missing file * fix python build * fix python build * fix python binding * fix python test * fix runtime path for posix env * exclude the shared library from minimal build * fix comments in pr; * seperate the provider shared lib loading * excluded from minimal / macos / ios build * skip copy the provider shared lib for minimal build and mac os * fix macos build * exclude the test for macos build * exclude from andorid build * exclude from web assembly build * enable the invalid ep test Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-04-23 09:54:09 -07:00
Ashwini Khade	75e054cd33	pick onnx release candidate (#7177 ) * pick onnx release candidate * fix typo * filter batchnorm tests * add implementation for reshape 14 * add identity op kernel for opset 14 * fix typo * update onnx commit * update commit to latest master * add hashes for new kernel registrations and update 1 * TEST commit * update onnx back to right commit * Update onnx to latest in rel-1.9.0 * temp fix * remove nonzeroshapesetter transformer * pick rel branch latest commit * fix build failures * fix build failures * fix build failures * update the commit to latest in release branch * add test filters for not impemented op14 ops in c# tests * plus review comments	2021-04-22 23:57:09 -07:00
Guoyu Wang	d414039189	Add ios coreml ci, and speedup ios ci run (#7420 )	2021-04-22 23:41:58 -07:00
sumitsays	d67c86265b	Enabled fp16-inception-v1 test (#7406 ) Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2021-04-22 23:05:03 -07:00
Yulong Wang	b56dd037d3	increase timeout for nodejs binding test (#7422 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-22 21:40:40 -07:00
raviskolli	4c8513a627	SimplifiedLayerNormalization kernel for ROCM EP (#7409 ) * Add SimplifiedLayerNormalization kernel to ROCM ep.	2021-04-22 21:25:09 -07:00
Changming Sun	6822ae95ec	Reduce the number of TensorRT tests needed to run (#7419 )	2021-04-22 19:14:39 -07:00
Thiago Crepaldi	771a6d235b	Fix IsContiguousTensor check on backend (#7391 )	2021-04-21 17:01:17 -07:00
Changming Sun	afa7b23609	Update docs/ContribOperators.md and the script that generates it. (#7399 )	2021-04-21 16:20:56 -07:00
Brian Popow	1bbe538379	Update references	2021-04-21 13:36:10 -07:00
Brian Popow	aa1ce726aa	Remove unnecessary encoding step	2021-04-21 13:36:10 -07:00
Changming Sun	65b2b87f83	Update CI build docker images (#7386 ) Update CI build docker images: delete ubuntu 16.04 support.	2021-04-21 13:18:34 -07:00
raviskolli	09313d9e1f	Added GreaterOrEqual and LessOrEqual Ops to RocmEP (#7398 ) * Added GreaterOrEqual and LessOrEqual Ops to Rocm EP	2021-04-21 11:44:24 -07:00
Changming Sun	b4cfa88bf7	Update protobuf to the latest version (#7396 )	2021-04-21 10:30:06 -07:00
Changming Sun	243713c464	Upload detailed code coverage result to azure blob storage (#7392 )	2021-04-21 08:24:44 -07:00
Sherlock	16ca7677e6	Relax ConvGrad Test tol (#7393 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-21 08:06:00 -07:00
Changming Sun	b5493d724c	Update rnn_helpers.cc: add #ifdef to DumpMatrixImpl (#7389 )	2021-04-20 22:11:38 -07:00
Hariharan Seshadri	7b11283af0	Add ability to allocate initialized tensor memory from non-arena memory (#7267 )	2021-04-20 20:27:48 -07:00
Thiago Crepaldi	8421124344	Add support to **kwargs in ORTModule forward() method (#7360 )	2021-04-20 16:21:52 -07:00
ashbhandare	76cc118dbe	Gemm transpose fusion (#7306 ) * Gemm transpose fusion * Correct rewrite rule effect * Add to inference transforms to trigger on gradient graph	2021-04-20 09:35:05 -07:00
Xiaoyu Liu	913ea8264b	GPT2 with one step beam search (#7163 ) * beam search refactoring checkin * add factory class and deduplicate code * one step beam search works on gpu Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>	2021-04-20 06:23:52 -07:00
mindest	1a3ddf0714	Add gradient registration and tests for Min/Max (#7217 ) * Add gradient registration and tests for Min/Max * Add helper function for min/max grad test * limit Min/Max Grad to accept at most two inputs; modify test case accordingly * resolve merge error	2021-04-20 18:14:31 +08:00
Sherlock	ce7ff27bac	Fix perf issue in Conv CUDA kernel (#7348 ) * Fix perf issue in Conv CUDA kernel * Read avaiable memory from device * assuming 10% fragmentation Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-19 23:37:05 -07:00
ashbhandare	ac346a1b90	Modify SimplifiedLayerNormFusion to allow fusion in the presence of Casts optionally (#7352 ) * LN transform partial changes * LN transform fix * Make transform optional, remove unnecessary code * Fix windows build * review comment, windows CI fix * review comments	2021-04-19 19:59:23 -07:00
ytaous	7abe1fd392	Identity elimination with graph output (#7312 ) * Identity removal * fix build * fix build * fix build * fix builld * UTs * fix UT * fix UTs * per comments * fix UTs * fix UTs * per comments Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-19 16:36:35 -07:00
Sheil Kumar	265db2ad96	Fix Microsoft.AI.MachineLearning .NET5 publishing and C# Store Release build (#7373 ) * fix .net publishing * make experimental api build with microsoft.ai.machinelearning.idl import Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-04-19 15:36:43 -07:00
satyajandhyala	bb1e417da0	Add logging support to Cast Propagation transformation from python (#7353 ) * Fixes needed to PropagateCast transformation. * Added number of passes to the logs. * Added logging support to OrtModuleGraphBuilder. * Added new testcases. * Added NodeArgToConsumerMap	2021-04-19 12:14:30 -07:00
M. Zeeshan Siddiqui	6dda1e0681	Flag for tensor memory re-use in allocation planner. (#7359 )	2021-04-16 17:53:25 -07:00
Guoyu Wang	96cdc65d57	Fix android CI failure after gradle updated to 7.0 (#7364 ) * Fix android ci failure after gradle updated to 7.0 * minor update	2021-04-16 15:28:28 -07:00
Yulong Wang	009f342caf	[JS] refactor Javascript/Typescript libraries in ONNX Runtime (#7308 ) * working on re-organizing js code for ortweb * remove dup files * move folder * fix common references * fix common es5 * add webpack to common * split interfact/impl * use cjs for node * add npmignore for common * update sourcemap config for common * update node * adjust folder/path in CI and build * update folder * nit: readme * add bundle for dev * correct nodejs paths * enable ORT_API_MANUAL_INIT * set name for umd library * correct name for commonjs export * add priority into registerBackend() * fix npm ci pwd * update eslintrc * revise code * revert package-lock lockfileVersion 2->1 * update prebuild * resolve comments * update document * revise eslint config * update eslint for typescript rules * revert changes by mistake in backend.ts * add env * resolve comments	2021-04-16 01:33:10 -07:00
Sunghoon	ded2b08380	WebAssembly multi-threads support. (#7326 ) * WebAssembly multi-threads support. * PROXY_TO_PTHREAD is not required for wasm library * Remove an unnecessary line commented out	2021-04-15 21:46:11 -07:00
Guoyu Wang	28e229ac4c	Enable build dynamic framework for macOS/iOS (#7343 ) * Enable build dynamic framework for macOS/iOS * Address CR comments	2021-04-15 16:47:53 -07:00
Chen Fu	ef1aaa367a	Adding interface for batched integer gemm (#7249 ) Parallelize MinMax, Quantize and batched quantize GEMM Performance problem identified in T5 decoder model (quantized). DynamicMatMul operator is identified as the culprit. This operator spend time on getting MinMax of a Tensor, quantize a tensor, and perform a batched qgemm. All of these can be parallelized. Currently GEMM is parallelized. However, in batched GEMM, we sequentially call GEMM multiple times. This causes multiple starting and ending of parallel sections, which can be slow sometimes. So we made the following changes: Parallel task partition no longer depends on degree of parallelism, only on shape of the matrices. In a single GEMM, perform 2D partition of the multiplication, along panel lines, to reduce repeated packing. For batched GEMM, all parallel tasks are executed in a single parallel section, reducing the cost of starting threads and waiting for them to finish.	2021-04-15 10:25:31 -07:00
Changming Sun	f1c1c38d44	Delete an unused var in nuget pipelines(#7345 )	2021-04-15 07:29:52 -07:00
Tianlei Wu	aa9ab565f5	FastGelu fusion for Megatron model (#7344 ) * add a fastgelu pattern from Megatron model * update comment * add test	2021-04-15 00:39:33 -07:00

1 2 3 4 5 ...

4683 commits