onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

Author	SHA1	Message	Date
Joseph Groenenboom	a433f22f17	Softmax interface update (#12469 ) * Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP * Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP The kernel already masks off invalid items and this gives a much faster implementation in hipCUB. * Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel Hard code accumulator to fp32 for hipCUB in indicated kernel. * Reset casting to old behavior * Document steps to optimize SoftMax kernel on ROCm EP Usage of the hipCUB valid_items interface on reduction operations has a significant performance impact. Masking all thread data to avoid need to use the valid_items interface to hipCUB.	2022-09-12 13:02:31 -07:00
Tianlei Wu	30ebc9e00a	Useless Cast removal after converting model from float32 to float16 (#12871 )	2022-09-12 11:07:33 -07:00
Yi Zhang	d8636c2be8	Add enable_onnx_tests in windows nuget test step (#12926 )	2022-09-12 10:08:24 -07:00
Tianlei Wu	1e34440c37	Fix ORT crash when loading BeamSearch model (#12872 ) * add subgraph verification in VerifyNodeAndOpMatch * add regression tests * update comments * update test	2022-09-09 12:48:32 -07:00
Scott McKay	022d9e2d0c	Get files for XNNPACK wasm build from BUILD.bazel. (#12892 ) Get files for wasm build from BUILD.bazel.	2022-09-09 12:38:57 -07:00
Jian Chen	e561a7cf29	Adding QuantConfig Class (#12810 ) * Initial commit for testing * Adding DynamicQuantConfig * Adding DynamicQuantConfig * Format file * Adding Default configuration placeholder. * Update onnxruntime/python/tools/quantization/quantize.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> * Reformat file * Reformat Rest Docstring style to google * Updatge set to frozeset * Uopdate Quant Config * Updates Quant Config * Update enum comparison * Update onnxruntime/python/tools/quantization/quantize.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> * Update Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2022-09-09 14:08:47 -04:00
Dwayne Robinson	8e4eb24648	Update operator kernel table to include DML operators (#12887 ) * Fix bug in pybind get_all_operator_schema due to premature reference dropping * Add updated operator kernels markdown table * Update build.py to include documentation generation for DML operators too * Update GPU pipeline to include DML in the build to so operators can be generated. * Use a separate pipeline stage, feedback from Changming and Scott * Appease annoying Python linter * Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage	2022-09-09 10:21:25 -07:00
Hariharan Seshadri	0b235b2763	Disable QOrderedMatMul with bias tests on Windows (#12901 )	2022-09-08 17:57:37 -07:00
pengwa	b5327595f3	Fix [prefast:Warning]: C26814 (#12897 ) fix C26814	2022-09-09 08:26:48 +08:00
Adam Pocock	5d55b0730e	[Java] JNI refactor for OrtJniUtil (#12516 ) Refactoring more JNI methods in OrtJniUtil. Make the strings const. Removing unnecessary use of OrtAllocator.	2022-09-08 17:04:42 -07:00
Scott McKay	60e4d012e0	Fix unused variable warning from reduced ops build (#12889 )	2022-09-09 08:08:56 +10:00
Wei-Sheng Chin	28f2e57de5	Use CUDA callback to release deferred-release buffers (#12883 ) * Use CUDA callback to release deferred-release buffers Polishment * Minor improvements. 1. Reorder a if-else so that frequent cases are checked first. 2. More documents. * Fix tests. Previously, in CUDAExecutionProvider::OnRunStart, we call GetPerThreadContext in auto& current_deferred_release_event = GetPerThreadContext().GetCurrentDeferredReleaseEvent(); so that a CUDAExecutionProvider always owns an active PerThreadContext and the ReleasePerThreadContext in CUDAExecutionProvider::OnRunEnd is always valid. However, this isn't true after we drop event- based deferred-release code, so we need to check if CUDAExecutionProvider really owns PerThreadContext than call ReleasePerThreadContext if yes. * Follow up for AMD GPU and improve CUDA part's return value.	2022-09-08 14:23:48 -07:00
Thiago Crepaldi	55c745eefd	Add support for ORTModule Torch cpp CUDA extension build within docker (#12868 ) Currently, CUDA hardware is not available to be leveraged by build during `docker build`. because of that, CUDA capable hardware would not have CUDA support This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA extensions to be compiled even when CUDA support is not detected.	2022-09-08 15:30:44 -04:00
pallavides	6ebb7b91eb	Re-apply fix for mkl issue for eager mode (#12881 ) * reapply fix for mkl issue for eager mode * add comment, update link libs	2022-09-08 12:29:24 -07:00
Changming Sun	ff52d6a6bf	Delete Dockerfile.ubuntu (#12888 ) The file was solely for Nuphar.	2022-09-08 10:26:40 -07:00
Changming Sun	a811c7629f	Remove "Build Python Documentation" from py-packaging-stage.yml (#12890 ) Remove "Build Python Documentation" from py-packaging-stage.yml because the task has been moved to Github actions by @natke in PR #10116 .	2022-09-08 09:56:54 -07:00
sophies927	b1984278d9	Enable blank issues (#12885 )	2022-09-07 23:28:17 -07:00
guyang3532	4765e5c382	Using ORTModule to wrap a evaluation model should not change the mode (#12747 ) Using ORTModule to wrap a evaluation model should not change the mode of model	2022-09-08 10:54:59 +08:00
RandySheriffH	d3b684cd9e	Drop nuphar (#11555 ) * drop nuphar code and configs * refactor test case * format python * remove nuphar from training test * remove commented nuphar logics * restore llvm setting * drop nuphar ci * fix compile err * fix compile err Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-07 15:11:18 -07:00
Jian Chen	acc8bdc6c5	Splitting quantize_tensor and quantize_input (#12873 ) * Splitting quantize_tensor and quantize_input * Reformat code * Reformat code * Update is_input_a_weight to is_input_a_initializer	2022-09-07 18:05:42 -04:00
Sheil Kumar	535b0835f2	User/sheilk/dft fixes (#12862 ) * DirectML DFT Tests and Fixes * Dynamicaly allocate temporaries using the allocator... * Allocate during compute * wrong dims * CR feedback	2022-09-07 13:21:56 -07:00
sophies927	f63bd0765d	New GitHub templates (#12777 ) * Create 01-build.yml * Create 02-documentation.yml * Create 03-mobile.yml * Create 04-web.yml * Create 05-performance.yml * Create 06-training.yml * Create 07-feature_request.yml * Create 08-general.yml * Create config.yml * Delete bug-performance-issue.md * Delete feature_request.md * Create labeler.yml * Create labeler.yml * Update Performance template to make model info optional. * Update feature request description placeholder	2022-09-07 11:59:56 -07:00
Hariharan Seshadri	ad69aac491	Introduce ordered quantization ops for the CUDA EP [1/n] (#12582 ) Initial core small set for the ordered quantization ops for cuda EP.	2022-09-07 11:58:15 -07:00
petermcaughan	69f7cc6494	Add pybind support for all memory config options in OrtArenaCfg (#12658 ) * Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind * Add overloaded constructor for KVP, UT still in progress * Fix class member access in pybind, fix unit test * Resolve linter warnings * Improve formatting * Simplify UT * Fix linter formatting Co-authored-by: Peter Mcaughan <petermca@microsoft.com>	2022-09-07 11:15:00 -07:00
Chen Fu	8004db4bf1	fix python import sequence warning (#12864 ) fix python import sequence warning	2022-09-07 09:53:39 -07:00
Xavier Dupré	400195a10a	raise an exception when TreeEnsemble request a feature out of boundaries (#12859 ) * Catch a potential error when the number of featues is low than the features referenced in TreeEnsemble * add unit test * remove extra spaces	2022-09-07 10:05:32 +02:00
Guenther Schmuelling	f856be162e	fix xnnpack wasm build (#12845 )	2022-09-06 19:20:07 -07:00
Jan Tilly	437409c343	Add DONT_VECTORIZE flag to cmake (#12169 ) Add DONT_VECTORIZE flag.	2022-09-07 12:14:14 +10:00
Scott McKay	706e03c63d	Add azp run helper (#12832 ) * Add helper to add azp run comments to a PR.	2022-09-07 11:48:31 +10:00
Yi Zhang	c571b99336	Refactor setup_test_data (#12818 ) * refactory setup_test_data * mv setup test data to test stage * model link for C# test * add comment	2022-09-07 08:33:27 +08:00
Yulong Wang	726251609a	increase max memory to 4G for wasm (#12798 )	2022-09-06 17:07:13 -07:00
Tianlei Wu	d19955fd89	fix transformers script issues (#12802 ) Fix a few obvious issues: (1) bert_perf_test.py create session without provider in line 65. (2) compare_bert_results.py miss a parameter in create_session in line 37 (3) onnx_exporter.py returns value mismatch in lines 667, 690. (4) remove some imports not used in the scripts. (5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"... (6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+	2022-09-06 16:15:16 -07:00
Xavier Dupré	54360c88d2	Disable two warnings raised by tensorboard on Visual Studio (#12773 )	2022-09-06 20:42:52 +02:00
Chen Fu	9ad5b95e4f	Fix math domain error with log10 (#12841 ) fix math domain error with log10	2022-09-06 08:54:41 -07:00
Cheng	8cedafe250	[xnnpack] Have `Initializer` in Mobile related EPs in Minimal_build and creating EP specific dynamic-schema (#12555 ) * Remove the dependence of Qlinearsoftmax schema * refactor initializerview && create shared schema * Dynamic Create EP specific schema * Have Initializer in minimal_build * address comments * remove CancelFuseSubGraph	2022-09-06 14:32:15 +08:00
Scott McKay	ac4f1bf960	Update max opset for NNAPI and CoreML. (#12831 ) Update max opset for NNAPI and CoreML. Changes in opsets 16 and 17 don't require any updates.	2022-09-05 09:37:14 +10:00
Baiju Meswani	9e47eb68e0	Remove unused orttraining amd dockerfiles and scripts (#12707 )	2022-09-02 18:43:21 -07:00
Cheng	76d17b0f48	Add java API for xnnpack (#12788 ) * Add java API for xnnpack * provider option support * a more general interface for creating EP	2022-09-03 08:29:40 +08:00
Baiju Meswani	295bd26980	Remove orttraining-distributed CI pipeline (#12738 )	2022-09-02 14:34:26 -07:00
ashbhandare	27dde0b51f	Csharp bindings for on-device training APIs (#12404 )	2022-09-02 13:13:48 -07:00
Jian Chen	2fe919c3ad	Adding Split Fusion (#12732 ) * Adding Split Fusion * Make changes to comments * Format files and change typo * Format files and change typo * Format files and change typo * Format files and change typo * Format file * Format files * Format files * Format files * Format files	2022-09-02 14:17:10 -04:00
Baiju Meswani	56bae3b196	Use InplaceClipGradNorm for offline processing for on-device training (#12603 )	2022-09-02 07:47:17 -07:00
Cassie Breviu	98b2b7f5bb	Update csharp documentation (#12830 )	2022-09-01 22:14:03 -07:00
sophies927	548938fb97	Update stale.yml (#12813 ) * Update stale.yml Change the number of days of inactivity before an issue becomes stale from 60 to 5 and the number of days of inactivity before a stale issue is closed from 7 to 5. Update the exempt labels based on the redefined set of GH labels. * Implement stale.yml feedback.	2022-09-01 20:50:46 -07:00
Changming Sun	ca5af24765	Update Sdl.ruleset to remove C26812 from the rules (#12695 )	2022-09-01 20:05:20 -07:00
Hariharan Seshadri	931c8b0147	Resolve GH issue 12706 (#12815 )	2022-09-01 18:30:57 -07:00
Justin Chu	6fe712b587	Create codeql.yml to replace LGTM (#12790 ) Description: Create codeql.yml to replace LGTM Motivation and Context LGTM.com is shutting down and moving to github code scanning. This PR enables github code scanning. cpp and c# support will be added in a separate pr.	2022-09-01 16:37:43 -07:00
ashbhandare	349469c381	Enable way to extract all parameters to and from a contiguous buffer. (#12674 ) * implementation * review comments * review comment * lint error	2022-09-01 15:23:30 -07:00
Hariharan Seshadri	52ce6a90b4	Props file cleanup (#12782 )	2022-09-01 11:05:46 -07:00
George Nash	0125e15281	Fix include order build failure training build (#12425 ) Signed-off-by: George Nash <george.nash@intel.com>	2022-09-01 10:48:40 -07:00

1 2 3 4 5 ...

7337 commits