onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-23 02:38:28 +00:00

Author	SHA1	Message	Date
Vincent Wang	5104c7dbd3	Fix Prefast Warnings (#12717 ) fix prefast warnings	2022-08-25 17:09:37 +08:00
Vincent Wang	53ecb9e635	Update Supporting DS Version to 0.7.1 for ORTModule (#12696 ) update ds version support for fp16_optimizer	2022-08-24 14:56:12 +08:00
abhi-ort	73e5741a9a	Enabling softmax grad and logsoftmax grad on ORT (#12614 ) * Enabling softmax grad and logsoftmax grad on ORT * formatting changes * formatting changes * reverting changes * Changing the OpType	2022-08-23 15:49:02 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Wei-Sheng Chin	dc486d146b	Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460 ) * Make ORT as Pytorch JIT backend LORT likely doesn't work with aten fallback so we only test LORT in its own CI. * Revert changes to enable external CUDA allocator. Will add it later. Revert "Revert changes to enable external CUDA allocator. Will add it later." This reverts commit d5487f2e193014c805505afae8fb577c53667658. Fix external allocator * Relax tolerance and remove commented code * Print more information in CI * Fix pointer * Address comments. 1. Reuse ORT-eager mode's environment. 2. Remove unused ctor. * Use Pytorch master branch as all PRs are merged Fix * Refine based on cpplint feedbacks * Revert changes to allow custom CUDA allocator in public APIs * Use torch.testing.assert_close * Use unittest framework * Switch docker repo * Rename .cpp to .cc * Address comments * Add comment * Use same pipeline file for eager and lort pipelines * Address comments * Add yaml comment * Fix cmake files * Address comments * Rename flags, remove printing code, remove dead comment	2022-08-22 09:40:40 -07:00
Vincent Wang	a078c8d99b	Update Supporting Deepspeed Version of ORTModule's FP16_Optimizer (#12668 )	2022-08-22 22:22:53 +08:00
Scott McKay	2102b8f67c	Avoid duplicate symbol error between ONNX and ORT for ostream operator<< with TensorShapeProto (#12651 ) * Remove ostream operator<< definitions for TensorShapeProto and TensorProto as they clash with ONNX definitions in onnx/defs/printer.h/cc. Currently printer.h (unnecessarily) pulls in a number of other ONNX headers which causes naming clashes with parts of ORT. It is also excluded in a minimal build. Instead convert the onnx::TensorShapeProto to onnxruntime::TensorShape so we use the existing ostream operator<< for TensorShape. Make GetTensorShapeFromTensorProto consistent with GetTensorShapeFromTensorShapeProto so both return a TensorShape (as the name implies).	2022-08-22 17:20:52 +10:00
pengwa	7df2e8c5cc	Refactor with std::variant (on device training) (#12383 ) * use std::variant for synthetic data storage. * use std::variant to replace TypedCheckpointProperty * Remvoe shared ptr for checkpoint property * fix tests * refine std::variant usage a bit * remove CheckpointProperty data abstraction * use InlinedVector and InlinedHashMap if possible * fix comments * fix build and test * fix some comments * use gsl::span * fix tests * refine based on comments * fix win build * fix build	2022-08-17 08:31:23 +08:00
Baiju Meswani	f5e3517c39	Add Learning Rate Scheduler C API (#11957 )	2022-08-15 09:10:25 -07:00
Wil Brady	3d009cdde3	Updating binary ops in eager mode to support broadcasting. (#12560 ) * Updating binary ops in eager mode to support broadcasting.	2022-08-11 17:00:12 -04:00
pengwa	24eab921be	Enable PythonOp for --enable_training_torch_interop build (#12539 ) * enable PythonOp by default when --enable_training_torch_interop is enabled during build * clean up * fix * fix comment * fix * fix tests * fix fallback test * pylint format * refine based on comments	2022-08-12 00:49:30 +08:00
Baiju Meswani	3e78f3cf1f	Add win-ci pipeline for on-device training (#12513 )	2022-08-10 14:45:39 -07:00
msftlincoln	0d9a02e647	Eager Mode - Support Concatenation via aten::cat.out (#12527 ) * support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims	2022-08-09 17:16:18 -04:00
Adam Louly	2681648f5b	Load checkpoint in cpp (#12352 ) * Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-09 12:30:50 -07:00
Vincent Wang	2bed0d4abb	[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482 ) * sce refactor * refactor * remove usnecessory memset	2022-08-09 16:48:44 +08:00
pengwa	a2dc3e9eac	Improve the compilation speed when compiling for multiple architectures. (#12490 ) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments	2022-08-09 11:52:26 +08:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
Baiju Meswani	a7d6290774	CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412 )	2022-08-04 22:28:28 -07:00
LironKesem	d452462b5e	Lironkesem/unsqueeze_and_squeeze (#12421 )	2022-08-04 15:12:34 -04:00
Baiju Meswani	7f58bd7236	Perform graph transformations during offline tooling (#12422 )	2022-08-03 11:27:12 -07:00
Vincent Wang	99d2a63e1a	Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432 ) add seed	2022-08-03 13:29:30 +08:00
smrkatte	54d5e86981	Add cast before copy for dissimilar scalar type (#12391 ) * Add proper cast/copy callflow for ORT and non-ORT devices	2022-08-02 18:32:58 -07:00
LironKesem	315e006532	adding a comment on nll_loss_forward.output that can not be implemented (#12406 ) adding a comment on nll_loss_forward.output that can not be implemented	2022-08-01 19:12:35 -04:00
msftlincoln	62922f4c3c	Eager Mode generator: add comments, rename functions (#12385 ) * eager generator: add comments, rename functions * lint	2022-08-01 15:52:47 -04:00
pengwa	6d1eb9509e	Refine gradient accumulation (on device training) (#12363 ) * a (cherry picked from commit 43909cdd6e3daf30a82d584292286806d1172a0b) * optimize inplace accumulator a bit * fix inputs * revert logging * minor fix * tune perf and resolve comments * typo * fix * fix tests * move threshold to constexpr.	2022-07-30 10:24:01 +08:00
msftlincoln	9559d25da9	ORT Eager Mode Generator - make smaller functions (#12371 ) These changes resulted in no change to the generated outputs ort_aten.g.cpp and ort_customops.g.cpp.	2022-07-29 10:12:34 -04:00
pengwa	6514069749	Make memory profiler work with multiple session runs. (#12317 ) * make memory profiler work with multiple session runs. (cherry picked from commit 5b636b4dd6fe91b75c063696dc73eda33ec36c8d) * minor fix * fix build * fix window build * 1. fix cpplint issues; 2. give unique filesname for each session profiler result.	2022-07-29 18:36:31 +08:00
msftlincoln	9cf6912bba	Fix ORT Eager Mode to work with Pytorch 1.12 (#12323 )	2022-07-27 16:24:46 -04:00
Wil Brady	1163294699	Fixing up some python warnings. (#12319 )	2022-07-27 07:24:37 -04:00
Adam Louly	f3dcbf539a	Checkpoint load inference (#12168 ) * LoadCheckPoint to tensor cpp functions (draft) * Load Checkpoint into inference model * fix python lint * fix python lint * Fixing lint and some unused imports * added assert for zero weights model, resolved other issues * resolved issues * Solved issues * changed variable names for get_models * paparameters names missmatched fix Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-07-26 11:08:50 -05:00
Wil Brady	de57daaab0	Eager mode: binary ops more complete behavior and testing. (#12293 ) * Remove hand written add_.Tensor as it can now be generated. * Generate .out for tensor version of basic math ops. Add.out testing added too. * Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section. * Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor. * Add support for scalar input to add, div, mul, and sub.	2022-07-26 09:14:57 -04:00
Vincent Wang	c40f73ae0c	Remove aten::binary_cross_entropy_with_logits from ATen Fallback (#12301 )	2022-07-26 07:29:56 +08:00
Dmitri Smirnov	3bf614fd47	Eliminate memory allocations per recent profiling (#12225 ) * Alloc begin FeedsFetches refactoring Refactor Tensor class Fix buffer deletor Remove new/delete deleted Adjust alloc move Fix up xnnpack provider Clarifying the comment on Create()	2022-07-25 14:14:38 -07:00
Baiju Meswani	ddb45e9126	On device training CI pipeline (#11987 )	2022-07-25 10:07:17 -07:00
Jameson Miller	8d0e86dec8	Apply project formatting rules to ort_aten.cpp (#12294 ) * Apply project formatting rules to ort_aten.cpp Formatting applied by formatting the file in VS Code. This file is under active development and the inconsistent formatting was causing friction due to: 1. cpplint job on Pipeline was flagging a lot of style issues, resulting in a lot of noisy annotations. 2. local edits would result in changes that are not part of the core change. While there are other files in this part of the source tree with inconsistent formatting, this file was causing the most friction. We can come back and address the other files later, which would be a much larger change. * Apply consistent pattern for invoker.Invoke(...)	2022-07-25 07:26:35 -04:00
Vincent Wang	0fa3aeb65c	[CUDA] Add Strided Tensor Support for Expand->GatherElements for Training (#11976 ) * strided tensor for expand and gather_elements * bugfix * simplify CoalesceDimensions * resolve comments * resolve more comments.	2022-07-25 16:05:26 +08:00
pengwa	75bda9f267	CPU AdamW implementation (#11978 ) * cpu adamwoptimizer implementation * unit tests for cpu kernel pass * refine based on comments * parallize the weights loop in PrepareForCompute. * fix wrong test data path * fix kernel hash * fix rocm ci pipeline	2022-07-25 09:43:52 +08:00
Juan Paez	4f57da78cf	OrtModule fix pytorch version comparison (#12280 ) * fix torch version comparison * remove patchfile Co-authored-by: Juan Paez <juanpaez@microsoft.com>	2022-07-22 09:11:28 -07:00
pengwa	feabafe58b	Fix memory consumption discrepancy (#12266 ) * release cached cuda memory after temp model_copy run * op schema change only: remove PythonOp forward output from PythonOpGrad inputs. * always export model using torch.no_grad * 1.update PythonOP's "input_requires_grads" attribute according to ORT gradient graph. 2. remove PythonOp's "output_tensor_requires_grads" attribute because in torch.no_grad mode, the exported value is not correct. 3. [related to 2] remove PythonOPGrad's "input_tensor_requires_grads" because it comes from corresponding PythonOP's "output_tensor_requires_grads". * fix uts * refine basde on wschin's comments && fix pylint * fix comments * fix unused variable	2022-07-22 16:55:50 +08:00
Ashwini Khade	ceb76429db	Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc Merge On-Device-Training Offline Tooling and C/C++ APIs	2022-07-21 15:09:48 -07:00
Wil Brady	45c0be8a25	Modify generator for eager to use all inputs for determining promote type. (#12268 ) * Sort supported types order so we get a consistently generated order of types. * Fix promote type to include all the input types and not just the first one.	2022-07-21 17:21:10 -04:00
Baiju Meswani	cbf08c7a7b	Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments	2022-07-21 18:11:48 +00:00
LironKesem	7dc45bc311	Implementing aten::gt.Scalar_out and aten::lt.Scalar_out (#12181 ) * Implementing aten::gt.Scalar_out and aten::lt.Scalar_out * modified the code according to code review	2022-07-21 10:36:43 -04:00
msftlincoln	424120d0fa	cpplint & Eager mode: refactor and add comments to empty_* functions, general lint cleanup in ort_aten (#12238 ) * empty* comments and code reuse * lint * more cpplint * add cpplint settings * test empty	2022-07-20 11:47:57 -04:00
Vincent Wang	72c689a502	[CUDA] Use dim3.z to Handle Large Input For GatherGrad (#12250 ) * use dim3.z to handle large input size * less blocks	2022-07-20 18:42:52 +08:00
pengwa	ebfd81e67e	Fix BiasGeluGrad bug (#12200 ) * use 3D grid to avoid the upper limit of grid dimension * enrich tests * Revert "use 3D grid to avoid the upper limit of grid dimension" This reverts commit 2d5badf2fe8cd985f3f29ee2cb18fff13d07c2ab. * change to a fix: switch the 1st and 2nd dim	2022-07-20 17:59:29 +08:00
Vincent Wang	3cdc6d7775	[ORTModule] Bugfix of torch.chunk's Custom Symbolic when chunks==1 (#12249 ) handle custom chunk with chunks==1	2022-07-20 17:00:41 +08:00
Juan Paez	9b6ef17c5f	Eager opgen support for in-place operations with variadic args (#12125 ) * use torch library binding frontend for tensorlist * fix test * allow in-place modification of variadic args * fix lint issues * update ORT eager readme Co-authored-by: Juan Paez <juanpaez@microsoft.com>	2022-07-19 21:01:00 -07:00
Jameson Miller	975bb56e8c	Eager mode - argmax_out: set output tensor (#12233 ) This change updates the implementation or te argmax_out operator to 1) set the output tensor correctly and 2) remove the unnecessary use of a temporary tensor to store intermediate result of onnx ArgMax operation. Previously, the argmax_out operator did not correctly update the out tensor - it replaced the OrtValue instead of the memory backing the OrtValue . To properly update the output tensor, we need to calculate the expected shape of the out tensor. We add the helper function calculate_reduction_shape to calculate the shape of the reduced tensor from the input tensor, dimension to reduce, and option to keep the reduced dimension or not. This is based on the utility functions in aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but is tailored to be a bit more specific to our current needs. Notes: We considered just directly leveraging PyTorch's utility functions (e.g. get_reduction_shape) to calculate the shape of the reduced tensor from aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but including this header file resulted in warnings around unused functions that we need to handle. As we only need a limited functionality at the moment, we instead implemented our own utility function to calculate the reduction shape for our specific current needs. If we need a utility function to more generally calculate the reduction shape, we could consider switching to leveraging the utility methods in PyTorch.	2022-07-19 14:37:03 -04:00
Wil Brady	4235ebc161	Add eager mode support for mm.out (matrix multiplication). (#12214 ) * Add eager mode support for mm.out (matrix multiplication). * Fallback to cpu when mm requirements not met so cpu can print error message.	2022-07-19 07:28:48 -04:00

1 2 3 4 5 ...

1065 commits