onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-03 23:49:44 +00:00

Author	SHA1	Message	Date
ashari4	c4a7e88fc8	QuantizeBFP and DequantizeBFP (#12833 ) * `QuantizeBFP` and `DequantizeBFP` schemas - similar to `QuantizeLinear` and `DeQuantizeLinear`. * BFP datatype is represented as a `uint8` tensor with shape and stride metadata. This is preferrable to adding a new datatype for BFP, which is more disruptive and [discouraged by PyTorch](https://discuss.pytorch.org/t/training-with-custom-quantized-datatype/152132/2). Context: The Microsoft Floating Point (BFP) datatype shares an exponent for every n numbers called a “bounding box.” Each number still has its own mantissa and sign bits. BFP has been shown to incur 3-4 less cost (energy and area) than BFloat16 and INT8 counterparts without reductions in accuracy for the ImageNet benchmark as described in [Rouhani 2020](https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7fbd2ad9ec03daa3f840-Paper.pdf). Requirements: * There are many variants of BFP (number of mantissa bits, number of shared exponent bits, size of bounding box, custom bit fields, etc.) * The size and layout of an BFP variant varies across hardware * bounding box can be over arbitrary dimensions; for example, for the channel "C" dimension in a N x C x H x W tensor for convolution Goals of this PR: * Add initial versions of QuantizeBFP and DequantizeBFP operators to enable QDQ-style quantization with BFP. Once the schemas stabilize, we can consider upstreaming to ONNX. * Add some basic type and shape inferencing tests; tests that run on an EP will be a follow-up.	2022-09-22 14:02:55 -07:00
Weixing Zhang	4113df0e21	use constexpr (#12953 )	2022-09-20 14:34:33 -07:00
Edward Chen	454f77cd94	Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791 ) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.	2022-09-20 14:24:59 -07:00
Pranav Sharma	a8b0f57d1a	Fix eager mode pipeline to accommodate recent allocator change. (#13000 )	2022-09-20 12:53:46 +08:00
cloudhan	0ddf4efbd9	Make PythonOp report dtype mismatch by name, instead of by using enum index (#13007 )	2022-09-20 12:29:30 +08:00
Adam Louly	268bfe2a5d	python training api bindings (#12610 ) Description: Python API Bindings for on device training. Motivation and Context - This PR contains api bindings so python users can perform a whole training loop. Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2022-09-16 09:38:24 -07:00
Vincent Wang	da07c83948	SoftmaxCrossEntropyLossInternalGrad and Sum Fusion (#12746 ) * fuse scegrad and sum * add yield output shapes to value_info * resolve comments * fix merge main	2022-09-14 14:45:51 +08:00
pengwa	b5327595f3	Fix [prefast:Warning]: C26814 (#12897 ) fix C26814	2022-09-09 08:26:48 +08:00
Thiago Crepaldi	55c745eefd	Add support for ORTModule Torch cpp CUDA extension build within docker (#12868 ) Currently, CUDA hardware is not available to be leveraged by build during `docker build`. because of that, CUDA capable hardware would not have CUDA support This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA extensions to be compiled even when CUDA support is not detected.	2022-09-08 15:30:44 -04:00
guyang3532	4765e5c382	Using ORTModule to wrap a evaluation model should not change the mode (#12747 ) Using ORTModule to wrap a evaluation model should not change the mode of model	2022-09-08 10:54:59 +08:00
RandySheriffH	d3b684cd9e	Drop nuphar (#11555 ) * drop nuphar code and configs * refactor test case * format python * remove nuphar from training test * remove commented nuphar logics * restore llvm setting * drop nuphar ci * fix compile err * fix compile err Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-07 15:11:18 -07:00
Baiju Meswani	9e47eb68e0	Remove unused orttraining amd dockerfiles and scripts (#12707 )	2022-09-02 18:43:21 -07:00
Baiju Meswani	295bd26980	Remove orttraining-distributed CI pipeline (#12738 )	2022-09-02 14:34:26 -07:00
ashbhandare	27dde0b51f	Csharp bindings for on-device training APIs (#12404 )	2022-09-02 13:13:48 -07:00
Baiju Meswani	56bae3b196	Use InplaceClipGradNorm for offline processing for on-device training (#12603 )	2022-09-02 07:47:17 -07:00
ashbhandare	349469c381	Enable way to extract all parameters to and from a contiguous buffer. (#12674 ) * implementation * review comments * review comment * lint error	2022-09-01 15:23:30 -07:00
George Nash	0125e15281	Fix include order build failure training build (#12425 ) Signed-off-by: George Nash <george.nash@intel.com>	2022-09-01 10:48:40 -07:00
Cheng	5dd9afe75a	python lint (#12825 )	2022-09-01 22:38:25 +08:00
PeixuanZuo	adbc0757ad	[UPDATE] update ROCm ci pipeline to ROCm5.2.3 (#12799 ) * [Update] update to rocm5.2.3 * [Fix] cmake version * [Fix] disbale ortmodule tests * [revert] revert performance number	2022-09-01 10:32:24 +08:00
Vincent Wang	262a597e2a	[CUDA] BiasSoftmax and Dropout Fusion (#12667 ) * bias softmax dropout fusion * fix rocm build * move some files	2022-09-01 10:01:44 +08:00
Justin Chu	a48b115540	Remove reference to the deprecated variable in `torch.onnx.symbolic_helper` (#12452 ) Description: Remove reference to the deprecated variable in `torch.onnx.symbolic_helper` pytorch/pytorch#81953 - Removed unused imports - Changed BANNED_AUTOGRAD_FUNCTION_NAMES to a frozenset Motivation and Context The cast_pytorch_to_onnx variable is deprecated and removed in `torch.onnx.symbolic_helper`. Since there is still a need for converting scalar types to onnx type, I copied the mapping to `_CAST_PYTORCH_TO_ONNX` in the module.	2022-08-31 11:55:56 -07:00
Yulong Wang	1a402a3f25	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
cloudhan	9907b59a1e	Change cuda and rocm error checking helpers to return Status (#12699 ) * CudaCall returns Status in non-throw and void in throw * RocmCall returns Status in non-throw and void in throw	2022-08-30 13:18:47 +08:00
pengwa	a0c25e5c2f	Fix segment fault for alltoall (#12701 ) * fix segment fault * formatting	2022-08-30 11:27:14 +08:00
Baiju Meswani	b83ea3c2ff	Address prefast static analysis warnings (#12756 )	2022-08-29 10:09:32 -07:00
Adam Louly	ee543a47f6	upgrade cuda version on ci pipelines (training CI pipelines) (#12708 ) * upgrade cuda version on ci pipelines * keeping folder name same * keeping folder name same * setting manual seed for primitive test case * resolving comments * changing atol and rtrol only for test case Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-26 16:51:19 -07:00
edgchen1	64e8806148	Address some static analysis warnings.	2022-08-26 15:05:53 -07:00
abhi-ort	ebff15d743	Pinning manual seed (#12714 )	2022-08-25 10:09:02 -07:00
Vincent Wang	5104c7dbd3	Fix Prefast Warnings (#12717 ) fix prefast warnings	2022-08-25 17:09:37 +08:00
Vincent Wang	53ecb9e635	Update Supporting DS Version to 0.7.1 for ORTModule (#12696 ) update ds version support for fp16_optimizer	2022-08-24 14:56:12 +08:00
abhi-ort	73e5741a9a	Enabling softmax grad and logsoftmax grad on ORT (#12614 ) * Enabling softmax grad and logsoftmax grad on ORT * formatting changes * formatting changes * reverting changes * Changing the OpType	2022-08-23 15:49:02 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Wei-Sheng Chin	dc486d146b	Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460 ) * Make ORT as Pytorch JIT backend LORT likely doesn't work with aten fallback so we only test LORT in its own CI. * Revert changes to enable external CUDA allocator. Will add it later. Revert "Revert changes to enable external CUDA allocator. Will add it later." This reverts commit d5487f2e193014c805505afae8fb577c53667658. Fix external allocator * Relax tolerance and remove commented code * Print more information in CI * Fix pointer * Address comments. 1. Reuse ORT-eager mode's environment. 2. Remove unused ctor. * Use Pytorch master branch as all PRs are merged Fix * Refine based on cpplint feedbacks * Revert changes to allow custom CUDA allocator in public APIs * Use torch.testing.assert_close * Use unittest framework * Switch docker repo * Rename .cpp to .cc * Address comments * Add comment * Use same pipeline file for eager and lort pipelines * Address comments * Add yaml comment * Fix cmake files * Address comments * Rename flags, remove printing code, remove dead comment	2022-08-22 09:40:40 -07:00
Vincent Wang	a078c8d99b	Update Supporting Deepspeed Version of ORTModule's FP16_Optimizer (#12668 )	2022-08-22 22:22:53 +08:00
Scott McKay	2102b8f67c	Avoid duplicate symbol error between ONNX and ORT for ostream operator<< with TensorShapeProto (#12651 ) * Remove ostream operator<< definitions for TensorShapeProto and TensorProto as they clash with ONNX definitions in onnx/defs/printer.h/cc. Currently printer.h (unnecessarily) pulls in a number of other ONNX headers which causes naming clashes with parts of ORT. It is also excluded in a minimal build. Instead convert the onnx::TensorShapeProto to onnxruntime::TensorShape so we use the existing ostream operator<< for TensorShape. Make GetTensorShapeFromTensorProto consistent with GetTensorShapeFromTensorShapeProto so both return a TensorShape (as the name implies).	2022-08-22 17:20:52 +10:00
pengwa	7df2e8c5cc	Refactor with std::variant (on device training) (#12383 ) * use std::variant for synthetic data storage. * use std::variant to replace TypedCheckpointProperty * Remvoe shared ptr for checkpoint property * fix tests * refine std::variant usage a bit * remove CheckpointProperty data abstraction * use InlinedVector and InlinedHashMap if possible * fix comments * fix build and test * fix some comments * use gsl::span * fix tests * refine based on comments * fix win build * fix build	2022-08-17 08:31:23 +08:00
Baiju Meswani	f5e3517c39	Add Learning Rate Scheduler C API (#11957 )	2022-08-15 09:10:25 -07:00
Wil Brady	3d009cdde3	Updating binary ops in eager mode to support broadcasting. (#12560 ) * Updating binary ops in eager mode to support broadcasting.	2022-08-11 17:00:12 -04:00
pengwa	24eab921be	Enable PythonOp for --enable_training_torch_interop build (#12539 ) * enable PythonOp by default when --enable_training_torch_interop is enabled during build * clean up * fix * fix comment * fix * fix tests * fix fallback test * pylint format * refine based on comments	2022-08-12 00:49:30 +08:00
Baiju Meswani	3e78f3cf1f	Add win-ci pipeline for on-device training (#12513 )	2022-08-10 14:45:39 -07:00
msftlincoln	0d9a02e647	Eager Mode - Support Concatenation via aten::cat.out (#12527 ) * support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims	2022-08-09 17:16:18 -04:00
Adam Louly	2681648f5b	Load checkpoint in cpp (#12352 ) * Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-09 12:30:50 -07:00
Vincent Wang	2bed0d4abb	[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482 ) * sce refactor * refactor * remove usnecessory memset	2022-08-09 16:48:44 +08:00
pengwa	a2dc3e9eac	Improve the compilation speed when compiling for multiple architectures. (#12490 ) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments	2022-08-09 11:52:26 +08:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
Baiju Meswani	a7d6290774	CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412 )	2022-08-04 22:28:28 -07:00
LironKesem	d452462b5e	Lironkesem/unsqueeze_and_squeeze (#12421 )	2022-08-04 15:12:34 -04:00
Baiju Meswani	7f58bd7236	Perform graph transformations during offline tooling (#12422 )	2022-08-03 11:27:12 -07:00
Vincent Wang	99d2a63e1a	Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432 ) add seed	2022-08-03 13:29:30 +08:00
smrkatte	54d5e86981	Add cast before copy for dissimilar scalar type (#12391 ) * Add proper cast/copy callflow for ORT and non-ORT devices	2022-08-02 18:32:58 -07:00

1 2 3 4 5 ...

1093 commits