onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-06 00:03:22 +00:00

Author	SHA1	Message	Date
Suffian Khan	225439193e	Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833 ) * special case concat and split when sizes are equal * add tests for 16 and 32 inputs with same dim * add tests for 16/64 inputs on concat or 16/64 outputs on split * try eliminate windows warning * outter => outer	2021-09-01 15:25:45 -07:00
Suffian Khan	00b0a9c127	Add hugging-face models loss curve and performance guards to ROCm CI pipeline. (#8915 ) * test running hf bert-large * try again * try again * include other models * correct names * disable deberta-v2-xxlarge * avoid torch.distributed * add compare json loss and perf for bert-large to test * fix sed expression * remove pytest * add more models * move unit tests u * display samples/sec	2021-09-01 09:03:10 -07:00
Tang, Cheng	4dc0ddf606	support register external ep lib information (#8897 ) * support register external ep lib inforation; make eager mode share the same ep pools with training workloads * fix inference code * fix build break * fix the message	2021-08-31 20:51:22 -07:00
pengwa	3eb08d4dc7	custom autograd func memory (#8901 ) * remove PythonOpGrad control dependency && avoid segement fault * comment alignment * fix bugs	2021-09-01 09:29:26 +08:00
baijumeswani	70ca03d491	Correctly set the skip check flags for ORTModule (#8891 )	2021-08-31 15:28:04 -07:00
George Nash	dc75a135c8	Add elementwise operators to DNNL execution provider (#8899 ) The following ops have been added to the DNNL execution provider Abs, Elu, Exp, Log, Relu, Round, Sigmoid, Softplus, Sqrt, and Tanh Relu op was moved from its individual file to the elementwise operators The error tolerance for the LogGrad unit test had to be decreased slightly when using OneDNN. Still investigating why a differet tolerance value is needed. DnnlSubgraph::AddKernels() member function was moved to the top of the file since this is eddited every time a new operator is added to the the execution provider this places the code at the top which mean less scrooling when adding new kernels. Signed-off-by: George Nash <george.nash@intel.com>	2021-08-31 12:20:49 -07:00
satyajandhyala	84f9271a8d	Enable registering external custom op schemas on Linux (#8889 ) * Use manylinux instead of Ubuntu to run external custom ops build pipeline.	2021-08-30 10:13:47 -07:00
pengwa	36fa0de8b7	fix regression and enable custom autograd func tests in CIs (#8868 ) * fix regression and enable tests in CIs * Update orttraining/orttraining/python/training/ortmodule/_custom_autograd_function.py Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> * fix Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2021-08-30 09:34:18 +08:00
Sherlock	6e20eb7eb3	Stop gradient for Multinomial, RandomNormalLike, RandomUniformLike and EyeLike (#8836 )	2021-08-28 16:21:34 -07:00
baijumeswani	df9438192a	Re-introduce saving of optimized onnx model (#8860 ) * Re-introduce saving of optimized onnx model	2021-08-28 14:27:25 -07:00
satyajandhyala	31926176ac	Support external custom operator schemas on Ubuntu (#8807 ) * Expose symbols in onnx and protobuf namespaces in python when building with --enable_external_custom_op_schemas * Add external onnx and protobuf files to wheel * Added an example to demonstrate external custom ops use-case * Added a Linux build pipeline to test external custom ops	2021-08-28 11:05:21 -07:00
Tang, Cheng	ae7f2d824d	Share the execution provider instance for training (#8719 ) * seperate the training python module; share the execution proivder instance * fix build break * fix cuda test crash; reorg the python module code base * se correct env * use provider customized hash func * fixbuild break * fix rocm break * use const ref in argument * rename the file * move hash func to trainiing module	2021-08-27 16:23:35 -07:00
Sherlock	c325207f7a	Optimize MatmulGrad (#8846 ) Optimize two special cases of MatmulGrad using FusedMatMul.	2021-08-25 23:36:40 -07:00
Hariharan Seshadri	cee79526fd	Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442 )	2021-08-25 12:04:20 -07:00
Sherlock	73fe7bfa0f	Add ATenOp at::diagonal (#8838 ) * Register at::diagonal for ATenOp	2021-08-25 09:45:53 -07:00
Chandru Ramakrishnan	98ed235fc7	Removed MSNPU code from eager. (#8832 )	2021-08-25 09:40:25 -04:00
ashari4	4251e04eae	Removed assert (#8779 )	2021-08-24 20:26:08 -07:00
ashari4	7f1e880649	Reorder ORT eager headers (#8813 )	2021-08-24 14:48:43 -07:00
Changming Sun	4bfff45859	Downgrade Eigen (#8817 )	2021-08-23 18:06:23 -07:00
Chandru Ramakrishnan	2693af9799	Ported changes / bug fixes from torch/ort. (#8784 ) * Ported changes / bug fixes from torch/ort. * Fixed formatting * Renamed function * Renamed module_ to module. * Revert "Renamed module_ to module." This reverts commit b17fc114b3db20d174283811d90592b5b8154c19. * Include pybind common header to fix linker errors on windows debug. * Fix to generation of > 1 custom op. Co-authored-by: Ashwin Hari <ashari@microsoft.com>	2021-08-23 17:45:40 -04:00
George Nash	d4a88cfe3f	Add Gemm op to DNNL Exectution provider (#8799 ) * Implement Gemm op for DNNL execution provider Signed-off-by: George Nash <george.nash@intel.com> * Remove KernelRegistry and Gemm op for dnnl ep The KernelRegistry for the dnnl execution provider only registered a Gemm op that as best we can tell was never actually used and also was not using the dnnl library. We have implemented a Gemm op in the DNNL execution provider subgraph code and thus are removing the unused Gemm op that was in the dnnl KernelRegistry. Signed-off-by: George Nash <george.nash@intel.com> * Fix duplicated output and kernelshape inference fix getcapability to make sure subgraph outputs do not have duplicates fix kernelshape inference in pool Signed-off-by: Wang <zhaoyang.wang@intel.com> * Removed most dnnl specialized ifdefs from gradient_ops_test code Re-enable GlobalAveragePoolGrad test for dnnl ep The bugs that were exposed by the GlobalAveragePoolGrad test have been fixed and this test no longer needs to be disabled for DNNL. Removed the ReluGradDnnl test. We are getting the testing from the already existing ReluGrad test. MaxPoolGrad test no longer has specialized execution provider enabling for DNNL execution provider. It will now run without the extra enabling. ConvGrad is the only test that still has dnnl specialized ifdefs However, the ConvGrad code was not being executed by the code unless it was listed first in the list of execution providers. Signed-off-by: George Nash <george.nash@intel.com> * Fix transpose issue on Gemm On transposing square matrices, getmemoryandreshape will fail to reshape fix by adding a bool Signed-off-by: Wang <zhaoyang.wang@intel.com> * Save memory space by reusing internal tensor for output The intermediat matmul output tensor can be used as the output tensor for the binary calculation. Remove the unused IsAttributeSupported from the DnnlGemmNodeCapability class since we now support all of the Gemm attributes in our implementation. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com>	2021-08-23 08:45:34 -07:00
Suffian Khan	9fa0d8392a	Extend node debugging utilities to push tensors and node placement to SQL database (#8672 ) * adding support for tracing to sqldb instead of files * use compiled statements * script to pull tensors from db * link sqlite3 * remove node info redundant with onnx graph * addressing PR comments * address PR comments and include program counter * third party notice * use find_pacakge * add to cgmanifests.json * address thread safety and add pid suffix * build fi * python script to select on devicetype * remove unpopulated and redundant Shape and Type fields * comment * comment * PR comments * add graph execution counter to session state * move increment to inference session * std::endl to \n * ifdef on graph execution counter * add ifdef to inference session * move DEBUG_NODE_INPUTS_OUTPUTS to CMakeLists.txt	2021-08-21 00:40:12 -07:00
Sherlock	81889a1cf6	Invertible ReluGrad (#8773 ) * Invertible Relu Grad	2021-08-19 11:29:05 -07:00
Aaron Bockover	b2813656f5	eager: fix build against latest PyTorch master (#8745 ) Improve README as well.	2021-08-18 14:27:21 -04:00
pengwa	0983d61969	refine glue code and tests (#8510 )	2021-08-18 11:38:00 +08:00
ashbhandare	cc275e7529	Gradient Accumulation optimization verified for correctness (#8273 ) * Fetching frontier tensors to frontend * Move before session initialize call * Fetch tensor and add to cache * Rest of the changes for using cache * Review comments * Review changes * Review comments * switch to shared_ptr * Fix bug after rebase * FE docstring change	2021-08-17 16:24:44 -07:00
baijumeswani	871eeb4dbd	Support dicts as inputs to ORTModule (#8718 )	2021-08-17 13:40:55 -07:00
Thiago Crepaldi	ed254c283f	Add support for experimental json config for fallback (#8759 )	2021-08-17 13:35:42 -07:00
Thiago Crepaldi	419834d285	Add PyTorch fallback for ORTModule forward exceptions (#8346 )	2021-08-17 10:41:15 -07:00
M. Zeeshan Siddiqui	0fb82f0f8a	Memory aware gradient builder. (#8582 )	2021-08-16 19:01:22 -07:00
Nat Kershaw (MSFT)	aa12d68c37	Update ORTModule API docstrings (#8309 )	2021-08-16 16:53:01 -07:00
George Nash	e695cd304a	Dnnl refactor (#8627 ) * dnnl ep rework rework DnnlTensor,DnnlNode,DnnlSubgraph to support arbitrary graph topology and tensor data types rework GetCapability to claim nodes in graph greedily from node topological ordering and delay creation of DnnlSubgraph until Compile rework compile to have DnnlSubgraphPrimitive as the object to handle primitive creation and execution instead of thread local primitive pool which duplicates intermediate memory allocated by the EP across threads DnnlSubgraphPrimitive provides helpers to handle many common functions for each dnnl primitive builder and become the centralized place to store input, output, intermediate memories, initializer memories and etc it provides functions to obtain input memories with automatic reordering/reshaping and moving between engines it provides interfaces to add primitive, set output memory for single node and etc add CONCURRENT_EXEC compile flag for dnnl library as without it, convolution primitive cannot be created and executed on different threads enable unit tests to run on dnnl ep as well if built with dnnl ep add dnnl ep support for Matmulinteger * Add Relu to the DNNL refactor Signed-off-by: George Nash <george.nash@intel.com> * Add Convolution op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add Pooling ops to the DNNL rework This adds the following ops: - AveragePool - GlobalAveragePool - GlobalMaxPool - MaxPool Note: Pooling with dilation is not yet supported. Note: GlobalLpPool, LpPool, MaxRoiPool, and MaxUnpool are not supported yet. Signed-off-by: George Nash <george.nash@intel.com> * Add Sum op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add ConvGrad op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add MaxPoolGrad and AveragePoolGrad ops to DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Added lrn operator to the refactored code Signed-off by chethan.palangoutu.keshava@intel.com * Added ReduceMean DNNL op to the refactor code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added Softmax DNNL op for the refactored code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added BatchNorm DNNL op inference-only for refactored code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added Binary Ops to DNNL rework Signed-off-by: Wang <zhaoyang.wang@intel.com> * Added ReluGrad to DNNL Rework Signed-off-by: Wang <zhaoyang.wang@intel.com> * Update OneDNN tag to v2.3 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Added support for memory upto dim size 12 this is to fix the CI test cases that contain binary ops of input dim size > 5 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Prevent claiming support for float16 and bfloat16 when only float is suppoted By using The string.find used was causing the code to claiming support for float16 and bfloat16 when we only supported float. We now explicitly check the code for the data type or the data type with a 7 letter prefix basically prefixed with "tensor(" Signed-off-by: George Nash <george.nash@intel.com> * Disable uint8 mul and div, improve type conversion Disable mul_uint8 and div_uint8 test cases as they use modulo for overflow handling while onednn uses saturation improve ype conversion using enum instead of string comparsion as well as adding more types Signed-off-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>	2021-08-13 14:15:43 -07:00
Changming Sun	436ac6dd5f	Rename ml_value.h to ort_value.h (#8726 )	2021-08-13 07:04:56 -07:00
baijumeswani	217b2c9f93	Removing filelock import from ORTModule (#8722 )	2021-08-12 21:19:49 -07:00
Tang, Cheng	de2a53e46d	[eager mode] fix build and support customize shared provider entry point (#8680 ) * fix build break * support customize the name of shared provide lib's entry point * fix non training build * check error code * check return code	2021-08-11 15:10:35 -07:00
harshithapv	c24335246b	Support bool type for Pad Op and fix Unsqueeze in Tile grad for Opset 13 (#8602 ) * changes * tile grad unsqueeze fix for opset 13 * clean up * remove bool support for opset 2 to 12 for Pad as it is not supported. * Copy OperatorKernels.md from artifacts of Windows CI build.	2021-08-11 11:21:02 -07:00
mindest	a56e325eb8	constrain inputs for min/max grad UT (#8632 ) * fix inputs for min/max grad UT * use random inputs (truncated)	2021-08-07 18:29:06 +08:00
Tang, Cheng	6d3c2c85ef	Integrate eager mode source code into onnxruntime repo (#8584 ) * integrate eager mode source codde; build with cmake and integrate the python test * Adding the python path for importing libraries in the Eager mode * fix clang break;check if training and python enabled * handling the linking of torch libraries across multiple platforms * merge and fix the naming * add build instruction Co-authored-by: Abhishek Jindal <abjindal@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: ajindal1 <abjindal@microsoft.com>	2021-08-06 08:30:27 -07:00
Ashwini Khade	96eb9810ba	Update onnx (#8458 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * plus more fixes * updates per review * update to release commit * add filters for optional type tests * plus updates	2021-08-05 09:21:44 -07:00
Changming Sun	0510688411	Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471 ) 1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths. 2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it. 3. Fix some parentheses warnings 4. Update cmake to the latest. 5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.	2021-07-30 17:16:37 -07:00
baijumeswani	816ad86d14	Configuring ORTModule - Internal Options (#8537 )	2021-07-30 13:05:32 -07:00
satyajandhyala	5e2f4263db	Enable cast propagation in the frontend. (#8517 )	2021-07-28 17:06:49 -07:00
baijumeswani	2e28cbaa64	Configuring ORTModule - End User Facing Options (#8470 )	2021-07-28 10:51:43 -07:00
Sherlock	1370cbe256	[ORTModule] Extract output schema in module's true train/eval mode (#8516 ) * Extract output schema in module's true train/eval mode	2021-07-28 09:55:07 -07:00
mindest	a71dab691d	Implement BatchNormInternal for cuda (#8172 ) * correct batchnorm replacement output order; remove bn replacement in grad graph builder * update op defs and kernel class * implement batch norm internal and grad. * change saved_var into saved_inv_std * cuda test case: bn internal * remove redundant include * fix comment; add support and UT for 1d input. * exclude batch_norm_internal in amd_hipify * run BNInternal UT for CUDA only * fix CI error * fix comment errors * fix error * add comment for inconsistency with cudnnBN doc * additional comments for cudnnBN inconsistency	2021-07-28 16:04:49 +08:00
Vincent Wang	1798698545	avgpool2d atenop (#8507 )	2021-07-28 14:04:55 +08:00
Sherlock	686f9b530b	ORTModule set_seed in int (#8511 )	2021-07-27 15:43:13 -07:00
Oliver Rausch	1685ab8138	Implement Concat with Strided copy (#8336 ) Adds a StridedCopy function that implements a copy from strided tensor to another. This parallelizes the Concat operator, and can also be used in the future to parallelize many other data movement operators (e.g. Transpose, Split, etc.). This operation is also required for the proposed data layout extensions to ORT.	2021-07-27 18:27:56 +02:00
ytaous	1ae32655b3	fix t5 assert error (#8501 ) Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-07-27 09:04:01 -07:00
ytaous	ab5289f109	Performance: enable faster training with skip checks config (#8411 ) * freeze/fastpath support * more comments on _fast_path * per comments * minor fix * IntFlag improve * address comments Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-07-23 10:23:13 -07:00

1 2 3 4 5 ...

745 commits