onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-07 04:39:07 +00:00

Author	SHA1	Message	Date
Shucai Xiao	e7d7fa8fa2	Update migraphx to rocm4.2 (#7994 ) * update dockerfile for migraphx ep * update to rocm4.2 * code cleanup * fix error related to onnx unit tests	2021-06-22 13:39:51 -07:00
Changming Sun	5809890ba2	Fix a compile error in InferenceTest.cs (#8119 )	2021-06-22 13:01:35 -07:00
Sunghoon	8cacb26946	remove debug.keystore from repository due to a credential issue report (#8113 )	2021-06-22 10:15:10 -07:00
Chi Lo	27d1784d44	Add TRT 7.1 Pipeline (#8073 ) * Revert for testing TensorRT 7.1 * change to origianl googletest version * change machine * remove build arg * change back machine * revert back googletest version * Make it ready to merge to master * revert onnx-tensorrt to v7.1 * rename yml * use [[ ]] in bash command * add sudo * add chmod * add correct path * change another way to revert onnx-tensorrt * change docker image to manylinux build	2021-06-21 20:57:04 -07:00
chethanpk	3cd06cb38c	Added support for ReduceMean on DNNL EP for CPU and GPU (#7902 ) * Added support for ReduceMean on DNNL EP for CPU and GPU Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added fix for a resnet model failure where it was failing to create dst shape for reducemean when it was part of a subgraph with other ops Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Removing the DNNL EP from these unit tests. This is in anticipation of two changes: - DNNL EP unit tests would be added in a different location later on, so addition of EP individually to these tests will not be necessary - This was causing a memory leak fail in debug build. The bug is in the EP itself and not in the code added for reducemean. The fix for this is in the i/o handling overhaul which will be added later. * Update reduction_ops_test.cc Had accidentally deleted a new line. Making sure there are no unnecessary changes in this file	2021-06-21 17:15:46 -07:00
Du Li	352d560fd5	Adding Conv+Clip fusion (#8102 )	2021-06-21 16:30:12 -07:00
Chandru Ramakrishnan	10b7ed6430	Added op_name to message when we are missing a kernel. (#8110 ) * Added op_name to message when we are missing a kernel. * Added domain and version * Added missing ,	2021-06-21 14:45:53 -04:00
Changming Sun	cba4bc11c7	Split Linux CPU CI pipeline (#8097 )	2021-06-21 10:52:30 -07:00
Bowen Bao	51c12a715b	Add NGramRepeatBlock contrib op (#8078 ) Description: Enforce no repetition of n-grams. Scores are set to `-inf` for tokens that form a repeated n-gram if added to the back of the input_ids. Motivation and Context Needed by transformer models in sequence generation algorithms (greedy search and beam search). This module has heavy impact on performance, and can be highly parallelized.	2021-06-21 10:21:48 -07:00
Sherlock	5ac06bad61	Relax test tolerance to make CI more reliable (#8100 )	2021-06-21 07:41:54 -07:00
Tang, Cheng	059d705988	support pass in custom op registry for eager mode (#8087 ) * support pass in custom op registry for eager mode * fix the comments	2021-06-20 13:38:09 -07:00
pengwa	9f5969693a	clean up builds for interop_torch (#8017 ) * clean up builds for interop_torch * add python dependency for executables * disable onnxruntime_ENABLE_TRAINING_TORCH_INTEROP by default; enable it in ortmodule GPU training pipeline only * disable training unrelated tests when torch interop is enabled * simplify the python dependency. * clean up and fix	2021-06-19 13:41:07 +08:00
Thiago Crepaldi	5c2e1bbb0a	Fix input schema extrator for ORTModule (#8098 )	2021-06-18 21:47:49 -07:00
baijumeswani	7701c8703e	Add module attribute to ORTModule to support HuggingFace Trainer save_model (#8088 )	2021-06-18 13:13:45 -07:00
Hariharan Seshadri	08eeb8763d	Loosen validation checks in Concat to unblock execution of model in #8020 (#8080 )	2021-06-18 11:14:36 -07:00
Olivia Jain	b2247ece25	Make Perf Test Configurable (#7836 ) - Allow anyone to kick off a perf test here. Customize: branch, eps, model selection, cuda version. - Only run shape inference when required. - Kill errored out memory processes. - Remove warmup run. - Clean up script. - Standalone_TRT is it's own "EP" vs as an additional run with TRT EP	2021-06-18 11:11:19 -07:00
Edward Chen	aa68157c3d	[Mobile package] Update required operator config with additional ops for wav2vec2. (#8079 ) Add some additional ops to the mobile package that are needed for the wav2vec2 model.	2021-06-17 13:08:15 -07:00
Guoyu Wang	d83f7fd4aa	[NNAPI EP] Enable Slice support (#8031 ) * Enable slice for NNAPI EP * Add ANEURALNETWORKS_STRIDED_SLICE support * Addressed CR comments * Addressed CR comments, rename PrepareForCompute to PrepareForComputeHelper to avoid confusion	2021-06-17 12:36:12 -07:00
Changming Sun	96989b83ee	Create python packages for DML (#8061 )	2021-06-16 16:59:12 -07:00
Nick Kreeger	d924fd205b	Create and move quantization tests to a shared Quantized utils file. (#8054 ) * Create a shared quantization util for all unit tests. * Cleanup qlinear_binary_op_test.cc * save * save * save * cleanup * save * cleanup for linux build	2021-06-16 17:00:36 -05:00
Guoyu Wang	32ef39be58	[Android] Move add header files into AAR to using Gradle (#8068 ) * Move add header files into AAR to using Gradle * fix gradle format violation	2021-06-16 12:03:42 -07:00
Ryan Hill	1d8edd0b5b	Fix missing files on linux (#8066 )	2021-06-16 11:05:03 -07:00
Wei-Sheng Chin	c76172fab6	Fix PythonOp with input which has no gradient (#8011 ) * Fix PythonOp with input has no gradient * Fix another bug which happens when inputs require gradient * Remove comments Co-authored-by: Peng Wang <pengwa@microsoft.com>	2021-06-17 00:19:41 +08:00
Vincent Wang	de8f2ecda9	Reduce Kernel Optimization (#8067 ) * reduce optimization * bug fix * add a check * add ut * refactor * add ut cases for keepdims=true	2021-06-16 19:53:46 +08:00
Ryan Hill	0ebaa71f49	Improve Windows Platform system error messages (#8063 )	2021-06-15 22:17:35 -07:00
Chen Fu	32e118bef0	Fix microbenchmark build failure (#8064 ) Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-06-15 20:49:39 -07:00
Tang, Cheng	e31784b6cf	decouple the python module construction from pybind_state (#8060 ) * fix broken tests * decouple the module construction to a seperate file	2021-06-15 18:52:26 -07:00
Changming Sun	96cf533c76	Remove DML from Windows GPU CUDA 10.2 pipeline	2021-06-15 16:53:24 -07:00
George Wu	25c49a5fe0	fix issue with cmake path (#8055 )	2021-06-15 15:09:15 -07:00
iperov	07b166bb1b	fix PATH addition in windows should set PATH, not add to the tail the copy of PATH	2021-06-15 14:18:00 -07:00
Sunghoon	887c3149e3	[js/react_native] Use a mobile ORT instead of a full ORT (#8042 ) * Change full ort to mobile ort * Update Android example to load mobile ort * Change the format of test models to ort * update ios to use mobile ort * revise README * use onnxruntime-mobile-c CocoaPods in a npm package	2021-06-15 13:36:05 -07:00
Nick Kreeger	6a1b000125	Fix unit test typo in test_op_embed_layernorm.py (#8056 )	2021-06-15 15:27:44 -05:00
Changming Sun	07788e082e	Enable python GPU tests (#7854 )	2021-06-15 10:24:58 -07:00
G. Ramalingam	8079c76383	Create ORT opschema library (#7903 ) * Op schema library * Create ORT opschema library and sample app * delete message in cmake * Fix cmake * Address PR feedback and add dependency * Add cmake dependency * Cmake fix * Add dependency for nsync * Add dependency for nsync * Reorder dependencies * Testing for dependencies on all platforms * Resolve dependencies on GetStackTrace, floatToHalf * Compiler strict-aliasing warning * Merge with master * Minor cleanup	2021-06-14 14:02:33 -07:00
Olivia Jain	c72a8c7ff4	Upgrade tf 2.4.1 to 2.4.2 for component governance (#8036 ) * Upgrade tf 2.4.1 to 2.4.2 for component governance * Trial run with tf 2.5.0	2021-06-14 09:30:58 -07:00
George Nash	9acf93b90a	Take graph topology into account when creating dnnl subgraphs (#7910 ) Check the inputs of all nodes are part of the subgraph for all operators. Previously the code assumed all operators only had a single input except for the "Sum" operator. This resolves issue seen when adding new operators that a subgraph was incorrectly accepting a node when the subgraph should not have because it was not following the topology of the nodes. Signed-off-by: George Nash <george.nash@intel.com>	2021-06-13 19:23:37 -07:00
Xavier Dupré	6d7461795f	Update Version.md (#8021 ) Fix the correct supported opset 1.8.0.	2021-06-13 18:52:40 +02:00
Pranav Sharma	ad6a306a7f	Add pragma once (#8040 )	2021-06-11 23:47:26 -07:00
Scott McKay	96ead2be91	Avoid hashing the operator type in the GraphViewer priority node check unless the string has a chance of matching. (#7972 ) * Avoid hashing the operator type in the GraphViewer priority node check unless the string has a chance of matching. Below are perf numbers from a test that loads 16 models multiple times. I was checking that some unrelated changes didn't have unexpected perf cost and found the PriorityNodeCompare overwhelmed any contribution the other changes were making. Before CPU Time:74.678s CPU Time for relevant Top Hotspots std::_Hash_array_representation<char> 20.834s onnxruntime::PriorityNodeCompare::IsHighPri 7.589s onnxruntime::Graph::KahnsTopologicalSort 4.487s After CPU Time:47.103s CPU Time for relevant Top Hotspots onnxruntime::Graph::KahnsTopologicalSort 4.465s onnxruntime::PriorityNodeCompare::IsHighPri 2.873s	2021-06-12 14:11:33 +10:00
Edward Chen	6e134c2cc3	[Objective-C API] Add support for documentation generation (#7999 ) Adding support for generating API documentation with the Jazzy tool. It's a manual process now, but we can eventually make it a part of the release pipeline.	2021-06-11 17:49:00 -07:00
Nick Kreeger	1d7f44a832	Add unit test for EmbedLayerNormalization quantization op. (#8033 )	2021-06-11 17:33:55 -05:00
Ye Wang	e6225c62a5	transformers test CI pipeline fix (#8016 ) * init checkin * Restore initial environment * -y * testtest * fix * fix indent	2021-06-11 12:57:52 -07:00
sumitsays	43c45ddd66	Update DirectML EP changes from DmlDev as of 2021-06-07 (#7987 ) * Merged PR 6093117: Fix test_DynamicQuantizedLinear_max_adjusted_expanded by allowing Identity operator to run on non-float inputs Motivation: As part of the OnnxConformance Backend tests, DynamicQuantizedLinear_max_adjusted_expanded is failing. Root Cause: - The test model has `Identity` operator as one of the node. The input of this node is of non-float data type. - In DML, `Identity` operator is registered as operator which requires floating input. - As per `DirectMLSchema.h`, support for non-float input has been added for `Identity` operator in DML but the same has not been reflected in the `OperatorRegistration.cpp`. Changes: - Removed all traces of the requiresFloatFormatsForGraph flag from it's definition and usage. This flag was only used for Identity and it's related operator. - Added null check for the graphOutput nodeArg in GraphDescBuilder.cpp to stop the crash of the test. Related work items: #33076298 * Merged PR 6103324: Remove usage of non-generic error code (FWP_E_NULL_POINTER) Motivation: Addressing Dwayne comment on the previous PR. [Ref: [6093117](https://dev.azure.com/microsoft/WindowsAI/_git/onnxruntime/pullrequest/6093117?discussionId=44292162&path=%2Fonnxruntime%2Fcore%2Fproviders%2Fdml%2FDmlExecutionProvider%2Fsrc%2FGraphPartitioner.cpp)] Changes: Inside the DML EP, we should not use some other platform specific error codes. Instead we should a appropriate generic error code. Related work items: #33076298 Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2021-06-11 11:09:48 -07:00
Vincent Wang	2f2aaf2cf6	Fix Memory Leak from DlpackToOrtValue (#8029 )	2021-06-11 15:48:13 +08:00
Jeff Daily	d02de9c1bc	[ROCm] dockerfile updates (#7955 ) * do not remove onnxruntime build directory in Dockerfile.rocm4.1.pytorch * restore ONNX Runtime Training Examples to rocm 4.2 dockerfile	2021-06-10 23:50:19 -07:00
Scott McKay	00d48d9c30	Add enhanced partitioning utils for use by compiling EPs (#7991 ) * Add enhanced partitioning utils and convert internal testing EP to use it. Will convert NNAPI EP once checked in. Background: Currently most EPs do their partitioning by iterating the model in the topologically sorted order. Whilst this works, it doesn’t ensure that all nodes which could possibly be added to the current group are, as the group is closed as when the first unsupported node is seen. Changes: - Ask EP for all nodes it supports first - Do partitioning aware topological sort - Groups nodes and flips between processing supported and unsupported nodes to maximise inputs that will be available for each partition - Create groups of nodes for the partition using the new order of nodes - Create ComputeCapability for each group There’s also an additional ability to specify operators to stop at. The processing will find all downstream nodes from ‘stop at’ operators and exclude them. If NonMaxSuppression is specified we can prevent the post-processing from SSD Mobilenet and MobileDet attempting to use NNAPI (so easy way to have parity with the TF Lite behavior). I don’t think there’s an automated way to determine what if any ‘stop at’ operators are required for a model, so this will need to be a configuration parameter for the EP and we’ll need to document recommended values for popular models.	2021-06-11 15:23:21 +10:00
Suffian Khan	35ca3c99d1	Fix ROCm wheels pipeline after changes to manylinux scripts (#8026 ) * update * try fix rocm pipeline * avoid already isntalled error * ignore python3.10 since build fails * fix * try setting user * try again * try again * try again * fix script * disable inference docs generation * try print device id * fix name qual * try again * try again * try again * provider_options * add device verify * rty again * try again * try aggain * print video/render gid * try again * run as root * try again with uid, gid * cleanup * run as root * temp fix * add /bin/bash Co-authored-by: Changming Sun <chasun@microsoft.com>	2021-06-10 21:01:28 -07:00
Scott McKay	20579595c8	Make logic in InsertCastTransformer around forcing a node to fp32 more precise. (#8018 ) * Address #7981 Reworked the logic around forcing a node to run on fp32 even if it was supported on fp16. The github issue had multiple factors. In ORT 1.8 we remove Identity nodes that produce graph outputs as they're not needed. That resulted in a Loop node no longer having output nodes (it produces graph outputs instead), which meant the check in IsSingleInputNodeFloat16Node returned true as there was no longer a downstream Identity node processing fp16 data. We shouldn't only force a node to fp32 in very specific circumstances, and the changes hopefully check for those more precisely.	2021-06-11 13:54:40 +10:00
Nat Kershaw (MSFT)	0237225117	Add @file annotation to support doxygen generation of C API docs (#7458 )	2021-06-10 16:10:32 -07:00
baijumeswani	b2ed4fb0a4	Merge orttraining and ortmodule single gpu ci pipelines (#8022 ) * Merge orttraining and ortmodule single gpu ci pipelines * Remove Debug from orttrainer build config	2021-06-10 15:58:23 -07:00

1 2 3 4 5 ...

5089 commits