onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-02 03:55:34 +00:00

Author	SHA1	Message	Date
Changming Sun	4bfff45859	Downgrade Eigen (#8817 )	2021-08-23 18:06:23 -07:00
Chandru Ramakrishnan	2693af9799	Ported changes / bug fixes from torch/ort. (#8784 ) * Ported changes / bug fixes from torch/ort. * Fixed formatting * Renamed function * Renamed module_ to module. * Revert "Renamed module_ to module." This reverts commit b17fc114b3db20d174283811d90592b5b8154c19. * Include pybind common header to fix linker errors on windows debug. * Fix to generation of > 1 custom op. Co-authored-by: Ashwin Hari <ashari@microsoft.com>	2021-08-23 17:45:40 -04:00
Chandru Ramakrishnan	f51f2bad66	Fix for doxygen doc errors. (#8814 )	2021-08-23 15:52:15 -04:00
Tiago Koji Castro Shibata	62c0d24340	Fix Windows Store build (#8753 ) * Remove APIs unavailable in Store in #8349, #8178, #8065 * Add UWP stubs of C runtime functions * Remove UWP incompatible tests from UWP build * Remove incompatible tests from Store * Use UWP stubs in store only * Skip partition check outside of Windows * Remove unused WRL include * Workaround Windows header not including what it uses * Fix precompiled header name clash * Workaround SDK bugs * DXCore workaround in Win7 * Fix warning * Fix more warnings * Bump WinML to target Windows 8 * Fix more warnings * Remove unnecessary workarounds * Remove Desktop only APIs from DML adapter	2021-08-23 11:19:03 -07:00
Edward Chen	ea68955c71	Add more info to kernel registry manager hash lookup error message. (#8801 )	2021-08-23 11:09:30 -07:00
George Nash	d4a88cfe3f	Add Gemm op to DNNL Exectution provider (#8799 ) * Implement Gemm op for DNNL execution provider Signed-off-by: George Nash <george.nash@intel.com> * Remove KernelRegistry and Gemm op for dnnl ep The KernelRegistry for the dnnl execution provider only registered a Gemm op that as best we can tell was never actually used and also was not using the dnnl library. We have implemented a Gemm op in the DNNL execution provider subgraph code and thus are removing the unused Gemm op that was in the dnnl KernelRegistry. Signed-off-by: George Nash <george.nash@intel.com> * Fix duplicated output and kernelshape inference fix getcapability to make sure subgraph outputs do not have duplicates fix kernelshape inference in pool Signed-off-by: Wang <zhaoyang.wang@intel.com> * Removed most dnnl specialized ifdefs from gradient_ops_test code Re-enable GlobalAveragePoolGrad test for dnnl ep The bugs that were exposed by the GlobalAveragePoolGrad test have been fixed and this test no longer needs to be disabled for DNNL. Removed the ReluGradDnnl test. We are getting the testing from the already existing ReluGrad test. MaxPoolGrad test no longer has specialized execution provider enabling for DNNL execution provider. It will now run without the extra enabling. ConvGrad is the only test that still has dnnl specialized ifdefs However, the ConvGrad code was not being executed by the code unless it was listed first in the list of execution providers. Signed-off-by: George Nash <george.nash@intel.com> * Fix transpose issue on Gemm On transposing square matrices, getmemoryandreshape will fail to reshape fix by adding a bool Signed-off-by: Wang <zhaoyang.wang@intel.com> * Save memory space by reusing internal tensor for output The intermediat matmul output tensor can be used as the output tensor for the binary calculation. Remove the unused IsAttributeSupported from the DnnlGemmNodeCapability class since we now support all of the Gemm attributes in our implementation. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com>	2021-08-23 08:45:34 -07:00
Guoyu Wang	89656bb712	[CoreML/NNAPI EPs] Move direct use of initializer data to unpacked tensor data (#8780 )	2021-08-21 14:58:41 -07:00
KeDengMS	0c5a305742	Bump up Nuphar cache version (#8806 ) To avoid confusion with 2.3.0	2021-08-21 12:05:05 -07:00
Suffian Khan	9fa0d8392a	Extend node debugging utilities to push tensors and node placement to SQL database (#8672 ) * adding support for tracing to sqldb instead of files * use compiled statements * script to pull tensors from db * link sqlite3 * remove node info redundant with onnx graph * addressing PR comments * address PR comments and include program counter * third party notice * use find_pacakge * add to cgmanifests.json * address thread safety and add pid suffix * build fi * python script to select on devicetype * remove unpopulated and redundant Shape and Type fields * comment * comment * PR comments * add graph execution counter to session state * move increment to inference session * std::endl to \n * ifdef on graph execution counter * add ifdef to inference session * move DEBUG_NODE_INPUTS_OUTPUTS to CMakeLists.txt	2021-08-21 00:40:12 -07:00
Olivia Jain	4666a49106	Add Component Governance (#8794 ) * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines	2021-08-20 17:41:18 -07:00
XiyinOSS	19b82b438b	GridSample OP implementation for CPU and CUDA (#8551 ) * GridSample OP implementation for CPU and CUDA Description: This change contains implementation for torch grid_sample OP. Cuda implementation contains contribution from Muscle Wu. * Use interpolation for out-of-bound points in zero padding mode Out-of-bound points in zeros padding mode changed from constant 0 to interpolation of surrounding pixels. This aligns with Pytorch implementation. A bug in CUDA batch offset calculation is fixed. Custom op exporter type is added. * Fix nearest bug in CPU * Update per CI build finding and review comments * Force float to avoid potential integer T issue * Style update * PR update * Remove c++17 feature from cuda code	2021-08-20 12:37:38 -07:00
Thiago Crepaldi	6f2f4721ec	Update Python setuptools classfiers to remove windows and mac (#8776 )	2021-08-20 08:53:25 -07:00
Chen Fu	c117ac57b7	New S8S8 Neon kernel for arm64 only (#8783 ) Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-08-19 15:20:57 -07:00
Edward Chen	94c3e2048b	[convert_onnx_models_to_ort.py] Add option to specify NNAPI EP partitioning stop ops. (#8668 ) Add option to specify NNAPI EP partitioning stop ops from the ORT format model conversion script.	2021-08-19 13:02:28 -07:00
Sherlock	81889a1cf6	Invertible ReluGrad (#8773 ) * Invertible Relu Grad	2021-08-19 11:29:05 -07:00
liqun Fu	2beb873c6b	move training CI agent pools to 1ES hosted (#8775 )	2021-08-18 18:36:19 -07:00
pengwa	39059f2539	enable torch interop build (#8493 ) * fix build - python.h not found * disable --build_shared_lib for ortmodule tests * fix * fix the build flag * disable --build_shared_lib for training path (not only for ortmodule) * fix missing test model files * disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON * enable custom_op_library build * fix build * fix * merge master and fix build failure * build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON * resolve comments * use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"	2021-08-19 09:16:32 +08:00
Chi Lo	51152e1aaa	Integrate TensorRT EP libs into existing GPU Nuget Package (Approach#1) (#8727 ) * Merge CPU/GPU nuget pipeline * Include TensorRT EP libraries into existing GPU nuget package pipeline * modify to use correct YAML * Modify for test * modify for test * Add depedance * Add depedance (cont.) * modify for test * Add create TensorRT nuget package * modify for test * modify for test * Merge CPU/GPU nuget pipeline * Include TensorRT EP libraries into existing GPU nuget package pipeline * modify to use correct YAML * Modify for test * modify for test * Add depedance * Add depedance (cont.) * modify for test * Add create TensorRT nuget package * modify for test * fix merge bug * code refactor * code refactor * modify for test * modify for test * modify for test * modify for test * modify for test * modify for test * cleanup * modify for test * fix bug * modify for test * refactor * fix bug and test * Modify for test * Modify for test * Modify for test * Modify for test * Prepare for PR * Prepare for PR * code refacotr from review * Remove naming 'Microsoft.ML.OnnxRuntime.TensorRT' to avoid confusion * Add linux TensorRT libraries * Remove redundant variable in YMAL * revert file * undo revert file * Modify regular expression so that it can capture the correct file * Remove newline at end of file * small fix * Revert to CUDA11.1 on Windows * Add unit tests for nuget package on Linux Co-authored-by: Changming Sun <chasun@microsoft.com>	2021-08-18 17:26:34 -07:00
Dmitri Smirnov	fe5046f48e	Add SparseToDenseMatMul to the list of required by test ops (#8774 )	2021-08-18 14:08:07 -07:00
liqun Fu	fa9fcb5634	fixed the link (#8757 )	2021-08-18 11:45:42 -07:00
Aaron Bockover	b2813656f5	eager: fix build against latest PyTorch master (#8745 ) Improve README as well.	2021-08-18 14:27:21 -04:00
Yulong Wang	cb67fca738	[js/web] enable 'use_ort_model_bytes_directly' by default (#8734 )	2021-08-18 11:18:58 -07:00
Changming Sun	401de5911b	Remove CUDNN dir from setup_env_cuda.bat (#8762 )	2021-08-18 10:32:57 -07:00
KeDengMS	b0c707caa8	[Nuphar] Do not handle MatMulInteger with zero-points (#8760 ) MatMulInteger can take zero-points as input, and Nuphar does not handle that yet. Fall back to CPU EP in that case.	2021-08-18 10:32:42 -07:00
Chen Fu	00b345eb7b	ARM Neon S8S8 kernel for QGemm (#8695 ) Using signed int, qgemm kernel avoids extending uint8 to int16 while computing matrix multiplication, achieving higher performance. We also find that by using only lower 64b of vector registers to load A and B matrix, we can get further performance improvements. We also experimented with using ldp to load two 64b in one shot, vs using two ldr to load one 64b at a time, in both Big and little cores, there is no noticeable differences. Submitting the LDP version. At this point we don't need to choose kernel based on micro-architecture. Inference time of resnet50, thread count 2 Big Core on Pixel 3a Current master: 292.947 ms First iteration S8S8: 188.239 ms LDP load two 64b reg: 178.715 ms LDR load one 64b reg: 179.536 ms Little Core Master: 546.317 ms S8S8: 513.332 ms LDP: 489.19 ms LDR: 497.865 ms Raspberry Pi 3B+ Master: 660.08 ms S8S8: 608.577 ms LDP: 603.675 ms LDR 602.075 ms	2021-08-18 09:58:47 -07:00
Rachel Guo	78759059f1	[CoreML EP]Make coreml ep build on non-macOS platform (#8677 ) * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * clean * remove unused defs * correct typo * remove onnxruntime_coreml_proto * cr comments * enablie nnapi/coreml in minimal build * enable nnapi/coreml in one build * refine dependencies * fix nnapi build failure and remove onnxruntime_coreml_proto dependencies in unit tests cmake files * small fix * fix * fix build * revert * fix build Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2021-08-18 09:35:32 -07:00
pengwa	0983d61969	refine glue code and tests (#8510 )	2021-08-18 11:38:00 +08:00
Guoyu Wang	3406b7b528	Simplify UnpackInitializerData API (#8736 ) * Move UnpackInitializerData to use vector * minor update * minor update * Update getclipminmax * Change uint8_t -> std::byte * fix build break * Revert "fix build break" This reverts commit 1ffa284ac54fd605c0651954ea4fb2cab0464526. * Revert "Change uint8_t -> std::byte" This reverts commit 764a656ebac6610cdf1f25e63770330c3aedece6. * Add todo notes for extra vector alignment * add check result size	2021-08-17 18:11:46 -07:00
Olivia Jain	60089f7093	Cuda11.4 (#8709 ) * initial update from 11.1 to 11.4 * change 11.4.1 to 11.4.0 * adjusting to match nvidia/cuda image tags * adjusting to match nvidia/cuda image tags centos7 * correction to 11.4.0 * correction to 11.4.0 * update to cuda 11.4 * change training back to 11.1 * change training back to 11.1 * point to correct nvcr.io/nvidia/cuda 11.4.1 image * change centos8 to centos7 * correct cudnn path * Update linux-gpu-ci-pipeline.yml for Azure Pipelines * Update c-api-noopenmp-packaging-pipelines.yml * need to resolve centos images but remove space and change to 11.4 * Update linux-gpu-ci-pipeline.yml * add cudnn to docker image * bump devtoolset to 10 * revert cuda 11.4 change to setup_env_trt * orttraining back to 11.1 * use nvcr.io * Fix previous change back to cuda 11.1 * update cudnn path * use cudnn image (revert if failure)	2021-08-17 16:36:26 -07:00
ashbhandare	cc275e7529	Gradient Accumulation optimization verified for correctness (#8273 ) * Fetching frontier tensors to frontend * Move before session initialize call * Fetch tensor and add to cache * Rest of the changes for using cache * Review comments * Review changes * Review comments * switch to shared_ptr * Fix bug after rebase * FE docstring change	2021-08-17 16:24:44 -07:00
Chen Fu	224380448d	Expand Qgemm UDOT kernel to 8x8 block (#8562 ) Create a new M8 loop processing A[8x8] B[8x8] per iteration. Avoid saving registers on paths that are not needed. Adjusted M2 and M1 loop, using more registers to relax the loop carrying dependencies. Nearly 7% improvement observed on Surface Pro X 2 with model ssd_mobilenet_v2_300 About 4.5% improvement on resnet50 on Surface Pro X 2.	2021-08-17 14:36:46 -07:00
baijumeswani	871eeb4dbd	Support dicts as inputs to ORTModule (#8718 )	2021-08-17 13:40:55 -07:00
Thiago Crepaldi	ed254c283f	Add support for experimental json config for fallback (#8759 )	2021-08-17 13:35:42 -07:00
KeDengMS	6ecf626a9c	[Nuphar] Parse node doc_string for quantize info (#8746 )	2021-08-17 11:29:03 -07:00
Wei-Sheng Chin	47b3ecb53b	Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652 ) This PR disable UTs in training's package pipelines for building packages with PythonOp (torch.autograd.Function).	2021-08-17 10:55:13 -07:00
liqun Fu	2b1f0816f8	to build cpu training packages for multiple multiple python versions (#8750 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-08-17 10:49:44 -07:00
Thiago Crepaldi	419834d285	Add PyTorch fallback for ORTModule forward exceptions (#8346 )	2021-08-17 10:41:15 -07:00
stevenlix	11a618b2ec	Add engine encryption in TensorRT EP (#8732 ) * add engine encryption * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.h * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.h * clean up * update encryption signature	2021-08-17 08:34:22 -07:00
Yulong Wang	f668a79532	[js/web] fix perf mode in test (#8748 )	2021-08-16 23:18:42 -07:00
Yulong Wang	4ceedbe933	[js/web] add SharedArrayBuffer check for wasm multi-thread (#8749 )	2021-08-16 23:17:54 -07:00
Changming Sun	ae6fdd3333	Bring code coverage dashboard back (#8394 )	2021-08-16 20:54:39 -07:00
M. Zeeshan Siddiqui	0fb82f0f8a	Memory aware gradient builder. (#8582 )	2021-08-16 19:01:22 -07:00
Nat Kershaw (MSFT)	aa12d68c37	Update ORTModule API docstrings (#8309 )	2021-08-16 16:53:01 -07:00
Dmitri Smirnov	8713d76dd1	Introduce C and C++ APIs for Sparse Tensors (#8621 ) Add IsSparseTensor Add CreateSparseTensor Add utilities and test fully sparse instantiation Fully sparse blocksparse Add test and docs for fully sparse tensor instantiation Rework creation API Use API Non string API Retrofit of existing String API Add tests Add documentation Address build issues (Winml pending) Add inference test Bump binary size Add ifdef DISABLE CONTRIB	2021-08-16 16:33:47 -07:00
Changming Sun	8335d3dc0b	Fix Python Packaging Pipeline (Training Torch 1.9.0 Cuda 11.4) (#8738 )	2021-08-16 14:46:43 -07:00
Olivia Jain	9cefd1303b	Integrate Anubis (#8603 ) * copy over changes * Update build_image.sh * allow for configurable trt * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * reflect previous changes * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * model_list.json * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * checkout trt 7.1 * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update post.py * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update model_list.json * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update model_list.json * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update start_job.ps1 * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update run_mem_test_docker.sh * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Separate anubis files * revert to old pipeline * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * build off master Dockerfile * Delete Dockerfile.custom-trt-perf * Delete install_common_deps.sh * uncomment * Update linux-gpu-tensorrt-ci-perf-pipeline.yml * pass in trt container version * Update linux-gpu-tensorrt-ci-perf-pipeline.yml * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update post.py * remove sudo * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * add back build number * allow python 3.8 * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * python 3.8 fix trtexec * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * remove prev py38 * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * add perf dependencies * Update start_job.ps1	2021-08-16 13:20:28 -07:00
Nick Kreeger	93e1e1dfa1	Drop quant_util.h and move helper function into quantization.h (#8747 )	2021-08-16 15:08:25 -05:00
KeDengMS	d0ff2621ee	[Nuphar] Fix Windows build in VS 2019 (#8728 ) Update TVM to fix c++17 build break in VS 2019 Remove tvm::nnvm from build	2021-08-13 16:13:34 -07:00
Chen Fu	8f7422be69	Limiting platforms where cpuinfo is included (#8716 ) * Limiting platforms where cpuinfo is included * Suppress strncpy warning during msvc build Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-08-13 14:46:21 -07:00
George Nash	e695cd304a	Dnnl refactor (#8627 ) * dnnl ep rework rework DnnlTensor,DnnlNode,DnnlSubgraph to support arbitrary graph topology and tensor data types rework GetCapability to claim nodes in graph greedily from node topological ordering and delay creation of DnnlSubgraph until Compile rework compile to have DnnlSubgraphPrimitive as the object to handle primitive creation and execution instead of thread local primitive pool which duplicates intermediate memory allocated by the EP across threads DnnlSubgraphPrimitive provides helpers to handle many common functions for each dnnl primitive builder and become the centralized place to store input, output, intermediate memories, initializer memories and etc it provides functions to obtain input memories with automatic reordering/reshaping and moving between engines it provides interfaces to add primitive, set output memory for single node and etc add CONCURRENT_EXEC compile flag for dnnl library as without it, convolution primitive cannot be created and executed on different threads enable unit tests to run on dnnl ep as well if built with dnnl ep add dnnl ep support for Matmulinteger * Add Relu to the DNNL refactor Signed-off-by: George Nash <george.nash@intel.com> * Add Convolution op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add Pooling ops to the DNNL rework This adds the following ops: - AveragePool - GlobalAveragePool - GlobalMaxPool - MaxPool Note: Pooling with dilation is not yet supported. Note: GlobalLpPool, LpPool, MaxRoiPool, and MaxUnpool are not supported yet. Signed-off-by: George Nash <george.nash@intel.com> * Add Sum op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add ConvGrad op to the DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Add MaxPoolGrad and AveragePoolGrad ops to DNNL rework Signed-off-by: George Nash <george.nash@intel.com> * Added lrn operator to the refactored code Signed-off by chethan.palangoutu.keshava@intel.com * Added ReduceMean DNNL op to the refactor code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added Softmax DNNL op for the refactored code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added BatchNorm DNNL op inference-only for refactored code Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Added Binary Ops to DNNL rework Signed-off-by: Wang <zhaoyang.wang@intel.com> * Added ReluGrad to DNNL Rework Signed-off-by: Wang <zhaoyang.wang@intel.com> * Update OneDNN tag to v2.3 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Added support for memory upto dim size 12 this is to fix the CI test cases that contain binary ops of input dim size > 5 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Prevent claiming support for float16 and bfloat16 when only float is suppoted By using The string.find used was causing the code to claiming support for float16 and bfloat16 when we only supported float. We now explicitly check the code for the data type or the data type with a 7 letter prefix basically prefixed with "tensor(" Signed-off-by: George Nash <george.nash@intel.com> * Disable uint8 mul and div, improve type conversion Disable mul_uint8 and div_uint8 test cases as they use modulo for overflow handling while onednn uses saturation improve ype conversion using enum instead of string comparsion as well as adding more types Signed-off-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>	2021-08-13 14:15:43 -07:00

1 2 3 4 5 ...

5403 commits