onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-08 00:23:03 +00:00

Author	SHA1	Message	Date
pengwa	49288de17c	Fix memory planning issues (#5752 ) * Fix memory planning issues * fix build * fix the wrong line...	2020-11-13 03:07:59 +08:00
Guoyu Wang	dc0f7b8f82	Remove onnxruntime_session_options_config_keys.h from c_api (#5772 ) * Remove seesion config keys header from c_api * remove copy session config header in release package * Keep the session option config header in the package	2020-11-12 09:12:13 -08:00
stevenlix	54de618c2e	Improve TensorRT engine caching (#5737 ) * add profile caching to improve engine caching feature * Add comments * fix typo * add decryption for engine caching * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * Update tensorrt_execution_provider.cc * update onnx-tensorrt submodule * set opt profile to max value of the range * add hash to engine/profile name * Add calibration based INT8 quantization * add an option to enable both FP16 and INT8 * Update tensorrt_execution_provider.cc * add env variable to specify calibration file name * clean up code * Add comments and update TRT document * enable tensorrt basic test and add EngineCachingTest * clean up * update envrionment variable in the test * clean up	2020-11-12 08:56:45 -08:00
Hariharan Seshadri	b92fc66ea1	Support opset-13 specs of controlflow ops (Loop, If) (#5665 )	2020-11-11 23:44:14 -08:00
Pranav Sharma	1ae58c960c	Allow turning off printing of shape when compiled with onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS. (#5768 ) * Allow turning off printing of shape when compiled with onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS.	2020-11-11 18:59:04 -08:00
ashbhandare	5aec34500d	Add megatron transforms for BART (#5521 ) * Large model export and run ORT Python support * Megatron change refine a bit workaround self attention issue use partitioned name for weights when megatron model parallel is enabled Fix Megatron Transformer Issue (cuased by the renaming) Add UTs for T5 model parallel Fix megatron seed issue fix log a bit checkkpointing changes + rebase Unintended reshape transform change t5 layer norm changes add t5 layer norm kernel use template for t5 layer norm template definition changes no build error add CPU cuda kernel first unit test other forward unit tests add T5LayerNormGrad Add c++ transform and test for T5 LN minor fix BART MLP Megatron tranform Add concat slice transform + test Cosmetic improvements in concat slice transform Constant folding bug fix + megatron attention transform for BART Undo unnecessary changes * Cleanup * Remove unnecessary changes * Cleanup megatron * Windows build * Add self attention test graph * Correcting transforms + cleanup * review comments * review comments * fix build and test failures * Fix CI * fix windows CI Co-authored-by: Peng Wang <pengwa@microsoft.com> Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-11 16:21:36 -08:00
Hariharan Seshadri	a14cd6267b	Support opset-13 specs of softmax family ops (Softmax, LogSoftmax, Hardmax) (#5707 )	2020-11-11 15:45:03 -08:00
Xavier Dupré	e5c8040c52	Improves performance of operator Transpose (#5550 ) * Improves implementation of transpose operator * Simplifies transposition when it is not really needed.	2020-11-12 00:25:25 +01:00
Maajid khan	a84a058f9e	[OpenVINO-EP] Enabling Multi Device support (#5740 ) * Enabling Multi Device support for UEP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor fix added *Added a simple fix to determine OpenVINO version for Arm build as well Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2020-11-11 15:16:30 -08:00
Guoyu Wang	4207e99be3	[NNAPI EP] Move GetCapability independent of ModelBuilder (#5767 ) * Move GetCapability independent of ModelBuilder * minor code style fix * Move ort_enforce for same number of op_builders and op_support_checkers * minor code fix	2020-11-11 13:33:38 -08:00
Xueyun Zhu	d8ace07ad7	Add CPU send/recv for pipeline (#5315 ) * cpu send/recv * clean up send/recv * remove unused code * assert and nccl option for mnist * add build option to enable build with only cpu. Without this, nccl is always enabled which will break build on machine that only contains cpu * Add USE_MPI distinct from USE_NCCL/USE_HOROVOD * fix * fix * exclude cpu send/recv for machines without mpi Co-authored-by: Tim Harris <tiharr@microsoft.com>	2020-11-11 12:41:39 -08:00
Ashwini Khade	496fa18c96	fix graph partitioning for nested functions (#5755 ) * fix graph partitioning for nested functions * enable broken test for SCE	2020-11-11 11:38:27 -08:00
Yufeng Li	2ba637c558	Implement Scale function for quant gemm (#5632 ) * Implement a Scale function for quantization Quantized GEMM is always followed by Scaling (PerTensor Or PerColumn), and often need to be accumulated to an existing matrix. This PR implements a post-processor for quantized GEMM result and accumulate it to another matrix.	2020-11-10 23:34:38 -08:00
Guoyu Wang	0767c4fdfb	Fix x86 build break (#5759 )	2020-11-10 20:33:27 -08:00
Guoyu Wang	042365029f	[NNAPI] Split OPBuilder IsOpSupported into a separated class (#5746 ) * init change * Split opbuilder into opbuilder and opsupportchecker * Update code comments * Address CR comments, some minor code updates	2020-11-10 15:00:38 -08:00
Scott McKay	6803e4ab44	Fix BatchNormalization registrations. (#5750 ) Add diatribe on how to correctly update registrations.	2020-11-11 07:32:26 +10:00
Tim Harris	48b14b52b8	Remove Env::Task wrapper around std::function (#5753 ) This is a small perf / clean-up change. It removes the Env::Task abstraction which wraps a single std::function field, and adds at least one virtual method call overhead when creating a Task and when executing it. The POSIX and Windows implementations are now identical.	2020-11-10 20:22:07 +00:00
edgchen1	4c6118eb49	Update get_applicable_matrix_reduction() to combine dimensions of 1 with the given reduction axes. (#5734 )	2020-11-10 10:32:50 -08:00
Hariharan Seshadri	63b85fc696	Fix VS 2017 build break (#5745 )	2020-11-10 10:25:43 -08:00
Xavier Dupré	d59f057db3	enable string for operator Shape (#5742 )	2020-11-10 18:38:36 +01:00
Xavier Dupré	8c74df2068	Add support for string with operator Expand (#5751 )	2020-11-10 18:38:20 +01:00
Tim Harris	5e44d25c5a	Support multi-loop parallel sections, use multi-loop sections in GRU (#5602 ) This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops. The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU. The main changes are: - Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool. - In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section. - Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops. - Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available. - Use of parallel sections in the GRU operator. - Additional test cases in threadpool_test.h. - Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.	2020-11-10 12:24:57 +00:00
edgchen1	2acdc3cd82	Move GetUseDeterministicCompute() to OpKernelContext to avoid need to downcast to OpKernelContextInternal. (#5729 )	2020-11-09 11:37:06 -08:00
ashbhandare	12d39ef4ed	Remove onnx backend test filters for updated ops (#5718 ) * remove unneeded filters * block openvino tests	2020-11-09 10:57:58 -08:00
Weixing Zhang	bb1af718b5	fix build failures due to recent change(`858040fa`) in CUDA EP (#5736 ) Some part of code for reduction kernels has been changed in `858040fa`, which cause failures in rocm build since ROCm EP shares some code with CUDA EP. This PR is to quick fix this failure by not sharing two files for now to unblock CI enabling on ROCm EP. Another PR for leveraging `858040fa` for ROCm EP will be done later.	2020-11-09 08:41:30 -08:00
Scott McKay	c0c9ab4d81	Fix kernel registrations for Equal, Greater and Less (#5730 )	2020-11-08 07:33:49 +10:00
Dmitri Smirnov	2bf5046d4e	Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs (#5716 ) Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs that contain uint16_t values for float16 and bfloat16. These will serve as type dispatching types for C++ API. They are of uint16_t size and arrays of these types can be used to create Tensors of the corresponding types. Make documentation Doxygen compliant.	2020-11-06 16:41:26 -08:00
Weixing Zhang	fff85a6a35	Add GPU kernels for ROCm EP (#5655 ) * Add kernels for AMD GPU. This PR is mostly about GPU kernels for ROCm EP. Due to similar GPU programming language (CUDA and HIP and similar math library calls, one principle in ROCM EP design is to share CUDA kernels as much as possible for ROCm. Thus, the script amd_hipify.py has been created for converting CUDA kernels to ROCm HIP kernels automatically during compilation phase. But, for some reasons such as perf issue, syntax difference..., some converted kernels need some manual intervention. These kernels will be checked in the repo physically for now. In order to avoid manual intervention, the plan is to refactor CUDA kernels to make them portable between CUDA EP and ROCm EP as much as possible. Please refer to "HIP Porting Guide" for details. * like lamb, multi-tensor-apply needs to be disabled for IsAllFiniteOp and ReduceAllL2, current AMD GPU compiler has perf issue for kernel parameter which is a structure with "pass by value". * Use hipMemsetAsync and add checks on HIP calls. * move the generated files to build folder. Co-authored-by: Jesse Benson <jesseb@microsoft.com>	2020-11-06 16:11:06 -08:00
Chi Lo	92292de135	Tensorrt perf tool (#5436 ) * Add YAML file for pipeline * Modify typo * Add working directory * Modify and test * Modfiy and test * Modify and test * Modify and test * Modify * Modify * Modify * Modify * Make sure to copy all the result files * Add clearn up * Modify * Modify agent pool name * Upload only specific artifacts * Modify * Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target * Fix bug * Fix bug * Add reading the information regarding previously known failing models and then skip testing them during benchmark/validation * Modify the script file for CI * Replace print with logger.info * Fix bug * Fix bug * Refine the code * Modify the script so that it can capture script segmentation fault while running ORT * Fix bug * fix bug * fix bug * Add debug info * fix bug * Refine perf code * Refine the code * fix bug * Code refactoring * change many-models path * remove metadata after validation/benchmark are done * Update README.md * Fix bug so that metadata doesn't hold stale value * Remove hardcode and update README * Add arguments to the script to make it run correctly * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines * Fix bug so that metadata doesn't hold stale value * Fix small bug of finding test dataset directory for FP16 test data, as well as modification of some output information * use -i random for perf test of TRT changes Co-authored-by: Olivia Jain <oljain@microsoft.com>	2020-11-06 12:27:42 -08:00
Ye Wang	95e6da7957	Revert saving optimized model as external data (#5690 ) * revert and add support for saving external data * review comments * update	2020-11-06 11:54:19 -08:00
Zhang Lei	77b1eea9cf	Add option to allow quantize_input() use input_qtype for initializers. (#5721 )	2020-11-06 09:33:24 -08:00
Zhang Lei	24016a517b	Prepacking in Gemm with merged logic for Matmul and Gemm on PackingB. (#5693 ) Prepacking in Gemm with merged logic for Matmul and Gemm on PackingB.	2020-11-05 22:35:24 -08:00
Maajid khan	d6f9cc181d	Modify logic to determine OV Version (#5701 ) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2020-11-05 15:12:02 -08:00
Pranav Sharma	28197b1460	Register opset13 flatten, LRN for cuda. (#5694 ) * Register opset13 flatten, LRN, ArgMax and ArgMin for cuda. * Fix build	2020-11-05 14:13:15 -08:00
Scott McKay	11fe683471	Partition full graph one execution provider at a time (#5635 ) * Partition full graph one EP at a time, bottom-up. Nuphar requires this and it makes life simpler for an EP as they can just check if all nodes in a subgraph are assigned to it when processing the control flow node containing the subgraph. Make a couple of nuphar error messages more meaningful.	2020-11-06 07:26:00 +10:00
edgchen1	858040faaa	Implement reduce_matrix_columns() to optimize ReduceSum (#5639 ) Implement reduce_matrix_columns() to optimize ReduceSum.	2020-11-05 10:25:00 -08:00
George Wu	c46515cd56	[TensorRT EP] Remove cudaDeviceSynchronize and use cudaAllocator for scratch buffers (#5714 ) * use cuda allocator, remove cudaDeviceSync call * use unique_ptr for scratch buffers	2020-11-05 09:45:27 -08:00
Dmitri Smirnov	fd9d0c4ee0	Remove redundant const_cast (#5705 ) Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-11-05 09:43:22 -08:00
Tim Harris	ff23083de2	Unbreak microbenchmark build (#5710 ) Minor updates to the microbenchmarks built optionally with "--build_micro_benchmarks". These are not built as part of CI, and builds started to fail. There are three changes: - I updated the threading-related benchmarks to use the static-method ThreadPool API, and to expose control over the thread pool configuration via constexpr int variables. - Disable GCC warnings seen with recent compiler versions when including parts of the Eigen headers in batchnorm.cc and eigen.cc files. - Flush std::cerr on error conditions to avoid buffered messages being lost. I tested manual builds with Linux (GCC) and Windows (MSVC).	2020-11-05 10:46:59 +00:00
Yufeng Li	5c4543e194	Calibrate float tensor only (#5704 )	2020-11-04 23:55:48 -08:00
Scott McKay	2127a229d7	The IndexedSubGraph is used to create the Function body, but after that is invalid as the nodes it referred to have been removed from the main Graph. As such there's no need to store it in the FunctionImpl instance. (#5669 )	2020-11-05 17:21:56 +10:00
Ryan Hill	941e3a69f9	Test a build break fix (#5706 )	2020-11-04 21:15:38 -08:00
ashbhandare	6d8e81cb08	Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5691 ) * Split change * ReduceSum and Split change * Other op changes, Grad builder, tests, registering required opset 13 ops * Rebase fixes * Fix tests, add some more * Review changes, rebase * Fix windows build * Disable new tests for TesnorRT EP * Disable unsupported for OpenVINO Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-04 20:00:27 -08:00
S. Manohar Karlapalem	e49f7a8b71	Fix build error due to unused variable (#5698 ) Fixes build error due to unused variable when building with OpenVINO 2020.2 and 2020.3.	2020-11-04 12:12:16 -08:00
Changming Sun	0445473dc1	Add ssd to x86_disabled_tests	2020-11-04 11:39:49 -08:00
Guoyu Wang	a2b551ff08	Add runtime options for NNAPI EP (#5576 ) * Add options for nnapi ep * Add nnapi flags test * add comments * Add flag comments * Make the flags bitset const * Fix build break * Add stub changes to java and c# api * Fix java related build break * Fix java build break * Switch to bit flags instead of bitset	2020-11-04 10:08:43 -08:00
Guoyu Wang	2ad7bcb766	NNAPI add opset version check (#5687 ) * nnapi add opset support	2020-11-04 21:48:00 +10:00
Changming Sun	67d7e3967d	Disable some model tests	2020-11-03 14:42:45 -08:00
Scott McKay	c9f44276da	Add ability to filter GraphViewer using IndexedSubGraph. (#5614 ) * Add ability to filter GraphViewer using IndexedSubGraph. This is to support compiling execution providers in a minimal build.	2020-11-04 07:08:18 +10:00
Hariharan Seshadri	db9c1308a5	Fix Resize kernel registration (#5677 )	2020-11-03 10:43:41 -08:00

1 2 3 4 5 ...

2405 commits