onnxruntime/onnxruntime/test/framework
Scott McKay 9790e19424
Handle mem pattern allocation failure better. Make BFCArena behavior more consistent (#4062)
* Fixes from investigating issue running BERT-Squad model with larger batch sizes. When the batch size gets large enough the initial run will be successful (no memory pattern in use) but the second will fail to allocate the memory pattern block.

The cause of this failure is that we still have the smaller blocks from the first run allocated, as BFCArena has no logic to free those. This essentially results in 2x the memory being required to run the model.

There was inconsistency in BFCArena::Extend which on one path threw an exception if it couldn't do the allocation, and on another just returned false (resulting in Alloc returning a nullptr). Make the behavior consistent by always throwing if BFCArena fails to find a buffer to return. There are a huge number of places in the code where we assume Alloc returns a valid pointer so throwing will result in more correct behavior as a whole. It's also consistent with what happens when CUDA or the standard library fails to allocate memory.

Next, update ExecutionFrame to check for this failure and not insert a memory block entry if it happens. With the existing code if BFCArena Alloc returned a nullptr we happily inserted that in the blocks, delaying detection of the failure to when we attempted to use the block in AllocateMLValueTensorSelfOwnBufferHelper.

Finally update AllocateMLValueTensorSelfOwnBufferHelper to expect a location may not have a block. A log message will be provided when the block allocation fails so it's not necessary to have more on each individual allocation that would have used the block. Falls through to default behavior of doing a normal allocation.
2020-06-05 18:54:01 +10:00
..
cuda Merge remote-tracking branch 'origin/ort_training' into edgchen1/merge_from_master 2020-04-22 16:56:15 +00:00
allocation_planner_test.cc Parallel all the activations ops (#3722) 2020-05-05 01:18:17 -07:00
allocator_test.cc Clean up OPTIONAL name conflict workarounds in ort_training. (#3622) 2020-04-22 09:07:55 -07:00
bfc_arena_test.cc Handle mem pattern allocation failure better. Make BFCArena behavior more consistent (#4062) 2020-06-05 18:54:01 +10:00
data_types_test.cc
distance_test.cc Fixes GTest deprecation warnings 2020-03-17 16:38:55 -07:00
dummy_allocator.cc
dummy_allocator.h
dummy_provider.cc
dummy_provider.h
endian_test.cc
execution_frame_test.cc Fix bug in handling of an initializer that provides a graph output. (#3912) 2020-05-12 20:42:58 +10:00
float_16_test.cc Add support for sessions to share a global threadpool. (#3177) 2020-03-18 15:42:46 -07:00
inference_session_test.cc Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master 2020-04-21 03:31:32 +00:00
insert_cast_transformer_test.cc Clean up OPTIONAL name conflict workarounds in ort_training. (#3622) 2020-04-22 09:07:55 -07:00
kernel_registry_test.cc
local_kernel_registry_test.cc Add support for sessions to share a global threadpool. (#3177) 2020-03-18 15:42:46 -07:00
math_test.cc Threadpool related changes. (#3564) 2020-04-21 09:57:39 -07:00
mem_pattern_planner_test.cc
memcpy_transformer_test.cc Fix static analysis warnings found by VC++ (#3530) 2020-04-16 01:46:47 -07:00
model_builder_utils.h
opaque_kernels_test.cc Add support for sessions to share a global threadpool. (#3177) 2020-03-18 15:42:46 -07:00
parallel_executor_test.cc Thread pool changes (#3153) 2020-03-30 12:18:40 -07:00
random_test.cc Add Python API to set random seed: onnxruntime.seed(<seed>) 2020-04-15 09:44:48 -07:00
session_state_test.cc Threadpool related changes. (#3564) 2020-04-21 09:57:39 -07:00
shape_inference_test.cc
sparse_kernels_test.cc Constant-12 support (#3304) 2020-03-30 23:13:52 -07:00
tensor_test.cc Add SafeInt bounds checking to memory allocation size calculations. (#3022) 2020-02-20 11:41:03 -08:00
tensorutils_test.cc Constant-12 support (#3304) 2020-03-30 23:13:52 -07:00
test_main.cc CMake changes (#2961) 2020-02-03 19:33:14 -08:00
test_tensor_loader.cc CMake changes (#2961) 2020-02-03 19:33:14 -08:00
test_utils.cc Initial PR for RKNPU execution provider (#3609) 2020-05-05 20:36:47 -07:00
test_utils.h Initial PR for RKNPU execution provider (#3609) 2020-05-05 20:36:47 -07:00
TestAllocatorManager.cc CMake changes (#2961) 2020-02-03 19:33:14 -08:00
TestAllocatorManager.h