[9] RFR (S): 8173151: Code heap corruption due to incorrect inclusion test
dean.long at oracle.com
dean.long at oracle.com
Mon Feb 6 23:14:08 UTC 2017
When do we allocate a CodeBlob with a code size of 0? Is it really
useful? Would having a minimum code size of 1 fix the problem?
dl
On 2/6/17 7:29 AM, Zoltán Majó wrote:
> Hi,
>
>
> please review the fix for 8173151.
>
> https://bugs.openjdk.java.net/browse/JDK-8173151
> http://cr.openjdk.java.net/~zmajo/8173151/webrev.00/
>
> The crash reported in the bug is caused by the corruption of the
> 'non-profiled nmethods' code heap. CodeHeap::_freelist for that code
> heap points to an address one segment before the heap's address space.
> The sweeper starts iterating through the code heap from the beginning
> of the heap's address space. Thus, the sweeper assumes that the first
> item in the code heap is a HeapBlock/FreeBlock (with the appropriate
> length and usage information). However, that is not the case, as the
> first item in the heap is actually *before* the heap. So the sweeper
> crashes.
>
> This is a hard-to-reproduce problem (I managed to reproduce it only
> once in 350 iterations, each iteration taking ~25 minutes). So the fix
> I propose is based on core file debugging and source code
> investigation. But I managed to write a regression test that triggers
> a problem similar to the original problem.
>
> I think that problem happens because a CodeBlob allocated in one code
> heap (A) is returned to a different code heap (B). When the CodeBlob
> is returned B, it is added to B's freelist. However, as the CodeBlob
> was allocated in A, the freelist of B now points into A (i.e., B is
> corrupted).
>
> The code cache uses CodeCache::get_code_heap(const CodeBlob* cb) to
> determine to which code heap a 'cb' is supposed to be returned to.
> Since 8171008 (AOT) [1], the check is:
>
> CodeHeap* CodeCache::get_code_heap(const CodeBlob* cb) {
> assert(cb != NULL, "CodeBlob is null");
> FOR_ALL_HEAPS(heap) {
> - if ((*heap)->contains(cb)) {
> + if ((*heap)->contains(cb->code_begin())) {
> return *heap;
> }
> }
>
> The blob 'cb' can be returned to the wrong heap if, for example:
> - 'cb' is at the end code heap A and
> - the size of the code contained in 'cb' is 0 (therefore code_begin()
> returns the address after 'cb', i.e., the first address in code heap B).
>
> The fix proposes to restore CodeCache::get_code_heap() to its pre-AOT
> state (as I'm not aware of the reasons why AOT changed that check). I
> also propose to add some guarantees after allocation/deallocation in
> the code heap to possibly easier catch this or related problems in the
> future.
>
> The regression test I propose achieves the above condition and results
> in a crash. The regression test works only with product builds,
> because in a product build a BufferBlob fits into one segment whereas
> in a fastdebug build it does not.
>
> The test needs to set the CodeCacheMinBlockLength flag to 1. The flag
> is currently develop and we would need to make it product for the test
> to work. (Other flags controlling the code cache, e.g.,
> CodeCacheExpansionSize, are also product.) I could also experiment
> with reproducing the problem with different block lengths/segment
> sizes, but that would most likely make the test more fragile (and
> CodeCacheSegmentSize is anyway develop as well).
>
> I tested the fix with JPRT, RBT is in progress.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/777aaa19c4b1#l116.71
>
More information about the hotspot-compiler-dev
mailing list