[9] RFR (S): 8173151: Code heap corruption due to incorrect inclusion test
Rickard Bäckman
rickard.backman at oracle.com
Fri Feb 10 10:34:42 UTC 2017
Sorry for being a bit late here...
The reason for the change in the contains() check from cb to
cb->code_begin() is that with the changes for AOT we no longer require
the metadata (AOTCompiledMethod *) and code to be in one blob.
AOT does things like:
code-heap-start
code-method-1
code-method-2
code-method-3
code-method-4
code-heap-end
and when we link those methods we create AOTCompiledMethods which just
have pointers to code-method-1.
The AOTCodeHeap then knows the boundaries of the code heap.
Checking if the heap contains the metadata (AOTCompiledMethod *) will
fail. While checking if it contains the code will succeed.
/R
On 02/06, Zoltán Majó wrote:
> Hi,
>
>
> please review the fix for 8173151.
>
> https://bugs.openjdk.java.net/browse/JDK-8173151
> http://cr.openjdk.java.net/~zmajo/8173151/webrev.00/
>
> The crash reported in the bug is caused by the corruption of the
> 'non-profiled nmethods' code heap. CodeHeap::_freelist for that code
> heap points to an address one segment before the heap's address
> space. The sweeper starts iterating through the code heap from the
> beginning of the heap's address space. Thus, the sweeper assumes
> that the first item in the code heap is a HeapBlock/FreeBlock (with
> the appropriate length and usage information). However, that is not
> the case, as the first item in the heap is actually *before* the
> heap. So the sweeper crashes.
>
> This is a hard-to-reproduce problem (I managed to reproduce it only
> once in 350 iterations, each iteration taking ~25 minutes). So the
> fix I propose is based on core file debugging and source code
> investigation. But I managed to write a regression test that
> triggers a problem similar to the original problem.
>
> I think that problem happens because a CodeBlob allocated in one
> code heap (A) is returned to a different code heap (B). When the
> CodeBlob is returned B, it is added to B's freelist. However, as the
> CodeBlob was allocated in A, the freelist of B now points into A
> (i.e., B is corrupted).
>
> The code cache uses CodeCache::get_code_heap(const CodeBlob* cb) to
> determine to which code heap a 'cb' is supposed to be returned to.
> Since 8171008 (AOT) [1], the check is:
>
> CodeHeap* CodeCache::get_code_heap(const CodeBlob* cb) {
> assert(cb != NULL, "CodeBlob is null");
> FOR_ALL_HEAPS(heap) {
> - if ((*heap)->contains(cb)) {
> + if ((*heap)->contains(cb->code_begin())) {
> return *heap;
> }
> }
>
> The blob 'cb' can be returned to the wrong heap if, for example:
> - 'cb' is at the end code heap A and
> - the size of the code contained in 'cb' is 0 (therefore
> code_begin() returns the address after 'cb', i.e., the first address
> in code heap B).
>
> The fix proposes to restore CodeCache::get_code_heap() to its
> pre-AOT state (as I'm not aware of the reasons why AOT changed that
> check). I also propose to add some guarantees after
> allocation/deallocation in the code heap to possibly easier catch
> this or related problems in the future.
>
> The regression test I propose achieves the above condition and
> results in a crash. The regression test works only with product
> builds, because in a product build a BufferBlob fits into one
> segment whereas in a fastdebug build it does not.
>
> The test needs to set the CodeCacheMinBlockLength flag to 1. The
> flag is currently develop and we would need to make it product for
> the test to work. (Other flags controlling the code cache, e.g.,
> CodeCacheExpansionSize, are also product.) I could also experiment
> with reproducing the problem with different block lengths/segment
> sizes, but that would most likely make the test more fragile (and
> CodeCacheSegmentSize is anyway develop as well).
>
> I tested the fix with JPRT, RBT is in progress.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/777aaa19c4b1#l116.71
>
More information about the hotspot-compiler-dev
mailing list