RFR: 8062063: Usage of UseHugeTLBFS, UseLargePagesInMetaspace and huge SurvivorAlignmentInBytes cause crashes in CMBitMapClosure::do_bit

Wed Jan 7 22:50:16 UTC 2015

Thanks for looking at this Kim,

On 2015-01-07 21:02, Thomas Schatzl wrote:
> Hi,
>
> On Wed, 2015-01-07 at 13:12 -0500, Kim Barrett wrote:
>> On Jan 7, 2015, at 10:14 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
>>> Hi,
>>>
>>> Please review this fix for:
>>> https://bugs.openjdk.java.net/browse/JDK-8062063
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~sjohanss/8062063/hotspot.00
>>>
>>> Summary:
>>> When using large pages on Linux we never actually uncommit memory, we just mark
>>> it as currently not used. When later re-committing those pages we
>>> currently only mark them in use again. This works fine until someone
>>> expects to get cleared memory back when doing a commit, which for
>>> example is expected for the memory backing certain bitmaps. This fix,
>>> makes sure that we always clear large pages when they are
>>> re-committed.
>> Ouch!
>>
>> Generally looks good.  I have one question:
>>
>> src/share/vm/gc_implementation/g1/g1PageBasedVirtualSpace.cpp
>>   137     for (uintptr_t page_index = start; page_index < start + size_in_pages; page_index++) {
>>   138       if (_needs_clear_on_commit.at(page_index)) {
>>   139         Copy::zero_to_bytes((HeapWord*)page_start(page_index), _page_size);
>>   140         _needs_clear_on_commit.clear_bit(page_index);
>>   141       }
>>   142     }
>>
>> I'm not sure how large the size_in_pages argument for commit can be /
>> tends to be.  Nor do I know how often or in what circumstances the
>> commit operation gets called.  With those caveats, would it be worth
> Size_in_pages is dependent on the G1 heap sizing, which is rather
> aggressive at the moment, so size_in_pages tends to be high I think.
> Also, we only shrink after full gc, which means the next expansion can
> potentially be large.
>
> However this code path is (afaik) used with large pages on Linux only,
> and only occurs when you use different Xms and Xmx settings. So page
> size is always >= 2M (or whatever the current large page size is).
>
>> mking the scan for pages that need to be cleared and their clearing
>> chunkier by using BitMap::get_next_[zero,one]_offset to search for
>> ranges that need to be cleared?  It makes the code a little more
>> complicated than the present bit at a time iteration, but is probably
>> faster if there are long runs in the bitmap, which seems plausible,
>> but should probably be tested.  But it might not be worth doing if
>> performance isn't important here.
> It would be a nice addition, but I do not feel it is required for this
> particular bug fix. As you mention, it is slightly more complicated. The
> overhead seems small compared to the effort to zero-fill a large page.
I agree that this would probably improve the clearing but for this fix I 
would prefer to leave it as is. The goal is to get this fix backported 
into 8u40 and therefore I would prefer to avoid adding more complexity.

Thanks,
Stefan
> Either way is fine for me.
>
> Thanks,
>    Thomas
>
>