RFR: 8062063: Usage of UseHugeTLBFS, UseLargePagesInMetaspace and huge SurvivorAlignmentInBytes cause crashes in CMBitMapClosure::do_bit

Wed Jan 7 20:02:05 UTC 2015

Hi,

On Wed, 2015-01-07 at 13:12 -0500, Kim Barrett wrote:
> On Jan 7, 2015, at 10:14 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
> > 
> > Hi,
> > 
> > Please review this fix for:
> > https://bugs.openjdk.java.net/browse/JDK-8062063
> > 
> > Webrev:
> > http://cr.openjdk.java.net/~sjohanss/8062063/hotspot.00
> > 
> > Summary:
> > When using large pages on Linux we never actually uncommit memory, we just mark
>> it as currently not used. When later re-committing those pages we
>> currently only mark them in use again. This works fine until someone
>> expects to get cleared memory back when doing a commit, which for
>> example is expected for the memory backing certain bitmaps. This fix,
>> makes sure that we always clear large pages when they are
>> re-committed.
> 
> Ouch!
> 
> Generally looks good.  I have one question:
> 
> src/share/vm/gc_implementation/g1/g1PageBasedVirtualSpace.cpp
>  137     for (uintptr_t page_index = start; page_index < start + size_in_pages; page_index++) {
>  138       if (_needs_clear_on_commit.at(page_index)) {
>  139         Copy::zero_to_bytes((HeapWord*)page_start(page_index), _page_size);
>  140         _needs_clear_on_commit.clear_bit(page_index);
>  141       }
>  142     }
> 
> I'm not sure how large the size_in_pages argument for commit can be /
> tends to be.  Nor do I know how often or in what circumstances the
> commit operation gets called.  With those caveats, would it be worth

Size_in_pages is dependent on the G1 heap sizing, which is rather
aggressive at the moment, so size_in_pages tends to be high I think.
Also, we only shrink after full gc, which means the next expansion can
potentially be large.

However this code path is (afaik) used with large pages on Linux only,
and only occurs when you use different Xms and Xmx settings. So page
size is always >= 2M (or whatever the current large page size is).

> mking the scan for pages that need to be cleared and their clearing
> chunkier by using BitMap::get_next_[zero,one]_offset to search for
> ranges that need to be cleared?  It makes the code a little more
> complicated than the present bit at a time iteration, but is probably
> faster if there are long runs in the bitmap, which seems plausible,
> but should probably be tested.  But it might not be worth doing if
> performance isn't important here.

It would be a nice addition, but I do not feel it is required for this
particular bug fix. As you mention, it is slightly more complicated. The
overhead seems small compared to the effort to zero-fill a large page.

Either way is fine for me.

Thanks,
  Thomas