RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps.

Stefan Johansson stefan.johansson at oracle.com
Fri Oct 18 14:31:48 UTC 2019


Hi Kishor,

> 17 okt. 2019 kl. 23:28 skrev Kharbas, Kishor <kishor.kharbas at intel.com>:
> 
> Hi Stefan,
> 
>> -----Original Message-----
>> From: Stefan Johansson [mailto:stefan.johansson at oracle.com]
>> Sent: Thursday, October 17, 2019 4:34 AM
>> To: Kharbas, Kishor <kishor.kharbas at intel.com>; sangheon.kim at oracle.com
>> Cc: hotspot-gc-dev at openjdk.java.net
>> Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1
>> concurrent marking bitmaps.
>> 
>> Hi Kishor,
>> 
>> On 2019-10-17 03:39, Kharbas, Kishor wrote:
>>> Hi Sangheon,
>>> 
>>> *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com]
>>> *Sent:* Wednesday, October 16, 2019 11:03 AM
>>> *To:* Kharbas, Kishor <kishor.kharbas at intel.com>
>>> *Cc:* hotspot-gc-dev at openjdk.java.net; Stefan Johansson
>>> <stefan.johansson at oracle.com>
>>> *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning G1
>>> concurrent marking bitmaps.
>>> 
>>>> Hi Kishor,
>>>> 
>>>> Before reviewing webrev.02, could you remind us what was the
>>>> motivation of pinning the bitmap mappers here?
>>>> In addition to explanations of the problematic situation, any logs /
>>>> stack-trace also may help.
>>>> 
>>>> We think that understanding of the root cause should be considered first.
>>> 
>>> Unfortunately, I do not have log/stack-trace of the problem I had faced.
>>> 
>>> I am trying to reproduce it by running SPECjbb workload over and over
>> again.
>>> 
>>> I haven't looked at GC code since end of last year. So I am having a
>>> difficult time pinning what the problem was.
>>> 
>>> I am looking at G1ClearBitMapTask which iterates over bitmap for all
>>> available regions. I am not sure when this task is performed.
>>> 
>>> There is comment in HeapRegionManager::par_iterate() as shown below,
>>> 
>>> /// This also (potentially) iterates over regions newly allocated
>>> during GC. This/
>>> 
>>> /  // is no problem except for some extra work./
>>> 
>>> This method is eventually called from G1ClearBitMapTask. The comment
>>> suggests that regions are allocated concurrently when the function is
>>> run. This also means with AllocateOldGenAt flag enabled, regions can
>>> also be un-committed.
>> 
>> I don't understand how AllocateOldGenAt would make any difference,
>> regions can be un-committed without it as well and there are mechanisms in
>> place to make sure only the correct parts of the side structures are un-
>> committed when that happens.
> 
> In the regular code un-commit is only done by VM thread during safepoint. Un-commit of region also causes its corresponding bitmap to be un-committed.
> But it never happens that CM threads are iterating over bitmap while regions are being un-committed concurrently.
> 
> Whereas when AllocateOldGenAt is used, because of the way regions are managed between
> dram and nvdimms, regions can be un-committed by mutator threads and GC threads.
> 1. Mutator threads - during mutator region allocation and humongous region allocation.

This is the problem, I managed to reproduce this by adding a short sleep in the clearing code and force back to back concurrent cycles in SPECjvm2008 and a 2g heap. I think this is only a problem for humongous allocations, because we should never allocate more young regions than we have already made available at the end of the previous GC. But the humongous allocations can very well happen during we clear the bitmaps in the concurrent cycle so that is probably why the pinning was added. 

Thinking more about this, a different solution would be to not un-commit memory in this situation. This all depends on how one sees the amount of committed memory when using AllocateOldGenAt, should the amount of committed on dram + nvdimm never be more than Xmx or is the important thing that the number of regions use never exceeds Xmx. I think I’m leaning towards the latter, but there might be reasons I haven’t thought about here. This would break the current invariant:
assert(total_committed_before == total_regions_committed(), "invariant not met”);

But that might be ok. If using that approach, instead of un-committing (shrink_dram), just remove the same number of regions from the freelist, that you expand on nvdimm. The unused removed regions need to be kept track of so we can add them again during the GC. To me this is more or less the same concept we use when borrowing regions during the GC. There might be issues with this approach but I think it would be interesting to explore. 

I also wonder if we ever should need to expand_dram during allocate_new_region, I see that it happens now during GC and that is probably because we do this at the end of the GC:
_manager->adjust_dram_regions((uint)young_list_target_length() …

If this adjustment included the expected number of survivors as well, we should have enough DRAM regions and if we then end up getting an NVDIMM region when asking for a survivor we should return NULL signaling that survivor is full.

What do you think about that approach?

Thanks,
Stefan

> 2. GC worker threads - during survivor region and old region allocation.
> 3. VMThread - heap size adjustment as in default and after full GC to allocate enough regions in dram for young gen (may require to un-commit some regions from nvdimm).
> 
> Could any of these be running concurrently when CM threads are iterating over the bitmap?
> 
>> 
>> I want to reiterate what Sangheon said about identifying the root cause.
>> If we don't know why this is needed and can't reproduce any failures without
>> the special pinning of the bitmaps, I would rather see that we remove the
>> pinning code to make things work more like normal G1.
> 
> I am trying to reproduce but as you can imagine it is very rare and hard-to-reproduce bug, if it is.
> 
> Thanks,
> Kishor
>> 
>> Thanks,
>> Stefan
>> 
>> 
>>> 
>>> Pardon me if my understanding is incorrect.
>>> 
>>> Regards,
>>> 
>>> Kishor




More information about the hotspot-gc-dev mailing list