RFR: 8232588: G1 concurrent System.gc can return early or late

Thu Nov 7 21:02:37 UTC 2019

> On Nov 7, 2019, at 4:58 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> On 07.11.19 01:14, Kim Barrett wrote:
>>> On Nov 6, 2019, at 10:24 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> 
>>> On 31.10.19 21:53, Kim Barrett wrote:
> 
>>> - Not sure, but maybe this logging should be moved to trace level? In either case I would suggest to improve it to cover the try_collect call too, i.e. with appropriate "attempt"/"complete"/"discard" messages.
>> Agreed on trace vs debug level.  There are already “completion" messages.  Maybe
>> what you are asking for is more detail on those?
> 
> One case I am missing is a "discard" message in the case described in g1CollectedHeap.cpp:2266.

That's one of the parts that I'm minimally touching as out of scope
for this change.  New RFE:
https://bugs.openjdk.java.net/browse/JDK-8233821

>>> Otherwise the logging looks a bit incomplete to me, i.e. only attempts to initiate a concurrent collection cauuse messages. I understand that this might generate too many messages for gc+debug, that is the reason to potentially move to gc+trace.
>> I'm intentionally only minimally touching the other cases in
>> try_collect(). Or am I misunderstanding your comment? (There are
>> logging messages for completion, waiting, and retries, in addition to
>> the attempts to initiate messages.)  Or maybe this is more about
>> wanting additional detail in “completion” messages?
> 
> In this case I was looking at the log messages only (on gc+debug/trace level), sorry. I found the other messages at gc+alloc=trace in the code.
> 
> It would be nice to think about consolidating these messages a bit (in a separate CR) as they are somewhat different in style now (and use different log tags).

Can I leave it to you to file any needed RFEs for this?  I think you
have a clearer idea of what you want.

> While looking through the code, I found another minor issue I think: to schedule a standard evacuation pause (in line 2270+, wouldn't it be better to call do_collection_pause() instead of executing the VMOps directly?
> 
> The existing code there seems to miss the check whether the VMOps gc_prologue succeeded.

I don't understand the prologue_suceeded part of do_collection_pause.
If the prologue fails then the doit won't even be run so won't update
succeeded from it's initial false value.

The code at line 2270 would be very slightly simpler if using
do_collection_pause rather than directly using the VMOp. Though I
wonder if it might be better to have another VMOp that doesn't have a
word-size and doesn't conditionally do an allocation.

>>> - feel free to ignore: in such log messages, if I add thread id for debugging purposes I tend to put the thread id value first so that if you have a log in front of you, and ask your viewer to highlight a particular id, all highlights are in the same column.
>>> 
>>> I.e.
>>> 
>>> .... stuff from unified logging] <thread-id> Message
>>> 
>>> instead of having the thread-id somewhere located in the message like in
>>> this case where it is somewhere in the middle.
>>> 
>>> (I kind of also prefer the raw Thread::current() value instead of the pretty name, fwiw)
>> I don't care that much how the thread is identified; I used the name
>> as something human readable in a log file. The address of the Thread
>> doesn't seem that useful though. (Or are you saying the logging system
>> provides something for threads that I'm not aware of? Wouldn't be the
>> first time I've overlooked some feature of logging.) I'll move it to
>> the front of the log message though.
> 
> Nah, it's fine to use the name, just some (obvious) rambling thought.
> 
> And you're right with your suspicion, there is a decorattor for the thread id…

Yes, I found it. But it seems like log decorators are hard to use in a
local fashion. At least, I didn't figure out a way to do that.

>> But I wonder if there might be a pre-existing bug here. The call to
>> should_upgrade was introduced as part of JDK-8211425. This changed the
>> previous check to add policy()->force_upgrade_to_full(), which has a
>> default false that is overridden by G1HeterogeneousHeapPolicy to
>> return true for _manager->has_borrowed_regions(). From reading the RFR
>> thread discussion, the point is that if we were to proceed without the
>> full GC here we could apparently have a heap size that is greater than
>> MaxHeapSize.
>> But I don't see why we couldn't be in that situation and still be able
>> to successfully perform the requested allocation and return.
>> So I wonder if the should_upgrade should be called unconditionally,
>> and then attempt the requested allocation (if any).
> 
> I agree here. The should_upgrade_to_full() check should be first.

https://bugs.openjdk.java.net/browse/JDK-8233822

>> New webrevs:
>> full: https://cr.openjdk.java.net/~kbarrett/8232588/open.01/
>> incr: https://cr.openjdk.java.net/~kbarrett/8232588/open.01.inc/
> 
> Looks good (obviously missing the items discussed here).

Thanks.