RFR: JDK-8133706: Kitchensink hanged
Bengt Rutisson
bengt.rutisson at oracle.com
Mon Sep 28 10:39:25 UTC 2015
Thanks, Per!
Bengt
On 2015-09-28 12:44, Per Liden wrote:
> Hi Bengt,
>
> On 2015-09-28 10:33, Bengt Rutisson wrote:
>>
>> Hi Jon,
>>
>> Thanks for looking at this!
>>
>> Sorry for not getting back to you earlier. I got busy with other issues
>> and forgot about this review.
>>
>> On 2015-09-21 20:08, Jon Masamitsu wrote:
>>> Bengt,
>>>
>>> It's correct that if the else is taken
>>>
>>> 178 } else {
>>> 179 // We don't want to update the marking status if a GC
>>> pause
>>> 180 // is already underway.
>>> 181 SuspendibleThreadSetJoiner sts_join;
>>> 182 g1h->collector_state()->set_mark_in_progress(false);
>>> 183 }
>>>
>>> then free_regions_coming() will always return false?
>>> If yes, then the change looks good.
>>
>> Yes, this is correct. The flag for free_regions_coming is only set to
>> true by the cleanup pause. And that is what is executed in the positive
>> branch of the if-statement that you are looking at. So, in the
>> else-branch free_regions_coming() will always return false. Not very
>> nice and kind of fragile. But that's why it works.
>>
>>>
>>> Also if you choose to move the block
>>> of code (lines 185-214)
>>>
>>> 185 // Check if cleanup set the free_regions_coming flag. If it
>>> 186 // hasn't, we can just skip the next step.
>>> 187 if (g1h->free_regions_coming()) {
>>> ...
>>>
>>> 214 }
>>>
>>> after
>>>
>>> 177 VMThread::execute(&op);
>>>
>>> I'll gladly code review the move of that code (but it is not
>>> necessary).
>>
>> I agree with that code refactoring, but I'd prefer not to do it as part
>> of this change.
>>
>> I discussed my proposed change with Per and we came to the conclusion
>> that it will be confusing to have the "start concurrent" phase logging
>> at the end of the stopped cleanup phase.
>>
>> It is instead better to leave it where it was before and just accept
>> that the logging can be interleaved with the young GC logging. This is
>> also a much smaller change. Just changing the join_sts parameter for the
>> call to cm_log() to false.
>>
>> Here's an updated webrev:
>> http://cr.openjdk.java.net/~brutisso/8133706/webrev.01/
>
> Looks good.
>
> /Per
>
>>
>> With the upcoming logging changes for adapting the GC logging to the
>> unified logging framework this will be a bit cleaner. Then the
>> concurrent logging will be logged on separate lines.
>>
>> Thanks,
>> Bengt
>>
>>
>>>
>>> Jon
>>>
>>> On 09/19/2015 04:19 AM, Bengt Rutisson wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> Could I have a couple of reviews for this change?
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/8133706/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8133706
>>>>
>>>> The bug report contains some good analysis from Eric Caspole, David
>>>> Holmes and Yumin Qi.
>>>>
>>>> Thanks for this valuable investigation! Basically the problem is that
>>>> the heap regions that can be reclaimed by the cleanup phase are not
>>>> made available until they have been cleaned up in a concurrent phase.
>>>> If a GC happens while we are doing the concurrent cleaning up of the
>>>> free regions, the GC will wait for the concurrent cleaning to finish,
>>>> either by calling new_region_try_secondary_free_list() or
>>>> wait_while_free_regions_coming(). But since the logging before we
>>>> start the concurrent cleaning is now joining the STS, that cleaning
>>>> is getting blocked by the GC safepoint. So, we have a deadlock.
>>>>
>>>> This was actually documented in the ConcurrentMarkThread::run()
>>>> method, but I missed that when I added the cm_log() calls.
>>>>
>>>> // Notify anyone who's waiting that there are no more free
>>>> // regions coming. We have to do this before we join the STS
>>>> // (in fact, we should not attempt to join the STS in the
>>>> // interval between finishing the cleanup pause and clearing
>>>> // the free_regions_coming flag) otherwise we might deadlock:
>>>> // a GC worker could be blocked waiting for the notification
>>>> // whereas this thread will be blocked for the pause to finish
>>>> // while it's trying to join the STS, which is conditional on
>>>> // the GC workers finishing.
>>>>
>>>>
>>>> The simplest fix that I could come up with was to move the logging
>>>> from the ConcurrentMarkThread::run() method in to the very end of
>>>> the stopped part of the cleanup phase. This ensures that we don’t mix
>>>> the log output with any logging that a GC does but id does not
>>>> require joining the STS since we are already at a safepoint.
>>>>
>>>> I left the timing (logged as part of the “GC concurrent-cleanup-end”
>>>> entry) unchanged. This means that there could be a slight mismatch
>>>> between the timestamps for “concurrent-cleanup-start” and
>>>> “concurrent-cleanup-end” and the time logged by
>>>> “concurrent-cleanup-end”. I hope the simplicity of the change
>>>> outweighs the disadvantage of this mismatch.
>>>>
>>>> Thanks,
>>>> Bengt
>>>
>>
More information about the hotspot-gc-dev
mailing list