RFR: 8225702(13): Safepoint counter can't be used for safepoint detection

Tue Jun 25 19:08:49 UTC 2019

On 6/25/19 2:49 PM, Robbin Ehn wrote:
> Hi David, (and Dan)
>
>>> But since this is JDK 13 bug I did the simplest thing.
>>
>> Yes this is best. To summarise what you and I discussed:
>> - older usage of safepoint_counter only does "trivial" actions 
>> between reading the counts
>> - the new usage allows more elaborate code between the reading of the 
>> counts and so hits the current problem
>> - the new global safepoint-id will fix that code and not disrupt the 
>> other safepoint-counter users.
>>
>> Okay I'm fine with changes and your explanations.
>
> Thanks!
>
> David and Dan, here is v2 update from Dan's comments.
> Full: http://cr.openjdk.java.net/~rehn/8225702/v2/full/
> Inc : http://cr.openjdk.java.net/~rehn/8225702/v2/inc/

src/hotspot/share/runtime/safepoint.hpp
     No comments.

src/hotspot/share/runtime/safepoint.cpp
     No comments.

Thumbs up!

Dan

>
> Passed t1-3 on jdk13.
>
> Thanks, Robbin
>
>>
>> Thanks,
>> David
>> -----
>>
>>>>
>>>>> Also it makes no sense for a normal user that JFR/JMC safepoint id 
>>>>> are only odd,
>>>>> we instead use this new id which is just incremented for each 
>>>>> safepoint.
>>>>>
>>>>> Code:
>>>>> http://cr.openjdk.java.net/~rehn/8225702/v1/webrev/index.html
>>>>
>>>> src/hotspot/share/runtime/safepoint.hpp
>>>>
>>>> Why did you make thread_not_running an instance method?
>>>
>>> Since it reads the _safepoint_counter now that I made it private it 
>>> needed access.
>>>
>>>>
>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>
>>>>   418   // Record state
>>>>   419   _state = _synchronized;
>>>>   420
>>>>   421   OrderAccess::fence();
>>>>   422
>>>>   423   // Set the new id
>>>>   424   ++_safepoint_id;
>>>>
>>>> Under the assumption that the threads/code that uses the new 
>>>> SafepointStateTracker do not participate in the safepoint protocol, 
>>>> is there a risk of seeing a stale _safepoint_id, or racing with the 
>>>> increment? It's not clear to me under what conditions the 
>>>> SafepointStateTracker::safepoint_state_changed() calls will be made.
>>>
>>> We write as:
>>> state = synch
>>> id++
>>> state = unsynch  // read state here
>>> state = synch
>>> id++             // read id here
>>> state = unsynch  // re-read state here
>>>                   // re-read id here
>>>
>>> This would give a false negative, so it would not work for a JT in
>>> native/blocked or a NonJavaThread. With doing a 'stable load' thus 
>>> reading id
>>> twice. So we should assert this.
>>>
>>> Encoding state and counter in the same primitive which can be 
>>> atomically
>>> updated, so we never see the same value in a long enough period as 
>>> you suggested
>>> fixes this. I believe doing the same for JavaThread state can also 
>>> simplify
>>> somethings.
>>>
>>> But I don't think we should do that as a bug fix for 13.
>>> And I was going to use the safepoint id in logging as gc does.
>>> So even if we don't use _safepoint_id to track safepoint state in 
>>> later JDK
>>> we have pretty numbers in logs and JFR.
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8225702
>>>>>
>>>>> Passes JFR test and t1-3.
>>>>>
>>>>> Thanks, Robbin