RFR (XS) 8174734: Safepoint sync time did not increase

David Holmes david.holmes at oracle.com
Wed Feb 7 23:13:50 UTC 2018


Hi Coleen,

Okay I will investigate this further on Linux and OSX as I think there 
is an underlying problem. I'm very dubious that 300 safepoint inducing 
getAllStackTraces() can occur in less than 1 ms. Theoretically possible 
yes, but seems unlikely on my measurements. Can you ping me direct with 
details of the machines you used to repro this please - thanks.

As far as the test goes I'm not sure where you left things, but 0ms 
elapsed is okay, while 0 safepoints occurred is not (it has to be >=300!).

Thanks,
David

On 7/02/2018 11:28 PM, coleen.phillimore at oracle.com wrote:
> 
> 
> On 2/7/18 4:56 AM, David Holmes wrote:
>> Hi Coleen,
>>
>> I've just updated the bug report with a patch to test if you are able 
>> to (I don't have any access to a mac unfortunately :( ). It's possible 
>> the underlying problem on OS X is an intermediate overflow in 
>> calculating the elapsed time via (a*b)/c
> 
> I doubt this is the problem since I can reproduce this problem on 
> Linux.   Maybe this is a different problem and you should file a bug for 
> it.
> 
> Coleen
>>
>> Thanks,
>> David
>>
>> On 7/02/2018 9:29 AM, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 2/6/18 4:06 PM, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 2/6/18 12:13 AM, David Holmes wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> On 6/02/2018 7:37 AM, coleen.phillimore at oracle.com wrote:
>>>>>> Summary: allow safepoint time to be zero in the test
>>>>>>
>>>>>> See bug for more details.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8174734.01/webrev
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8174734
>>>>>
>>>>> I guess I'm still surprised that 300 thread dumps can take less 
>>>>> than a millisecond! There's always more than one thread running. I 
>>>>> did some basic benchmarking and dumpAllStacks() from main takes at 
>>>>> least 150us on the Linux box I tested on. I just can't see 300 
>>>>> dumps taking less than 1ms ... though I can see them taking < 10ms 
>>>>> if we're measuring time using a coarse clock - where do these times 
>>>>> come from?
>>>>>
>>>>
>>>> I think the thread dumps only the actual JavaThread which is not 
>>>> "hidden_from_view".  There are lots of threads but they're all GC 
>>>> and compiler threads when I ran this test.
>>>>
>>>>> That aside this change seem unnecessary:
>>>>>
>>>>>       // Careful with these values.
>>>>> !     private static final long MIN_VALUE_FOR_PASS = 0;
>>>>>       private static final long MAX_VALUE_FOR_PASS = Long.MAX_VALUE;
>>>>
>>>> This was another one of the failures modes, so we need this change 
>>>> to make this test more reliable.
>>>>>
>>>>> this is for the minimum number of safepoints that need to be seen, 
>>>>> which I think should still be 1. By allowing 0 here (and for the 
>>>>> elapsed time), the test could actually fail to do anything related 
>>>>> to safepoints and still pass - and that seems wrong. Or the 
>>>>> safepoint stat code could be completely broken and we'd never 
>>>>> notice. Basically the test just wants to check that we get 
>>>>> reasonable looking statistics from the MBean
>>>>>
>>>>> Maybe we need to be measuring the time at a higher resolution than 
>>>>> milliseconds - though that would be a non-trivial RFE I expect. ?
>>>>>
>>>>
>>>> So, looking at and debugging the runtimeService.cpp code, it appears 
>>>> to be doing the thing that it's supposed to be doing. I agree that 
>>>> it's not a particularly useful test when changing the times to zero, 
>>>> although I traced through and it does exercise the code, and logging 
>>>> makes it non-zero.
>>>>
>>>> What you're suggesting would be a lot more work.  I guess my work 
>>>> was to get the test off the ProblemList.txt but if you'd prefer 
>>>> doing more work, I'll reassign it and withdraw this RFR.  I thought 
>>>> getting it running without failure is more worth doing than writing 
>>>> a new test for this feature honestly.
>>>
>>> Just rereading this.  It might be more useful to add the check that 
>>> the safepoint count is non-zero.
>>>
>>> thanks,
>>> Coleen
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>
>>>
> 


More information about the hotspot-runtime-dev mailing list