RFR (XS) 8174734: Safepoint sync time did not increase

Wed Feb 7 09:56:04 UTC 2018

Hi Coleen,

I've just updated the bug report with a patch to test if you are able to 
(I don't have any access to a mac unfortunately :( ). It's possible the 
underlying problem on OS X is an intermediate overflow in calculating 
the elapsed time via (a*b)/c

Thanks,
David

On 7/02/2018 9:29 AM, coleen.phillimore at oracle.com wrote:
> 
> 
> On 2/6/18 4:06 PM, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 2/6/18 12:13 AM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 6/02/2018 7:37 AM, coleen.phillimore at oracle.com wrote:
>>>> Summary: allow safepoint time to be zero in the test
>>>>
>>>> See bug for more details.
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8174734.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8174734
>>>
>>> I guess I'm still surprised that 300 thread dumps can take less than 
>>> a millisecond! There's always more than one thread running. I did 
>>> some basic benchmarking and dumpAllStacks() from main takes at least 
>>> 150us on the Linux box I tested on. I just can't see 300 dumps taking 
>>> less than 1ms ... though I can see them taking < 10ms if we're 
>>> measuring time using a coarse clock - where do these times come from?
>>>
>>
>> I think the thread dumps only the actual JavaThread which is not 
>> "hidden_from_view".  There are lots of threads but they're all GC and 
>> compiler threads when I ran this test.
>>
>>> That aside this change seem unnecessary:
>>>
>>>       // Careful with these values.
>>> !     private static final long MIN_VALUE_FOR_PASS = 0;
>>>       private static final long MAX_VALUE_FOR_PASS = Long.MAX_VALUE;
>>
>> This was another one of the failures modes, so we need this change to 
>> make this test more reliable.
>>>
>>> this is for the minimum number of safepoints that need to be seen, 
>>> which I think should still be 1. By allowing 0 here (and for the 
>>> elapsed time), the test could actually fail to do anything related to 
>>> safepoints and still pass - and that seems wrong. Or the safepoint 
>>> stat code could be completely broken and we'd never notice. Basically 
>>> the test just wants to check that we get reasonable looking 
>>> statistics from the MBean
>>>
>>> Maybe we need to be measuring the time at a higher resolution than 
>>> milliseconds - though that would be a non-trivial RFE I expect. ?
>>>
>>
>> So, looking at and debugging the runtimeService.cpp code, it appears 
>> to be doing the thing that it's supposed to be doing.  I agree that 
>> it's not a particularly useful test when changing the times to zero, 
>> although I traced through and it does exercise the code, and logging 
>> makes it non-zero.
>>
>> What you're suggesting would be a lot more work.  I guess my work was 
>> to get the test off the ProblemList.txt but if you'd prefer doing more 
>> work, I'll reassign it and withdraw this RFR.  I thought getting it 
>> running without failure is more worth doing than writing a new test 
>> for this feature honestly.
> 
> Just rereading this.  It might be more useful to add the check that the 
> safepoint count is non-zero.
> 
> thanks,
> Coleen
>>
>> thanks,
>> Coleen
>>
>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Coleen
>>
>