RFR (XS) 8174734: Safepoint sync time did not increase

Wed Feb 7 23:24:11 UTC 2018

Okay, I've reassigned this to you.
thanks,
Coleen

On 2/7/18 6:13 PM, David Holmes wrote:
> Hi Coleen,
>
> Okay I will investigate this further on Linux and OSX as I think there 
> is an underlying problem. I'm very dubious that 300 safepoint inducing 
> getAllStackTraces() can occur in less than 1 ms. Theoretically 
> possible yes, but seems unlikely on my measurements. Can you ping me 
> direct with details of the machines you used to repro this please - 
> thanks.
>
> As far as the test goes I'm not sure where you left things, but 0ms 
> elapsed is okay, while 0 safepoints occurred is not (it has to be 
> >=300!).
>
> Thanks,
> David
>
> On 7/02/2018 11:28 PM, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 2/7/18 4:56 AM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> I've just updated the bug report with a patch to test if you are 
>>> able to (I don't have any access to a mac unfortunately :( ). It's 
>>> possible the underlying problem on OS X is an intermediate overflow 
>>> in calculating the elapsed time via (a*b)/c
>>
>> I doubt this is the problem since I can reproduce this problem on 
>> Linux.   Maybe this is a different problem and you should file a bug 
>> for it.
>>
>> Coleen
>>>
>>> Thanks,
>>> David
>>>
>>> On 7/02/2018 9:29 AM, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 2/6/18 4:06 PM, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 2/6/18 12:13 AM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> On 6/02/2018 7:37 AM, coleen.phillimore at oracle.com wrote:
>>>>>>> Summary: allow safepoint time to be zero in the test
>>>>>>>
>>>>>>> See bug for more details.
>>>>>>>
>>>>>>> open webrev at 
>>>>>>> http://cr.openjdk.java.net/~coleenp/8174734.01/webrev
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8174734
>>>>>>
>>>>>> I guess I'm still surprised that 300 thread dumps can take less 
>>>>>> than a millisecond! There's always more than one thread running. 
>>>>>> I did some basic benchmarking and dumpAllStacks() from main takes 
>>>>>> at least 150us on the Linux box I tested on. I just can't see 300 
>>>>>> dumps taking less than 1ms ... though I can see them taking < 
>>>>>> 10ms if we're measuring time using a coarse clock - where do 
>>>>>> these times come from?
>>>>>>
>>>>>
>>>>> I think the thread dumps only the actual JavaThread which is not 
>>>>> "hidden_from_view".  There are lots of threads but they're all GC 
>>>>> and compiler threads when I ran this test.
>>>>>
>>>>>> That aside this change seem unnecessary:
>>>>>>
>>>>>>       // Careful with these values.
>>>>>> !     private static final long MIN_VALUE_FOR_PASS = 0;
>>>>>>       private static final long MAX_VALUE_FOR_PASS = Long.MAX_VALUE;
>>>>>
>>>>> This was another one of the failures modes, so we need this change 
>>>>> to make this test more reliable.
>>>>>>
>>>>>> this is for the minimum number of safepoints that need to be 
>>>>>> seen, which I think should still be 1. By allowing 0 here (and 
>>>>>> for the elapsed time), the test could actually fail to do 
>>>>>> anything related to safepoints and still pass - and that seems 
>>>>>> wrong. Or the safepoint stat code could be completely broken and 
>>>>>> we'd never notice. Basically the test just wants to check that we 
>>>>>> get reasonable looking statistics from the MBean
>>>>>>
>>>>>> Maybe we need to be measuring the time at a higher resolution 
>>>>>> than milliseconds - though that would be a non-trivial RFE I 
>>>>>> expect. ?
>>>>>>
>>>>>
>>>>> So, looking at and debugging the runtimeService.cpp code, it 
>>>>> appears to be doing the thing that it's supposed to be doing. I 
>>>>> agree that it's not a particularly useful test when changing the 
>>>>> times to zero, although I traced through and it does exercise the 
>>>>> code, and logging makes it non-zero.
>>>>>
>>>>> What you're suggesting would be a lot more work.  I guess my work 
>>>>> was to get the test off the ProblemList.txt but if you'd prefer 
>>>>> doing more work, I'll reassign it and withdraw this RFR.  I 
>>>>> thought getting it running without failure is more worth doing 
>>>>> than writing a new test for this feature honestly.
>>>>
>>>> Just rereading this.  It might be more useful to add the check that 
>>>> the safepoint count is non-zero.
>>>>
>>>> thanks,
>>>> Coleen
>>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>
>>>>
>>