RFR (S): 8076995: gc/ergonomics/TestDynamicNumberOfGCThreads.java failed with java.lang.RuntimeException: 'new_active_workers' missing from stdout/stderr

Fri Apr 24 07:57:33 UTC 2015

Hi Jon and Derek,

On 2015-04-23 19:13, Jon Masamitsu wrote:
>
>
> On 04/23/2015 12:46 AM, Bengt Rutisson wrote:
>> On 22/04/15 17:45, Jon Masamitsu wrote:
>>>
>>>
>>> On 4/21/2015 2:57 PM, bill pittore wrote:
>>>>
>>>>
>>>> On 4/21/2015 4:56 PM, Derek White wrote:
>>>>> Thanks  Jon!
>>>>>
>>>>> On 4/21/15 1:23 PM, Jon Masamitsu wrote:
>>>>>> Derek,
>>>>>>
>>>>>> Thanks for fixing this.
>>>>>>
>>>>>> Fix looks good.
>>>>>>
>>>>>> What do you think about always making testDynamicNumberOfGCThread()
>>>>>> check for the uniprocessor case (as opposed to passing in a flag 
>>>>>> to explicitly
>>>>>> check it)?
>>>>> This may not catch all of the failures. What I couldn't pin down 
>>>>> was why some 2, 3(!), or 4 core ARM machines would result in 
>>>>> defaulting ParallelGCThreads=1. Now these were embedded machines, 
>>>>> with potentially "odd" versions of linux, possibly with "odd" 
>>>>> errata. Or perhaps there was some dynamic differences between 
>>>>> "installed" and "on-line" cores.
>>>> There is definitely a difference between the processor count and 
>>>> the online processor count.  It seems that the calculation of 
>>>> ParallelGCThreads uses the online count which could easily be 1 on 
>>>> some embedded platform since the kernel does do active power 
>>>> management by shutting off cores.  The comment in os.hpp for 
>>>> active_processor_count() says "Returns the number of CPUs this 
>>>> process is currently allowed to run on".  On linux at least I don't 
>>>> think that's correct. Cores could be powered down just because the 
>>>> kernel is in some low power state and not because of some affinity 
>>>> property for this particular Java process. I'd change the 
>>>> calculation to call processor_count() instead of 
>>>> active_processor_count().
>>>
>>> An early implementation used processor_count() and there was some 
>>> issue with virtualization.
>>> I forget what the virtualization was but it was something like 
>>> Solaris containers or zones.  Let me
>>> call them containers.  A container on an 8 processor machine might 
>>> only get 1 processor but
>>> processor_count() would return 8.   It may also have been on a 
>>> system where there were 8
>>> processors but 7 were disabled.  Only 1 processor was available to 
>>> execute the JVM but
>>> processor_count() returned 8.  Anyway, if anyone thinks it should be 
>>> processor_count()
>>> instead of active_processor_count(), check those types of situations.
>>
>> Jon,
>>
>> In the hg repo it has always been active_processor_count(). I was not 
>> able to figure out exactly when it was changed from 
>> processor_count(), but back in 2003 when JDK-4804915 was pushed it 
>> was already active_processor_count(). So, maybe it is worth 
>> re-evaluating processor_count() again. I don't pretend that I know 
>> what the correct answer here is, it just feels like a lot has 
>> happened in the virtualization area over the past 10+ years so maybe 
>> we should reconsider how we calculate the number of worker threads. 
>> Especially if it causes problems on embedded.
>
> No argument there.  I just wanted to point out situations where it
> might matter.

I didn't mean to start an argument. Sorry if it was interpreted like 
that. I just don't want us to be afraid of investigating a change like 
this. It is great to know the historical reason for a particular choice, 
so thanks for providing this, Jon!

>
>>
>> Also, I find the comment for active_processor_count() a bit worrying.
>>
>>   // Returns the number of CPUs this process is currently allowed to 
>> run on.
>>   // Note that on some OSes this can change dynamically.
>>   static int active_processor_count();
>>
>> We read it only once and set the static value for ParallelGCThreads 
>> based on this. But apparently it can change over time so why do we 
>> think that we get a good value to start with?
>
> At the time the number of parallel GC threads could not change so
> we were stuck with the value at the start.  Even today increasing
> beyond the original maximum GC threads would take some work
> (arrays sized for the maximum number of GC threads, for example).
> There's plenty of ergonomics work like that to do.

Right, and the current implementation of the dynamic GC thread count 
does not read the active_processor_count() so we don't reduce the number 
of GC threads if that value drops. But that would be a simpler fix than 
increasing beyond the initial value.

Thanks,
Bengt

>
> Jon
>
>
>>
>> Thanks,
>> Bengt
>>
>>>
>>> Jon
>>>
>>>>
>>>> bill
>>>>
>>>>>
>>>>> In any case the safest test seemed to be to force 
>>>>> ParallelGCThreads=1 and see if it works.
>>>>>> ForceDynamicNumberOfGCThreads is a diagnostic flag
>>>>>>
>>>>>>   diagnostic(bool, ForceDynamicNumberOfGCThreads, 
>>>>>> false,                    \
>>>>>>           "Force dynamic selection of the number of 
>>>>>> "                       \
>>>>>>           "parallel threads parallel gc will use to aid 
>>>>>> debugging")         \
>>>>>>
>>>>>> so I think you need +UnlockDiagnosticVMOptions.
>>>>> OK.
>>>>>> On 04/21/2015 06:53 AM, Derek White wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Please review this fix for:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8076995
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~drwhite/8076995/webrev.00/
>>>>>>>
>>>>>>> Summary:
>>>>>>>
>>>>>>> Part 1 is a test bug that tries to run G1 on embedded SE builds. Not changed by this webrev.
>>>>>
>>>>> Looking into changing TEST.group...
>>>>>
>>>>> BTW, I tested with jprt earlier, but I'll try to get an Aurora run in.
>>>>>
>>>>>
>>>>>  - Derek
>>>>>>> Part two is assertion failure that is being fixed by this webrev.
>>>>>>>
>>>>>>> This is a fix for bug that triggered an assert when running CMS on very
>>>>>>> small machines - 1 core x86, or 1-4 core ARM. This may seem unlikely but
>>>>>>>   can easily happen when running virtual instances.
>>>>>>>
>>>>>>> Failure stack traces also show bug crashing printing a stack trace, but this is being tracked in another bug.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> - Derek
>>>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150424/6c90e279/attachment.htm>