RFR (S): 8076995: gc/ergonomics/TestDynamicNumberOfGCThreads.java failed with java.lang.RuntimeException: 'new_active_workers' missing from stdout/stderr

Bengt Rutisson bengt.rutisson at oracle.com
Thu Apr 23 07:46:09 UTC 2015


On 22/04/15 17:45, Jon Masamitsu wrote:
>
>
> On 4/21/2015 2:57 PM, bill pittore wrote:
>>
>>
>> On 4/21/2015 4:56 PM, Derek White wrote:
>>> Thanks  Jon!
>>>
>>> On 4/21/15 1:23 PM, Jon Masamitsu wrote:
>>>> Derek,
>>>>
>>>> Thanks for fixing this.
>>>>
>>>> Fix looks good.
>>>>
>>>> What do you think about always making testDynamicNumberOfGCThread()
>>>> check for the uniprocessor case (as opposed to passing in a flag to 
>>>> explicitly
>>>> check it)?
>>> This may not catch all of the failures. What I couldn't pin down was 
>>> why some 2, 3(!), or 4 core ARM machines would result in defaulting 
>>> ParallelGCThreads=1. Now these were embedded machines, with 
>>> potentially "odd" versions of linux, possibly with "odd" errata. Or 
>>> perhaps there was some dynamic differences between "installed" and 
>>> "on-line" cores.
>> There is definitely a difference between the processor count and the 
>> online processor count.  It seems that the calculation of 
>> ParallelGCThreads uses the online count which could easily be 1 on 
>> some embedded platform since the kernel does do active power 
>> management by shutting off cores.  The comment in os.hpp for 
>> active_processor_count() says "Returns the number of CPUs this 
>> process is currently allowed to run on".  On linux at least I don't 
>> think that's correct. Cores could be powered down just because the 
>> kernel is in some low power state and not because of some affinity 
>> property for this particular Java process. I'd change the calculation 
>> to call processor_count() instead of active_processor_count().
>
> An early implementation used processor_count() and there was some 
> issue with virtualization.
> I forget what the virtualization was but it was something like Solaris 
> containers or zones.  Let me
> call them containers.  A container on an 8 processor machine might 
> only get 1 processor but
> processor_count() would return 8.   It may also have been on a system 
> where there were 8
> processors but 7 were disabled.  Only 1 processor was available to 
> execute the JVM but
> processor_count() returned 8.  Anyway, if anyone thinks it should be 
> processor_count()
> instead of active_processor_count(), check those types of situations.

Jon,

In the hg repo it has always been active_processor_count(). I was not 
able to figure out exactly when it was changed from processor_count(), 
but back in 2003 when JDK-4804915 was pushed it was already 
active_processor_count(). So, maybe it is worth re-evaluating 
processor_count() again. I don't pretend that I know what the correct 
answer here is, it just feels like a lot has happened in the 
virtualization area over the past 10+ years so maybe we should 
reconsider how we calculate the number of worker threads. Especially if 
it causes problems on embedded.

Also, I find the comment for active_processor_count() a bit worrying.

   // Returns the number of CPUs this process is currently allowed to 
run on.
   // Note that on some OSes this can change dynamically.
   static int active_processor_count();

We read it only once and set the static value for ParallelGCThreads 
based on this. But apparently it can change over time so why do we think 
that we get a good value to start with?

Thanks,
Bengt

>
> Jon
>
>>
>> bill
>>
>>>
>>> In any case the safest test seemed to be to force 
>>> ParallelGCThreads=1 and see if it works.
>>>> ForceDynamicNumberOfGCThreads is a diagnostic flag
>>>>
>>>>   diagnostic(bool, ForceDynamicNumberOfGCThreads, 
>>>> false,                    \
>>>>           "Force dynamic selection of the number of 
>>>> "                       \
>>>>           "parallel threads parallel gc will use to aid 
>>>> debugging")         \
>>>>
>>>> so I think you need +UnlockDiagnosticVMOptions.
>>> OK.
>>>> On 04/21/2015 06:53 AM, Derek White wrote:
>>>>> Hi All,
>>>>>
>>>>> Please review this fix for:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8076995
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~drwhite/8076995/webrev.00/
>>>>>
>>>>> Summary:
>>>>>
>>>>> Part 1 is a test bug that tries to run G1 on embedded SE builds. Not changed by this webrev.
>>>
>>> Looking into changing TEST.group...
>>>
>>> BTW, I tested with jprt earlier, but I'll try to get an Aurora run in.
>>>
>>>
>>>  - Derek
>>>>> Part two is assertion failure that is being fixed by this webrev.
>>>>>
>>>>> This is a fix for bug that triggered an assert when running CMS on very
>>>>> small machines - 1 core x86, or 1-4 core ARM. This may seem unlikely but
>>>>>   can easily happen when running virtual instances.
>>>>>
>>>>> Failure stack traces also show bug crashing printing a stack trace, but this is being tracked in another bug.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> - Derek
>>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150423/6c5b47f3/attachment.htm>


More information about the hotspot-gc-dev mailing list