RFR (S): 8076995: gc/ergonomics/TestDynamicNumberOfGCThreads.java failed with java.lang.RuntimeException: 'new_active_workers' missing from stdout/stderr
Bengt Rutisson
bengt.rutisson at oracle.com
Thu Apr 23 07:46:09 UTC 2015
On 22/04/15 17:45, Jon Masamitsu wrote:
>
>
> On 4/21/2015 2:57 PM, bill pittore wrote:
>>
>>
>> On 4/21/2015 4:56 PM, Derek White wrote:
>>> Thanks Jon!
>>>
>>> On 4/21/15 1:23 PM, Jon Masamitsu wrote:
>>>> Derek,
>>>>
>>>> Thanks for fixing this.
>>>>
>>>> Fix looks good.
>>>>
>>>> What do you think about always making testDynamicNumberOfGCThread()
>>>> check for the uniprocessor case (as opposed to passing in a flag to
>>>> explicitly
>>>> check it)?
>>> This may not catch all of the failures. What I couldn't pin down was
>>> why some 2, 3(!), or 4 core ARM machines would result in defaulting
>>> ParallelGCThreads=1. Now these were embedded machines, with
>>> potentially "odd" versions of linux, possibly with "odd" errata. Or
>>> perhaps there was some dynamic differences between "installed" and
>>> "on-line" cores.
>> There is definitely a difference between the processor count and the
>> online processor count. It seems that the calculation of
>> ParallelGCThreads uses the online count which could easily be 1 on
>> some embedded platform since the kernel does do active power
>> management by shutting off cores. The comment in os.hpp for
>> active_processor_count() says "Returns the number of CPUs this
>> process is currently allowed to run on". On linux at least I don't
>> think that's correct. Cores could be powered down just because the
>> kernel is in some low power state and not because of some affinity
>> property for this particular Java process. I'd change the calculation
>> to call processor_count() instead of active_processor_count().
>
> An early implementation used processor_count() and there was some
> issue with virtualization.
> I forget what the virtualization was but it was something like Solaris
> containers or zones. Let me
> call them containers. A container on an 8 processor machine might
> only get 1 processor but
> processor_count() would return 8. It may also have been on a system
> where there were 8
> processors but 7 were disabled. Only 1 processor was available to
> execute the JVM but
> processor_count() returned 8. Anyway, if anyone thinks it should be
> processor_count()
> instead of active_processor_count(), check those types of situations.
Jon,
In the hg repo it has always been active_processor_count(). I was not
able to figure out exactly when it was changed from processor_count(),
but back in 2003 when JDK-4804915 was pushed it was already
active_processor_count(). So, maybe it is worth re-evaluating
processor_count() again. I don't pretend that I know what the correct
answer here is, it just feels like a lot has happened in the
virtualization area over the past 10+ years so maybe we should
reconsider how we calculate the number of worker threads. Especially if
it causes problems on embedded.
Also, I find the comment for active_processor_count() a bit worrying.
// Returns the number of CPUs this process is currently allowed to
run on.
// Note that on some OSes this can change dynamically.
static int active_processor_count();
We read it only once and set the static value for ParallelGCThreads
based on this. But apparently it can change over time so why do we think
that we get a good value to start with?
Thanks,
Bengt
>
> Jon
>
>>
>> bill
>>
>>>
>>> In any case the safest test seemed to be to force
>>> ParallelGCThreads=1 and see if it works.
>>>> ForceDynamicNumberOfGCThreads is a diagnostic flag
>>>>
>>>> diagnostic(bool, ForceDynamicNumberOfGCThreads,
>>>> false, \
>>>> "Force dynamic selection of the number of
>>>> " \
>>>> "parallel threads parallel gc will use to aid
>>>> debugging") \
>>>>
>>>> so I think you need +UnlockDiagnosticVMOptions.
>>> OK.
>>>> On 04/21/2015 06:53 AM, Derek White wrote:
>>>>> Hi All,
>>>>>
>>>>> Please review this fix for:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8076995
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~drwhite/8076995/webrev.00/
>>>>>
>>>>> Summary:
>>>>>
>>>>> Part 1 is a test bug that tries to run G1 on embedded SE builds. Not changed by this webrev.
>>>
>>> Looking into changing TEST.group...
>>>
>>> BTW, I tested with jprt earlier, but I'll try to get an Aurora run in.
>>>
>>>
>>> - Derek
>>>>> Part two is assertion failure that is being fixed by this webrev.
>>>>>
>>>>> This is a fix for bug that triggered an assert when running CMS on very
>>>>> small machines - 1 core x86, or 1-4 core ARM. This may seem unlikely but
>>>>> can easily happen when running virtual instances.
>>>>>
>>>>> Failure stack traces also show bug crashing printing a stack trace, but this is being tracked in another bug.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> - Derek
>>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150423/6c5b47f3/attachment.htm>
More information about the hotspot-gc-dev
mailing list