RFR (S): 8076995: gc/ergonomics/TestDynamicNumberOfGCThreads.java failed with java.lang.RuntimeException: 'new_active_workers' missing from stdout/stderr
Bengt Rutisson
bengt.rutisson at oracle.com
Fri Apr 24 07:57:33 UTC 2015
Hi Jon and Derek,
On 2015-04-23 19:13, Jon Masamitsu wrote:
>
>
> On 04/23/2015 12:46 AM, Bengt Rutisson wrote:
>> On 22/04/15 17:45, Jon Masamitsu wrote:
>>>
>>>
>>> On 4/21/2015 2:57 PM, bill pittore wrote:
>>>>
>>>>
>>>> On 4/21/2015 4:56 PM, Derek White wrote:
>>>>> Thanks Jon!
>>>>>
>>>>> On 4/21/15 1:23 PM, Jon Masamitsu wrote:
>>>>>> Derek,
>>>>>>
>>>>>> Thanks for fixing this.
>>>>>>
>>>>>> Fix looks good.
>>>>>>
>>>>>> What do you think about always making testDynamicNumberOfGCThread()
>>>>>> check for the uniprocessor case (as opposed to passing in a flag
>>>>>> to explicitly
>>>>>> check it)?
>>>>> This may not catch all of the failures. What I couldn't pin down
>>>>> was why some 2, 3(!), or 4 core ARM machines would result in
>>>>> defaulting ParallelGCThreads=1. Now these were embedded machines,
>>>>> with potentially "odd" versions of linux, possibly with "odd"
>>>>> errata. Or perhaps there was some dynamic differences between
>>>>> "installed" and "on-line" cores.
>>>> There is definitely a difference between the processor count and
>>>> the online processor count. It seems that the calculation of
>>>> ParallelGCThreads uses the online count which could easily be 1 on
>>>> some embedded platform since the kernel does do active power
>>>> management by shutting off cores. The comment in os.hpp for
>>>> active_processor_count() says "Returns the number of CPUs this
>>>> process is currently allowed to run on". On linux at least I don't
>>>> think that's correct. Cores could be powered down just because the
>>>> kernel is in some low power state and not because of some affinity
>>>> property for this particular Java process. I'd change the
>>>> calculation to call processor_count() instead of
>>>> active_processor_count().
>>>
>>> An early implementation used processor_count() and there was some
>>> issue with virtualization.
>>> I forget what the virtualization was but it was something like
>>> Solaris containers or zones. Let me
>>> call them containers. A container on an 8 processor machine might
>>> only get 1 processor but
>>> processor_count() would return 8. It may also have been on a
>>> system where there were 8
>>> processors but 7 were disabled. Only 1 processor was available to
>>> execute the JVM but
>>> processor_count() returned 8. Anyway, if anyone thinks it should be
>>> processor_count()
>>> instead of active_processor_count(), check those types of situations.
>>
>> Jon,
>>
>> In the hg repo it has always been active_processor_count(). I was not
>> able to figure out exactly when it was changed from
>> processor_count(), but back in 2003 when JDK-4804915 was pushed it
>> was already active_processor_count(). So, maybe it is worth
>> re-evaluating processor_count() again. I don't pretend that I know
>> what the correct answer here is, it just feels like a lot has
>> happened in the virtualization area over the past 10+ years so maybe
>> we should reconsider how we calculate the number of worker threads.
>> Especially if it causes problems on embedded.
>
> No argument there. I just wanted to point out situations where it
> might matter.
I didn't mean to start an argument. Sorry if it was interpreted like
that. I just don't want us to be afraid of investigating a change like
this. It is great to know the historical reason for a particular choice,
so thanks for providing this, Jon!
>
>>
>> Also, I find the comment for active_processor_count() a bit worrying.
>>
>> // Returns the number of CPUs this process is currently allowed to
>> run on.
>> // Note that on some OSes this can change dynamically.
>> static int active_processor_count();
>>
>> We read it only once and set the static value for ParallelGCThreads
>> based on this. But apparently it can change over time so why do we
>> think that we get a good value to start with?
>
> At the time the number of parallel GC threads could not change so
> we were stuck with the value at the start. Even today increasing
> beyond the original maximum GC threads would take some work
> (arrays sized for the maximum number of GC threads, for example).
> There's plenty of ergonomics work like that to do.
Right, and the current implementation of the dynamic GC thread count
does not read the active_processor_count() so we don't reduce the number
of GC threads if that value drops. But that would be a simpler fix than
increasing beyond the initial value.
Thanks,
Bengt
>
> Jon
>
>
>>
>> Thanks,
>> Bengt
>>
>>>
>>> Jon
>>>
>>>>
>>>> bill
>>>>
>>>>>
>>>>> In any case the safest test seemed to be to force
>>>>> ParallelGCThreads=1 and see if it works.
>>>>>> ForceDynamicNumberOfGCThreads is a diagnostic flag
>>>>>>
>>>>>> diagnostic(bool, ForceDynamicNumberOfGCThreads,
>>>>>> false, \
>>>>>> "Force dynamic selection of the number of
>>>>>> " \
>>>>>> "parallel threads parallel gc will use to aid
>>>>>> debugging") \
>>>>>>
>>>>>> so I think you need +UnlockDiagnosticVMOptions.
>>>>> OK.
>>>>>> On 04/21/2015 06:53 AM, Derek White wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Please review this fix for:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8076995
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~drwhite/8076995/webrev.00/
>>>>>>>
>>>>>>> Summary:
>>>>>>>
>>>>>>> Part 1 is a test bug that tries to run G1 on embedded SE builds. Not changed by this webrev.
>>>>>
>>>>> Looking into changing TEST.group...
>>>>>
>>>>> BTW, I tested with jprt earlier, but I'll try to get an Aurora run in.
>>>>>
>>>>>
>>>>> - Derek
>>>>>>> Part two is assertion failure that is being fixed by this webrev.
>>>>>>>
>>>>>>> This is a fix for bug that triggered an assert when running CMS on very
>>>>>>> small machines - 1 core x86, or 1-4 core ARM. This may seem unlikely but
>>>>>>> can easily happen when running virtual instances.
>>>>>>>
>>>>>>> Failure stack traces also show bug crashing printing a stack trace, but this is being tracked in another bug.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> - Derek
>>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150424/6c90e279/attachment.htm>
More information about the hotspot-gc-dev
mailing list