RFR (S): 8076995: gc/ergonomics/TestDynamicNumberOfGCThreads.java failed with java.lang.RuntimeException: 'new_active_workers' missing from stdout/stderr

Jon Masamitsu jon.masamitsu at oracle.com
Thu Apr 23 17:13:28 UTC 2015



On 04/23/2015 12:46 AM, Bengt Rutisson wrote:
> On 22/04/15 17:45, Jon Masamitsu wrote:
>>
>>
>> On 4/21/2015 2:57 PM, bill pittore wrote:
>>>
>>>
>>> On 4/21/2015 4:56 PM, Derek White wrote:
>>>> Thanks  Jon!
>>>>
>>>> On 4/21/15 1:23 PM, Jon Masamitsu wrote:
>>>>> Derek,
>>>>>
>>>>> Thanks for fixing this.
>>>>>
>>>>> Fix looks good.
>>>>>
>>>>> What do you think about always making testDynamicNumberOfGCThread()
>>>>> check for the uniprocessor case (as opposed to passing in a flag 
>>>>> to explicitly
>>>>> check it)?
>>>> This may not catch all of the failures. What I couldn't pin down 
>>>> was why some 2, 3(!), or 4 core ARM machines would result in 
>>>> defaulting ParallelGCThreads=1. Now these were embedded machines, 
>>>> with potentially "odd" versions of linux, possibly with "odd" 
>>>> errata. Or perhaps there was some dynamic differences between 
>>>> "installed" and "on-line" cores.
>>> There is definitely a difference between the processor count and the 
>>> online processor count.  It seems that the calculation of 
>>> ParallelGCThreads uses the online count which could easily be 1 on 
>>> some embedded platform since the kernel does do active power 
>>> management by shutting off cores.  The comment in os.hpp for 
>>> active_processor_count() says "Returns the number of CPUs this 
>>> process is currently allowed to run on".  On linux at least I don't 
>>> think that's correct. Cores could be powered down just because the 
>>> kernel is in some low power state and not because of some affinity 
>>> property for this particular Java process. I'd change the 
>>> calculation to call processor_count() instead of 
>>> active_processor_count().
>>
>> An early implementation used processor_count() and there was some 
>> issue with virtualization.
>> I forget what the virtualization was but it was something like 
>> Solaris containers or zones.  Let me
>> call them containers.  A container on an 8 processor machine might 
>> only get 1 processor but
>> processor_count() would return 8.   It may also have been on a system 
>> where there were 8
>> processors but 7 were disabled.  Only 1 processor was available to 
>> execute the JVM but
>> processor_count() returned 8.  Anyway, if anyone thinks it should be 
>> processor_count()
>> instead of active_processor_count(), check those types of situations.
>
> Jon,
>
> In the hg repo it has always been active_processor_count(). I was not 
> able to figure out exactly when it was changed from processor_count(), 
> but back in 2003 when JDK-4804915 was pushed it was already 
> active_processor_count(). So, maybe it is worth re-evaluating 
> processor_count() again. I don't pretend that I know what the correct 
> answer here is, it just feels like a lot has happened in the 
> virtualization area over the past 10+ years so maybe we should 
> reconsider how we calculate the number of worker threads. Especially 
> if it causes problems on embedded.

No argument there.  I just wanted to point out situations where it
might matter.

>
> Also, I find the comment for active_processor_count() a bit worrying.
>
>   // Returns the number of CPUs this process is currently allowed to 
> run on.
>   // Note that on some OSes this can change dynamically.
>   static int active_processor_count();
>
> We read it only once and set the static value for ParallelGCThreads 
> based on this. But apparently it can change over time so why do we 
> think that we get a good value to start with?

At the time the number of parallel GC threads could not change so
we were stuck with the value at the start.  Even today increasing
beyond the original maximum GC threads would take some work
(arrays sized for the maximum number of GC threads, for example).
There's plenty of ergonomics work like that to do.

Jon


>
> Thanks,
> Bengt
>
>>
>> Jon
>>
>>>
>>> bill
>>>
>>>>
>>>> In any case the safest test seemed to be to force 
>>>> ParallelGCThreads=1 and see if it works.
>>>>> ForceDynamicNumberOfGCThreads is a diagnostic flag
>>>>>
>>>>>   diagnostic(bool, ForceDynamicNumberOfGCThreads, 
>>>>> false,                    \
>>>>>           "Force dynamic selection of the number of 
>>>>> "                       \
>>>>>           "parallel threads parallel gc will use to aid 
>>>>> debugging")         \
>>>>>
>>>>> so I think you need +UnlockDiagnosticVMOptions.
>>>> OK.
>>>>> On 04/21/2015 06:53 AM, Derek White wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Please review this fix for:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8076995
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~drwhite/8076995/webrev.00/
>>>>>>
>>>>>> Summary:
>>>>>>
>>>>>> Part 1 is a test bug that tries to run G1 on embedded SE builds. Not changed by this webrev.
>>>>
>>>> Looking into changing TEST.group...
>>>>
>>>> BTW, I tested with jprt earlier, but I'll try to get an Aurora run in.
>>>>
>>>>
>>>>  - Derek
>>>>>> Part two is assertion failure that is being fixed by this webrev.
>>>>>>
>>>>>> This is a fix for bug that triggered an assert when running CMS on very
>>>>>> small machines - 1 core x86, or 1-4 core ARM. This may seem unlikely but
>>>>>>   can easily happen when running virtual instances.
>>>>>>
>>>>>> Failure stack traces also show bug crashing printing a stack trace, but this is being tracked in another bug.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> - Derek
>>>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150423/c2527375/attachment.htm>


More information about the hotspot-gc-dev mailing list