Fwd: Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.

Mon Nov 25 00:24:13 PST 2013

Hi Jon,

On 25/11/2013 4:11 PM, Jon Masamitsu wrote:
> David,
>
> Thanks for taking a look at this.
>
> On 11/24/2013 6:19 PM, David Holmes wrote:
>> Hi Jon,
>>
>> On 23/11/2013 3:24 AM, Jon Masamitsu wrote:
>>> This is a contribution regarding the number of GC worker threads to
>>> use.  Part of the change queries /proc on linux to get the number of
>>> active cores on the platform.  The changes are in
>>>
>>> http://cr.openjdk.java.net/~jmasa/8028554/webrev.00/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp.frames.html
>>>
>>>
>>> Can someone familiar with this code take a look to see
>>> if it is reasonable and done in a way that is consistent
>>> with other /proc queries.
>>
>> I can't comment on that specifically but I do have reservations about
>> this proposed patch.
>>
>> First we have a general problem that "active processor count" doesn't
>> take into account the various resource management mechanisms that can
>> limit the actual "processors" available to the VM when it is running.
>> I would prefer to see that general problem solved. It also isn't clear
>> to me that the sched_getaffinity usage will correctly reflect the use
>> of tasksets/cpusets. (Note on solaris we try to handle some of these
>> mechanisms eg pbind and psrsets but still don't handle resource pools.)
>
> Is there any work being done on the general problem?  I also would
> like to see this solved.  I've always thought of it as runtime code that
> GC uses.  Do you see it as a GC responsibility?

No this is a runtime issue.

>>
>> Second, this feeds into future work on NUMA-awareness that will likely
>> need a more sophisticated set of API's.
>
> Can you explain more?

NUMA aware APIs need access to underlying machine topology so there will 
have to be a VM interface that exposes the information in a suitable 
way. This might involve information on sockets, cores, 
"hyper-threading", processor id's etc.

>>
>> Third I dislike that this is only really addressing linux-x86 and
>> leaving the other platforms to default to cores==processors. That just
>> causes unnecessary divergence in platform functionality.
>
> This is an interesting question with regard to open jdk contributions.
> Yes, Oracle should do its best to implement on all platforms but
> Google is a linux x86 shop and I personally don't expect them to implement
> and performance test on all the supported platforms.   Should we be making
> that a requirement for open jdk contributions?

I can't say that we can make it a requirement from the original 
contributor, but we should be advancing the platform not individual ports.

>>
>> This is too late for JDK 8 and I think we will be doing more complete
>> work in this area during JDK 9 development.
>
> Agreed that it is too late for jdk8.  I would think it would be suitable
> for an
> 8 update, however.  What is coming in jdk9 that affects this?

Hopefully some NUMA-aware APIs :)

That aside this has to go into 9 before it can be considered for a 
backport to 8u.

David
-----

> Jon
>
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks.
>>>
>>>
>>> -------- Original Message --------
>>> Subject:     Better default for ParallelGCThreads and ConcGCThreads by
>>> using number of physical cores and CPU mask.
>>> Date:     Tue, 19 Nov 2013 15:35:22 -0800
>>> From:     Jungwoo Ha <jwha at google.com>
>>> To:     hotspot-gc-dev at openjdk.java.net
>>>
>>>
>>>
>>> Hi,
>>>
>>> I am sending this webrev for the review.
>>> (On behalf of Jon Masamitsu, it is upload here)
>>> http://cr.openjdk.java.net/~jmasa/8028554/webrev.00/
>>> <http://cr.openjdk.java.net/%7Ejmasa/8028554/webrev.00/>
>>>
>>> The feature is a new heuristics to calculate the default
>>> ParallelGCThreads and ConGCThreads.
>>> In x86, hyperthreading is generally bad for GC because of the cache
>>> contention.
>>> Hence, using all the hyper-threaded cores will slow down the overall GC
>>> performance.
>>> Current hotspot reads the number of processors that the Linux reports,
>>> which treats all hyper-threaded cores equally.
>>> Second problem is that when cpu mask is set, not all the cores are
>>> available for the GC.
>>>
>>> The patch improves the heuristics by evaluating the actual available
>>> physical cores
>>> from the proc filesystem and the CPU mask, and use that as the basis for
>>> calculating the ParallelGCThreads and ConcGCThreads.
>>>
>>> The improvements of GC pause time is significant. We evaluated on
>>> Nehalem, Westmere, Sandybridge as well as several AMD processors. We
>>> also evaluated on various CPU mask configuration and single/dual socket
>>> configurations.
>>> In almost all cases, there were speed up in GC pause time by 10~50%.
>>>
>>> We primarily use CMS collector for the evaluation, but we also tested on
>>> other GCs as well.
>>> Please take a look and let me know if this patch can be accepted.
>>>
>>> Thanks,
>>> Jungwoo Ha
>>>
>>>
>>>
>