Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.

Wed Nov 20 23:02:51 UTC 2013

We've tested it thoroughly with various kinds of workloads,
and here is something you can probably reproduce it.

* Benchmark: Dacapo benchmarks (http://dacapobench.org)
* Hardware : 2 socket sandybridge core
* JDK: JDK7u36(? not sure which exact one I used)
* Benchmark Heapsize: 2x of minimum heap size requirement of each benchmark
                                  Min heap size is where each benchmark
will start to throw OOM.
* GC: CMS GC
* Measured Execution time and GC pause time reported by -XX:+PrintGCDetails
on 15th iteration
* Average of 30 runs per benchmarks

RESULTS SUMMARY
* Results are normalized to the before.
* Geomean: 0.58 (that means pause time has reduced 42% in the average)
* Full results: attached (open the html on your browser).
  * Tradesoap runs Full GC all the time so it doesn't get any benefit.
  * You can ignore avrora because it has very little GC activity.

@Vitaly: Nice catch!

Jungwoo

On Tue, Nov 19, 2013 at 7:34 PM, Vitaly Davidovich <vitalyd at gmail.com>wrote:

> I'd also be interested in a bit more detail.  As well, 10-50% improvement
> is with respect to what absolute values?
>
> Small issue in the webrev (os_linux_x86.cpp):
>
> 973         os::set_core_count(os::active_processor_count());
> 974         return os::core_count();
>
> This bailout should fclose(fp) before returning.
>
> Sent from my phone
> On Nov 19, 2013 7:01 PM, "David Keenan" <dkeenan at twitter.com> wrote:
>
>> Could you detail the workloads used to measure the reduction in CPU time?
>>  Vagueness is ok. The magnitude of the improvement albeit wonderful is
>> surprising.  I'm curious whether system load was a factor in measurement.
>>
>>
>> On Tue, Nov 19, 2013 at 3:35 PM, Jungwoo Ha <jwha at google.com> wrote:
>>
>>> Hi,
>>>
>>> I am sending this webrev for the review.
>>> (On behalf of Jon Masamitsu, it is upload here)
>>> http://cr.openjdk.java.net/~jmasa/8028554/webrev.00/
>>>
>>> The feature is a new heuristics to calculate the default
>>> ParallelGCThreads and ConGCThreads.
>>> In x86, hyperthreading is generally bad for GC because of the cache
>>> contention.
>>> Hence, using all the hyper-threaded cores will slow down the overall GC
>>> performance.
>>> Current hotspot reads the number of processors that the Linux reports,
>>> which treats all hyper-threaded cores equally.
>>> Second problem is that when cpu mask is set, not all the cores are
>>> available for the GC.
>>>
>>> The patch improves the heuristics by evaluating the actual available
>>> physical cores
>>> from the proc filesystem and the CPU mask, and use that as the basis for
>>> calculating the ParallelGCThreads and ConcGCThreads.
>>>
>>> The improvements of GC pause time is significant. We evaluated on
>>> Nehalem, Westmere, Sandybridge as well as several AMD processors. We also
>>> evaluated on various CPU mask configuration and single/dual socket
>>> configurations.
>>> In almost all cases, there were speed up in GC pause time by 10~50%.
>>>
>>> We primarily use CMS collector for the evaluation, but we also tested on
>>> other GCs as well.
>>> Please take a look and let me know if this patch can be accepted.
>>>
>>> Thanks,
>>> Jungwoo Ha
>>>
>>>
>>
>>
>> --
>>
>> @dagskeenan
>>
>> 1.617.755.8186
>>
>> dkeenan at twitter.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20131120/2615b11a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcthreads_dacapo_sandybridge.html.gz
Type: application/x-gzip
Size: 1060 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20131120/2615b11a/gcthreads_dacapo_sandybridge.html.gz>