Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.

Thu Jan 16 13:35:52 UTC 2014

On 2014-01-15 22:29, Jungwoo Ha wrote:
> Sounds good to me.

Yes, I like ScaleGCThreadsByCores better too.

Bengt

>
>
> On Wed, Jan 15, 2014 at 10:52 AM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>
>
>     On 1/15/2014 4:51 AM, Bengt Rutisson wrote:
>>
>>     On 2014-01-13 22:39, Jungwoo Ha wrote:
>>>
>>>
>>>         In CMSCollector there is still this code to change the value
>>>         for ConcGCThreads based on AdjustGCThreadsToCores.
>>>
>>>
>>>          639       if (AdjustGCThreadsToCores) {
>>>          640 FLAG_SET_DEFAULT(ConcGCThreads, ParallelGCThreads / 2);
>>>          641       } else {
>>>          642 FLAG_SET_DEFAULT(ConcGCThreads, (3 + ParallelGCThreads)
>>>         / 4);
>>>          643       }
>>>
>>>         Do you think that is needed or can we use the same logic in
>>>         both cases given that ParallelGCThreads has a different
>>>         value if AdjustGCThreadsToCores is enabled.
>>>
>>>
>>>     I am happy to just use FLAG_SET_DEFAULT(ConcGCThreads,
>>>     ParallelGCThreads / 2);
>>>     The original hotspot code used FLAG_SET_DEFAULT(ConcGCThreads,
>>>     (3 + ParallelGCThreads) / 4); which I think is somewhat arbitrary.
>>>     Now that ParallelGCThreads will reduce on some configuration,
>>>     dividing it into 4 seems to make the ConcGCThreads too small.
>>
>>     Hm. Changing to FLAG_SET_DEFAULT(ConcGCThreads, ParallelGCThreads
>>     / 2) might be the way to go, but I think that should probably
>>     done as a separate change. That way we can performance test it
>>     more thoroughly.
>>
>>>
>>>
>>>         Also, I don't fully understand the name
>>>         AdjustGCThreadsToCores. In
>>>         VM_Version::calc_parallel_worker_threads() for x86 we simply
>>>         active_core_count with 2 if this flag is enabled. So, the
>>>         flag does not really adjust to the cores. It seems like it
>>>         is reduces the number of GC threads. How about calling the
>>>         flag ReduceGCThreads or something like that?
>>>
>>>
>>>     The flag can be named better. However, ReduceGCThreads doesn't
>>>     seem to reflect what this flag does.
>>>     I am pretty bad at naming, so let me summarize what this flag is
>>>     actually doing.
>>>
>>>     The flag adjusts the GC threads to the number of "available"
>>>     physical cores reported by /proc filesystem and the CPU mask set
>>>     by sched_setaffinity.
>>>     For example, ParallelGCThreads will remain the same regardless
>>>     of whether hyperthreading is turned on/off.
>>>     Current hotspot code will have twice more GC threads if
>>>     hyperthreading is on.
>>>     Usually, GC causes huge number of cache misses, thus having two
>>>     GC threads competing for the same physical core hurts the GC
>>>     throughput.
>>>     Current hotspot code doesn't consider CPU mask at all.
>>>     For example, even though the machine has 64 cores, if CPU mask
>>>     is set for 2 cores, current hotspot calculates the number of GC
>>>     threads based on 64.
>>>     Thus, this flag is actually evaluating the number of GC threads
>>>     to the number of physical cores available for the JVM process.
>>
>>     Right. In VM_Version::calc_parallel_worker_threads() we take the
>>     value of os::active_core_count() and divide it by 2. I guess this
>>     is to reduce the cache issues. But if the flag is called
>>     AdjustGCThreadsToCores I would have expected that we set the
>>     number of GC threads to be equal to the core count. That's why I
>>     suggested "Reduce" in the name.
>>
>>     Naming is hard and I am not particularly fond of the name
>>     ReduceGCThreads either. But maybe we can try to come up with
>>     something else?
>
>     How about ScaleGCThreadsByCores?
>
>     Jon
>
>
>>
>>>
>>>         I think I pointed this out earlier, but I don't feel
>>>         comfortable reviewing the changes in os_linux_x86.cpp. I
>>>         hope someone from the Runtime team can review that.
>>>
>>>
>>>     Can you clarify what you meant? /proc & cpu mask is dependent on
>>>     Linux & x86, and I only tested on that platform.
>>>     The assumptions I used here is based on the x86 cache architecture.
>>
>>     What I was trying to say was that I don't know enough about Linux
>>     to be confident that your implementation of
>>     os::active_core_count() is the simplest and most stable way to
>>     retrieve that information. I'm sure it is good, I am just not the
>>     right person to review this piece of the code. That's why I think
>>     it would be good if someone from the Runtime team looked at this.
>>
>>     Thanks,
>>     Bengt
>>
>>
>>>
>>>     Jungwoo
>>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20140116/c465155e/attachment.htm>