Better default for ParallelGCThreads and ConcGCThreads by using number of physical cores and CPU mask.

Thu Nov 21 17:37:08 UTC 2013

On Thu, Nov 21, 2013 at 2:11 AM, Bengt Rutisson
<bengt.rutisson at oracle.com>wrote:

>
> Hi Jungwoo,
>
> Thanks for contributing this change. Nice work.
>
> First a couple general question.
>
> Why did you only make this change for CMS? Shouldn't the other collectors
> benefit from this as well? You said you tried the other GCs as well. What
> were the results for them?
>

The change will affect all collectors because it is setting
ParalellGCThreads and ConcGCThreads.
I mostly tested on CMS for convenience of evaluation and needs.

Do we need the flag AdjustGCThreadsToCores? It seems like this is a good
> change and it is always nice to reduce the number of flags we have. If you
> feel unsure about the change it may be good to be able to disable it, but
> in general I think it would be better to decide that this is the default
> behavior and go with it.
>

I have no strong opinion. If you guys can confirm the improvements and feel
like you don't need the flag, that would be ideal.

>
> os_linux_x86.cpp
>
> This is where the important part of your change is. I don't know enough
> about Linux internals to validate the changes you have made there. I will
> leave that out of my review and let others (Runtime team?) with more Linux
> knowledge review it.
>
>
> Some code comments:
>
> Since core_count() returns 0 for all platforms except linux_x86, couldn't
> the following code be moved to a platform independent place?
>
>  758 int os::active_core_count() {
>  759   if (os::core_count() <= 0) {
>  760     os::set_core_count(os::active_processor_count());
>  761   }
>  762   return os::core_count();
>  763 }
>
> That way this code would not have to be duplicated in all other platforms:
>
>  726 int os::active_core_count() {
>  727   return os::active_processor_count();
>  728 }
>
> We could just always use os::active_core_count() and rely on the value it
> returns.
>

Sounds good to me.

>
>
> VM_Version::calc_parallel_worker_threads()
>
>  862     unsigned int ncpus = (unsigned int) os::active_core_count();
>  863     return (ncpus <= 8) ? ncpus : ncpus / 2;
>
> Isn't it strange that if we have 8 cores we get 8 parallel threads, but if
> we have 12 cores we only get 6? Wouldn't it make more sense to treat this
> limit as a max value, so:
>
> return (ncpus <= 8) ? ncpus : MAX2(8, ncpus / 2);
>
>
I think that's better.

> concurrentMarkSweepGeneration.cpp
>
>  639       if (AdjustGCThreadsToCores) {
>  640         FLAG_SET_DEFAULT(ConcGCThreads, ParallelGCThreads / 2);
>  641       } else {
>  642         FLAG_SET_DEFAULT(ConcGCThreads, (3 + ParallelGCThreads) / 4);
>  643       }
>
> This looks a little odd to me. Why do we add 3? That does not seem to
> scale when we have a large number of parallel threads. Did you mean to
> multiply by 3?
>

(3 + ParallelGCThreads) / 4 is the upstream default, and I didn't change it.
ParallelGCThreads can be much smaller with the change, so I divided into 2
instead of 4.

Also, do we really need to use a different calculation for the
> ConcGCThreads? It would have been nice to be able to use the same logic for
> both cases. Did you measure any performance difference in the two cases?
>

Maybe not. Most of the pause time is on the young generation in CMS. I have
seen pause time reduction on old generation as well, but anyway that is
small.
As I mentioned earlier, ParallelGCThreads has changed when Hyperthreading
or CPU mask is turned on. Hence using the same heuristics can cause too
small ConcGCThreads.
I'd like to use something like "os::active_core_counts() / 4" and make it
independent of using ParallelGCThreads.

Jungwoo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20131121/c4690c8e/attachment.htm>