Benchmark scenario with high G1 performance degradation

Thu Apr 27 08:29:01 UTC 2017

Hi Jens,

On 2017-04-25 05:25, Jens Wilke wrote:
> Hi Stefan,
>
> On Montag, 24. April 2017 21:16:20 ICT Stefan Johansson wrote:
>>>> I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
>>>> far I can't reproduce your problems. It would be great if you could
>>>> provide us with some more information. For example GC-logs and the
>>>> result files. We might be able to dig something out of them.
>>> The logs from the measurement on my notebook for the first mail (see
>>> below) are available at (only 30 days valid):
>>>
>>> http://ovh.to/FzKbgrb
>>>
>>> What environment you are testing on?
>> I only did some quick testing on my desktop with has 12 cores and
>> hyper-threading, so the default is to use 18 parallel GC threads on my
>> system.
> The benchmarks I am conducting are using four workload threads on four CPU
> cores. The example I sent is with four workload threads, so in your
> environment you have enough spare cores for GC work and you don't see the
> performance difference to the CMS collector.
>
> The benchmark is designed to have a constrained core count and keep those
> cores maximal busy.
I see, under those circumstances G1 will have a harder time keeping up 
than the other collectors due to concurrent refinement. You might be 
able to tune your way out of this or at least improve the situation, but 
I'm not sure that is what your looking for.
>> As I mentioned in my reply to your other mail, these calls are caused by
>> region to region pointers in G1. Adding those references can be done
>> either during a safepoint or concurrently. Looking at your profile it
>> seems that most calls come from the concurrent path and since your
>> system has few cores having the concurrent refinement threads doing a
>> lot of work will have impact on the over all performance more.
> Yes.
>
> I have the feeling that there is some kind of "tripping point" in the whole
> system, that causes the high "refinement" activity which would be interesting
> to understand.
>
> For the moment I postpone to dig into this deeper. It's "just" a benchmark
> scenario which triggers this effect. I believe that interactive applications
> that would make use of G1 and its low pause times don't have these large cache
> sizes.
I agree that this is not a benchmark or scenario where we expect G1 to 
be the best choice. The notion I get is that this is a very throughput 
oriented benchmark, and especially when run in a constrained environment 
this will be though on G1. Still, as you said, it would be interesting 
to understand at which point things start to go bad and work to improve 
on that.
> Using JMH to get some reliable benchmark results for scenarios with large
> heaps need some more work, too. AFIAK I am the only one doing "not so micro"
> benchmarks with JMH.
>
> Thanks for looking into this!
Thanks again for sharing you findings and if you have more interesting 
benchmarks/results to share, please do so.

Stefan

>
> Best,
>
> Jens
>