EpsilonGC and throughput.

Tue Dec 19 18:52:48 UTC 2017


On 12/19/2017 12:14 AM, Aleksey Shipilev wrote:
> I assume you have ran SPECjvm2008.
Bingo. All of them which were able to work with EpsilonGC at least 30 
seconds.
>
> Beware of what I call the Catch-22 of (GC) Performance Evaluation: "standard benchmarks" tend to be
> developed/tuned with existing GCs in mind.
You are partially true. Looking into some sources I could conclude that 
they were written having general Java style in mind, not tuned to 
particular GCs.

> For example, it would be hard to find the "standard
> benchmark" that exhibits large LDS, or otherwise experiences large GC pauses, or experiences GC
> problems in its steady state (ignoring transient hiccups in the warmups).
>
>
>> - EpsilonGC vs ParallelOld:
>>    -- only on 3 benchmarks overall throughput with Epsilon GC was higher than ParallelOld and speedup
>> was : 0.2%-0.6%
>>    -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster (faster means throughput!),
>> within 1%-10%.
>>
>> - EpsilonGC vs G1
>>    -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3%
>>    -  G1 was faster on 5 benchmarks, within 2%-10%.
> Oh! The throughput figures are actually pretty good for non-compacting collector, and performance
> improvements are in-line with that is called out in JEP as "Last-drop performance improvements" on
> special workloads.
For special cases yes. I wrote about typical cases. And I my my message 
was: don't expect that EpsilonGC will show you "ideal throughput" 
without GC overheads, sometimes GC overhead is important for higher 
performance.
>
> As noted above, it makes little sense to run Epsilon for throughput on "standard benchmarks" that do
> not suffer from GC issues. It is instructive, however, to run workloads that *do* suffer from them.
I have concerns here. I am afraid that if application *does* suffer from 
GC issues it will continue suffering from EpsilonGC issues (OutOfMemory).
> For example, try this for a quick turn-around CLI workload that is supposed to do one thing very
> quickly:
>
> public class AL {
>      static List<Object> l;
>      public static void main(String... args) throws Throwable {
>          l = new ArrayList<>();
>          for (int c = 0; c < 100_000_000; c++) {
>              l.add(new Object());
>          }
>          System.out.println(l.hashCode());
>      }
> }
>
>
> $ time java -XX:+UseParallelGC AL
> -1907572722
>
> real	0m25.063s
> user	1m5.700s
> sys	0m1.084s
>
> $ time java -XX:+UseG1GC AL
> -1907572722
>
> real	0m14.908s
> user	0m33.264s
> sys	0m0.788s
>
> $ time java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC AL
> -1907572722
>
> real	0m8.995s
> user	0m8.784s
> sys	0m0.260s
It doesn't look like throughput benchmark, it's startup. I am sorry, I 
had to be more clear in my previous email, I was writing about steady 
state throughput.
Converting this into throughput benchmark I've got:
G1: 12 seconds
ParallelOld: 24 seconds
EpsilonGC: 9.5 seconds
Not so huge difference, and EpsilonGC can't do more than a couple 
iterations.
> In workloads like these, having GC pauses does impact application throughput.
Nobody argued with this. I just have shown examples that sometimes GC 
pauses (with compaction) provide better overall throughput.
> When out-of-the-box GC
> performance is concerned, the difference is not even in single-digit percents. Of course, you can
> configure GC to avoid pauses in the timespan that is critical for you (e.g. setting -Xms8g -Xmx8g
> -Xmn7g for the workload above), and hope you got it right, but one of the points for Epsilon is not
> to guess about this, but actually have the guarantee GC never happens.
>
>
>> Compacting GCs have significant advantage over non-GC in terms of throughput (e.g.
>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/)
> True, and it is called out in JEP:
>
> "Locality considerations. Non-compacting GC implicitly means it maintains the object graph in its
> allocation order. This has impact on spatial locality, and regular applications may experience the
> throughput hit if allocations are random or generate lots of sparse garbage. While this may entail
> some throughput overhead, this is outside of GC control, and would affect most non-moving GCs.
> Locality-aware application coding would be required to mitigate this drawback, if locality proves to
> be a problem."
>
> Locality is something that users can control, especially when small contained applications are
> concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.
Sure. Just have to note that such special tuned locality-aware 
application barely could use standard Java API, because of it is out of 
user control.
Epsilon GC is not a silver bullet, and for *practical* usage it will 
require more efforts than existing GCs to achieve benefits. I don't mind 
that such benefits are exist.
>
> Thanks,
> -Aleksey
>

-- 
Best regards,
Sergey Kuksenko