RFC: Epsilon GC JEP

Tue Jul 18 13:20:04 UTC 2017

Hi Aleksey,

If I understand this correctly, the motivation for EpsilonGC is to be 
able to measure the overheads due to GC pauses and GC barriers and 
measure only the application throughput without GC jitter, and then use 
that as a baseline for measuring performance of an actual GC 
implementation compared to EpsilonGC.

Howerver, automatic memory management is quite complicated when you 
think about it. Will EpsilonGC allocate all memory up-front, or expand 
the heap? In the case where it expanded on-demand until it runs out of 
memory, what consequences does that potential expansion have on 
throughput? In the case it is allocated upfront, will pages be 
pre-touched? If so, what NUMA nodes will the pre-mapped memory map in 
to? Will mutators try to allocate NUMA-local memory? What consequences 
will the larger heap footprint have on the throughput because of 
decreased memory locality and as a result increased last level cache 
misses and suddenly having to spread to more NUMA nodes? Does the larger 
footprint change the requirements on compressed oops and what 
encoding/decoding of oop compression is required? In case of an 
expanding heap - can it even use compressed oops? In case of a not 
expanding heap allocated up-front, does a comparison of a GC using 
compressed oops with a baseline that can inherently not use it make 
sense? Will lack of compaction and resulting possibly worse object 
locality of memory accesses affect performance?

I am not convinced that we can just remove GC-induced overheads from the 
picture and measure the application throughput without the GC by using 
an EpsilonGC as proposed. At least I do not think I would use it to draw 
conclusions about GC-induced throughput loss. It seems like an apples to 
oranges comparison to me. Or perhaps I have missed something?

Thanks,
/Erik

On 2017-07-18 13:23, Aleksey Shipilev wrote:
> Hi Erik,
>
> Thanks for looking into this!
>
> On 07/18/2017 12:09 PM, Erik Helin wrote:
>> first of all, thanks for trying this out and starting a discussion. Regarding
>> the JEP, I have a few questions/comments:
>> - the JEP specifies "last-drop performance improvements" as a
>>    motivation. However, I think you also know that taking a pause and
>>    compacting a heap that is mostly filled with garbage most likely
>>    results in higher throughput*. So are you thinking in terms of pauses
>>    here when you say performance?
> This cuts both ways: while it is true that moving GC improves locality [1], it
> is also true that the runtime overhead from barriers can be quite high [2, 3,
> 4]. So, "performance" in that section is tied to both throughput (no barriers)
> and pauses (no pauses).
>
> [1] https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality
> [2] https://shipilev.net/jvm-anatomy-park/13-intergenerational-barriers
> [3] Also, remember the reason for UseCondCardMark
> [4] Also, remember the whole thing about G1 barriers
>
>> - why do you think Epsilon GC is a good baseline? IMHO, no barriers is
>>    not the perfect baseline, since it is just a theoretical exercise.
>>    Just cranking up the heap and using Serial is more realistic
>>    baseline, but even using that as a baseline is questionable.
> It sometimes is. Non-generational GC is a good baseline for some workloads. Even
> Serial does not cut it, because even if you crank up old and trim down young,
> there is no way to disable reference write barrier store that maintains card tables.
>
>> - the JEP specifies this as an experimental feature, meaning that you
>>    intend non-JVM developers to be able to run this. Have you considered
>>    the cost of supporting this option? You say "New jtreg tests under
>>    hotspot/gc/epsilon would be enough to assert correctness". For which
>>    platforms? How often should these tests be run, every night?
> I think for all platforms, somewhere in hs-tier3? IMO, current test set in
> hotspot/gc/epsilon is fairly complete, and it takes less than a minute on my
> 4-core i7.
>
>> Whenever we want to do large changes, like updating logging, tracing, etc,
>> will we have to take Epsilon GC into account? Will there be serviceability
>> support for Epsilon GC, like jstat, MXBeans, perf counters etc?
> I tried to address the maintenance costs in the JEP? It is unlikely to cause
> trouble, since it mostly calls into the shared code. And GC interface work would
> hopefully make BarrierSet into more shareable chunk of interface, which makes
> the whole thing even more self-contained. There is some new code in MemoryPools
> that handles the minimal diagnostics. MXBeans still work, at least ThreadMXBean
> that reports allocation pressure, although I'd need to add a test to assert that.
>
> To me, if the no-op GC requires much maintenance whenever something in JVM is
> changing, that points to the insanity of GC interface. No-op GC is a good canary
> in the coalmine for this. This is why one of the motivations is seeing what
> exactly a minimal GC should support to be functional.
>
>
>> - You quote "The experience, however, tells that many players in the
>>    Java ecosystem already did this exercise with expunging GC from their
>>    custom-built JVMs". So it seems that those users that want something
>>    like Epsilon GC are fine with building OpenJDK themselves? Having
>>    -XX:+UseEpsilonGC as a developer flag is much different compared to
>>    exposing it (and supporting, even if in experimental mode) to users.
> There is a fair share of survivorship bias: we know about people who succeeded,
> do we know how many failed or given up? I think developers who do day-to-day
> Hotspot development grossly underestimate the effort required to even build a
> custom JVM. Most power users I know have did this exercise with great pains. I
> used to sing the same song to them: just build OpenJDK yourself, but then pesky
> details pour in. Like: oh, Windows, oh, Cygwin, oh MacOS, oh XCode, oh FreeType,
> oh new compilers that build OpenJDK with warnings and build does treat warnings
> as errors, oh actual API mismatches against msvcrt, glibc, whatever, etc. etc.
> etc. As much as OpenJDK build improved over the years, I am not audacious enough
> to claim it would ever be a completely smooth experience :) Now I am just
> willingly hand them binary builds.
>
> So I think having the experimental feature available in the actual product build
> extends the feature exposure. For example, suppose you are the academic writing
> a paper on GC, would you accept custom-build JVM into your results, or would you
> rather pick up the "gold" binary build from a standard distribution and run with it?
>
>
>> I guess most of my question can be summarized as: this seems like it perhaps
>> could be useful tool for JVM GC developers, why do you want to expose the flag
>> to non-JVM developers (given all the work/support/maintenance that comes with
>> that)?
> My initial thought was that the discussion about the costs should involve
> discussing the actual code. This is why there is a complete implementation in
> the Sandbox, and also the webrev posted.
>
> In the months following my initial (crazy) experiments, I had multiple people
> coming to me and asking when Epsilon is going to be in JDK, because they want to
> use it. And those were the ultra-power-users who actually know what they are
> doing with their garbage-free applications.
>
> So the short answer about why Epsilon is good to have in product is because the
> cost seems low, the benefits are present, and so cost/benefit is still low.
>
>
>> It is _great_ that you are experimenting and trying out new ideas in the VM,
>> please continue doing that! Please don't interpret my questions/comments as
>> to grumpy, this is just my experience from maintaining 5-6 different GC
>> algorithms for more than five years that is speaking. There is _always_ a
>> maintenance cost :)
> Yeah, I know how that feels. Look at the actual Epsilon changes, do they look
> scary to you, given your experience maintaining the related code?
>
> Thanks,
> -Aleksey
>