RFC: Epsilon GC JEP

Tue Jul 18 15:22:50 UTC 2017

On 07/18/2017 03:26 PM, Aleksey Shipilev wrote:
> On 07/18/2017 02:37 PM, Erik Helin wrote:
>>> [1] https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality
>>> [2] https://shipilev.net/jvm-anatomy-park/13-intergenerational-barriers
>>> [3] Also, remember the reason for UseCondCardMark
>>> [4] Also, remember the whole thing about G1 barriers
>>
>> Absolutely, barriers can come with an overhead. But a barrier that consists of
>> dirtying a card does not come with a quite high overhead. In fact, it comes with
>> a very low overhead :)
>
> Mhm! "Low" is in the eye of beholder. You can't beat zero overhead. And there
> are people who literally count instructions on their hot paths, while still
> developing in Java.
>
> Let me ask you a trick question: how do you *know* the card mark overhead is
> small, if you don't have a no-barrier GC to compare against?

There is no need for trick questions. Aleksey, we are working towards 
the same goal: making OpenJDK's GCs better. That doesn't mean we can't 
have different opinions on a few topics.

You of course know the cost a GC barrier by measuring it. You measure it 
by constructing a build where you do not emit the barriers and compare 
it to a build where you do. Again, I have already said that I can see 
your work being useful for other JVM developers.

>>>> - why do you think Epsilon GC is a good baseline? IMHO, no barriers is
>>>>   not the perfect baseline, since it is just a theoretical exercise.
>>>>   Just cranking up the heap and using Serial is more realistic
>>>>   baseline, but even using that as a baseline is questionable.
>>>
>>> It sometimes is. Non-generational GC is a good baseline for some workloads. Even
>>> Serial does not cut it, because even if you crank up old and trim down young,
>>> there is no way to disable reference write barrier store that maintains card
>>> tables.
>>
>> I will still point out though that a GC without a barrier is still just a
>> theoretical baseline. One could imagine a single-gen mark-compact GC for OpenJDK
>> (that would require no barriers), but AFAIK almost all users prefer the slight
>> overhead of dirtying a card (and in return get a generational GC) for the use
>> cases where a single-gen mark-compact algorithm would be applicable.
>
> Mark-compact, maybe. But single-gen mark-sweep algorithms are plenty, see e.g.
> Go runtime. I have hard time seeing how is that theoretical.

That is not what I said. As I wrote above:

 > but AFAIK almost all users prefer the slight
 > overhead of dirtying a card (and in return get a generational GC) for
 > the use cases where a single-gen mark-compact algorithm would be
 > applicable.

There are of course use cases for single-gen mark-sweep algorithms, and 
as I write above, for single-gen mark-compact algorithms as well. But 
for Java, and OpenJDK, at least it is my understanding that most users 
prefer a generational algorithm like Serial compared to a single-gen 
mark-compact algorithm (at least I have not seen a lot of users asking 
for that). But maybe I'm missing something here?

This is why I wrote, and still think, that a GC without a barrier for 
Java seems more like a theoretical baseline. There are of course single 
generational GC algorithms that uses a barrier that it would be very 
interesting to see implemented in OpenJDK (including the great work that 
you and others are doing with Shenandoah).

>> However, again, this might be useful for someone who wants try to do some
>> changes to the JVM GC code. But that, to me, is not enough to expose it to
>> non-JVM developers. It could be useful to have in the source code though, maybe
>> like a --with-jvm-feature kind of thing?
>
> That would go against the maintainability argument, no? Because you will still
> have to maintain the code, *and* it will require building a special JVM flavor.
> So it is a lose-lose: neither users get it, nor maintainers have simpler lives.

No, I don't view it that way. Having the code in the upstream repository 
and having it exposed in binary builds are two very different things to 
me, and comes with very different requirements in terms of maintenance. 
If the code is in the upstream repository, then it is a tool for 
developers working in OpenJDK and for integrators building OpenJDK. We 
have a much easier time changing such code compared to code that users 
have come to rely on (and expect certain behavior from).

>> [snip] Such users will still be able to get binary builds if someone is willing to
>> produce them with Epsilon GC. There are plenty of OpenJDK binary builds
>> available from various organizations/companies.
>
> Well, yes. I actually happen to know the company which can distribute this in
> the downstream OpenJDK builds, and reap the ultra-power-users loyalty. But, I am
> maintaining that having the code upstream is beneficial, even if that company is
> going to do maintenance work either way.
>
>
>>> So the short answer about why Epsilon is good to have in product is because the
>>> cost seems low, the benefits are present, and so cost/benefit is still low.
>>
>> And it is here that our opinions differ :) For you the maintenance cost is low,
>> whereas for me, having yet another command-line flag, yet another code path,
>> gets in the way. You have to respect that we have different background and
>> experiences here.
>
> I am not trying to challenge your background or experience here, I am
> challenging the cost estimates though. Because ad absurdum, we can shoot down
> any feature change coming into JVM, just because it introduces yet another flag,
> yet another code path, etc.

Do you see me doing that? I at least hope I am welcoming to everyone 
that wants to contribute a patch to OpenJDK, big or small (please let me 
know otherwise).

> I cannot see where the Epsilon maintenance would be a burden: it comes with
> automated tests that run fast, its implementation seemss trivial, its exposure
> to VM code seems trivial too (apart from the BarrierSet thing that would be
> trimmed down with GC interface work).

And from my experience there is always maintenance work (documentation, 
support, testing matrix increase, etc) with supporting a new kind of 
collector. You and I just do a different cost/benefit analysis on 
exposing this behavior to non-JVM developers.

>>> Yeah, I know how that feels. Look at the actual Epsilon changes, do they look
>>> scary to you, given your experience maintaining the related code?
>>
>> I don't like taking the role of the grumpy open source maintainer :) No, the
>> code is not scary, code is rarely scary IMO, it is just code. Running tests,
>> fixing that a test -Xmx1g isn't run on a RPi, having additional code paths, more
>> cases to take into consideration when refactoring, is burdensome. And to me, the
>> benefits of benchmarking against Epsilon vs benchmarking against Serial/Parallel
>> isn't that high to me.
>>
>> But, I can understand that it is useful when trying to evaluate for example the
>> cost of stores into a HashMap. Which is why I'm not against the code, but I'm
>> not keen on exposing this to non-JVM developers.
>
> I hear you, but thing is, Epsilon does not seem a coding exercise anymore.
> Epsilon is useful for GC performance work especially when readily available, and
> there are willing users to adopt it. Similarly how we respect maintainers'
> burden in the product, we have to also see what benefits users, especially the
> ones who are championing our project performance even by cutting corners with
> e.g. no-op GCs.

Yes, you always have to weigh the benefits against the costs, and in 
this case, exposing Epsilon GC to non-JVM developers seems, at least for 
now and to me, taht the benefits do not outweigh the costs. Who knows, 
maybe this will change and we redo the cost/benefit analysis? It is very 
easy to go from developer flag to experimental flag, it is way, way 
harder to go from experimental flag to developer flag.

Thanks,
Erik

> Thanks,
> -Aleksey
>