Epsilon GC JEP updates (was: Re: EpsilonGC and throughput.)

Mon Mar 12 13:12:37 UTC 2018

Hey Thomas,

Updated the JEP here:
 https://bugs.openjdk.java.net/browse/JDK-8174901

On 01/08/2018 05:45 PM, Thomas Schatzl wrote:
> I apologize for my somewhat inappropriate words, this has been due to
> some frustration; also for the long delay that were due to the winter
> holidays.

It took me quite some time to recover from this, and I am still slightly bitter. If we want
contributions to OpenJDK, then we all have to understand this kind of thing really dissuades people
from contributing. I have got quite a few personal "raised eyebrows" replies on this, one person
calling the whole thing right away hostile.

I can imagine much less thick-skinned contributors just walking away. I know the intent wasn't that,
but what matters here is not the intent, but the appearance too (I fell into the same trap before
numerous times, and learned to keep it together -- call this thread a karmic justice).

<burying the hatchet here>

> It also helps if the JEP is written in a way to make it interesting for
> the community to read it, and respond. The less thinking a reader has
> to do to answer whether he is impacted or not, and whether and by how
> much it would simplify the life of himself or in general Java users,
> the more people will feel urged to get this in (or at least not
> deterred).

Come to think about it, the public discussion this JEP is getting is positive, and there is no need
to urge more people to get it in at this point. Lots of people read it, and surprisingly quite a lot
of them tried the prototype, at least for fun, but also for performance work too.

> "Motivation
> ----------

> JEP text: "Java implementations are well known for a broad choice of
> highly configurable GC implementations."
> 
> Potential answer to "Why should this work be done?". Or does the
> sentence indicate we need another GC because we already have so many,
> and another does not hurt? I am asking this in full seriousness, I
> really do not know. Or is this only an introductory sentence without
> meaning?

This statement underpins there is no single all-purpose GC in OpenJDK.

"The variety of available collectors caters for different needs in the end, even if their
configurability make their functionality intersect. It is sometimes easier to maintain a separate
implementation, rather than piling on another configuration option on the existing GC implementation."

> Let's go into these benefits in more detail:
> 
> JEP text: "Performance testing. ..."
> 
> Benefit. Maybe it would be useful to list a few of these performance
> artifacts here ("... , e.g. barrier code, concurrent threads"). 

I added some. We (in Shenandoah development land), and others (in Shenandoah/ZGC/Zing comparison
land) have used Epsilon as the ultimate latency baseline. New text captures that bit:

"Having a GC that does almost nothing is a useful tool to do differential performance analysis for
other, real GCs. Having a no-op GC can help to filter out GC-induced performance artifacts, like GC
workers scheduling, GC barriers costs, GC cycles triggered at unfortunate times, locality changes,
etc. Moreover, there are latency artifacts that are not GC-induced (e.g. scheduling hiccups,
compiler transition hiccups, etc), and removing the GC-induced artifacts help to contrast those. For
example, having the no-op GC allows to estimate the natural "background" latency baseline for
low-latency GC work."

> An alternative could be a developer just nop'ing out the relevant GC
> interface section. That is somewhat cumbersome, but for how many users
> is this a problem? Spell that out in the appropriate Alternatives
> section.

Spelled: "The developers might just no-op out the existing GC implementation to get the baseline
implementation for testing. The problem with this is inconvenience: the developers would need to
make sure such the implementation is still correct, that it provides enough performance to be a good
baseline, that it is hooked up into the other runtime facilities (heap dumping, thread stack
walking, MXBeans) to amend the differential analysis. The implementations for other platforms would
require much more work. Having the ready-to-go no-op implementation in the mainline solves this
inconvenience."

> JEP text: "Functional testing. For Java code testing, a way to
> establish a threshold for allocated memory is useful to assert memory
> pressure invariants. Today, we have to pick up the allocation data from
> MXBeans, or even resort to parsing GC logs. Having a GC that accepts
> only the bounded number of allocations, and fails on heap exhaustion,
> simplifies testing."
> 
> Benefit. For regression testing, in how many cases do you think it is
> sufficient (or in what circumstances) to get a fail/no-fail answer
> only?
> This seems to pass work on a failure to the dev, them needing to write
> another test that also prints and monitors the memory usage increases
> over time anyway.
> How much work, given that you already need to monitor memory usage is
> the test to fail when heap usage goes above a threshold then?

I don't quite believe debugging the test like this would involve tracking the memory allocated so
far, because that is not readily actionable. Even if it is, Epsilon prints the messages when n% of
the heap was allocated, which actually improves developer's experience, because as a dev I don't
need to copy-paste MXBeans blocks anymore.

But what is more actionable is the actual heap dump. And this is where Epsilon comes handy: you just
set -Xmx1g -XX:HeapDumpOnOutOfMemoryError, and run the test. If the test fails, you get the ready
heap dump that tell you what the test had ever allocated to blow the allocation limit, down to every
single object.

I added the example: "For example, knowing that test should allocate no more than 1 GB of memory, we
can configure no-op GC with -Xmx1g, and let it crash with a heap dump if that constraint is violated."

> "VM interface testing. For VM development purposes, having a simple GC
> helps to understand the absolute minimum required from the VM-GC
> interface to have a functional allocator. This serves as proof that the
> VM-GC interface is sane, which is important in lieu of JEP 304
> ("Garbage Collector Interface")."
> 
> Benefit. Who are the (main) benefactors for that - probably developers?
> For a developer, how much is that benefit if there are already 5 or 6
> implementations of that interface?

The benefit is simple: for no-op GC, the implemented GC interface should be in epsilon-neighborhood
of zero. It is not right now, by the way, in both native and codegen parts:
  https://builds.shipilev.net/patch-openjdk-epsilon-jdk/

I added: "For no-op GC, the interface should not have anything implemented, and good interface means
Epsilon's BarrierSet would just use no-op barrier implementations from the default implementation".

> "Last-drop performance improvements. For ultra-latency-sensitive
> applications, where developers are conscious about memory allocations
> and know the application memory footprint exactly, or even have
> (almost) completely garbage-free applications. In those applications,
> GC cycles may be considered an implementation bug that wastes CPU
> cycles for no good reason."

I split this section in three: "Extremely short lived jobs", "Last-drop latency improvements", and
"Last-drop throughput improvements", because it seems more logical that way.

> It may be useful to investigate the problem of these power users in
> more detail, and see if we could provide a (more?) complete solution
> for them.

The problem with this is, those power users consider whatever tricks they managed to tame the
misbehaving GC their competitive advantage (think HFT), and are not really inclined to share. (Some
did not manage, and they moved out of Java, to our disadvantage). The bits and pieces I got are
"give us the no-op GC, and we shall figure out how to manage our memory ourselves, thank you very
much". And in many cases, from the little glimpses that were shared behind the curtains, those guys
seem to really know what they doing. All I'm saying here is that we have no alternative than to take
it on faith, which I am willing to do.

> "Extremely short lived jobs are one example of this."
> 
> I do not understand the use of Epsilon in such use case. The
> alternative I can see would be to restart the VM after every short
> lived job (something for the Alternatives section). That seems strange
> to me, depending on the definition of a "short lived job", particularly
> if nothing survives after execution of that short lived job, a GC will
> be extremely fast.

This relies on per-supposition that GC is fast, because heap is a graveyard. It is not always the case.

I have demonstrated one example:
  http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-December/021042.html

Added: "A short-lived job might rely on exiting quickly to free the resources (e.g. heap memory). In
this case, accepting the GC cycle to futilely clean up the heap is a waste of time, because the heap
would be freed on exit anyway. Note that the GC cycle might take a while, because it would depend on
the amount of live data in the heap, which can be a lot."

> "There are also cases when restarting the JVM -- letting load balancers
> figure out failover -- is sometimes a better recovery strategy than
> accepting a GC cycle."
> 
> I really can't find a good example where a GC, particularly in the
> situation that has been described so far, also for these short-lived
> jobs, where a GC (on an almost empty heap) is not at least as fast as a
> restart.

Again, this relies on per-supposition that GC is fast, because heap is a graveyard. It is, again,
not always the case.

The real-world systems I know of, the latency of node restart does not matter as much: it is more
important to reliably fail the node to let balancer act. In other words, the "global"
detect-and-evade latency is more important than "local" restart latency.

Accepting the GC cycle makes the availability logic harder: you now have to disambiguate between the
normal 100ms execution in the business logic, and the first 100ms of multi-second GC pause. Which
probably means timeout to be several-sigma larger than the usual business logic wait time, which
prolongs the recovery. Instead, you might just crash, and let balancer figure out where to restart
the processing right away.

I understand this goes against our own intuition how the systems should be built. Of course, we want
our GCs to never push users to come up with these contraptions, but the sad reality is that they are
doing that, because the world and GC implementations in this world are not perfect.

And you would not find that in our spectrum of well-behaved workloads. Talking to people who
maintain large real-world systems can be sobering for understanding what they have to deal with. For
example, one customer asked me to come up with this contraption for their high-availability
in-memory grid -- they are ready for JVM to crash, and it fact they would like it to crash instead
of stalling!
  https://bugs.openjdk.java.net/browse/JDK-8181143

> It would make for a very good paragraph explaining this use case in the
> alternatives section.
> 
> Another problem with these two sentences to me is (and I am by no means
> a "FaaS power user") that I believe that waiting for the VM to
> crash/shut down to steer the load balancers is not a good strategy.
> Maybe you can give some more information about this use case?

JEP does not advocate for using this strategy. It just reports what users are doing in the wild. So,
discussion around this seems to go outside the scope of the JEP.

> In the earlier email I only directly asked for performance numbers
> because in order to streamline this discussion, and given that you are
> a well-known performance and benchmark guru (afaik you were "R"eviewer
> long before me joining) it seemed a logical request. If you can't find
> numbers, there is also the reference ("Barriers, Friendlier Still" or
> so from Blackburn et al I think) I got that is also mentioned iirc in
> the very good Jones GC book.
> "Real" newbies I would just ask to perform this test.

See here:
 https://shipilev.net/jvm-anatomy-park/13-intergenerational-barriers/

I considered it a bad taste to link my blog to the JEP, but I can do this anyway.

> "Alternatives
> ------------
> Again, try to make these alternative review balanced, and in context of
> the users the benefit is for.

I have rewritten that part with most things we have discussed, and tried to discuss why those
alternatives are not exactly better. I might still be missing some salient alternatives, because I
ran out of steam...

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20180312/42e80a6f/signature.asc>