EpsilonGC and throughput.

Thu Jan 11 01:04:36 UTC 2018

Thomas,

Thank you for bringing up these questions and comments. While I think it would be great to get some additional data and use case information for this feature added to the JEP, the isolated nature of the feature along with the fact that it is experimental means that the impact of making it is relatively small. With that in mind, I suggest that we move forward with this JEP/feature, and that more information can be added if/when it’s available. In line with that I will be endorsing the JEP shortly.

Cheers,
Mikael

> On Jan 8, 2018, at 8:45 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Aleksey,
> 
>  I apologize for my somewhat inappropriate words, this has been due to
> some frustration; also for the long delay that were due to the winter
> holidays.
> 
> Let's try to start all over with this... I will try to be constructive
> this time. Feel free to remind me if needed.
> 
> One purpose of the JEP is to share a problem and propose an idea (often
> already accompanied by a solution) to solve them. This problem and the
> idea is then discussed by the community, eventually refining it along
> the way.
> 
> The community then evaluates that idea based on its contents, of course
> starting with the people trying to determine whether there is a
> problem, what the problem is, and whether the proposed idea will fix
> the problem.
> 
> For this evaluation to happen, the JEP needs to clearly state the
> problem, it's seriousness, and the proposed idea.
> 
> It also helps if the JEP is written in a way to make it interesting for
> the community to read it, and respond. The less thinking a reader has
> to do to answer whether he is impacted or not, and whether and by how
> much it would simplify the life of himself or in general Java users,
> the more people will feel urged to get this in (or at least not
> deterred).
> 
> Finally, I assume you do understand that, in general, although there is
> always a certain level of duplication in the VM, but if a change only
> solves the problems that existing code already solves, or solves
> problems almost nobody has, or it does not give enough benefit (also
> dependent on the complexity of a change), it makes it a hard(er) sell?
> 
> So the JEP template (http://openjdk.java.net/jeps/2) provides some
> questions on how to structure this idea proposal and what to put into
> the various sections.
> 
> In general this is to help you providing the relevant information to
> the community. While this might be onerous for a writer at first
> glance, it saves everyone else lots of time trying to find out what and
> how you want to solve something.
> 
> 
> I am going over the Motivation section in detail in the remainder of
> this email, with some comments at the end about the Alternatives one
> which seem to be the most important here.
> 
> The JEP template states under the Motivation section:
> 
> "Motivation
> ----------
> 
> // Why should this work be done?  What are its benefits?  Who's asking
> // for it?  How does it compare to the competition, if any?"
> 
> 
> Now let me try to associate these questions to the relevant parts of
> the existing JEP 318 (http://openjdk.java.net/jeps/318) text.
> 
> And please, before reading below, I really do not want to shoot down
> the proposal if you see a question mark. It should indicate just that
> there is a question where I honestly do not know the answer to, but
> which I hope you do. Similarly if I raise some concerns about some
> statements I expect you to notice that there may be something missing
> here, nothing else. I.e. not necessarily that I am "right" about
> something. You said you already talked about it many times with other
> people in the field, thought it over for a long time, so hopefully
> these questions can be answered quickly, and in the future the JEP also
> contains this information for other people too.
> 
> Some may not need an answer as they only try to make you think about
> the seriousness of a stated problem.
> 
> JEP text: "Java implementations are well known for a broad choice of
> highly configurable GC implementations."
> 
> Potential answer to "Why should this work be done?". Or does the
> sentence indicate we need another GC because we already have so many,
> and another does not hurt? I am asking this in full seriousness, I
> really do not know. Or is this only an introductory sentence without
> meaning?
> 
> JEP text: "There are four use cases where a trivial no-op GC proves
> useful."
> 
> This seems to be a transition sentence, but is fine to make it flow
> better.
> 
> Reading this, and given that only a list of benefits follows, I assume
> that these two sentences were supposed to answer the "Why should this
> work be done? Who's asking for it?" questions from the JEP.
> 
> In the earlier email you mentioned these power users that want full
> control. Mention them here. Define them. Also mention other user groups
> that might be interested. Particularly groups the benefits list could
> refer to.
> 
> Let's go into these benefits in more detail:
> 
> JEP text: "Performance testing. Having a GC that does almost nothing is
> a useful tool to do differential performance analysis for other, real
> GCs. Having a no-op GC can help to filter out GC-induced performance
> artifacts."
> 
> Benefit. Maybe it would be useful to list a few of these performance
> artifacts here ("... , e.g. barrier code, concurrent threads"). 
> 
> Who are the benefactors of this? Not sure about these "power users"
> (see M. Berger's response in this exact thread). Probably developers of
> new GC algorithms?
> 
> An alternative could be a developer just nop'ing out the relevant GC
> interface section. That is somewhat cumbersome, but for how many users
> is this a problem? Spell that out in the appropriate Alternatives
> section.
> 
> Also tell that using Epsilon GC for barrier testing may not be an ideal
> tool, because all other existing collectors are generational (but in
> the future it might apply to Shenandoah unless it goes generational
> too, idk), and testing generational barriers on a non-generational heap
> may not give a complete picture of barrier overhead.
> 
> JEP text: "Functional testing. For Java code testing, a way to
> establish a threshold for allocated memory is useful to assert memory
> pressure invariants. Today, we have to pick up the allocation data from
> MXBeans, or even resort to parsing GC logs. Having a GC that accepts
> only the bounded number of allocations, and fails on heap exhaustion,
> simplifies testing."
> 
> Benefit. For regression testing, in how many cases do you think it is
> sufficient (or in what circumstances) to get a fail/no-fail answer
> only?
> This seems to pass work on a failure to the dev, them needing to write
> another test that also prints and monitors the memory usage increases
> over time anyway.
> How much work, given that you already need to monitor memory usage is
> the test to fail when heap usage goes above a threshold then?
> 
> "VM interface testing. For VM development purposes, having a simple GC
> helps to understand the absolute minimum required from the VM-GC
> interface to have a functional allocator. This serves as proof that the
> VM-GC interface is sane, which is important in lieu of JEP 304
> ("Garbage Collector Interface")."
> 
> Benefit. Who are the (main) benefactors for that - probably developers?
> For a developer, how much is that benefit if there are already 5 or 6
> implementations of that interface?
> 
> "Last-drop performance improvements. For ultra-latency-sensitive
> applications, where developers are conscious about memory allocations
> and know the application memory footprint exactly, or even have
> (almost) completely garbage-free applications. In those applications,
> GC cycles may be considered an implementation bug that wastes CPU
> cycles for no good reason."
> 
> This is the only benefit in this list that actually mentions its target
> group. I assume it is those power users (not necessarily developers
> only?), that are ultra-latency aware. This paragraph further
> characterizes them that they are also throughput conscious.
> The discussion earlier also characterized them as also being very
> conscious about memory layout etc, they do not want object reordering
> because it is inconsistent between GCs (which is a different issue, and
> I do not want to discuss it here).
> 
> From what I gathered so far, they want absolute control over memory
> management - but the real question is whether this is their real or
> only problem with the Java VM to achieve consistent VM behavior.
> There are certainly more components in the VM that introduce
> potentially more significant jitter (now assuming that that power user
> can set heap sizes accordingly to use e.g. Serial GC).
> 
> This execution consistency is maybe another goal that is even more
> important than last-drop performance.
> 
> It may be useful to investigate the problem of these power users in
> more detail, and see if we could provide a (more?) complete solution
> for them.
> 
> 
> "Extremely short lived jobs are one example of this."
> 
> I do not understand the use of Epsilon in such use case. The
> alternative I can see would be to restart the VM after every short
> lived job (something for the Alternatives section). That seems strange
> to me, depending on the definition of a "short lived job", particularly
> if nothing survives after execution of that short lived job, a GC will
> be extremely fast.
> 
> Further I assume this example is about FaaS (Function-as-a-service) and
> their users, and while there may be an overlap with those "power
> users", I would expect the "regular java users" a way larger group than
> the power users. There may be an overlap with those power users, power
> users probably would not want to incur the associated loss of control.
> 
> "There are also cases when restarting the JVM -- letting load balancers
> figure out failover -- is sometimes a better recovery strategy than
> accepting a GC cycle."
> 
> I really can't find a good example where a GC, particularly in the
> situation that has been described so far, also for these short-lived
> jobs, where a GC (on an almost empty heap) is not at least as fast as a
> restart.
> 
> It would make for a very good paragraph explaining this use case in the
> alternatives section.
> 
> Another problem with these two sentences to me is (and I am by no means
> a "FaaS power user") that I believe that waiting for the VM to
> crash/shut down to steer the load balancers is not a good strategy.
> Maybe you can give some more information about this use case?
> 
> "Even for non-allocating workloads, the choice of GC means choosing the
> set of GC barriers that the workload has to use, even if no GC cycle
> actually happens. Most JDK GCs are generational, and they emit at least
> one reference write barrier. Avoiding this barrier brings the last bit
> of performance improvement."
> 
> (_All_ JDK GCs are currently generational)
> 
> Now, as mentioned earlier in the thread, when talking about performance
> improvements, it would be nice to mention the potential gains that can
> be made (or elsewhere, like in the alternatives section). There is
> already an implementation, and so you can measure this too.
> 
> Please make your comparison in context: since this whole paragraph is
> about last-drop performance improvements for power users, a balanced
> comparison would probably only be a comparison that such a power user
> would do - i.e. not running the VM with randomly selected default
> options that arbitrarily penalizes your competition.
> 
> In the earlier email I only directly asked for performance numbers
> because in order to streamline this discussion, and given that you are
> a well-known performance and benchmark guru (afaik you were "R"eviewer
> long before me joining) it seemed a logical request. If you can't find
> numbers, there is also the reference ("Barriers, Friendlier Still" or
> so from Blackburn et al I think) I got that is also mentioned iirc in
> the very good Jones GC book.
> "Real" newbies I would just ask to perform this test.
> 
> 
> In our discussion we found at least one more, actually unique benefit
> (the one about getting correct heap dumps on failure).
> 
> 
> Of course there is a limit on the length of that section and others
> (i.e. considering the attention span of your readership), but all
> questions asked by the JEP template should be answered in the
> corresponding section. There is some intentional overlap in the JEP,
> particularly in the first three sections, similar to a scientific paper
> so that different groups of readers need only read the sections they
> are interested in to see whether this change is actually affecting them
> (and interesting to follow).
> 
> It shouldn't be as long as a scientific paper though, so if you think a
> section is too long, drop the less impactful benefits, and other parts
> of the JEP will automatically follow.
> 
> Again, given your experience with the VM I assume you know alternatives
> as good or even better than me to make a balanced assessment here. 
> 
> Otherwise, keep them and please raise specific questions.
> 
> As for the Alternatives section, it is the same procedure, start with
> answering the questions raised in the template:
> 
> "Alternatives
> ------------
> 
> // Did you consider any alternative approaches or technologies?  If so
> // then please describe them here and explain why they were not 
> // chosen."
> 
> I would assume that for all of these benefits we can easily come up
> with alternative ways of doing the same or a similar thing (I already
> stated a few alternatives that I think are very valid in this or
> previous emails; some valid ones are already in the JEP), and why we
> would want to particularly do it this way given the context of that
> benefit (e.g. the user group). If there is no alternative, add a
> sentence that says so in that section.
> 
> Again, try to make these alternative review balanced, and in context of
> the users the benefit is for.
> 
> This section should imho also include a discussion of "mostly complete
> alternatives", as suggested in this email thread already, e.g. adding a
> -XX:+DieOnFirstGC switch, and reasons for and against it.
> 
> Please understand that the JEP will be the reference to talk about, not
> some email or private offline discussions. Keeping that in mind I think
> discussions will go much smoother.
> 
> I hope I made clear now why I, unfortunately not in a very friendly way
> (apologies again), suggested that the current JEP text lacks the
> required answers to the questions stated in the JEP template to (re-
> )start a hopefully more focused discussion.
> 
> Thanks,
>  Thomas
>