Reviving JEP 230: Microbenchmark Suite

Thu Sep 6 17:30:27 UTC 2018

Hi,

On 2018-09-06 18:28, Aleksey Shipilev wrote:
> Hi,
>
> I think this proposal tries hard to push for co-location without properly considering the drawbacks
> of it.

I've considered many more pros and cons than has been outlined in this 
introductory "Hey,
I intend to work on this thing again" proposal. But I digress... :-)

>
> The major drawback is: pushing corpus to JDK tree means we effectively tie the corpus versioning to
> JDK versioning. Instead of golden corpus that could be used for cross-JDK tests (which is _much
> more_ important with faster release cadence), we now would have the similar benchmarks carried over
> in jdk12, in jdk13, in jdk14, etc.

Microbenchmark bundles would effectively be released and versioned with 
their respective
JDK. If I want to compare back to jdk12, I'd use the "golden" bundle 
released with that
build (or some earlier).

The number of times we've decided to upgrade to a new version of the 
microbenchmarks
corpus for an older release to date is zero, while the number of times 
we've added in new
configurations for benchmarks in development is significant. Optimizing 
the process for
ongoing development seems sensible, but of course we should try to make 
backporting
etc as manageable as possible. I think the drawbacks you're 
hypothesizing about below will
turn out to be more manageable in reality than you make them out to be.

>
> And this gives us the cluster of problems:
>
>   1. What happens when JMH updates are needed (e.g. infra bugs fixed, more features in APIs, or
> profilers)? Are we pushing the JMH version update to all jdk repositories at once? How does that
> work with the notion that some repositories get abandoned within months after inception?

On a case by case basis.

The main purpose of "golden" bundles is to not have to make any changes 
and preserve
predictable results over a longer time trend, so I see backports as 
something to be avoided.
This is in line with the new release model: we backport stability and 
critical bug fixes,
not features.

When they are needed, they should follow the typical path of being 
pushed to jdk/jdk first,
then backport back as far as necessary.

>
>   2. What happens when JMH API changes? The golden corpus is good for coordinated change like that:
> we update the JMH APIs, and we fix all benchmarks at once. Co-located repository means you would
> need to do this work N times, for each supported tree.

I think this would continue to be a rare occurrence, but if a 
coordinated change has to
be done I'm sure it can be done.

>   3. What happens when benchmark is added *after* feature development? For example, when we add
> benchmarks for jdk11 features when jdk12 is in development? How do these benchmarks get from jdk/jdk
> to jdk-updates/jdkXu? Are we now looking into backporting the benchmarks too?

I'd avoid it, but it seems reasonable to allow backporting benchmarks, 
or otherwise
ensure we can build bundles that are runnable on update releases.

>   4. What happens when benchmark has a testbug? Do we sift the changes to that benchmark down to
> every jdk-updates/jdkXu?
>
> Notice "when", not "if". Benchmarks are much more fluid than jtreg regression tests, and require
> much more care during their lifetime. The separate jcstress repository was very handy when solving
> all these problems wholesale.

Arguably, it also goes the other way: certain benchmarks need to be 
different to capture
the underlying behavior of the JDK under test, especially if what you're 
testing is
provoking some indirect behavior (GC, reference processing). I don't 
think there's a
one-size-fits-all solution here, and the "golden" standard that is 
jmh-jdk-microbenchmarks
might not always be doing the "right thing" on older and newer JDKs alike.

>
> While we can argue that some repositories would be abandoned for a reason (e.g. "we" do not care
> about jdk10 once jdk11 is released), it is a kick in the gut for community maintainers that pick up
> what Oracle abandoned. In other words, this argument is a moral hazard.
>
> I guess we can make the argument that the "golden" corpus is the one in jdk/jdk, and that corpus
> should be used for all benchmarks. But it also comes with a bunch of questions:
>
>   a. Philosophical. This is jmh-jdk-microbenchmarks in disguise, which we already have, why bother?

I think there are several benefits outside of those outlined here.

>
>   b. Educational. What exactly prevents the user who runs 8-feature-specific benchmark, from taking
> jdk-updates/jdk11 corpus while running 11 binaries, and taking jdk/jdk while running 12 binaries? I
> would guess build system changes for benchmark co-location would make that a natural thing to do.
> How would users know that any of (1)...(4) pitfalls might be in effect? If we argue that internal
> Oracle test systems are aware of this possibility, and act correctly to resolve it -- not only this
> is a moral hazard, it is also a bad design, when a natural way is the faulty one.

Documenting best practices and why doing comparisons using benchmarks 
picked from
different places are hazardous is part of the plan for this JEP.

I think a much more common use-case is that you have a change against 
the repo you're
working on and you want to test it locally. Having a "natural way" to 
build and run said
changes using the benchmarks available and under development in the same 
repo would
be great for many.

>
>   c. Technical. When users want to run benchmarks against already existing binary, for example,
> 8u181, 10.0.2, 11, and current dev, what do users do? Okay, checking out the entire jdk/jdk is a
> bearable hassle. What then? Does OpenJDK build system produce a JAR somewhere in build/ that we need
> to pick up from, and it runs normal JMH from there? This piles on "why bother" question above. Or
> does OpenJDK build system know enough to accept outside JDK with its build target option that it
> could be executed from "make ..."? This piles on "natural way' question above.

A build target to produce a JAR somewhere is planned, yes. I think tools 
to help pick up the
right JDKs and benchmark JARs is out of scope for this JEP, but 
definitely something that
could be worked on and contributed separately.

>
> On 09/06/2018 05:16 PM, Claes Redestad wrote:
>> A side-effect of the more rapid release cadence is that the bulk of new feature development is being
>> done in project repositories. This creates some very specific challenges when developing benchmarks
>> out-of-tree, especially those that that aim to track new and emerging APIs.
> So, why take the entire corpus hostage for a handful of benchmarks against APIs that are not yet
> stable? Can we put the {valhalla, amber, whatever} benchmarks in their specific JDK trees, and only
> them? Once the feature graduates into the released JDK, its benchmarks get contributed to the
> benchmark corpus, wherever it is.

I have outlined this as an alternative, which you replied to below.

>
>> For starters we're pushed toward setting up branches that mirror the project layout of the various
>> OpenJDK projects (valhalla, amber, ...), then we need to set up automated builds from each such
>> branch, then have some automated means to match these artifacts with appropriate builds of the
>> project-under-test etc. And how do we even deal with the case when the changes we really want to
>> test are in javac? Nothing is impossible, of course, but the workarounds and added work is turning
>> out to be costly.
> Saying as the guy who supported j.u.c/VarHandles tests in jcstress, it did not feel costly.
>
> Take a step back. Once benchmarks are done against the published APIs, they are just like any other
> 3rd party test suite that is run against JDK, be that jcstress, Lucene tests, bigapps, etc. Painting
> the infrastructure need as additional hassle misses that this part has to be done efficiently for
> 3rd party suites anyway.

The general experience around here is that co-locating test suites has 
greatly increased
productivity in several ways. There are naturally valid reasons not to 
in specific cases, be
they practical or legal.

>
>
>> By co-locating microbenchmarks, matching the JDK-under-test with an appropriate microbenchmark
>> bundle would be trivial and automatic in most cases, and while one always need to be wary of subtle
>> changes creeping into benchmark bundles and the JDK between builds, this is something we already
>> test for automatically as regressions are detected.
> It sounds that this co-location proposal tries to simplify operational issues for Oracle testing
> systems, right?

It's a motivating factor, yes, but not the sole or even main motivator 
for this. Things like
simplifying for *any* dev to contribute microbenchmarks for the feature 
they are working
on is higher up on my list.

>
> I remember solving the matching problem in current corpus by splitting the corpus by minimal JDK
> version required to run the benchmark, and building the parts separately with different
> source/target, then trivially matching "ha, this is JDK 9, better run benchmarks-9.jar". This
> mechanism seems very local, and does not give versioning headaches outlined above. I cannot remember
> at which point all that was lumped together into the single module in jmh-jdk-microbenchmarks
> project, which gives us the matching problem we are now trying to resolve.

I think the corpus was cleaned up to be a single module before it was 
open sourced, but I
wasn't part of the decision process for that so I can't tell what the 
reasoning at the time
was.

>
>> A standalone project can be considered a good enough fit for that case, so one alternative to
>> moving all of jmh-jdk-microbenchmarks into the JDK would be keep maintaining the standalone project
>> for benchmarks that are considered mature and stable.
> I prefer to keep separate jmh-jdk-microbenchmarks project, and do whatever JDK compatibility work
> there. Putting the in-flight unstable benchmarks into the relevant non-mainline feature trees seems
> to be the good compromise to shield the jmh-jdk-microbenchmarks from the need to address
> feature-specific troubles with benchmarks under heavy development.

Something along these lines could be a reasonable compromise, yes. It 
does carry with it
some questions, such as do we keep the benchmarks we've developed in the 
jdk repo
forever, or remove them as we migrate them to jmh-jdk-microbenchmarks?

/Claes