Reviving JEP 230: Microbenchmark Suite

Thu Sep 6 16:28:42 UTC 2018

Hi,

I think this proposal tries hard to push for co-location without properly considering the drawbacks
of it.

The major drawback is: pushing corpus to JDK tree means we effectively tie the corpus versioning to
JDK versioning. Instead of golden corpus that could be used for cross-JDK tests (which is _much
more_ important with faster release cadence), we now would have the similar benchmarks carried over
in jdk12, in jdk13, in jdk14, etc.

And this gives us the cluster of problems:

 1. What happens when JMH updates are needed (e.g. infra bugs fixed, more features in APIs, or
profilers)? Are we pushing the JMH version update to all jdk repositories at once? How does that
work with the notion that some repositories get abandoned within months after inception?

 2. What happens when JMH API changes? The golden corpus is good for coordinated change like that:
we update the JMH APIs, and we fix all benchmarks at once. Co-located repository means you would
need to do this work N times, for each supported tree.

 3. What happens when benchmark is added *after* feature development? For example, when we add
benchmarks for jdk11 features when jdk12 is in development? How do these benchmarks get from jdk/jdk
to jdk-updates/jdkXu? Are we now looking into backporting the benchmarks too?

 4. What happens when benchmark has a testbug? Do we sift the changes to that benchmark down to
every jdk-updates/jdkXu?

Notice "when", not "if". Benchmarks are much more fluid than jtreg regression tests, and require
much more care during their lifetime. The separate jcstress repository was very handy when solving
all these problems wholesale.

While we can argue that some repositories would be abandoned for a reason (e.g. "we" do not care
about jdk10 once jdk11 is released), it is a kick in the gut for community maintainers that pick up
what Oracle abandoned. In other words, this argument is a moral hazard.

I guess we can make the argument that the "golden" corpus is the one in jdk/jdk, and that corpus
should be used for all benchmarks. But it also comes with a bunch of questions:

 a. Philosophical. This is jmh-jdk-microbenchmarks in disguise, which we already have, why bother?

 b. Educational. What exactly prevents the user who runs 8-feature-specific benchmark, from taking
jdk-updates/jdk11 corpus while running 11 binaries, and taking jdk/jdk while running 12 binaries? I
would guess build system changes for benchmark co-location would make that a natural thing to do.
How would users know that any of (1)...(4) pitfalls might be in effect? If we argue that internal
Oracle test systems are aware of this possibility, and act correctly to resolve it -- not only this
is a moral hazard, it is also a bad design, when a natural way is the faulty one.

 c. Technical. When users want to run benchmarks against already existing binary, for example,
8u181, 10.0.2, 11, and current dev, what do users do? Okay, checking out the entire jdk/jdk is a
bearable hassle. What then? Does OpenJDK build system produce a JAR somewhere in build/ that we need
to pick up from, and it runs normal JMH from there? This piles on "why bother" question above. Or
does OpenJDK build system know enough to accept outside JDK with its build target option that it
could be executed from "make ..."? This piles on "natural way' question above.

On 09/06/2018 05:16 PM, Claes Redestad wrote:
> A side-effect of the more rapid release cadence is that the bulk of new feature development is being
> done in project repositories. This creates some very specific challenges when developing benchmarks
> out-of-tree, especially those that that aim to track new and emerging APIs.

So, why take the entire corpus hostage for a handful of benchmarks against APIs that are not yet
stable? Can we put the {valhalla, amber, whatever} benchmarks in their specific JDK trees, and only
them? Once the feature graduates into the released JDK, its benchmarks get contributed to the
benchmark corpus, wherever it is.

> For starters we're pushed toward setting up branches that mirror the project layout of the various
> OpenJDK projects (valhalla, amber, ...), then we need to set up automated builds from each such
> branch, then have some automated means to match these artifacts with appropriate builds of the
> project-under-test etc. And how do we even deal with the case when the changes we really want to
> test are in javac? Nothing is impossible, of course, but the workarounds and added work is turning
> out to be costly.

Saying as the guy who supported j.u.c/VarHandles tests in jcstress, it did not feel costly.

Take a step back. Once benchmarks are done against the published APIs, they are just like any other
3rd party test suite that is run against JDK, be that jcstress, Lucene tests, bigapps, etc. Painting
the infrastructure need as additional hassle misses that this part has to be done efficiently for
3rd party suites anyway.

> By co-locating microbenchmarks, matching the JDK-under-test with an appropriate microbenchmark
> bundle would be trivial and automatic in most cases, and while one always need to be wary of subtle
> changes creeping into benchmark bundles and the JDK between builds, this is something we already
> test for automatically as regressions are detected.

It sounds that this co-location proposal tries to simplify operational issues for Oracle testing
systems, right?

I remember solving the matching problem in current corpus by splitting the corpus by minimal JDK
version required to run the benchmark, and building the parts separately with different
source/target, then trivially matching "ha, this is JDK 9, better run benchmarks-9.jar". This
mechanism seems very local, and does not give versioning headaches outlined above. I cannot remember
at which point all that was lumped together into the single module in jmh-jdk-microbenchmarks
project, which gives us the matching problem we are now trying to resolve.

> A standalone project can be considered a good enough fit for that case, so one alternative to
> moving all of jmh-jdk-microbenchmarks into the JDK would be keep maintaining the standalone project
> for benchmarks that are considered mature and stable. 

I prefer to keep separate jmh-jdk-microbenchmarks project, and do whatever JDK compatibility work
there. Putting the in-flight unstable benchmarks into the relevant non-mainline feature trees seems
to be the good compromise to shield the jmh-jdk-microbenchmarks from the need to address
feature-specific troubles with benchmarks under heavy development.

-Aleksey