RFR: Allow setting of opsPerInvocation from setup functions

Fri Mar 24 16:57:33 UTC 2023

On Thu, 23 Mar 2023 10:27:03 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>Interesting example!

Thanks! I think the example is fun, the issue is pervasive in micro-benchmarking, and "branch history table stitching" (making up a word here) is not a very well-known effect. So I think some form of this example would be worthwhile to add, even without dynamic opsPerInvocation. 

I could change the example to work with static opsPerInvocation, following the variant in https://discourse.julialang.org/t/psa-microbenchmarks-remember-branch-history/17436 

This has the advantage of obviously excluding memory as the source of the effect (no one can claim that this is due to the working set still being in L1 from the last benchmark iteration), and the disadvantage of being less relatable: Iterating over a length 100k array that has 100 copies of the same random length 1000 pattern looks unnatural, and furthermore obscures the effect that this is a genuine benchmarking artifact -- the CPU cheats because the branch history table remembers previous iterations from the hidden jmh-generated benchmarking loop. 

>Things to think about: Do we allow to set opsPerInvocation multiple times during the run? Do we allow doing this in @Setup(Level.Invocation)?

True enough. We'd need to document the specifics of the implementation (which is: You can overwrite opsPerInvocation as often as you want, but only the last write counts). That issue alone could make this PR non-workable as stable public API :(

> The usual answer for things that cannot be captured in declarative form via annotations is going to Java API ...

I'd consider doing that instead if there was a sensible API to individually configure benchmarks and combine them into a single `Runner` for output formatting. Alas, almost all relevant functions in `Runner` are `private` instead of `protected`.

Nevertheless, I think figuring out or documenting a low-effort way of proper rescaling / dynamic opsPerInvocation would be worthwhile. 

Maybe the rescaling operation happening in the generated functions like `public BenchmarkTaskResult Foo_Throughput(InfraControl control, ThreadParams threadParams) throws Throwable` could get a hook? 

How would e.g. _you_ use jmh to benchmark a piece of complex code, where performance is meaningful only relative to a corpus of test-data (> 100 mb), like e.g. parse a stream of webserver logs? 

Obviously the test-data cannot be known at compile time, and obviously some kind of normalization is necessary, otherwise the time will only reflect the size of the corpus (and it's up to the user to select a corpus that matches the intended deployment, and to experiment to decide between cycles-per-byte, messages-per-second or whatever metric turns out to be most meaningful).

Using reflection to modify opsPerInvocation is good enough for my needs at the moment, now that I figured out how to do that. But I'd feel very dirty recommending that to other people.

-------------

PR Comment: https://git.openjdk.org/jmh/pull/97#issuecomment-1483118104