RFR: 8259074: regex benchmarks and tests

Mon Feb 1 22:27:04 UTC 2021

On 2021-02-01 21:54, Martin Buchholz wrote:
> On Mon, 1 Feb 2021 10:22:14 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> 
>>> @amalloy - you are invited to comment on regex content
>>> @cl4es @shipilev - you are invited to point out my jmh bad practices
>>
>> The assertion discussion aside, the micros look fine to me.
>>
>> With an eye towards reducing total run time I'd ask you to consider if all parameter combinations are useful or if we can get the same value after some pruning.
> 
> @cl4es I agree pruning is a good idea. I settled on 3 data points with 16x separations as good enough to clearly show the difference between O(1) O(N) O(N^2) and O(2^N) (although O(2^N) would "run forever").
> 
> (although ... please tell me you're not actually running these benchmarks in an automated fashion ... too expensive, and needs a human to interpret the results)

No, we don't automatically pick up and run new microbenchmarks, but I
still think it's good if the checked in parameter combinations are a
reasonable approximation of what _someone_ should be running every now
and then.

A manual exploration of a new set of micros would naturally start with
the default config, so if such a config runs forever, that would be poor
ergonomics IMHO. I don't think such configurations should be checked in
in an active state. I'd opt for a comment on how to produce a regex that
would exponentially backtrack in such ways, for those who might want to
explore such corner cases.

(It would be nice with some JMH analogue of jtreg's "manual", so
that one can mark benchmarks that should are excluded from automatic or
wildcard executions. Maybe there exists some way to do this already?)

> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/1940
>