RFR: 8282047: Enhance StringDecode/Encode microbenchmarks [v2]

Andrew Haley aph at openjdk.java.net
Fri Feb 25 09:44:03 UTC 2022


On Fri, 25 Feb 2022 09:19:33 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> Splitting out these micro changes from #7231
>> 
>> - Clean up and simplify setup and code
>> - Add variants with different inputs with varying lengths and encoding weights, but also relevant mixes of each so that we both cover interesting corner cases but also verify that performance behave when there's a multitude of input shapes. Both simple and mixed variants are interesting diagnostic tools.
>> - Drop all charsets from the default run configuration except UTF-8. Motivation: Defaults should give good coverage but try to keep runtimes at bay. Additionally if the charset tested can't encode the higher codepoints used in these micros the results might be misleading. If you for example test using ISO-8859-1 the UTF-16 bytes in StringDecode.decodeUTF16 will have all been replaced by `?`, so the test is effectively the same as testing ASCII-only.
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Brent review

test/micro/org/openjdk/bench/java/lang/StringDecode.java line 72:

> 70:     public void setup() {
> 71:         charset = Charset.forName(charsetName);
> 72:         asciiString = LOREM.substring(0, 32).getBytes(charset);

This is problematic IMO in that it's missing short strings such as "Claes". Average Java strings are about 32 bytes long AFAICR, and people writing (vectorized) ijntrinsics have a nasty habit of optimizing for long strings, to the detriment of typical-length ones.
Whether we like it or not, people will optimize for benchmarks, so it's important that benchmark data is realistic. The shortest here is 15 bytes, as far as I can see. I'd certainly include a short string of just a few bytes so that intrinsics don't cause regressions in important cases.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7516


More information about the core-libs-dev mailing list