RFR: 8315098: Improve URLEncodeDecode microbenchmark

Wed Aug 30 13:48:10 UTC 2023

On Mon, 28 Aug 2023 13:33:46 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> The `URLEncodeDecode` microbenchmark accidentally generates strings with a lot of `'\u0000'` chars, heavily skewing towards strings that need to be encoded in a rather unrealistic what. To be more realistic the benchmark should test a mix of inputs.
> 
> This patch fixes these inadvertent cases, and sets up the benchmark for a healthier mix by default - adding controls to allow testing some mixed scenarios.
> 
> #15354 explore a few optimizations to `URLEncoder`, but due the nature of this microbenchmark a trivial fast-path scan for chars that need no encoding shows underwhelming results. With the modifications to this benchmark then a simple fast-path to `URLEncode.encode` shows a decent win when some or all the inputs remain unchanged:
> 
> 
> Name                           (encodeChars) (maxLength) (unchanged) Cnt  Base   Error   Test   Error  Unit  Diff%
> URLEncodeDecode.testDecodeUTF8             6        1024           0  15 3,307 ± 0,507  3,010 ± 0,048 ms/op   9,0% (p = 0,030 )
> URLEncodeDecode.testDecodeUTF8             6        1024          75  15 2,296 ± 0,003  2,313 ± 0,017 ms/op  -0,7% (p = 0,001*)
> URLEncodeDecode.testDecodeUTF8             6        1024         100  15 0,812 ± 0,010  0,819 ± 0,017 ms/op  -0,8% (p = 0,201 )
> URLEncodeDecode.testDecodeUTF8            35        1024           0  15 6,909 ± 0,065  7,192 ± 0,415 ms/op  -4,1% (p = 0,014 )
> URLEncodeDecode.testDecodeUTF8            35        1024          75  15 3,346 ± 0,206  3,320 ± 0,270 ms/op   0,8% (p = 0,753 )
> URLEncodeDecode.testDecodeUTF8            35        1024         100  15 0,794 ± 0,034  0,818 ± 0,015 ms/op  -3,0% (p = 0,016 )
> URLEncodeDecode.testEncodeUTF8             6        1024           0  15 2,434 ± 0,019  2,579 ± 0,120 ms/op  -6,0% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8             6        1024          75  15 1,764 ± 0,014  0,937 ± 0,012 ms/op  46,9% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8             6        1024         100  15 1,227 ± 0,008  0,401 ± 0,001 ms/op  67,4% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8            35        1024           0  15 6,177 ± 0,062  6,057 ± 0,199 ms/op   1,9% (p = 0,029 )
> URLEncodeDecode.testEncodeUTF8            35        1024          75  15 2,716 ± 0,023  1,876 ± 0,012 ms/op  30,9% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8            35        1024         100  15 1,220 ± 0,003  0,401 ± 0,001 ms/op  67,2% (p = 0,000*)
> 
> 
> A potential future improvement would be to extend test data with varying amounts of surrogate pairs, e.g....

Thanks for improving this microbenchmark Claes. Changes look good to me. Using 1024 strings  should ensure that at least some of them have some characters that need to be encoded/decoded.

-------------

Marked as reviewed by dfuchs (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15448#pullrequestreview-1602752757