RFR: 8315098: Improve URLEncodeDecode microbenchmark [v3]

Wed Aug 30 15:28:22 UTC 2023

> The `URLEncodeDecode` microbenchmark accidentally generates strings with a lot of `'\u0000'` chars, heavily skewing towards strings that need to be encoded in a rather unrealistic what. To be more realistic the benchmark should test a mix of inputs.
> 
> This patch fixes these inadvertent cases, and sets up the benchmark for a healthier mix by default - adding controls to allow testing some mixed scenarios.
> 
> #15354 explore a few optimizations to `URLEncoder`, but due the nature of this microbenchmark a trivial fast-path scan for chars that need no encoding shows underwhelming results. With the modifications to this benchmark then a simple fast-path to `URLEncode.encode` shows a decent win when some or all the inputs remain unchanged:
> 
> 
> Name                           (encodeChars) (maxLength) (unchanged) Cnt  Base   Error   Test   Error  Unit  Diff%
> URLEncodeDecode.testDecodeUTF8             6        1024           0  15 3,307 ± 0,507  3,010 ± 0,048 ms/op   9,0% (p = 0,030 )
> URLEncodeDecode.testDecodeUTF8             6        1024          75  15 2,296 ± 0,003  2,313 ± 0,017 ms/op  -0,7% (p = 0,001*)
> URLEncodeDecode.testDecodeUTF8             6        1024         100  15 0,812 ± 0,010  0,819 ± 0,017 ms/op  -0,8% (p = 0,201 )
> URLEncodeDecode.testDecodeUTF8            35        1024           0  15 6,909 ± 0,065  7,192 ± 0,415 ms/op  -4,1% (p = 0,014 )
> URLEncodeDecode.testDecodeUTF8            35        1024          75  15 3,346 ± 0,206  3,320 ± 0,270 ms/op   0,8% (p = 0,753 )
> URLEncodeDecode.testDecodeUTF8            35        1024         100  15 0,794 ± 0,034  0,818 ± 0,015 ms/op  -3,0% (p = 0,016 )
> URLEncodeDecode.testEncodeUTF8             6        1024           0  15 2,434 ± 0,019  2,579 ± 0,120 ms/op  -6,0% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8             6        1024          75  15 1,764 ± 0,014  0,937 ± 0,012 ms/op  46,9% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8             6        1024         100  15 1,227 ± 0,008  0,401 ± 0,001 ms/op  67,4% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8            35        1024           0  15 6,177 ± 0,062  6,057 ± 0,199 ms/op   1,9% (p = 0,029 )
> URLEncodeDecode.testEncodeUTF8            35        1024          75  15 2,716 ± 0,023  1,876 ± 0,012 ms/op  30,9% (p = 0,000*)
> URLEncodeDecode.testEncodeUTF8            35        1024         100  15 1,220 ± 0,003  0,401 ± 0,001 ms/op  67,2% (p = 0,000*)
> 
> 
> A potential future improvement would be to extend test data with varying amounts of surrogate pairs, e.g....

Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:

  Print out distribution of generated strings, improve precision at extremes (short strings, very low percentages), fix issue with decodable not getting the expected distribution

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/15448/files
  - new: https://git.openjdk.org/jdk/pull/15448/files/6706826a..665666a0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15448&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15448&range=01-02

  Stats: 64 lines in 1 file changed: 49 ins; 6 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/15448.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15448/head:pull/15448

PR: https://git.openjdk.org/jdk/pull/15448