RFR: 8316681: Rewrite URLEncoder.encode to use small reusable buffers
Claes Redestad
redestad at openjdk.org
Thu Sep 21 15:26:52 UTC 2023
On Thu, 21 Sep 2023 14:32:46 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> `URLEncoder` currently appends chars that needs encoding into a `java.io.CharArrayWriter`, converts that to a `String`, uses `String::getBytes` to get the encoded bytes and then appends these bytes in a escaped manner to the output stream. This is somewhat inefficient.
>
> This PR replaces the `CharArrayWriter` with a reusable `CharBuffer` + `ByteBuffer` pair. This allows us to encode to the output `StringBuilder` in small chunks, with greatly reduced allocation as a result.
>
> The exact size of the buffers is an open question, but generally it seems that a tiny buffer wins by virtue of allocating less, and that the per chunk overheads are relatively small.
Micros show a small throughput win and a large allocation reduction for variants that need to change the URL either partially or completely, and no regression when the URL remains unchanged:
Name (unchanged) Cnt Base Error Test Error Unit Diff%
URLEncodeDecode.testEncodeLatin1 0 15 3.471 ± 0.103 2.796 ± 0.078 ms/op 19.5% (p = 0.000*)
:gc.alloc.rate N/A 15 828.462 ± 25.054 673.090 ± 19.214 MB/sec -18.8% (p = 0.000*)
:gc.alloc.rate.norm N/A 15 3013680.062 ± 0.721 1972347.384 ± 0.540 B/op -34.6% (p = 0.000*)
:gc.count N/A 15 20.000 17.000 counts
:gc.time N/A 15 16.000 15.000 ms
URLEncodeDecode.testEncodeLatin1 75 15 1.269 ± 0.028 1.132 ± 0.029 ms/op 10.8% (p = 0.000*)
:gc.alloc.rate N/A 15 606.924 ± 12.993 443.802 ± 11.184 MB/sec -26.9% (p = 0.000*)
:gc.alloc.rate.norm N/A 15 807656.807 ± 0.191 526711.840 ± 0.197 B/op -34.8% (p = 0.000*)
:gc.count N/A 15 16.000 11.000 counts
:gc.time N/A 15/11 16.000 10.000 ms
URLEncodeDecode.testEncodeLatin1 100 15 0.542 ± 0.000 0.542 ± 0.000 ms/op -0.0% (p = 0.932 )
:gc.alloc.rate N/A 15 0.007 ± 0.000 0.007 ± 0.000 MB/sec 0.0% (p = 0.358 )
:gc.alloc.rate.norm N/A 15 3.730 ± 0.004 3.731 ± 0.001 B/op 0.0% (p = 0.356 )
:gc.count N/A 15 0.000 0.000 counts
URLEncodeDecode.testEncodeUTF8 0 15 3.469 ± 0.137 2.678 ± 0.023 ms/op 22.8% (p = 0.000*)
:gc.alloc.rate N/A 15 843.593 ± 32.562 711.747 ± 6.147 MB/sec -15.6% (p = 0.000*)
:gc.alloc.rate.norm N/A 15 3065136.041 ± 0.948 1999098.562 ± 0.161 B/op -34.8% (p = 0.000*)
:gc.count N/A 15 22.000 18.000 counts
:gc.time N/A 15 20.000 15.000 ms
URLEncodeDecode.testEncodeUTF8 75 15 1.337 ± 0.031 1.192 ± 0.107 ms/op 10.8% (p = 0.000*)
:gc.alloc.rate N/A 15 586.133 ± 13.420 429.661 ± 36.595 MB/sec -26.7% (p = 0.000*)
:gc.alloc.rate.norm N/A 15 821529.273 ± 0.216 533888.255 ± 0.744 B/op -35.0% (p = 0.000*)
:gc.count N/A 15 15.000 11.000 counts
:gc.time N/A 15/11 10.000 10.000 ms
URLEncodeDecode.testEncodeUTF8 100 15 0.542 ± 0.000 0.541 ± 0.000 ms/op 0.1% (p = 0.000*)
:gc.alloc.rate N/A 15 0.007 ± 0.000 0.007 ± 0.000 MB/sec 0.0% (p = 0.771 )
:gc.alloc.rate.norm N/A 15 3.731 ± 0.001 3.727 ± 0.002 B/op -0.1% (p = 0.000*)
:gc.count N/A 15 0.000 0.000 counts
* = significant
Invariant parameters used by above microbenchmarks:
encodeChars: 6
maxLength : 1024
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15865#issuecomment-1729715342
More information about the net-dev
mailing list