RFR: 8314774: Optimize URLEncoder [v8]

Thu Aug 24 12:14:27 UTC 2023

On Thu, 24 Aug 2023 10:38:57 GMT, Glavo <duke at openjdk.org> wrote:

>> I mainly made these optimizations:
>> 
>> * Avoid allocating `StringBuilder` when there are no characters in the URL that need to be encoded;
>> * Implement a fast path for UTF-8.
>> 
>> In addition to improving performance, these optimizations also reduce temporary objects:
>> 
>> * It no longer allocates any object when there are no characters in the URL that need to be encoded;
>> * The initial size of StringBuilder is larger to avoid expansion as much as possible;
>> * For UTF-8, the temporary `CharArrayWriter`, strings and byte arrays are no longer needed.
>> 
>> The results of the `URLEncodeDecode` benchmark:
>> 
>> 
>> Before:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  5.587 ? 0.010  ms/op
>> 
>> After:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  3.582 ? 0.054  ms/op
>> 
>> 
>> I also updated the tests to add more test cases.
>
> Glavo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove UTF-8 fast path

Does your benchmark test a healthy mix of strings? Some that need encoding, some that don't (perhaps mostly weighted so that most inputs need encoding only in the latter half - which is common since protocol+host seldom needs encoding) 

For strings that don't need encoding at all this optimization alone should get you close to the numbers for the full thing.

The heuristic to size the sb could perhaps discount chars we copy 1:1 to reduce allocation pressure (`i + ((s.length() - i) << 1)`)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1691559924