RFR: 8314774: Optimize URLEncoder [v8]

Thu Aug 24 23:18:09 UTC 2023

On Thu, 24 Aug 2023 10:38:57 GMT, Glavo <duke at openjdk.org> wrote:

>> I mainly made these optimizations:
>> 
>> * Avoid allocating `StringBuilder` when there are no characters in the URL that need to be encoded;
>> * Implement a fast path for UTF-8.
>> 
>> In addition to improving performance, these optimizations also reduce temporary objects:
>> 
>> * It no longer allocates any object when there are no characters in the URL that need to be encoded;
>> * The initial size of StringBuilder is larger to avoid expansion as much as possible;
>> * For UTF-8, the temporary `CharArrayWriter`, strings and byte arrays are no longer needed.
>> 
>> The results of the `URLEncodeDecode` benchmark:
>> 
>> 
>> Before:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  5.587 ? 0.010  ms/op
>> 
>> After:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  3.582 ? 0.054  ms/op
>> 
>> 
>> I also updated the tests to add more test cases.
>
> Glavo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove UTF-8 fast path

I took a look at the `URLEncodeDecode` micro and noticed that it was generating strings with a lot of encoding required, and seemingly doing so by accident: most of the tokens are not set as a quick read of the code would say.

I patch this up like so, letting a random subset of strings be generated with chars that need encoding but most being plain:

```diff --git a/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java b/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
index 1bd98f9ed52..6b4ea3aaaca 100644

--- a/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
+++ b/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
@@ -37,6 +37,8 @@

 import java.io.UnsupportedEncodingException;
 import java.net.URLDecoder;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
 import java.util.Random;
 import java.util.concurrent.TimeUnit;

@@ -66,22 +68,22 @@ public class URLEncodeDecode {

     @Setup
     public void setupStrings() {
-        char[] tokens = new char[((int) 'Z' - (int) 'A' + 1) + ((int) 'z' - (int) 'a' + 1) + ((int) '9' - (int) '1' + 1) + 5];
+        char[] tokens = new char[((int) 'Z' - (int) 'A' + 1) + ((int) 'z' - (int) 'a' + 1) + ((int) '9' - (int) '0' + 1) + 4];
         int n = 0;
-        tokens[n++] = '0';
-        for (int i = (int) '1'; i <= (int) '9'; i++) {
+        for (int i = (int) '0'; i <= (int) '9'; i++) {
             tokens[n++] = (char) i;
         }
         for (int i = (int) 'A'; i <= (int) 'Z'; i++) {
             tokens[n++] = (char) i;
         }
-        for (int i = (int) 'a'; i <= (int) '<'; i++) {
+        for (int i = (int) 'a'; i <= (int) 'z'; i++) {
             tokens[n++] = (char) i;
         }
         tokens[n++] = '-';
         tokens[n++] = '_';
         tokens[n++] = '.';
         tokens[n++] = '*';
+        System.out.println(Arrays.toString(tokens));

         Random r = new Random(mySeed);
         testStringsEncode = new String[count];
@@ -89,10 +91,15 @@ public void setupStrings() {
         toStrings = new String[count];
         for (int i = 0; i < count; i++) {
             int l = r.nextInt(maxLength);
+            boolean needEncoding = r.nextInt(100) <= 15;
             StringBuilder sb = new StringBuilder();
             for (int j = 0; j < l; j++) {
                 int c = r.nextInt(tokens.length);
-                sb.append(tokens[c]);
+                if (needEncoding && r.nextInt(100) <= 3) {
+                    sb.append('[');
+                } else {
+                    sb.append(tokens[c]);
+                }
             }
             testStringsEncode[i] = sb.toString();
         }
@@ -115,14 +122,14 @@ public void setupStrings() {
     @Benchmark
     public void testEncodeUTF8(Blackhole bh) throws UnsupportedEncodingException {
         for (String s : testStringsEncode) {
-            bh.consume(java.net.URLEncoder.encode(s, "UTF-8"));
+            bh.consume(java.net.URLEncoder.encode(s, StandardCharsets.UTF_8));
         }
     }

     @Benchmark
     public void testDecodeUTF8(Blackhole bh) throws UnsupportedEncodingException {
         for (String s : testStringsDecode) {
-            bh.consume(URLDecoder.decode(s, "UTF-8"));
+            bh.consume(URLDecoder.decode(s, StandardCharsets.UTF_8));
         }
     }


With this a simple no-encoding-needed fast-path is enough to give a 3x speedup and a 50% allocation reduction:

Name                           (count) (maxLength) (mySeed) Cnt  Base   Error   Test   Error  Unit  Diff%
URLEncodeDecode.testEncodeUTF8    1024        1024        3  15 2.862 ± 0.395  1.068 ± 0.028 ms/op  62.7% (p = 0.000*)


URLEncodeDecode.testEncodeUTF8:·gc.alloc.rate.norm                  1024         1024         3  avgt   15  1072598.651 ±  12806.440    B/op
URLEncodeDecode.testEncodeUTF8:·gc.alloc.rate.norm                  1024         1024         3  avgt   15  492086.107 ±   1751.752    B/op


Fixing the microbenchmark of course means we don't do as much actual encoding, downplaying the cost and the potential win. What balance is realistic is a good question, but the unfixed micro only saw strings that need encoding and more than 40% of the chars in those strings needed a triple-char encoding which is extremely skewed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1692531267