RFR: 8314774: Optimize URLEncoder [v8]
Claes Redestad
redestad at openjdk.org
Thu Aug 24 23:18:09 UTC 2023
On Thu, 24 Aug 2023 10:38:57 GMT, Glavo <duke at openjdk.org> wrote:
>> I mainly made these optimizations:
>>
>> * Avoid allocating `StringBuilder` when there are no characters in the URL that need to be encoded;
>> * Implement a fast path for UTF-8.
>>
>> In addition to improving performance, these optimizations also reduce temporary objects:
>>
>> * It no longer allocates any object when there are no characters in the URL that need to be encoded;
>> * The initial size of StringBuilder is larger to avoid expansion as much as possible;
>> * For UTF-8, the temporary `CharArrayWriter`, strings and byte arrays are no longer needed.
>>
>> The results of the `URLEncodeDecode` benchmark:
>>
>>
>> Before:
>> Benchmark (count) (maxLength) (mySeed) Mode Cnt Score Error Units
>> URLEncodeDecode.testEncodeUTF8 1024 1024 3 avgt 15 5.587 ? 0.010 ms/op
>>
>> After:
>> Benchmark (count) (maxLength) (mySeed) Mode Cnt Score Error Units
>> URLEncodeDecode.testEncodeUTF8 1024 1024 3 avgt 15 3.582 ? 0.054 ms/op
>>
>>
>> I also updated the tests to add more test cases.
>
> Glavo has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove UTF-8 fast path
I took a look at the `URLEncodeDecode` micro and noticed that it was generating strings with a lot of encoding required, and seemingly doing so by accident: most of the tokens are not set as a quick read of the code would say.
I patch this up like so, letting a random subset of strings be generated with chars that need encoding but most being plain:
```diff --git a/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java b/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
index 1bd98f9ed52..6b4ea3aaaca 100644
--- a/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
+++ b/test/micro/org/openjdk/bench/java/net/URLEncodeDecode.java
@@ -37,6 +37,8 @@
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@@ -66,22 +68,22 @@ public class URLEncodeDecode {
@Setup
public void setupStrings() {
- char[] tokens = new char[((int) 'Z' - (int) 'A' + 1) + ((int) 'z' - (int) 'a' + 1) + ((int) '9' - (int) '1' + 1) + 5];
+ char[] tokens = new char[((int) 'Z' - (int) 'A' + 1) + ((int) 'z' - (int) 'a' + 1) + ((int) '9' - (int) '0' + 1) + 4];
int n = 0;
- tokens[n++] = '0';
- for (int i = (int) '1'; i <= (int) '9'; i++) {
+ for (int i = (int) '0'; i <= (int) '9'; i++) {
tokens[n++] = (char) i;
}
for (int i = (int) 'A'; i <= (int) 'Z'; i++) {
tokens[n++] = (char) i;
}
- for (int i = (int) 'a'; i <= (int) '<'; i++) {
+ for (int i = (int) 'a'; i <= (int) 'z'; i++) {
tokens[n++] = (char) i;
}
tokens[n++] = '-';
tokens[n++] = '_';
tokens[n++] = '.';
tokens[n++] = '*';
+ System.out.println(Arrays.toString(tokens));
Random r = new Random(mySeed);
testStringsEncode = new String[count];
@@ -89,10 +91,15 @@ public void setupStrings() {
toStrings = new String[count];
for (int i = 0; i < count; i++) {
int l = r.nextInt(maxLength);
+ boolean needEncoding = r.nextInt(100) <= 15;
StringBuilder sb = new StringBuilder();
for (int j = 0; j < l; j++) {
int c = r.nextInt(tokens.length);
- sb.append(tokens[c]);
+ if (needEncoding && r.nextInt(100) <= 3) {
+ sb.append('[');
+ } else {
+ sb.append(tokens[c]);
+ }
}
testStringsEncode[i] = sb.toString();
}
@@ -115,14 +122,14 @@ public void setupStrings() {
@Benchmark
public void testEncodeUTF8(Blackhole bh) throws UnsupportedEncodingException {
for (String s : testStringsEncode) {
- bh.consume(java.net.URLEncoder.encode(s, "UTF-8"));
+ bh.consume(java.net.URLEncoder.encode(s, StandardCharsets.UTF_8));
}
}
@Benchmark
public void testDecodeUTF8(Blackhole bh) throws UnsupportedEncodingException {
for (String s : testStringsDecode) {
- bh.consume(URLDecoder.decode(s, "UTF-8"));
+ bh.consume(URLDecoder.decode(s, StandardCharsets.UTF_8));
}
}
With this a simple no-encoding-needed fast-path is enough to give a 3x speedup and a 50% allocation reduction:
Name (count) (maxLength) (mySeed) Cnt Base Error Test Error Unit Diff%
URLEncodeDecode.testEncodeUTF8 1024 1024 3 15 2.862 ± 0.395 1.068 ± 0.028 ms/op 62.7% (p = 0.000*)
URLEncodeDecode.testEncodeUTF8:·gc.alloc.rate.norm 1024 1024 3 avgt 15 1072598.651 ± 12806.440 B/op
URLEncodeDecode.testEncodeUTF8:·gc.alloc.rate.norm 1024 1024 3 avgt 15 492086.107 ± 1751.752 B/op
Fixing the microbenchmark of course means we don't do as much actual encoding, downplaying the cost and the potential win. What balance is realistic is a good question, but the unfixed micro only saw strings that need encoding and more than 40% of the chars in those strings needed a triple-char encoding which is extremely skewed.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1692531267
More information about the net-dev
mailing list