Performance degradation due to probable (?) C2 issue
Сергей Цыпанов
sergei.tsypanov at yandex.ru
Tue Jul 28 10:35:35 UTC 2020
Hello,
I've run into a strange issue while trying to improve java.net.URLEncoder.encode() for the case URL contains UTF-8 symbols.
The idea of the fix it to replace the contents of line 276
String str = new String(charArrayWriter.toCharArray());
with
String str = charArrayWriter.toString());
The CharArrayWriter.toCharArray() allocates a copy of underlying char[] which is passed into String constructor,
while CharArrayWriter.toString() passes the char[] to String constructor direclty. In theory this must give us
ceratin improvement both in time and memory as we don't allocate redundant char[]. To verify it I've used the benchmark
encoding the link to article about UN in Russian wiki:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g", "-XX:+UseParallelGC"})
public class UrlEncoderBenchmark {
private final Charset charset = Charset.defaultCharset();
private final String utf8Url = "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";
@Benchmark
public String encodeUtf8() {
return URLEncoder.encode(utf8Url, charset);
}
}
In practise it turned out that we win only in interpreter and tier1:
Benchmark Mode Cnt Score Error Units
-Xint before
UrlEncoderBenchmark.encodeUtf8 avgt 100 179.905 ± 2.498 us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1712.752 ± 0.542 B/op
-Xint after
UrlEncoderBenchmark.encodeUtf8 avgt 100 173.323 ± 3.459 us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1552.409 ± 0.339 B/op
-XX:TieredStopAtLevel=1 before
UrlEncoderBenchmark.encodeUtf8 avgt 100 3.846 ± 0.021 us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1712.271 ± 0.011 B/op
-XX:TieredStopAtLevel=1 after
UrlEncoderBenchmark.encodeUtf8 avgt 100 3.732 ± 0.013 us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1552.246 ± 0.014 B/op
Here we see that we indeed consume less time and memory. However in case of full compilation we have severe degraddation (+ 30%)
in time consumption while as of memory we still have the same improvement:
before
UrlEncoderBenchmark.encodeUtf8 avgt 100 1108.668 ± 6.226 ns/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1712.202 ± 0.003 B/op
after
UrlEncoderBenchmark.encodeUtf8 avgt 100 1454.647 ± 6.067 ns/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1528.219 ± 0.007 B/op
As inlining log says in the second case ther's something wrong:
Compilation before
@ 186 java.io.CharArrayWriter::flush (1 bytes) inline (hot)
!m @ 195 java.io.CharArrayWriter::toCharArray (26 bytes) inline (hot)
@ 15 java.util.Arrays::copyOf (19 bytes) inline (hot)
@ 11 java.lang.Math::min (11 bytes) (intrinsic)
@ 14 java.lang.System::arraycopy (0 bytes) (intrinsic)
@ 198 java.lang.String::<init> (10 bytes) inline (hot)
@ 6 java.lang.String::<init> (74 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 36 java.lang.StringUTF16::compress (20 bytes) inline (hot)
@ 9 java.lang.StringUTF16::compress (50 bytes) (intrinsic)
@ 67 java.lang.StringUTF16::toBytes (34 bytes) (intrinsic)
Compilation after
@ 186 java.io.CharArrayWriter::flush (1 bytes) inline (hot)
!m @ 191 java.io.CharArrayWriter::toString (31 bytes) already compiled into a big method <----------------
@ 199 java.lang.String::getBytes (25 bytes) inline (hot)
@ 14 java.lang.String::coder (15 bytes) inline (hot)
! @ 21 java.lang.StringCoding::encode (324 bytes) inline (hot)
@ 10 java.lang.StringCoding::encodeUTF8 (132 bytes) inline (hot)
@ 7 java.lang.StringCoding::encodeUTF8_UTF16 (369 bytes) hot method too big <----------------
@ 15 java.lang.StringCoding::hasNegatives (25 bytes) (intrinsic)
@ 24 java.util.Arrays::copyOf (19 bytes) inline (hot)
@ 11 java.lang.Math::min (11 bytes) (intrinsic)
@ 14 java.lang.System::arraycopy (0 bytes) (intrinsic)
And in compilation log for the patched case I have this entry:
<method id='1166' holder='1154' name='toString' return='1032' flags='1' bytes='31' compile_id='1062' compiler='c2' level='4' iicount='11163'/>
<dependency type='unique_concrete_method' ctxk='1154' x='1166'/>
<call method='1166' count='75859' prof_factor='1,000000' inline='1'/>
<inline_fail reason='already compiled into a big method'/>
This complies with results of profiling with perfasm:
- for the original code we have only 1 hot region
....................................................................................................
62.29% <total for region 1>
....[Hottest Regions]...............................................................................
62.29% c2, level 4 java.net.URLEncoder::encode, version 1032 (1487 bytes)
- for the patched code we have 2 hot regions:
....[Hottest Region 1]..............................................................................
c2, level 4, java.net.URLEncoder::encode, version 1019 (1467 bytes)
....................................................................................................
61.44% <total for region 1>
....[Hottest Region 2]..............................................................................
c2, level 4, java.net.URLEncoder::encode, version 1019 (1048 bytes)
....................................................................................................
10.90% <total for region 2>
So my question is whether there's something wrong with compier of the original idea of improvement was wrong?
Here are some attachments if one finds them useful
1. Output of LinuxPerfAsmProfiler for original code: https://gist.github.com/stsypanov/6bcd95fd9fbe79afc5f29db929e517f1
2. Output of LinuxPerfAsmProfiler for patched code: https://gist.github.com/stsypanov/794c0b4fdb13bad9fcb7fc890cec3dc8
Regards,
Sergey Tsypanov
More information about the hotspot-compiler-dev
mailing list