Performance degradation due to probable (?) C2 issue

Tue Jul 28 10:35:35 UTC 2020

Hello,

I've run into a strange issue while trying to improve java.net.URLEncoder.encode() for the case URL contains UTF-8 symbols.
The idea of the fix it to replace the contents of line 276

  String str = new String(charArrayWriter.toCharArray());

with 

  String str = charArrayWriter.toString());

The CharArrayWriter.toCharArray() allocates a copy of underlying char[] which is passed into String constructor,
while CharArrayWriter.toString() passes the char[] to String constructor direclty. In theory this must give us
ceratin improvement both in time and memory as we don't allocate redundant char[]. To verify it I've used the benchmark
encoding the link to article about UN in Russian wiki:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g", "-XX:+UseParallelGC"})
public class UrlEncoderBenchmark {
  private final Charset charset = Charset.defaultCharset();
  private final String utf8Url = "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";

  @Benchmark
  public String encodeUtf8() {
    return URLEncoder.encode(utf8Url, charset);
  }
}

In practise it turned out that we win only in interpreter and tier1:

Benchmark                                                    Mode  Cnt     Score      Error   Units

-Xint before

UrlEncoderBenchmark.encodeUtf8                               avgt  100   179.905 ±    2.498   us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1712.752 ±    0.542    B/op

-Xint after

UrlEncoderBenchmark.encodeUtf8                               avgt  100   173.323 ±    3.459   us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1552.409 ±    0.339    B/op

-XX:TieredStopAtLevel=1 before

UrlEncoderBenchmark.encodeUtf8                               avgt  100     3.846 ±    0.021   us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1712.271 ±    0.011    B/op

-XX:TieredStopAtLevel=1 after

UrlEncoderBenchmark.encodeUtf8                               avgt  100     3.732 ±    0.013   us/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1552.246 ±    0.014    B/op

Here we see that we indeed consume less time and memory. However in case of full compilation we have severe degraddation (+ 30%)
in time consumption while as of memory we still have the same improvement:

before

UrlEncoderBenchmark.encodeUtf8                               avgt  100  1108.668 ±    6.226   ns/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1712.202 ±    0.003    B/op

after

UrlEncoderBenchmark.encodeUtf8                               avgt  100  1454.647 ±    6.067   ns/op
UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm           avgt  100  1528.219 ±    0.007    B/op

As inlining log says in the second case ther's something wrong:

Compilation before

               @ 186   java.io.CharArrayWriter::flush (1 bytes)   inline (hot)
!m             @ 195   java.io.CharArrayWriter::toCharArray (26 bytes)   inline (hot)
                 @ 15   java.util.Arrays::copyOf (19 bytes)   inline (hot)
                   @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
                   @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)
               @ 198   java.lang.String::<init> (10 bytes)   inline (hot)
                 @ 6   java.lang.String::<init> (74 bytes)   inline (hot)
                   @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                   @ 36   java.lang.StringUTF16::compress (20 bytes)   inline (hot)
                     @ 9   java.lang.StringUTF16::compress (50 bytes)   (intrinsic)
                   @ 67   java.lang.StringUTF16::toBytes (34 bytes)   (intrinsic)

Compilation after

               @ 186   java.io.CharArrayWriter::flush (1 bytes)   inline (hot)
!m             @ 191   java.io.CharArrayWriter::toString (31 bytes)   already compiled into a big method    <----------------
               @ 199   java.lang.String::getBytes (25 bytes)   inline (hot)
                 @ 14   java.lang.String::coder (15 bytes)   inline (hot)
!                @ 21   java.lang.StringCoding::encode (324 bytes)   inline (hot)
                   @ 10   java.lang.StringCoding::encodeUTF8 (132 bytes)   inline (hot)
                     @ 7   java.lang.StringCoding::encodeUTF8_UTF16 (369 bytes)   hot method too big        <----------------
                     @ 15   java.lang.StringCoding::hasNegatives (25 bytes)   (intrinsic)
                     @ 24   java.util.Arrays::copyOf (19 bytes)   inline (hot)
                       @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
                       @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)

And in compilation log for the patched case I have this entry:

<method id='1166' holder='1154' name='toString' return='1032' flags='1' bytes='31' compile_id='1062' compiler='c2' level='4' iicount='11163'/>
<dependency type='unique_concrete_method' ctxk='1154' x='1166'/>
<call method='1166' count='75859' prof_factor='1,000000' inline='1'/>
<inline_fail reason='already compiled into a big method'/>

This complies with results of profiling with perfasm:

- for the original code we have only 1 hot region

....................................................................................................
 62.29%  <total for region 1>
....[Hottest Regions]...............................................................................
 62.29%         c2, level 4  java.net.URLEncoder::encode, version 1032 (1487 bytes)

- for the patched code we have 2 hot regions:

....[Hottest Region 1]..............................................................................
c2, level 4, java.net.URLEncoder::encode, version 1019 (1467 bytes)
....................................................................................................
 61.44%  <total for region 1>

....[Hottest Region 2]..............................................................................
c2, level 4, java.net.URLEncoder::encode, version 1019 (1048 bytes)
....................................................................................................
 10.90%  <total for region 2>

So my question is whether there's something wrong with compier of the original idea of improvement was wrong?

Here are some attachments if one finds them useful

1. Output of LinuxPerfAsmProfiler for original code: https://gist.github.com/stsypanov/6bcd95fd9fbe79afc5f29db929e517f1
2. Output of LinuxPerfAsmProfiler for patched code: https://gist.github.com/stsypanov/794c0b4fdb13bad9fcb7fc890cec3dc8

Regards,
Sergey Tsypanov