Dismal performance of String.intern()
Ioi Lam
ioi.lam at oracle.com
Mon Aug 5 18:15:43 UTC 2013
Hi Steven,
This looks like a promising patch. If all goes well I can sponsor this
patch into hotspot-rt.
I have problems building your test. Could you send a stand-alone test
that can be build with plain JDK (i.e., exclude things like
com.google.common. and org.openjdk.jmh)?
Also, I saw that you tested with strings that are the same length as
UUID, which are 36 chars long:
2544f85f-1845-42c4-aba1-7a431e6edb99
eff61458-32f8-45cd-a77d-fddb14951cab
d685f1a2-0b95-4021-b9b3-b0fb3c02a17f
We do need to handle strings with shorter length, and making calls to
memcmp() may cause degradation with lots of short strings. Can you add a
sub-test that tests for strings length <= 4?
I would also recommend running a few non-trivial apps (such as eclipse)
and dump the interned string table to see the distribution of lengths. I
know this is a hassle, but it would be a good last step to wrap up your
work.
Meanwhile, I will try to run our benchmarks (aka "refworkload") to check
for any performance impact.
Thanks
- Ioi
On 08/04/2013 10:21 PM, Steven Schlansker wrote:
> On Tue, 11 Jun 2013 10:28:14 +0100
> Alan Bateman <Alan.Bateman at oracle.com> wrote:
>
>> On 10/06/2013 19:06, Steven Schlansker wrote:
>>> Hi core-libs-dev,
>>>
>>> While doing performance profiling of my application, I discovered
>>> that nearly 50% of the time deserializing JSON was spent within
>>> String.intern(). I understand that in general interning Strings is
>>> not the best approach for things, but I think I have a decent use
>>> case -- the value of a certain field is one of a very limited
>>> number of valid values (that are not known at compile time, so I
>>> cannot use an Enum), and is repeated many millions of times in the
>>> JSON stream.
>>>
>> Have you run with -XX:+PrintStringTableStatistics? Might be
>> interesting if you can share the output (it is printed just before
>> the VM terminates).
>>
>> There are also tuning knobs such as StringTableSize and would be
>> interesting to know if you've experimented with.
>>
>> -Alan.
> Hi everyone,
>
> Thanks again for your useful advice. I definitely misjudged the
> difficulty and complexity of working with methods that directly bridge
> the Java <-> C++ "gap"! As such, it took me way longer to get to this
> than I expected...
>
> That said, I think I have very good results to report. OpenJDK8 is
> already *significantly* better than OpenJDK 7 was, but still can be
> better.
>
> Here is the patch I have at the moment:
> https://gist.github.com/stevenschlansker/6153643
>
> I used oprofile with oprofile-jit to identify the hot spots in the
> benchmark code as being java_lang_String::equals() and
> java_lang_String::as_unicode_string.
>
> Both of these methods have hand-written loops that copy or compare
> jchar arrays.
>
> The problem is that in -fastdebug or -slowdebug releases, this is
> one method call per character (the function is not inlined). Even in
> -release builds, it seems that this is significantly slower than the
> libc memcpy() or memcmp() functions which can use SSE4 (or other
> related technologies).
>
> My patch adds new methods, char_cmp and char_cpy, which delegate to the
> mem* functions instead of using hand-written loops.
>
> The micro-benchmark results are very good for such a small change.
>
> On fastdebug, before:
>
> Benchmark Mode Thr Cnt
> Sec Mean Mean error Units
> o.s.b.InternBenchmark.testLongStringChmIntern sample 1 2819
> 5 1.780 0.184 msec/op
> o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 343
> 5 14.571 0.310 msec/op
> o.s.b.InternBenchmark.testLongStringNoIntern sample 1 8712
> 5 0.526 0.138 msec/op
> o.s.b.InternBenchmark.testShortStringChmIntern sample 1 4427
> 5 1.133 0.121 msec/op
> o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 603
> 5 8.319 0.161 msec/op
> o.s.b.InternBenchmark.testShortStringNoIntern sample 1 17185
> 5 0.274 0.048 msec/op
>
> After:
>
> Benchmark Mode Thr Cnt
> Sec Mean Mean error Units
> o.s.b.InternBenchmark.testLongStringChmIntern sample 1 2898
> 5 1.812 0.208 msec/op
> o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 1138
> 5 4.397 0.136 msec/op
> o.s.b.InternBenchmark.testLongStringNoIntern sample 1 9035
> 5 0.519 0.146 msec/op
> o.s.b.InternBenchmark.testShortStringChmIntern sample 1 4538
> 5 1.094 0.107 msec/op
> o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 1363
> 5 3.686 0.100 msec/op
> o.s.b.InternBenchmark.testShortStringNoIntern sample 1 16686
> 5 0.316 0.081 msec/op
>
> On release, before:
>
> Benchmark Mode Thr Cnt
> Sec Mean Mean error Units
> o.s.b.InternBenchmark.testLongStringChmIntern sample 1 4030
> 5 1.240 0.002 msec/op
> o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 1024
> 5 4.894 0.042 msec/op
> o.s.b.InternBenchmark.testLongStringNoIntern sample 1 20000
> 5 0.185 0.002 msec/op
> o.s.b.InternBenchmark.testShortStringChmIntern sample 1 6143
> 5 0.814 0.005 msec/op
> o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 1852
> 5 2.702 0.016 msec/op
> o.s.b.InternBenchmark.testShortStringNoIntern sample 1 20000
> 5 0.102 0.001 msec/op
>
> After:
>
> Benchmark Mode Thr Cnt
> Sec Mean Mean error Units
> o.s.b.InternBenchmark.testLongStringChmIntern sample 1 4040
> 5 1.236 0.002 msec/op
> o.s.b.InternBenchmark.testLongStringJdkIntern sample 1 2733
> 5 1.832 0.010 msec/op
> o.s.b.InternBenchmark.testLongStringNoIntern sample 1 20000
> 5 0.181 0.002 msec/op
> o.s.b.InternBenchmark.testShortStringChmIntern sample 1 6170
> 5 0.809 0.001 msec/op
> o.s.b.InternBenchmark.testShortStringJdkIntern sample 1 3577
> 5 1.396 0.007 msec/op
> o.s.b.InternBenchmark.testShortStringNoIntern sample 1 20000
> 5 0.102 0.000 msec/op
>
>
> This is almost a 3.5x improvement on fastdebug builds, and more than a
> 2.5x improvement on release builds. It is now only ~50% slower than
> ConcurrentHashMap, at least for the low-contention case! (I did not
> benchmark higher numbers of threads thoroughly, but I do not think that
> my changes could make that worse...)
>
> Finally, the benchmark code:
> https://github.com/stevenschlansker/jvm-intern-benchmark/blob/master/src/main/java/org/sugis/benchmark/InternBenchmark.java
>
> It is not the best ever benchmark, but I'm hopeful that it's "good
> enough" to demonstrate the wins I have. Please let me know if you
> believe the benchmark invalidates my conclusions. It is run with JMH,
> as that seems to be the standard way of doing things around here.
>
> Thank you again for your time and input; I am hopeful that I have not
> erred terribly :-)
>
> Best,
> Steven
More information about the core-libs-dev
mailing list