Dismal performance of String.intern()

Remi Forax forax at univ-mlv.fr
Tue Jun 11 08:31:27 UTC 2013


On 06/10/2013 08:06 PM, Steven Schlansker wrote:
> Hi core-libs-dev,

Hi Steven,
the main issue is that intern() doesn't work in isolation,
by example if you write
   "foo" == new String("foo").intern()
the result should be always true so the String cache must be accessible 
not only from the Java side but also from the VM side.

Given that we can do a switch on String since java7,
we don't really need to check strings with == in code anymore,
I think it's better to change the JSON Parser implementation to use it's 
own cache (or not) and not rely on String.intern().

cheers,
Rémi

>
> While doing performance profiling of my application, I discovered that nearly 50% of the time deserializing JSON was spent within String.intern().  I understand that in general interning Strings is not the best approach for things, but I think I have a decent use case -- the value of a certain field is one of a very limited number of valid values (that are not known at compile time, so I cannot use an Enum), and is repeated many millions of times in the JSON stream.
>
> I discovered that replacing String.intern() with a ConcurrentHashMap improved performance by almost an order of magnitude.
>
> I'm not the only person that discovered this and was surprised: http://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern
>
> I've been excited about starting to contribute to OpenJDK, so I am thinking that this might be a fun project for me to take on and then contribute back.  But I figured I should check in on the list before spending a lot of time tracking this down.  I have a couple of preparatory questions:
>
> * Has this bottleneck been examined thoroughly before?  Am I wishing too hard for performance here?
>
> * String.intern() is a native method currently.  My understanding is that there is a nontrivial penalty to invoking native methods (at least via JNI, not sure if this is also true for "built ins"?).  I assume the reason that this is native is so the Java intern is the same as C++-invoked interns from within the JVM itself.  Is this an actual requirement, or could String.intern be replaced with Java code?
>
> * If the interning itself must be handled by a symbol table in C++ land as it is today, would a "second level cache" in Java land that invokes a native "intern0" method be acceptable, so that there is a low-penalty "fast path"?  If so, this would involve a nonzero memory cost, although I assume that a few thousand references inside of a Map is an OK price to pay for a (for example) 5x speedup.
>
> * I assume the String class itself is loaded at a very sensitive time during VM initialization.  Having String initialization trigger (for example) ConcurrentHashMap class initialization may cause problems or circularities.  If this is the case, would triggering such a load lazily on the first intern() call be "late enough" as to not cause problems?
>
> I'm sure that if I get anywhere with this I will have more questions, but this should get me started. Thank you for any advice / insight you may be able to provide!
>
> Steven
>




More information about the core-libs-dev mailing list