Dismal performance of String.intern()
Steven Schlansker
stevenschlansker at gmail.com
Wed Jun 12 18:27:33 UTC 2013
Thank you everyone for the valuable input!
On Jun 11, 2013, at 1:52 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> On 06/11/2013 12:31 PM, Remi Forax wrote:
>> On 06/10/2013 08:06 PM, Steven Schlansker wrote:
>> Hi Steven,
>> the main issue is that intern() doesn't work in isolation,
>>
>> I think it's better to change the JSON Parser implementation to use it's
>> own cache (or not) and not rely on String.intern().
>
> +1.
>
> IMO, String.intern() is the gateway into VM symbol table, and should be
> regarded as such. The improvements for String.intern(), if any, then
> should be on the VM (native) side.
>
> Also, I think most people confuse String interning and String
> de-duplication. Using interning to improve memory footprint is the
> overkill. Smart deduplicators may carefully balance the overheads of
> deduplication vs. the memory footprint
Yes, maybe this is in fact the real problem here. The JavaDoc for String does not in anyway reflect what you and the other JDK developers seem to assume -- that intern() is mostly a "for JVM use" method and is not really intended for use by end users. Maybe a documentation update to reflect that fact would be appropriate? Something indicating that the implementation is specialized for VM usage and is not optimal for end user code might help clear up confusion. Does that sound like a good idea?
I understand that this is confusing the contract of the method with the implementation a bit. I just feel that the sentiment I get here ("Why would you do that? Don't use intern, just do it yourself!") is mismatched with the implicit fit-for-purpose I expect from core Java classes, and a warning might help reduce confusion.
On Jun 11, 2013, at 2:28 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> On 10/06/2013 19:06, Steven Schlansker wrote:
>> Hi core-libs-dev,
>>
>> While doing performance profiling of my application, I discovered that nearly 50% of the time deserializing JSON was spent within String.intern(). I understand that in general interning Strings is not the best approach for things, but I think I have a decent use case -- the value of a certain field is one of a very limited number of valid values (that are not known at compile time, so I cannot use an Enum), and is repeated many millions of times in the JSON stream.
>>
> Have you run with -XX:+PrintStringTableStatistics? Might be interesting if you can share the output (it is printed just before the VM terminates).
>
> There are also tuning knobs such as StringTableSize and would be interesting to know if you've experimented with.
>
> -Alan.
I have not experimented with any such tunings. I will do so and report back before spending a lot of time changing things. Thank you for the pointer!
Best,
Steven
More information about the core-libs-dev
mailing list