String Deduplication in JEP192

Sat Mar 1 13:42:15 UTC 2014

Hi Kirk,

Thanks for the good questions. :-)

At the risk of jumping in ahead of the JEP authors and reviewers ...

 > where did you get the statistics from?

900+ profiles

If you or others have a repository of profiles, (if can) please share 
your observations.

 > If the weak generational hypothesis holds will you not be spending 
more time deduping soon to be garbage which potentially would place more 
load on the GC threads?

As mentioned in the JEP, you can manipulate the age at which a String 
becomes a candidate with -XX:StringDeduplicationAgeThreshold.

hths,

charlie

On 03/01/2014 01:25 AM, Kirk Pepperdine wrote:
> Hello,
>
> I’ve been looking at the JEP and have been wondering 1) where did you get the statistics from and 2) is this really going to be a big win? If the weak generational hypothesis holds will you not be spending more time deduping soon to be garbage which potentially would place more load on the GC threads?
>
> Regards,
> Kirk
>
> On Mar 1, 2014, at 12:08 AM, Bernd Eckenfels <bernd-2014 at eckenfels.net> wrote:
>
>> Hello,
>>
>> not sure what the proper process is, but I notice that Dalibor
>> retweeted a JEP link to JEP192 - String Deduplication in G1.
>>
>> http://openjdk.java.net/jeps/192
>>
>> The most obvious thing I noticed is, that the JEP goes into detail to
>> describe how a String object is constructed out of hashcode and char
>> array. But it somehow totally misses offset+count fields (substrings).
>>
>> One can say, it is not the scope of the JEP to be so detailed, but then
>> the other details of the string object should be removed as well.
>>
>> What I somewhat also miss is a detailed description how this is
>> integrated with the GC. I mean there are some interactions around the
>> topic of aging, dereferencing and atomic replacement, but most of the
>> JEP deals with functionality outside the GC.
>>
>> It looks a bit like it will suffer from similiar scalability problems
>> then the already existing string pool. Maybe it would be better to
>> re-design the string pool in a way it solves both problems with less
>> work for the GC phases. This could go so far to even have a (new)
>> string intern API which could be used by things like XML parsers or
>> network decoders - which are typically a source of lots of string
>> duplications in apps.
>>
>> (And I am not sure if this should be so G1 specific, after all the
>> adoption rate of G1 is still lower than it could be)
>>
>> Gruss
>> Bernd