String Deduplication in JEP192

Mon Mar 3 09:30:19 UTC 2014

Hi,

On 03/01/2014 12:08 AM, Bernd Eckenfels wrote:
> Hello,
>
> not sure what the proper process is, but I notice that Dalibor
> retweeted a JEP link to JEP192 - String Deduplication in G1.
>
> http://openjdk.java.net/jeps/192
>
> The most obvious thing I noticed is, that the JEP goes into detail to
> describe how a String object is constructed out of hashcode and char
> array. But it somehow totally misses offset+count fields (substrings).

As Thomas already mentioned, offset and count was removed from String 
quite some time ago.

>
> One can say, it is not the scope of the JEP to be so detailed, but then
> the other details of the string object should be removed as well.
>
> What I somewhat also miss is a detailed description how this is
> integrated with the GC. I mean there are some interactions around the
> topic of aging, dereferencing and atomic replacement, but most of the
> JEP deals with functionality outside the GC.

What's inside vs. outside of the GC is of course open for different 
interpretations. The deduplication thread can conceptually be seen as 
just another concurrent GC thread, which adds a new concurrent GC phase 
to G1. The interactions with the existing GC phases are described in the 
"Implementation Overview" and "Candidate Selection".

>
> It looks a bit like it will suffer from similiar scalability problems
> then the already existing string pool. Maybe it would be better to

Note that deduplication is done concurrently with the application, so 
the app is not directly affected in that sense. The deduplication thread 
is the main user of the deduplication table and, unlike the current 
StringTable, the deduplication hashtable is dynamically resized at runtime.

> re-design the string pool in a way it solves both problems with less
> work for the GC phases. This could go so far to even have a (new)
> string intern API which could be used by things like XML parsers or
> network decoders - which are typically a source of lots of string
> duplications in apps.

I guess you're suggesting using (a better) String.intern(). The 
alternative of using String.intern() is mentioned briefly under 
"Alternatives".

cheers,
/Per

>
> (And I am not sure if this should be so G1 specific, after all the
> adoption rate of G1 is still lower than it could be)
>
> Gruss
> Bernd
>