Understanding Name and Name.Table

Thu Apr 11 11:51:41 UTC 2019

I'm a bit skeptical about String::intern as the cure. From many sources 
I've seen that performances of String::intern degrades rather quickly 
with the number of strings being interned (at which point an 
HashMap-based implementation beats String::intern any day of the week). 
Look for instance at this great post from Alex Shipilev:

https://shipilev.net/jvm/anatomy-quarks/10-string-intern/

Maurizio

On 10/04/2019 19:37, Jonathan Gibbons wrote:
>
> Some comments inline.
>
>
> On 4/10/19 10:56 AM, Ron Shapiro wrote:
>> Hi,
>>
>> I continuously am seeing Name and Name.Table show up in profiles of 
>> annotation processors, and I have a few questions regarding the 
>> design of these classes. I first brought this up back in this thread 
>> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012529.html>discussing 
>> the performance of Name.contentEquals(). That conversation stalled 
>> around this comment from Jon 
>> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012533.html>.
>> I believe that Javac itself does a good job using the interned Names 
>> itself, so there aren't many cases where extra Strings are being 
>> recreated internally. But for annotation processors (and maybe also 
>> compiler plugins?), which don't have access to Names/the Name.Table, 
>> they must resort to many of these suboptimal methods that create Strings.
>
> Historical note: the use of the internal javac Name class (as compared 
> to the JSR 269 interface of the sane name) goes all the way back to 
> the beginning of this version of javac, round about JDK 1.4 or so. 
> Back then, the design center for javac was to be more independent of 
> JDK API than it is today.
>
>
>>
>> The sense that I get is that most of the Name methods that create 
>> String instances do so because class files use a modified version of 
>> UTF-8, and so the use of Convert.utf2string() greatly simplifies the 
>> implementation of these methods. Is that a correct assumption?
> No, I don't think so. The use of modified UTF-8 is an unrelated (but 
> nevertheless important) implementation detail.  I think you're just 
> seeing the result of retrofitting interfaces that require methods to 
> be implemented.  I'm guessing that the methods you are looking at are 
> not used internally within javac itself.
>>
>> Separately, I have some general questions about Name.Table.
>> - Can someone explain the performance goals of it (I presume it's for 
>> performance)? Is it to limit memory usage since Strings are UTF-16 
>> and strings in class files are stored as UTF-8, or something else?
>
> My understanding is that originally it was to save space, and to 
> provide a context for "interned strings" that could be compared with 
> referential equality.
>
>
>> - Are the initial goals still relevant in 2019?
>
> Maybe not as relevant as originally the case.
>
>
>> - What is the purpose of interning Name instances?
>
> To reduce the size of the name table, and to support referential equality.
>
>
>> - Are there any predefined benchmarks that are run to verify any of 
>> these goals?
>
> No.
>
>
>> - What is the rationale of having the SharedNameTable data as one 
>> large array? I see the documentation states that "This avoids the 
>> overhead incurred by using an array of bytes for each name" but I'm 
>> not sure what that overhead is referring to.
>
> The overhead is the overhead of having one String instance per name, 
> and/or the overhead of having a separate array of bytes for each name. 
> i.e. the overhead of the "one big array" is amortized over all names,
>
>
>>
>> -----
>>
>> I have a prototype of a Name.Table implementation that stores Names 
>> as Strings instead of their UTF-8 byte[]. This seems to address the 
>> performance concerns from my previous thread (it saves 1.2s/build on 
>> some of our larger annotation processing builds at Google) without 
>> affecting the performance of internal javac (i.e. non-annotation 
>> processing). There should be no less overhead than the current Name 
>> implementations as it only needs to store a single field for the 
>> String, besides for the UTF-8/UTF-16 distinction.
>
> What are the space comparisons? It is certainly the case that Strings 
> are managed more efficiently in the JVM these days, and interned 
> strings are handled better as well.  Your number of 1.2s doesn't mean 
> much without more context that "our larger annotation processing 
> builds."  What percent is that of the overall build time, and/or total 
> time in Name/Name.Table.
>
>
>>
>> Would something like this be a worthwhile contribution? Are there 
>> ways I can evaluate whether this satisfies the performance goals of 
>> either of the other implementations so we don't need to have 
>> yet-another-Name-implementation?
>
>
> What is now SharedNameTable is the original impl. It used to be plain 
> NameTable, until the advent of clients, like IDEs and other tools, for 
> whom the single shared name table was inconvenient, which is when 
> NameTable was forked into SharedNameTable and UnsharedNameTable.
>
> If we can come up with an impl that is fast as the current impl, and 
> as space efficient as the current impl, that would be great, but 
> typically you have to choose one or the other to optimize.  I could 
> believe a new impl might at least replace the current UnsharedNameTable.
>
> I think this is a worthwhile direction to investigate further.
>
>
>>
>> Thanks for your help!
>> Ron
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190411/424edfed/attachment-0001.html>