Understanding Name and Name.Table
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Apr 11 11:51:41 UTC 2019
I'm a bit skeptical about String::intern as the cure. From many sources
I've seen that performances of String::intern degrades rather quickly
with the number of strings being interned (at which point an
HashMap-based implementation beats String::intern any day of the week).
Look for instance at this great post from Alex Shipilev:
https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
Maurizio
On 10/04/2019 19:37, Jonathan Gibbons wrote:
>
> Some comments inline.
>
>
> On 4/10/19 10:56 AM, Ron Shapiro wrote:
>> Hi,
>>
>> I continuously am seeing Name and Name.Table show up in profiles of
>> annotation processors, and I have a few questions regarding the
>> design of these classes. I first brought this up back in this thread
>> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012529.html>discussing
>> the performance of Name.contentEquals(). That conversation stalled
>> around this comment from Jon
>> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012533.html>.
>> I believe that Javac itself does a good job using the interned Names
>> itself, so there aren't many cases where extra Strings are being
>> recreated internally. But for annotation processors (and maybe also
>> compiler plugins?), which don't have access to Names/the Name.Table,
>> they must resort to many of these suboptimal methods that create Strings.
>
> Historical note: the use of the internal javac Name class (as compared
> to the JSR 269 interface of the sane name) goes all the way back to
> the beginning of this version of javac, round about JDK 1.4 or so.
> Back then, the design center for javac was to be more independent of
> JDK API than it is today.
>
>
>>
>> The sense that I get is that most of the Name methods that create
>> String instances do so because class files use a modified version of
>> UTF-8, and so the use of Convert.utf2string() greatly simplifies the
>> implementation of these methods. Is that a correct assumption?
> No, I don't think so. The use of modified UTF-8 is an unrelated (but
> nevertheless important) implementation detail. I think you're just
> seeing the result of retrofitting interfaces that require methods to
> be implemented. I'm guessing that the methods you are looking at are
> not used internally within javac itself.
>>
>> Separately, I have some general questions about Name.Table.
>> - Can someone explain the performance goals of it (I presume it's for
>> performance)? Is it to limit memory usage since Strings are UTF-16
>> and strings in class files are stored as UTF-8, or something else?
>
> My understanding is that originally it was to save space, and to
> provide a context for "interned strings" that could be compared with
> referential equality.
>
>
>> - Are the initial goals still relevant in 2019?
>
> Maybe not as relevant as originally the case.
>
>
>> - What is the purpose of interning Name instances?
>
> To reduce the size of the name table, and to support referential equality.
>
>
>> - Are there any predefined benchmarks that are run to verify any of
>> these goals?
>
> No.
>
>
>> - What is the rationale of having the SharedNameTable data as one
>> large array? I see the documentation states that "This avoids the
>> overhead incurred by using an array of bytes for each name" but I'm
>> not sure what that overhead is referring to.
>
> The overhead is the overhead of having one String instance per name,
> and/or the overhead of having a separate array of bytes for each name.
> i.e. the overhead of the "one big array" is amortized over all names,
>
>
>>
>> -----
>>
>> I have a prototype of a Name.Table implementation that stores Names
>> as Strings instead of their UTF-8 byte[]. This seems to address the
>> performance concerns from my previous thread (it saves 1.2s/build on
>> some of our larger annotation processing builds at Google) without
>> affecting the performance of internal javac (i.e. non-annotation
>> processing). There should be no less overhead than the current Name
>> implementations as it only needs to store a single field for the
>> String, besides for the UTF-8/UTF-16 distinction.
>
> What are the space comparisons? It is certainly the case that Strings
> are managed more efficiently in the JVM these days, and interned
> strings are handled better as well. Your number of 1.2s doesn't mean
> much without more context that "our larger annotation processing
> builds." What percent is that of the overall build time, and/or total
> time in Name/Name.Table.
>
>
>>
>> Would something like this be a worthwhile contribution? Are there
>> ways I can evaluate whether this satisfies the performance goals of
>> either of the other implementations so we don't need to have
>> yet-another-Name-implementation?
>
>
> What is now SharedNameTable is the original impl. It used to be plain
> NameTable, until the advent of clients, like IDEs and other tools, for
> whom the single shared name table was inconvenient, which is when
> NameTable was forked into SharedNameTable and UnsharedNameTable.
>
> If we can come up with an impl that is fast as the current impl, and
> as space efficient as the current impl, that would be great, but
> typically you have to choose one or the other to optimize. I could
> believe a new impl might at least replace the current UnsharedNameTable.
>
> I think this is a worthwhile direction to investigate further.
>
>
>>
>> Thanks for your help!
>> Ron
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190411/424edfed/attachment-0001.html>
More information about the compiler-dev
mailing list