Understanding Name and Name.Table

Wed Apr 10 18:37:02 UTC 2019

Some comments inline.

On 4/10/19 10:56 AM, Ron Shapiro wrote:
> Hi,
>
> I continuously am seeing Name and Name.Table show up in profiles of 
> annotation processors, and I have a few questions regarding the design 
> of these classes. I first brought this up back in this thread 
> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012529.html>discussing 
> the performance of Name.contentEquals(). That conversation stalled 
> around this comment from Jon 
> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012533.html>.
> I believe that Javac itself does a good job using the interned Names 
> itself, so there aren't many cases where extra Strings are being 
> recreated internally. But for annotation processors (and maybe also 
> compiler plugins?), which don't have access to Names/the Name.Table, 
> they must resort to many of these suboptimal methods that create Strings.

Historical note: the use of the internal javac Name class (as compared 
to the JSR 269 interface of the sane name) goes all the way back to the 
beginning of this version of javac, round about JDK 1.4 or so. Back 
then, the design center for javac was to be more independent of JDK API 
than it is today.

>
> The sense that I get is that most of the Name methods that create 
> String instances do so because class files use a modified version of 
> UTF-8, and so the use of Convert.utf2string() greatly simplifies the 
> implementation of these methods. Is that a correct assumption?
No, I don't think so. The use of modified UTF-8 is an unrelated (but 
nevertheless important) implementation detail.  I think you're just 
seeing the result of retrofitting interfaces that require methods to be 
implemented.  I'm guessing that the methods you are looking at are not 
used internally within javac itself.
>
> Separately, I have some general questions about Name.Table.
> - Can someone explain the performance goals of it (I presume it's for 
> performance)? Is it to limit memory usage since Strings are UTF-16 and 
> strings in class files are stored as UTF-8, or something else?

My understanding is that originally it was to save space, and to provide 
a context for "interned strings" that could be compared with referential 
equality.

> - Are the initial goals still relevant in 2019?

Maybe not as relevant as originally the case.

> - What is the purpose of interning Name instances?

To reduce the size of the name table, and to support referential equality.

> - Are there any predefined benchmarks that are run to verify any of 
> these goals?

No.

> - What is the rationale of having the SharedNameTable data as one 
> large array? I see the documentation states that "This avoids the 
> overhead incurred by using an array of bytes for each name" but I'm 
> not sure what that overhead is referring to.

The overhead is the overhead of having one String instance per name, 
and/or the overhead of having a separate array of bytes for each name. 
i.e. the overhead of the "one big array" is amortized over all names,

>
> -----
>
> I have a prototype of a Name.Table implementation that stores Names as 
> Strings instead of their UTF-8 byte[]. This seems to address the 
> performance concerns from my previous thread (it saves 1.2s/build on 
> some of our larger annotation processing builds at Google) without 
> affecting the performance of internal javac (i.e. non-annotation 
> processing). There should be no less overhead than the current Name 
> implementations as it only needs to store a single field for the 
> String, besides for the UTF-8/UTF-16 distinction.

What are the space comparisons? It is certainly the case that Strings 
are managed more efficiently in the JVM these days, and interned strings 
are handled better as well.  Your number of 1.2s doesn't mean much 
without more context that "our larger annotation processing builds."  
What percent is that of the overall build time, and/or total time in 
Name/Name.Table.

>
> Would something like this be a worthwhile contribution? Are there ways 
> I can evaluate whether this satisfies the performance goals of either 
> of the other implementations so we don't need to have 
> yet-another-Name-implementation?

What is now SharedNameTable is the original impl. It used to be plain 
NameTable, until the advent of clients, like IDEs and other tools, for 
whom the single shared name table was inconvenient, which is when 
NameTable was forked into SharedNameTable and UnsharedNameTable.

If we can come up with an impl that is fast as the current impl, and as 
space efficient as the current impl, that would be great, but typically 
you have to choose one or the other to optimize.  I could believe a new 
impl might at least replace the current UnsharedNameTable.

I think this is a worthwhile direction to investigate further.

>
> Thanks for your help!
> Ron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190410/950bb867/attachment-0001.html>