Understanding Name and Name.Table
Jonathan Gibbons
jonathan.gibbons at oracle.com
Wed Apr 10 18:37:02 UTC 2019
Some comments inline.
On 4/10/19 10:56 AM, Ron Shapiro wrote:
> Hi,
>
> I continuously am seeing Name and Name.Table show up in profiles of
> annotation processors, and I have a few questions regarding the design
> of these classes. I first brought this up back in this thread
> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012529.html>discussing
> the performance of Name.contentEquals(). That conversation stalled
> around this comment from Jon
> <https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012533.html>.
> I believe that Javac itself does a good job using the interned Names
> itself, so there aren't many cases where extra Strings are being
> recreated internally. But for annotation processors (and maybe also
> compiler plugins?), which don't have access to Names/the Name.Table,
> they must resort to many of these suboptimal methods that create Strings.
Historical note: the use of the internal javac Name class (as compared
to the JSR 269 interface of the sane name) goes all the way back to the
beginning of this version of javac, round about JDK 1.4 or so. Back
then, the design center for javac was to be more independent of JDK API
than it is today.
>
> The sense that I get is that most of the Name methods that create
> String instances do so because class files use a modified version of
> UTF-8, and so the use of Convert.utf2string() greatly simplifies the
> implementation of these methods. Is that a correct assumption?
No, I don't think so. The use of modified UTF-8 is an unrelated (but
nevertheless important) implementation detail. I think you're just
seeing the result of retrofitting interfaces that require methods to be
implemented. I'm guessing that the methods you are looking at are not
used internally within javac itself.
>
> Separately, I have some general questions about Name.Table.
> - Can someone explain the performance goals of it (I presume it's for
> performance)? Is it to limit memory usage since Strings are UTF-16 and
> strings in class files are stored as UTF-8, or something else?
My understanding is that originally it was to save space, and to provide
a context for "interned strings" that could be compared with referential
equality.
> - Are the initial goals still relevant in 2019?
Maybe not as relevant as originally the case.
> - What is the purpose of interning Name instances?
To reduce the size of the name table, and to support referential equality.
> - Are there any predefined benchmarks that are run to verify any of
> these goals?
No.
> - What is the rationale of having the SharedNameTable data as one
> large array? I see the documentation states that "This avoids the
> overhead incurred by using an array of bytes for each name" but I'm
> not sure what that overhead is referring to.
The overhead is the overhead of having one String instance per name,
and/or the overhead of having a separate array of bytes for each name.
i.e. the overhead of the "one big array" is amortized over all names,
>
> -----
>
> I have a prototype of a Name.Table implementation that stores Names as
> Strings instead of their UTF-8 byte[]. This seems to address the
> performance concerns from my previous thread (it saves 1.2s/build on
> some of our larger annotation processing builds at Google) without
> affecting the performance of internal javac (i.e. non-annotation
> processing). There should be no less overhead than the current Name
> implementations as it only needs to store a single field for the
> String, besides for the UTF-8/UTF-16 distinction.
What are the space comparisons? It is certainly the case that Strings
are managed more efficiently in the JVM these days, and interned strings
are handled better as well. Your number of 1.2s doesn't mean much
without more context that "our larger annotation processing builds."
What percent is that of the overall build time, and/or total time in
Name/Name.Table.
>
> Would something like this be a worthwhile contribution? Are there ways
> I can evaluate whether this satisfies the performance goals of either
> of the other implementations so we don't need to have
> yet-another-Name-implementation?
What is now SharedNameTable is the original impl. It used to be plain
NameTable, until the advent of clients, like IDEs and other tools, for
whom the single shared name table was inconvenient, which is when
NameTable was forked into SharedNameTable and UnsharedNameTable.
If we can come up with an impl that is fast as the current impl, and as
space efficient as the current impl, that would be great, but typically
you have to choose one or the other to optimize. I could believe a new
impl might at least replace the current UnsharedNameTable.
I think this is a worthwhile direction to investigate further.
>
> Thanks for your help!
> Ron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190410/950bb867/attachment-0001.html>
More information about the compiler-dev
mailing list