Understanding Name and Name.Table

Wed Apr 10 17:56:50 UTC 2019

Hi,

I continuously am seeing Name and Name.Table show up in profiles of
annotation processors, and I have a few questions regarding the design of
these classes. I first brought this up back in this thread
<https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012529.html>
discussing the performance of Name.contentEquals(). That conversation
stalled around this comment from Jon
<https://mail.openjdk.java.net/pipermail/compiler-dev/2018-October/012533.html>
.

I believe that Javac itself does a good job using the interned Names
itself, so there aren't many cases where extra Strings are being recreated
internally. But for annotation processors (and maybe also compiler
plugins?), which don't have access to Names/the Name.Table, they must
resort to many of these suboptimal methods that create Strings.

The sense that I get is that most of the Name methods that create String
instances do so because class files use a modified version of UTF-8, and so
the use of Convert.utf2string() greatly simplifies the implementation of
these methods. Is that a correct assumption?

Separately, I have some general questions about Name.Table.
- Can someone explain the performance goals of it (I presume it's for
performance)? Is it to limit memory usage since Strings are UTF-16 and
strings in class files are stored as UTF-8, or something else?
- Are the initial goals still relevant in 2019?
- What is the purpose of interning Name instances?
- Are there any predefined benchmarks that are run to verify any of these
goals?
- What is the rationale of having the SharedNameTable data as one large
array? I see the documentation states that "This avoids the overhead
incurred by using an array of bytes for each name" but I'm not sure what
that overhead is referring to.

-----

I have a prototype of a Name.Table implementation that stores Names as
Strings instead of their UTF-8 byte[]. This seems to address the
performance concerns from my previous thread (it saves 1.2s/build on some
of our larger annotation processing builds at Google) without affecting the
performance of internal javac (i.e. non-annotation processing). There
should be no less overhead than the current Name implementations as it only
needs to store a single field for the String, besides for the UTF-8/UTF-16
distinction.

Would something like this be a worthwhile contribution? Are there ways I
can evaluate whether this satisfies the performance goals of either of the
other implementations so we don't need to have
yet-another-Name-implementation?

Thanks for your help!
Ron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190410/36b3ad8d/attachment-0001.html>