End position storage in javac

Liam Miller-Cushon cushon at google.com
Tue Dec 2 15:59:17 UTC 2025


Thanks for the thoughts on this! I have filed JDK-8372948 to track the
potential change to end positions.

> That being said, it is a testament to that early design that it has
endured so long with many pervasive incremental upgrades.

I think this is well put, and the example of linked lists in List and Scope
that Mauritzio raised seems like a good example of those decisions holding
up.

I did some more archeology, and storing end positions in a separate map was
present in the first version of the OpenJDK sources that there's git
history for, which I think predates the introduction of annotation
processing. I think it can both be the case that it was a good decision at
the time (memory was more constrained, and the compiler was primarily a
batch compiler), and that it's a reasonable time to revisit it.

> Then there's the general lack of data-orientedness of the javac design.

I have worked on some compiler-adjacent tools that lean more into being
data oriented, representing symbols and types as immutable data, then
putting information computed during compilation passes in separate tables
keyed off symbols or types. I have found it to be pleasant to reason about,
compared to having more mutable and lazily computed state directly in the
symbol and type representations.

Going through separate maps doesn't provide as nice an object oriented API
as having getters directly on Symbol/Type, and I don't think that approach
necessarily makes sense for javac, but it does seem interesting to consider
areas where javac could benefit from being more data-oriented.

On Tue, Dec 2, 2025 at 4:07 PM Jan Lahoda <jan.lahoda at oracle.com> wrote:

> Hi,
>
>
> Yes, I think it would make sense to look into moving the end positions
> into the trees, as the types of compilations that don't use end
> positions are, I think, getting rarer. I took a peek at the draft PR,
> and overall it seems reasonable to me. (I'd need a more detailed pass to
> fully review, though.)
>
>
> Thanks,
>
>      Jan
>
>
> On 11/27/25 14:45, Maurizio Cimadamore wrote:
> > Hi Liam, Archie
> > I believe this came up also in the recent discussions on lint
> > warnings, where we needed to expand the set of end positions retained
> > by default.
> >
> > I think the long term solution here is, as you say (and I think Jan
> > supports that too) that end positions should be just stored by default
> > in the trees.
> >
> > There's a lot of things javac does "its own way" for reasons sometimes
> > good, sometimes less good. For instance, javac had to use its own
> > `List` because when it was written generics were not yet available.
> > That said, one issue with using just a plain j.l.List (like ArrayList)
> > in javac is that (a) javac typically operates on very small lists,
> > where the overhead of array lists might be too big and (b) j.u.List is
> > very bad for recursing algorithms, which is what javac is all about.
> > So, I believe using a custom List impl there seems to be a good trade
> > off.
> >
> > Other areas that are brought up from time to time are:
> >
> > * use of special data structures for scopes -- why not just maps?
> > * use of special data structures for names -- why not just strings?
> >
> > I believe we did some experiments on the former, and concluded that
> > javac implementation was still better than a hashmap (as javac
> > requirement are specialized, and scopes need to be traversed in
> > different ways, and somtimes a new scope needs to be "pushed" on top
> > of an old one -- reusing the undelrying entries). For names I'm less
> > sure, but maybe somebody else knows the answer there.
> >
> > Then there's the general lack of data-orientedness of the javac
> > design. Lots of classes with lots of visitors everywhere, and various
> > ways to query "are you a T". I would like very much, one day, to make
> > the Type/Symbol/Tree hierarchies sealed, and get rid of all the
> > various kinds/tags, etc. and maybe even see if we can get rid of
> > visitors and just use plain code with pattern matching.
> >
> > Maurizio
> >
> >
> > On 26/11/2025 13:01, Liam Miller-Cushon wrote:
> >> Hi,
> >>
> >> I wanted to discuss how javac handles end positions, and get input on
> >> the possibility of having the compiler unconditionally store end
> >> positions in a field on JCTree instead of in a separate map.
> >>
> >> Currently end position information is not stored by default, but is
> >> enabled in certain modes: if -Xjcov is set, or if the compilation
> >> includes diagnostic listeners, task listeners, or annotation
> >> processors (since they may want end positions).
> >>
> >> The hash table used to store the end positions was optimized in JDK 9
> >> in JDK-8033287, and there was some related discussion on
> >> compiler-dev@ about the motivation for making end positions optional
> >> at that time.
> >>
> >> As I understand it, the goal is to save memory in the case where end
> >> positions aren't needed. That savings comes with a trade-off when end
> >> positions are needed, though, since the map is less efficient than
> >> storing the position directly in JCTree.
> >>
> >> Today, many invocations of javac will need end position information
> >> (annotation processing is common, when javac is used programatically
> >> in IDEs end positions will be enabled). For the invocations that do
> >> not need end positions, typical developer machines are less memory
> >> constrained than they were when the optimization for end positions
> >> was first introduced.
> >>
> >> Looking at the compilation of java.base, it contains about 3000 files
> >> and creates about 3 million AST nodes, so adding an int field to
> >> JCTree to store end positions would take about 12MB.
> >>
> >> What do you think? Would it make sense to consider adding a field to
> >> JCTree to store end positions, instead of using EndPosTable?
> >>
> >> I have a draft PR of the approach here:
> >> https://github.com/openjdk/jdk/pull/28506
> >>
> >> Having end position information always available might enable some
> >> potential improvements to javac. For example, some compilers indicate
> >> a span of source text for some diagnostics, for example the 'range
> >> highlighting' described in these clang docs:
> >> https://clang.llvm.org/diagnostics.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20251202/73b2683e/attachment-0001.htm>


More information about the compiler-dev mailing list