End position storage in javac

Thu Nov 27 13:45:56 UTC 2025

Hi Liam, Archie
I believe this came up also in the recent discussions on lint warnings, 
where we needed to expand the set of end positions retained by default.

I think the long term solution here is, as you say (and I think Jan 
supports that too) that end positions should be just stored by default 
in the trees.

There's a lot of things javac does "its own way" for reasons sometimes 
good, sometimes less good. For instance, javac had to use its own `List` 
because when it was written generics were not yet available. That said, 
one issue with using just a plain j.l.List (like ArrayList) in javac is 
that (a) javac typically operates on very small lists, where the 
overhead of array lists might be too big and (b) j.u.List is very bad 
for recursing algorithms, which is what javac is all about. So, I 
believe using a custom List impl there seems to be a good trade off.

Other areas that are brought up from time to time are:

* use of special data structures for scopes -- why not just maps?
* use of special data structures for names -- why not just strings?

I believe we did some experiments on the former, and concluded that 
javac implementation was still better than a hashmap (as javac 
requirement are specialized, and scopes need to be traversed in 
different ways, and somtimes a new scope needs to be "pushed" on top of 
an old one -- reusing the undelrying entries). For names I'm less sure, 
but maybe somebody else knows the answer there.

Then there's the general lack of data-orientedness of the javac design. 
Lots of classes with lots of visitors everywhere, and various ways to 
query "are you a T". I would like very much, one day, to make the 
Type/Symbol/Tree hierarchies sealed, and get rid of all the various 
kinds/tags, etc. and maybe even see if we can get rid of visitors and 
just use plain code with pattern matching.

Maurizio

On 26/11/2025 13:01, Liam Miller-Cushon wrote:
> Hi,
>
> I wanted to discuss how javac handles end positions, and get input on 
> the possibility of having the compiler unconditionally store end 
> positions in a field on JCTree instead of in a separate map.
>
> Currently end position information is not stored by default, but is 
> enabled in certain modes: if -Xjcov is set, or if the compilation 
> includes diagnostic listeners, task listeners, or annotation 
> processors (since they may want end positions).
>
> The hash table used to store the end positions was optimized in JDK 9 
> in JDK-8033287, and there was some related discussion on compiler-dev@ 
> about the motivation for making end positions optional at that time.
>
> As I understand it, the goal is to save memory in the case where end 
> positions aren't needed. That savings comes with a trade-off when end 
> positions are needed, though, since the map is less efficient than 
> storing the position directly in JCTree.
>
> Today, many invocations of javac will need end position information 
> (annotation processing is common, when javac is used programatically 
> in IDEs end positions will be enabled). For the invocations that do 
> not need end positions, typical developer machines are less memory 
> constrained than they were when the optimization for end positions was 
> first introduced.
>
> Looking at the compilation of java.base, it contains about 3000 files 
> and creates about 3 million AST nodes, so adding an int field to 
> JCTree to store end positions would take about 12MB.
>
> What do you think? Would it make sense to consider adding a field to 
> JCTree to store end positions, instead of using EndPosTable?
>
> I have a draft PR of the approach here: 
> https://github.com/openjdk/jdk/pull/28506
>
> Having end position information always available might enable some 
> potential improvements to javac. For example, some compilers indicate 
> a span of source text for some diagnostics, for example the 'range 
> highlighting' described in these clang docs: 
> https://clang.llvm.org/diagnostics.html