End position storage in javac
Jan Lahoda
jan.lahoda at oracle.com
Tue Dec 2 15:06:54 UTC 2025
Hi,
Yes, I think it would make sense to look into moving the end positions
into the trees, as the types of compilations that don't use end
positions are, I think, getting rarer. I took a peek at the draft PR,
and overall it seems reasonable to me. (I'd need a more detailed pass to
fully review, though.)
Thanks,
Jan
On 11/27/25 14:45, Maurizio Cimadamore wrote:
> Hi Liam, Archie
> I believe this came up also in the recent discussions on lint
> warnings, where we needed to expand the set of end positions retained
> by default.
>
> I think the long term solution here is, as you say (and I think Jan
> supports that too) that end positions should be just stored by default
> in the trees.
>
> There's a lot of things javac does "its own way" for reasons sometimes
> good, sometimes less good. For instance, javac had to use its own
> `List` because when it was written generics were not yet available.
> That said, one issue with using just a plain j.l.List (like ArrayList)
> in javac is that (a) javac typically operates on very small lists,
> where the overhead of array lists might be too big and (b) j.u.List is
> very bad for recursing algorithms, which is what javac is all about.
> So, I believe using a custom List impl there seems to be a good trade
> off.
>
> Other areas that are brought up from time to time are:
>
> * use of special data structures for scopes -- why not just maps?
> * use of special data structures for names -- why not just strings?
>
> I believe we did some experiments on the former, and concluded that
> javac implementation was still better than a hashmap (as javac
> requirement are specialized, and scopes need to be traversed in
> different ways, and somtimes a new scope needs to be "pushed" on top
> of an old one -- reusing the undelrying entries). For names I'm less
> sure, but maybe somebody else knows the answer there.
>
> Then there's the general lack of data-orientedness of the javac
> design. Lots of classes with lots of visitors everywhere, and various
> ways to query "are you a T". I would like very much, one day, to make
> the Type/Symbol/Tree hierarchies sealed, and get rid of all the
> various kinds/tags, etc. and maybe even see if we can get rid of
> visitors and just use plain code with pattern matching.
>
> Maurizio
>
>
> On 26/11/2025 13:01, Liam Miller-Cushon wrote:
>> Hi,
>>
>> I wanted to discuss how javac handles end positions, and get input on
>> the possibility of having the compiler unconditionally store end
>> positions in a field on JCTree instead of in a separate map.
>>
>> Currently end position information is not stored by default, but is
>> enabled in certain modes: if -Xjcov is set, or if the compilation
>> includes diagnostic listeners, task listeners, or annotation
>> processors (since they may want end positions).
>>
>> The hash table used to store the end positions was optimized in JDK 9
>> in JDK-8033287, and there was some related discussion on
>> compiler-dev@ about the motivation for making end positions optional
>> at that time.
>>
>> As I understand it, the goal is to save memory in the case where end
>> positions aren't needed. That savings comes with a trade-off when end
>> positions are needed, though, since the map is less efficient than
>> storing the position directly in JCTree.
>>
>> Today, many invocations of javac will need end position information
>> (annotation processing is common, when javac is used programatically
>> in IDEs end positions will be enabled). For the invocations that do
>> not need end positions, typical developer machines are less memory
>> constrained than they were when the optimization for end positions
>> was first introduced.
>>
>> Looking at the compilation of java.base, it contains about 3000 files
>> and creates about 3 million AST nodes, so adding an int field to
>> JCTree to store end positions would take about 12MB.
>>
>> What do you think? Would it make sense to consider adding a field to
>> JCTree to store end positions, instead of using EndPosTable?
>>
>> I have a draft PR of the approach here:
>> https://github.com/openjdk/jdk/pull/28506
>>
>> Having end position information always available might enable some
>> potential improvements to javac. For example, some compilers indicate
>> a span of source text for some diagnostics, for example the 'range
>> highlighting' described in these clang docs:
>> https://clang.llvm.org/diagnostics.html
More information about the compiler-dev
mailing list