<div dir="ltr">Thanks for the thoughts on this! I have filed JDK-8372948 to track the potential change to end positions.<div><br></div><div>> <span style="color:rgb(0,0,0)">That being said, it is a testament to that early design that it has endured so long with many pervasive incremental upgrades.<br><br>I think this is well put, and the example of linked lists in List and Scope that </span><span style="color:rgb(0,0,0)">Mauritzio raised </span><span style="color:rgb(0,0,0)">seems like a good example of those decisions holding up.</span></div><div><span style="color:rgb(0,0,0)"><br></span></div><div><span style="color:rgb(0,0,0)">I did some more </span>archeology, and<span style="color:rgb(0,0,0)"> storing end positions in a separate map was present in the first version of the OpenJDK sources that there's git history for, which I think predates the introduction of annotation processing. I think it can both be the case that it was a good decision at the time (memory was more constrained, and the compiler was primarily a batch compiler), and that it's a reasonable time to revisit it.</span></div><div><br></div><div>> <span style="color:rgb(0,0,0)">Then there's the general lack of data-orientedness of the javac design.</span></div><div><span style="color:rgb(0,0,0)"><br></span></div><div><span style="color:rgb(0,0,0)">I have worked on some compiler-adjacent tools that lean more into being data oriented, representing symbols and types as immutable data, then putting information computed during compilation passes in separate tables keyed off symbols or types. I have found it to be pleasant to reason about, compared to having more mutable and lazily computed state directly in the symbol and type representations.</span></div><div><span style="color:rgb(0,0,0)"><br></span></div><div><span style="color:rgb(0,0,0)">Going through separate maps doesn't provide as nice an object oriented API as having getters directly on Symbol/Type, and I don't think that approach necessarily makes sense for javac, but it does seem interesting to consider areas where javac could benefit from being more data-oriented.</span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 2, 2025 at 4:07 PM Jan Lahoda <<a href="mailto:jan.lahoda@oracle.com" target="_blank">jan.lahoda@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
<br>
Yes, I think it would make sense to look into moving the end positions <br>
into the trees, as the types of compilations that don't use end <br>
positions are, I think, getting rarer. I took a peek at the draft PR, <br>
and overall it seems reasonable to me. (I'd need a more detailed pass to <br>
fully review, though.)<br>
<br>
<br>
Thanks,<br>
<br>
Jan<br>
<br>
<br>
On 11/27/25 14:45, Maurizio Cimadamore wrote:<br>
> Hi Liam, Archie<br>
> I believe this came up also in the recent discussions on lint <br>
> warnings, where we needed to expand the set of end positions retained <br>
> by default.<br>
><br>
> I think the long term solution here is, as you say (and I think Jan <br>
> supports that too) that end positions should be just stored by default <br>
> in the trees.<br>
><br>
> There's a lot of things javac does "its own way" for reasons sometimes <br>
> good, sometimes less good. For instance, javac had to use its own <br>
> `List` because when it was written generics were not yet available. <br>
> That said, one issue with using just a plain j.l.List (like ArrayList) <br>
> in javac is that (a) javac typically operates on very small lists, <br>
> where the overhead of array lists might be too big and (b) j.u.List is <br>
> very bad for recursing algorithms, which is what javac is all about. <br>
> So, I believe using a custom List impl there seems to be a good trade <br>
> off.<br>
><br>
> Other areas that are brought up from time to time are:<br>
><br>
> * use of special data structures for scopes -- why not just maps?<br>
> * use of special data structures for names -- why not just strings?<br>
><br>
> I believe we did some experiments on the former, and concluded that <br>
> javac implementation was still better than a hashmap (as javac <br>
> requirement are specialized, and scopes need to be traversed in <br>
> different ways, and somtimes a new scope needs to be "pushed" on top <br>
> of an old one -- reusing the undelrying entries). For names I'm less <br>
> sure, but maybe somebody else knows the answer there.<br>
><br>
> Then there's the general lack of data-orientedness of the javac <br>
> design. Lots of classes with lots of visitors everywhere, and various <br>
> ways to query "are you a T". I would like very much, one day, to make <br>
> the Type/Symbol/Tree hierarchies sealed, and get rid of all the <br>
> various kinds/tags, etc. and maybe even see if we can get rid of <br>
> visitors and just use plain code with pattern matching.<br>
><br>
> Maurizio<br>
><br>
><br>
> On 26/11/2025 13:01, Liam Miller-Cushon wrote:<br>
>> Hi,<br>
>><br>
>> I wanted to discuss how javac handles end positions, and get input on <br>
>> the possibility of having the compiler unconditionally store end <br>
>> positions in a field on JCTree instead of in a separate map.<br>
>><br>
>> Currently end position information is not stored by default, but is <br>
>> enabled in certain modes: if -Xjcov is set, or if the compilation <br>
>> includes diagnostic listeners, task listeners, or annotation <br>
>> processors (since they may want end positions).<br>
>><br>
>> The hash table used to store the end positions was optimized in JDK 9 <br>
>> in JDK-8033287, and there was some related discussion on <br>
>> compiler-dev@ about the motivation for making end positions optional <br>
>> at that time.<br>
>><br>
>> As I understand it, the goal is to save memory in the case where end <br>
>> positions aren't needed. That savings comes with a trade-off when end <br>
>> positions are needed, though, since the map is less efficient than <br>
>> storing the position directly in JCTree.<br>
>><br>
>> Today, many invocations of javac will need end position information <br>
>> (annotation processing is common, when javac is used programatically <br>
>> in IDEs end positions will be enabled). For the invocations that do <br>
>> not need end positions, typical developer machines are less memory <br>
>> constrained than they were when the optimization for end positions <br>
>> was first introduced.<br>
>><br>
>> Looking at the compilation of java.base, it contains about 3000 files <br>
>> and creates about 3 million AST nodes, so adding an int field to <br>
>> JCTree to store end positions would take about 12MB.<br>
>><br>
>> What do you think? Would it make sense to consider adding a field to <br>
>> JCTree to store end positions, instead of using EndPosTable?<br>
>><br>
>> I have a draft PR of the approach here: <br>
>> <a href="https://github.com/openjdk/jdk/pull/28506" rel="noreferrer" target="_blank">https://github.com/openjdk/jdk/pull/28506</a><br>
>><br>
>> Having end position information always available might enable some <br>
>> potential improvements to javac. For example, some compilers indicate <br>
>> a span of source text for some diagnostics, for example the 'range <br>
>> highlighting' described in these clang docs: <br>
>> <a href="https://clang.llvm.org/diagnostics.html" rel="noreferrer" target="_blank">https://clang.llvm.org/diagnostics.html</a><br>
</blockquote></div>