RFR: 8346785: Potential infinite loop in JavadocTokenizer.ensures

Wed Apr 23 12:21:54 UTC 2025

On Tue, 15 Apr 2025 14:12:49 GMT, Vicente Romero <vromero at openjdk.org> wrote:

>> Please review this patch to fix a potential infinite loop in `JavadocTokenizer.ensure` when `map.length` and `size + need` approach Interger.MAX_VALUE.
>> 
>> While I couldn't reproduce the issue even with large inputs (~1.9GB java file where almost the entire file is one javadoc comment), the fix is about correctness and prevention of UB in extreme cases.
>> 
>> TIA
>
> src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavadocTokenizer.java line 309:
> 
>> 307: 
>> 308:             while (need > grow && grow <= Integer.MAX_VALUE/2) {
>> 309:                 grow <<= 1;
> 
> I wonder why `map` is not an `ArrayList` which already have a grow strategy. If this is because of performance issues I think that stop growing the array as soon as `grow > Integer.MAX_VALUE / 2` is a bit premature as there is still plenty of room for a less aggressive growing strategy

`OffsetMap` instances holding an `int[]` array are created and held in memory for each processed doc comment, so I think the optimized implementation is justified by boxing and memory overhead. However, individual array size is rather small, usually 2 elements per line and/or Unicode escape. This amounts to a few dozen elements for a typical doc comment, with the longest doc comments in JDK source reaching the low thousands. So the growth by doubling should be fine for all practical uses of this class, and supporting huge sizes (or overflowing capacity) are rather theoretical problems.

Regarding the proposed fix, I think I would prefer a one-stop solution rather than the two-step approach, by which I mean let the overflow happen, and check for `grow < 0` directly afterwards here in the loop.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24620#discussion_r2055923696