[External] : Re: Error examples page

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Aug 17 14:11:00 UTC 2023


Hi Ethan,
thanks for taking the time to do this exploration on javac (but not 
only!) diagnostics.

As you have noted, javac does have its own idiomatic way to add "hints" 
to an error message - long ago we settled on the rule that the very 
first line of the error message should "stand out" and be 
self-contained. After all, that is the line that is most likely to be 
searched online. (as you notice, this rule is not followed everywhere, 
and that's a bug!)

After the first line, javac will typically (depending on formatter 
options) display the line at which the error was reported. After that, 
we have some indented space where we can provide more information. 
Surely, this is not structural (as in the approach you describe), but we 
have been using this framework to generate dioagnostics that are, in 
spirit, pretty close to the one you are aiming for.

For instance, around 2008, when we implemented the new formatter-based 
architecture for diagnostics, we had the following goals:

* explain what occurrences of types were about (e.g. T is a 
type-variable with bounds X, Y and Z - or I is an intersection type with 
components A, B and C) - this is done in the RichDiagnosticFormatter
* capture near-misses - this was particularly an issue with overload 
resolution, where javac just said "cannot apply method", but didn't say 
"why", or which alternative overloads were available
* drop package qualifier in unambiguous contexts - this is also done in 
the RichDiagnosticFormatter

When working on Project Lambda, it became clear that the "more helpful" 
diagnostics around overload resolution were backfiring. It was not 
unusual to get a pretty long error message, with nested diagnostics, and 
little clue as to what went wrong. So, as part of the Java 8 effort, we 
have tried to apply some heuristics to _rewrite_ some of the more 
complex diagnostics into simpler ones (by speculating e.g. on which 
overload candidate the user was likely to want to hear more about it). 
This helped, but I think it is telling that, when using in anger, some 
of the "improvements" listed above (such as listing all overload 
candidates), turned out to be liabilities.

All this to say that the problem of generating good diagnostics is 
rather tricky - and, most of all, a subjective one. Some developers 
prefer terse diagnostics (this is especially the case when working on a 
big build system). Other developers would prefer more help. But in a 
real language like Java, there is a delicate balance between help and 
"too much information". For instance, some of the stuff we did to spell 
out "this is a type variable", or "this is a union type", is good on one 
side, as developers can search for terms and learn what they are about - 
but on the other end, they expose developers to the type-system rabbit 
hole, which can be pretty deep. I'm sure that, with more complex error 
messages, developers will simply try to "make the compiler happy" rather 
than reading all the messages and try to understand as to why a type 
with 2 nested wildcard type arguments cannot be assigned to this other 
generic type.

Now, I'm sure that Rust gets its share of "complex error messages" when 
you start tinkering with lifetime variables and all that. But one might 
argue that stuff like that, in Rust, is for "advanced developers". For 
better or worse, that's not the case with Java generics (or other 
features leading to complex messages such as platform modules). That is, 
even a relatively casual user can trigger pretty complex error messages 
when using simple methods in the standard library:

```
jshell> (List<Object>)List.of(1, "");
|  Error:
|  incompatible types: 
java.util.List<java.lang.Object&java.io.Serializable&java.lang.Comparable<? 
extends java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>> 
cannot be converted to java.util.List<java.lang.Object>
|  (List<Object>)List.of(1, "");
|                ^------------^
```

The distribution of errors you report is also interesting, and might 
suggest that perhaps (as it's often the case in this industry) focussing 
on specific fixes to specific error messages might provide a lot of 
relief for casual develpers with relatively minor effort. For instance, 
we know well that all parser errors are quite bad (recovering from 
parsing error is science in itself), and likely the ones developers will 
see first when interacting with the compiler. Then, "cannot find symbol" 
I think it's bad because it talks about "symbol" which is not a term 
that the Java developer understands (Java has fields, local variables, 
methods, classes, enums - but not "symbols"). I think it would improve 
things if the error message was more targeted (e.g. "no field foo in 
class Bar"). As to whether we should list near misses - given past 
experiences, I guess I'm a little biased against - but that's 
subjective. Anyway, even if we wanted to do all that, no big 
infrastructure change is needed - using diagnostic formatter, together 
with multi-line diagnostics (useful to show table of nested diagnostics, 
we use these for overload diagnostics) should be good enough to achieve 
decent results.

Cheers
Maurizio



On 15/08/2023 00:48, Ethan McCue wrote:
> Also coming out of some of the other discussions[1][2] I think there 
> is one table from a paper[3] that feels relevant to the discussion.
>
> | Compiler Error Message                | n     | %     |
> |---------------------------------------|-------|-------|
> | cannot find symbol                    | 4,614 | 16.0% |
> | ‘)’ expected                          | 3,317 | 11.5% |
> | ‘;’ expected                          | 3,076 | 10.7% |
> | not a statement                       | 2,142 | 7.4%  |
> | illegal start of expression           | 1,825 | 6.3%  |
> | reached end of file while parsing     | 1,406 | 4.9%  |
> | illegal start of type                 | 1,316 | 4.6%  |
> | ‘else’ without ‘if’                   | 1,141 | 4.0%  |
> | bad operand types for binary operator | 1,138 | 3.9%  |
> | <identifier> expected                 | 1,091 | 3.8%  |
>
> This shows the frequency at which their sample of students ran into 
> different errors.[4]
>
>
>
> [1]: 
> https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/ 
> <https://urldefense.com/v3/__https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MJ6p8Bvg$>
> [2]: 
> https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/ 
> <https://urldefense.com/v3/__https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MOliwT-U$>
> [3]: 
> https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing 
> <https://urldefense.com/v3/__https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MPxtS2C0$>
> [4]: I don't know whether this generalizes to the language as it will 
> exist in the future or whether that particular paper is reliable, but 
> it's a start.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20230817/c90089c3/attachment-0001.htm>


More information about the compiler-dev mailing list