[External] : Re: Error examples page
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Aug 17 14:11:00 UTC 2023
Hi Ethan,
thanks for taking the time to do this exploration on javac (but not
only!) diagnostics.
As you have noted, javac does have its own idiomatic way to add "hints"
to an error message - long ago we settled on the rule that the very
first line of the error message should "stand out" and be
self-contained. After all, that is the line that is most likely to be
searched online. (as you notice, this rule is not followed everywhere,
and that's a bug!)
After the first line, javac will typically (depending on formatter
options) display the line at which the error was reported. After that,
we have some indented space where we can provide more information.
Surely, this is not structural (as in the approach you describe), but we
have been using this framework to generate dioagnostics that are, in
spirit, pretty close to the one you are aiming for.
For instance, around 2008, when we implemented the new formatter-based
architecture for diagnostics, we had the following goals:
* explain what occurrences of types were about (e.g. T is a
type-variable with bounds X, Y and Z - or I is an intersection type with
components A, B and C) - this is done in the RichDiagnosticFormatter
* capture near-misses - this was particularly an issue with overload
resolution, where javac just said "cannot apply method", but didn't say
"why", or which alternative overloads were available
* drop package qualifier in unambiguous contexts - this is also done in
the RichDiagnosticFormatter
When working on Project Lambda, it became clear that the "more helpful"
diagnostics around overload resolution were backfiring. It was not
unusual to get a pretty long error message, with nested diagnostics, and
little clue as to what went wrong. So, as part of the Java 8 effort, we
have tried to apply some heuristics to _rewrite_ some of the more
complex diagnostics into simpler ones (by speculating e.g. on which
overload candidate the user was likely to want to hear more about it).
This helped, but I think it is telling that, when using in anger, some
of the "improvements" listed above (such as listing all overload
candidates), turned out to be liabilities.
All this to say that the problem of generating good diagnostics is
rather tricky - and, most of all, a subjective one. Some developers
prefer terse diagnostics (this is especially the case when working on a
big build system). Other developers would prefer more help. But in a
real language like Java, there is a delicate balance between help and
"too much information". For instance, some of the stuff we did to spell
out "this is a type variable", or "this is a union type", is good on one
side, as developers can search for terms and learn what they are about -
but on the other end, they expose developers to the type-system rabbit
hole, which can be pretty deep. I'm sure that, with more complex error
messages, developers will simply try to "make the compiler happy" rather
than reading all the messages and try to understand as to why a type
with 2 nested wildcard type arguments cannot be assigned to this other
generic type.
Now, I'm sure that Rust gets its share of "complex error messages" when
you start tinkering with lifetime variables and all that. But one might
argue that stuff like that, in Rust, is for "advanced developers". For
better or worse, that's not the case with Java generics (or other
features leading to complex messages such as platform modules). That is,
even a relatively casual user can trigger pretty complex error messages
when using simple methods in the standard library:
```
jshell> (List<Object>)List.of(1, "");
| Error:
| incompatible types:
java.util.List<java.lang.Object&java.io.Serializable&java.lang.Comparable<?
extends java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>>
cannot be converted to java.util.List<java.lang.Object>
| (List<Object>)List.of(1, "");
| ^------------^
```
The distribution of errors you report is also interesting, and might
suggest that perhaps (as it's often the case in this industry) focussing
on specific fixes to specific error messages might provide a lot of
relief for casual develpers with relatively minor effort. For instance,
we know well that all parser errors are quite bad (recovering from
parsing error is science in itself), and likely the ones developers will
see first when interacting with the compiler. Then, "cannot find symbol"
I think it's bad because it talks about "symbol" which is not a term
that the Java developer understands (Java has fields, local variables,
methods, classes, enums - but not "symbols"). I think it would improve
things if the error message was more targeted (e.g. "no field foo in
class Bar"). As to whether we should list near misses - given past
experiences, I guess I'm a little biased against - but that's
subjective. Anyway, even if we wanted to do all that, no big
infrastructure change is needed - using diagnostic formatter, together
with multi-line diagnostics (useful to show table of nested diagnostics,
we use these for overload diagnostics) should be good enough to achieve
decent results.
Cheers
Maurizio
On 15/08/2023 00:48, Ethan McCue wrote:
> Also coming out of some of the other discussions[1][2] I think there
> is one table from a paper[3] that feels relevant to the discussion.
>
> | Compiler Error Message | n | % |
> |---------------------------------------|-------|-------|
> | cannot find symbol | 4,614 | 16.0% |
> | ‘)’ expected | 3,317 | 11.5% |
> | ‘;’ expected | 3,076 | 10.7% |
> | not a statement | 2,142 | 7.4% |
> | illegal start of expression | 1,825 | 6.3% |
> | reached end of file while parsing | 1,406 | 4.9% |
> | illegal start of type | 1,316 | 4.6% |
> | ‘else’ without ‘if’ | 1,141 | 4.0% |
> | bad operand types for binary operator | 1,138 | 3.9% |
> | <identifier> expected | 1,091 | 3.8% |
>
> This shows the frequency at which their sample of students ran into
> different errors.[4]
>
>
>
> [1]:
> https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/
> <https://urldefense.com/v3/__https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MJ6p8Bvg$>
> [2]:
> https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/
> <https://urldefense.com/v3/__https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MOliwT-U$>
> [3]:
> https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing
> <https://urldefense.com/v3/__https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MPxtS2C0$>
> [4]: I don't know whether this generalizes to the language as it will
> exist in the future or whether that particular paper is reliable, but
> it's a start.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20230817/c90089c3/attachment-0001.htm>
More information about the compiler-dev
mailing list