<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Ethan,<br>
thanks for taking the time to do this exploration on javac (but
not only!) diagnostics.</p>
<p>As you have noted, javac does have its own idiomatic way to add
"hints" to an error message - long ago we settled on the rule that
the very first line of the error message should "stand out" and be
self-contained. After all, that is the line that is most likely to
be searched online. (as you notice, this rule is not followed
everywhere, and that's a bug!)<br>
</p>
<p>After the first line, javac will typically (depending on
formatter options) display the line at which the error was
reported. After that, we have some indented space where we can
provide more information. Surely, this is not structural (as in
the approach you describe), but we have been using this framework
to generate dioagnostics that are, in spirit, pretty close to the
one you are aiming for.</p>
<p>For instance, around 2008, when we implemented the new
formatter-based architecture for diagnostics, we had the following
goals:</p>
<p>* explain what occurrences of types were about (e.g. T is a
type-variable with bounds X, Y and Z - or I is an intersection
type with components A, B and C) - this is done in the
RichDiagnosticFormatter<br>
* capture near-misses - this was particularly an issue with
overload resolution, where javac just said "cannot apply method",
but didn't say "why", or which alternative overloads were
available<br>
* drop package qualifier in unambiguous contexts - this is also
done in the RichDiagnosticFormatter</p>
<p>When working on Project Lambda, it became clear that the "more
helpful" diagnostics around overload resolution were backfiring.
It was not unusual to get a pretty long error message, with nested
diagnostics, and little clue as to what went wrong. So, as part of
the Java 8 effort, we have tried to apply some heuristics to
_rewrite_ some of the more complex diagnostics into simpler ones
(by speculating e.g. on which overload candidate the user was
likely to want to hear more about it). This helped, but I think it
is telling that, when using in anger, some of the "improvements"
listed above (such as listing all overload candidates), turned out
to be liabilities.</p>
<p>All this to say that the problem of generating good diagnostics
is rather tricky - and, most of all, a subjective one. Some
developers prefer terse diagnostics (this is especially the case
when working on a big build system). Other developers would prefer
more help. But in a real language like Java, there is a delicate
balance between help and "too much information". For instance,
some of the stuff we did to spell out "this is a type variable",
or "this is a union type", is good on one side, as developers can
search for terms and learn what they are about - but on the other
end, they expose developers to the type-system rabbit hole, which
can be pretty deep. I'm sure that, with more complex error
messages, developers will simply try to "make the compiler happy"
rather than reading all the messages and try to understand as to
why a type with 2 nested wildcard type arguments cannot be
assigned to this other generic type.</p>
<p>Now, I'm sure that Rust gets its share of "complex error
messages" when you start tinkering with lifetime variables and all
that. But one might argue that stuff like that, in Rust, is for
"advanced developers". For better or worse, that's not the case
with Java generics (or other features leading to complex messages
such as platform modules). That is, even a relatively casual user
can trigger pretty complex error messages when using simple
methods in the standard library:</p>
<p>```<br>
jshell> (List<Object>)List.of(1, "");<br>
| Error:<br>
| incompatible types:
java.util.List<java.lang.Object&java.io.Serializable&java.lang.Comparable<?
extends
java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>>
cannot be converted to java.util.List<java.lang.Object><br>
| (List<Object>)List.of(1, "");<br>
| ^------------^<br>
```<br>
</p>
<p>The distribution of errors you report is also interesting, and
might suggest that perhaps (as it's often the case in this
industry) focussing on specific fixes to specific error messages
might provide a lot of relief for casual develpers with relatively
minor effort. For instance, we know well that all parser errors
are quite bad (recovering from parsing error is science in
itself), and likely the ones developers will see first when
interacting with the compiler. Then, "cannot find symbol" I think
it's bad because it talks about "symbol" which is not a term that
the Java developer understands (Java has fields, local variables,
methods, classes, enums - but not "symbols"). I think it would
improve things if the error message was more targeted (e.g. "no
field foo in class Bar"). As to whether we should list near misses
- given past experiences, I guess I'm a little biased against -
but that's subjective. Anyway, even if we wanted to do all that,
no big infrastructure change is needed - using diagnostic
formatter, together with multi-line diagnostics (useful to show
table of nested diagnostics, we use these for overload
diagnostics) should be good enough to achieve decent results.</p>
<p>Cheers<br>
Maurizio<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 15/08/2023 00:48, Ethan McCue wrote:<br>
</div>
<blockquote type="cite" cite="mid:CA+NR86h6_wcBV4BxvPx6xQBWwbN1v5Gi_gbhykRFwO-U_KcPsg@mail.gmail.com">
<div dir="ltr">
<div>Also coming out of some of the other discussions[1][2] I
think there is one table from a paper[3] that feels relevant
to the discussion.<br>
<font face="monospace"><br>
| Compiler Error Message | n | % |<br>
|---------------------------------------|-------|-------|<br>
| cannot find symbol | 4,614 | 16.0% |<br>
| ‘)’ expected | 3,317 | 11.5% |<br>
| ‘;’ expected | 3,076 | 10.7% |<br>
| not a statement | 2,142 | 7.4% |<br>
| illegal start of expression | 1,825 | 6.3% |<br>
| reached end of file while parsing | 1,406 | 4.9% |<br>
| illegal start of type | 1,316 | 4.6% |<br>
| ‘else’ without ‘if’ | 1,141 | 4.0% |<br>
| bad operand types for binary operator | 1,138 | 3.9% |<br>
| <identifier> expected | 1,091 | 3.8%
|<br>
</font><br>
This shows the frequency at which their sample of students ran
into different errors.[4]<br>
<br>
<br>
<br>
[1]: <a href="https://urldefense.com/v3/__https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MJ6p8Bvg$" moz-do-not-send="true">https://www.reddit.com/r/programming/comments/15qzny5/better_java_compiler_error_messages/</a><br>
[2]: <a href="https://urldefense.com/v3/__https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MOliwT-U$" moz-do-not-send="true">https://www.reddit.com/r/java/comments/15qzkh9/better_java_compiler_error_messages/</a><br>
[3]: <a href="https://urldefense.com/v3/__https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing__;!!ACWV5N9M2RV99hQ!OJ4RfE35tRDXZM0iDFEEndNd4Gz5aeGGI09BczVXBUmshIXcZPhj3s5z6c_maHzCNjOoFXUrWUJe4--MPxtS2C0$" moz-do-not-send="true">https://drive.google.com/file/d/1GUj-KQMzWhuWTk7ksAIgABKiGkHseHQL/view?usp=sharing</a><br>
[4]: I don't know whether this generalizes to the language as
it will exist in the future or whether that particular paper
is reliable, but it's a start.</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> </div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>