New pattern matching doc
Brian Goetz
brian.goetz at oracle.com
Thu Jan 14 21:48:37 UTC 2021
I hear you on this, I have had some concerns about this too. But also,
after looking at typical client code of APIs like JSONP, getting good
error messages there, while possible, is also pretty rare. Far more
common is not checking errors at all, and just assuming that the key is
present, mapped to an integer, the result will surely parse with
Integer.parseInt, etc. In those cases you get an exception, whose stack
trace might point you to the right line number, but you're not really
getting validation there either.
If you use an XPath-like API, you are more likely to get a sensible
error (that the path didn't lead to what you expected), because you're
basically handing a checkable schema to the library, but use of XPath in
Java is rare and XPath-like stuff for JSON in Java is even more rare.
So while I worry that using complex patterns to extract lots of goop
from a JSON document could turn into a debugging nightmare ("document
failed to match"), I worry even more that what we do today isn't even
correct (in addition to being painful), because the programming model
rarely retains the information with which to check the result, and even
when it does, is often hard to get the checks and order right, even with
good intentions. And if you get it right, the code is a nightmare to
read and maintain.
Stepping back, why am I trying to apply pattern matching here? Not
because it's cool (though it is). It's because code at the boundary
(which describes more code, as programs get smaller) have to deal with
untyped, semi-structured or unstructured data from the outside world
(JSON, XML, HTML, SQL result sets, etc.) That leaves Java developers
with a few choices:
- Program in a bad, dynamically-typed dialect of Java, where
everything is a String or List or Map or Object. I think we can agree
that we don't want to encourage this.
- Use a schema-driven tool for translating from the external
representation to Java types, such as JAXB, O/R mappers, etc. This
works OK when your schemae are under your control and stable, but
doesn't deal will with schema changes (let alone "no schema"), and often
has high performance costs (because of eager, expensive full
translations of the data in each direction).
- Use a parsing library, where you pick apart an input document in an
ad-hoc manner to extract the bits you want. This is pretty common, but
is unpleasant, verbose, error-prone, and hard-to-maintain. The number
of error cases you have to handle scales with the number of navigation
points and extractions. 80% of your code is paying tribute to the
parsing library, and usually the output is still a relatively lightly
typed bag of variables.
None of these choices are so happy. My observation here is that (a)
parsing and pattern matching have a lot in common (structural test +
conditional extraction) and (b) much of the pain associated with the
third option comes from the lack of composition. If we had a
test+conditional extract, that composed cleanly, which could deal with
unstructured data on input, and output clean strongly typed Java
variables, then this would be much more pleasant.
I think your concern amounts to "well, that might be better than what we
have now, but today's problems come from yesterday's solutions, so
tomorrow's probably will be one of visiblity into why something
failed." Right? And, unlike exposing this as a parsing library, where
you could pass in an error context that could accumulate debug
information, there is no obvious clean way to do this in the language.
> The issue, as I see it, is that I'm not entirely sure if a failure to
> match in such a large nested structure is going to help me construct
> a usable _error message_. As you're certainly aware, about 80% of the
> code in any good compiler is devoted to giving error messages that are
> actually useful to users. If I get a parse error, for example, I want to
> know - down to the level of lines and columns - which part of the input
> failed to match expectations.
>
> Is matching a structure like that going to be able to provide useful
> error messages if input _doesn't_ match? It seems like it just provides
> a binary true/false answer. If it's the case that it won't actually
> help with giving useful error messages, then I think that reduces the
> applicability of patterns to this particular class of problems. It
> follows that it also might mean that the nice things we're putting on
> top (such as the composition of patterns) won't actually see practical
> use, because people end up writing very simple patterns with at most
> one level of nesting.
>
> Now, you know me, I'm the first to try to apply pattern matching and
> algebraic data types to any and every problem. I'm a little concerned
> about possible over-engineering though.
>
> [0]https://github.com/openjdk/amber-docs/blob/master/site/design-notes/pattern-match-object-model.md#a-possible-approach-for-parsing-apis
>
More information about the amber-spec-experts
mailing list