JEP-198 - Lets start talking about JSON

Ethan McCue ethan at
Thu Dec 15 23:26:43 UTC 2022

> The 95%+ use case for working with JSON for your average java coder is
best done with data binding.

To be brave yet controversial: I'm not sure this is neccesarily true.

I will elaborate and respond to the other points after a hot cocoa, but the
last point is part of why I think that tree-crawling needs _something_
better as an API to fit the bill.

With my sketch that set of requirements would be represented as

    record Thing(
        List<Long> xs
    ) {
        static Thing fromJson(Json json)
            var defaultList = List.of(0L);
            return new Thing(Decoder.optionalNullableField(
                        x -> Long.parseLong(Decoder.string(x)),
                    x -> List.of(Decoder.long_(x))

Which isn't amazing at first glance, but also

   {"xs": null}
   {"xs": 5}
   {"xs": [5]}   {"xs": ["5"]}
   {"xs": [1, "2", "3"]}

these are some wildly varied structures. You could make a solid argument
that something which silently treats these all the same is
a bad API for all the reasons you would consider it a good one.

On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
lichtenberger.johannes at> wrote:

> I'll have to read the whole thing, but are pure JSON parsers really the
> go-to for most people? I'm a big advocate of providing also something
> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
> biased, of course, as I'm working on Brackit[1] in my spare time (which is
> also a query compiler and intended to be used with proven optimizations by
> document stores / JSON stores), but also can be used as an in-memory query
> engine.
> kind regards
> Johannes
> [1]
> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
> reinier at>:
>> A recent Advent-of-Code puzzle also made me double check the support of
>> JSON in the java core libs and it is indeed a curious situation that the
>> java core libs don’t cater to it particularly well.
>> However, I’m not seeing an easy way forward to try to close this hole in
>> the core library offerings.
>> If you need to stream huge swaths of JSON, generally there’s a clear unit
>> size that you can just databind. Something like:
>> String jsonStr = """ { "version": 5, "data": [
>>   -- 1 million relatively small records in this list --
>>   ] } """;
>> The usual swath of JSON parsers tend to support this (giving you a stream
>> of java instances created by databinding those small records one by one),
>> or if not, the best move forward is presumably to file a pull request with
>> those projects; the java.util.log experiment shows that trying to
>> ‘core-librarize’ needs that the community at large already fulfills with
>> third party deps isn’t a good move, especially if the core library variant
>> tries to oversimplify to avoid the trap of being too opinionated (which
>> core libs shouldn’t be). In other words, the need for ’stream this JSON for
>> me’ style APIs is even more exotic that Ethan is suggesting.
>> I see a fundamental problem here:
>>    - The 95%+ use case for working with JSON for your average java coder
>>    is best done with data binding.
>>    - core libs doesn’t want to provide it, partly because it’s got a
>>    large design space, partly because the field’s already covered by GSON and
>>    Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>>    that’s what Ethan thinks and I agree with this assessment.
>>    - A language that claims to be “batteries included” that doesn’t ship
>>    with a JSON parser in this era is dubious, to say the least.
>> I’m not sure how to square this circle. Hence it feels like core-libs
>> needs to hold some more fundamental debates first:
>>    - Maybe it’s time to state in a more or less official decree that
>>    well-established, large design space jobs will remain the purview of
>>    dependencies no matter how popular it has, unless being part of the
>>    core-libs adds something more fundamental the third party deps cannot bring
>>    to the table (such as language integration), or the community standardizes
>>    on a single library (JSR310’s story, more or less). JSON parsing would
>>    qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
>>    as Ethan pointed out.
>>    - Given that 99% of java projects, even really simple ones, start
>>    with maven/gradle and a list of deps, is that really a problem?
>> I’m honestly not sure what the right answer is. On one hand, the npm
>> ecosystem seems to be doing very well even though their ‘batteries
>> included’ situation is an utter shambles. Then again, the notion that your
>> average nodejs project includes 10x+ more dependencies than other languages
>> is likely a significant part of the security clown fiesta going on over
>> there as far as 3rd party deps is concerned, so by no means should java
>> just blindly emulate their solutions.
>> I don’t like the idea of shipping a non-data-binding JSON API in the core
>> libs. The root issue with JSON is that you just can’t tell how to interpret
>> any given JSON token, because that’s not how JSON is used in practice. What
>> does 5 mean? Could be that I’m to take that as an int, or as a double,
>> or perhaps even as a j.t.Instant (epoch-millis), and defaulting
>> behaviour (similar to j.u.Map’s .getOrDefault is *very* convenient to
>> parse most JSON out there in the real world - omitting k/v pairs whose
>> value is still on default is very common). That’s what makes those databind
>> libraries so enticing: Instead of trying to pattern match my way into this
>> behaviour:
>>    - If the element isn’t there at all or null, give me a list-of-longs
>>    with a single 0 in it.
>>    - If the element is a number, make me a list-of-longs with 1 value in
>>    it, that is that number, as long.
>>    - If the element is a string, parse it into a long, then get me a
>>    list with this one long value (because IEEE double rules mean sometimes you
>>    have to put these things in string form or they get mangled by javascript-
>>    eval style parsers).
>> And yet the above is quite common, and can easily be done by a
>> databinder, which sees you want a List<Long> for a field whose default
>> value is List.of(1L), and, armed with that knowledge, can transit the
>> JSON into java in that way.
>> You don’t *need* databinding to cater to this idea: You could for
>> example have a jsonNode.asLong(123) method that would parse a string if
>> need be, even. But this has nothing to do with pattern matching either.
>>  --Reinier Zwitserloot
>> On 15 Dec 2022 at 21:30:17, Ethan McCue <ethan at> wrote:
>>> I'm writing this to drive some forward motion and to nerd-snipe those
>>> who know better than I do into putting their thoughts into words.
>>> There are three ways to process JSON[1]
>>> - Streaming (Push or Pull)
>>> - Traversing a Tree (Realized or Lazy)
>>> - Declarative Databind (N ways)
>>> Of these, JEP-198 explicitly ruled out providing "JAXB style type safe
>>> data binding."
>>> No justification is given, but if I had to insert my own: mapping the
>>> Json model to/from the Java/JVM object model is a cursed combo of
>>> - Huge possible design space
>>> - Unpalatably large surface for backwards compatibility
>>> - Serialization! Boo![2]
>>> So for an artifact like the JDK, it probably doesn't make sense to
>>> include. That tracks.
>>> It won't make everyone happy, people like databind APIs, but it tracks.
>>> So for the "read flow" these are the things to figure out.
>>>                 | Should Provide? | Intended User(s) |
>>> ----------------+-----------------+------------------+
>>>  Streaming Push |                 |                  |
>>> ----------------+-----------------+------------------+
>>>  Streaming Pull |                 |                  |
>>> ----------------+-----------------+------------------+
>>>  Realized Tree  |                 |                  |
>>> ----------------+-----------------+------------------+
>>>  Lazy Tree      |                 |                  |
>>> ----------------+-----------------+------------------+
>>> At which point, we should talk about what "meets needs of Java
>>> developers using JSON" implies.
>>> JSON is ubiquitous. Most kinds of software us schmucks write could have
>>> a reason to interact with it.
>>> The full set of "user personas" therefore aren't practical for me to
>>> talk about.[3]
>>> JSON documents, however, are not so varied.
>>> - There are small ones (1-10kb)
>>> - There are medium ones (10-1000kb)
>>> - There are big ones (1000kb-???)
>>> - There are shallow ones
>>> - There are deep ones
>>> So that feels like an easier direction to talk about it from.
>>> This repo[4] has some convenient toy examples of how some of those APIs
>>> look in libraries
>>> in the ecosystem. Specifically the Streaming Pull and Realized Tree
>>> models.
>>>         User r = new User();
>>>         while (true) {
>>>             JsonToken token = reader.peek();
>>>             switch (token) {
>>>                 case BEGIN_OBJECT:
>>>                     reader.beginObject();
>>>                     break;
>>>                 case END_OBJECT:
>>>                     reader.endObject();
>>>                     return r;
>>>                 case NAME:
>>>                     String fieldname = reader.nextName();
>>>                     switch (fieldname) {
>>>                         case "id":
>>>                             r.setId(reader.nextString());
>>>                             break;
>>>                         case "index":
>>>                             r.setIndex(reader.nextInt());
>>>                             break;
>>>                         ...
>>>                         case "friends":
>>>                             r.setFriends(new ArrayList<>());
>>>                             Friend f = null;
>>>                             carryOn = true;
>>>                             while (carryOn) {
>>>                                 token = reader.peek();
>>>                                 switch (token) {
>>>                                     case BEGIN_ARRAY:
>>>                                         reader.beginArray();
>>>                                         break;
>>>                                     case END_ARRAY:
>>>                                         reader.endArray();
>>>                                         carryOn = false;
>>>                                         break;
>>>                                     case BEGIN_OBJECT:
>>>                                         reader.beginObject();
>>>                                         f = new Friend();
>>>                                         break;
>>>                                     case END_OBJECT:
>>>                                         reader.endObject();
>>>                                         r.getFriends().add(f);
>>>                                         break;
>>>                                     case NAME:
>>>                                         String fn = reader.nextName();
>>>                                         switch (fn) {
>>>                                             case "id":
>>> f.setId(reader.nextString());
>>>                                                 break;
>>>                                             case "name":
>>> f.setName(reader.nextString());
>>>                                                 break;
>>>                                         }
>>>                                         break;
>>>                                 }
>>>                             }
>>>                             break;
>>>                     }
>>>             }
>>> I think its not hard to argue that the streaming apis are brutalist. The
>>> above is Gson, but Jackson, moshi, etc
>>> seem at least morally equivalent.
>>> Its hard to write, hard to write *correctly*, and theres is a curious
>>> protensity towards pairing it
>>> with anemic, mutable models.
>>> That being said, it handles big documents and deep documents really
>>> well. It also performs
>>> pretty darn well and is good enough as a "fallback" when the intended
>>> user experience
>>> is through something like databind.
>>> So what could we do meaningfully better with the language we have
>>> today/will have tommorow?
>>> - Sealed interfaces + Pattern matching could give a nicer model for
>>> tokens
>>>         sealed interface JsonToken {
>>>             record Field(String name) implements JsonToken {}
>>>             record BeginArray() implements JsonToken {}
>>>             record EndArray() implements JsonToken {}
>>>             record BeginObject() implements JsonToken {}
>>>             record EndObject() implements JsonToken {}
>>>             // ...
>>>         }
>>>         // ...
>>>         User r = new User();
>>>         while (true) {
>>>             JsonToken token = reader.peek();
>>>             switch (token) {
>>>                 case BeginObject __:
>>>                     reader.beginObject();
>>>                     break;
>>>                 case EndObject __:
>>>                     reader.endObject();
>>>                     return r;
>>>                 case Field("id"):
>>>                     r.setId(reader.nextString());
>>>                     break;
>>>                 case Field("index"):
>>>                     r.setIndex(reader.nextInt());
>>>                     break;
>>>                 // ...
>>>                 case Field("friends"):
>>>                     r.setFriends(new ArrayList<>());
>>>                     Friend f = null;
>>>                     carryOn = true;
>>>                     while (carryOn) {
>>>                         token = reader.peek();
>>>                         switch (token) {
>>>                 // ...
>>> - Value classes can make it all more efficient
>>>         sealed interface JsonToken {
>>>             value record Field(String name) implements JsonToken {}
>>>             value record BeginArray() implements JsonToken {}
>>>             value record EndArray() implements JsonToken {}
>>>             value record BeginObject() implements JsonToken {}
>>>             value record EndObject() implements JsonToken {}
>>>             // ...
>>>         }
>>> - (Fun One) We can transform a simpler-to-write push parser into a pull
>>> parser with Coroutines
>>>     This is just a toy we could play with while making something in the
>>> JDK. I'm pretty sure
>>>     we could make a parser which feeds into something like
>>>         interface Listener {
>>>             void onObjectStart();
>>>             void onObjectEnd();
>>>             void onArrayStart();
>>>             void onArrayEnd();
>>>             void onField(String name);
>>>             // ...
>>>         }
>>>     and invert a loop like
>>>         while (true) {
>>>             char c = next();
>>>             switch (c) {
>>>                 case '{':
>>>                     listener.onObjectStart();
>>>                     // ...
>>>                 // ...
>>>             }
>>>         }
>>>     by putting a Coroutine.yield in the callback.
>>>     That might be a meaningful simplification in code structure, I don't
>>> know enough to say.
>>> But, I think there are some hard questions like
>>> - Is the intent[5] to be make backing parser for ecosystem databind apis?
>>> - Is the intent that users who want to handle big/deep documents fall
>>> back to this?
>>> - Are those new language features / conveniences enough to offset the
>>> cost of committing to a new api?
>>> - To whom exactly does a low level api provide value?
>>> - What benefit is standardization in the JDK?
>>> and just generally - who would be the consumer(s) of this?
>>> The other kind of API still on the table is a Tree. There are two ways
>>> to handle this
>>> 1. Load it into `Object`. Use a bunch of instanceof checks/casts to
>>> confirm what it actually is.
>>>         Object v;
>>>         User u = new User();
>>>         if ((v = jso.get("id")) != null) {
>>>             u.setId((String) v);
>>>         }
>>>         if ((v = jso.get("index")) != null) {
>>>             u.setIndex(((Long) v).intValue());
>>>         }
>>>         if ((v = jso.get("guid")) != null) {
>>>             u.setGuid((String) v);
>>>         }
>>>         if ((v = jso.get("isActive")) != null) {
>>>             u.setIsActive(((Boolean) v));
>>>         }
>>>         if ((v = jso.get("balance")) != null) {
>>>             u.setBalance((String) v);
>>>         }
>>>         // ...
>>>         if ((v = jso.get("latitude")) != null) {
>>>             u.setLatitude(v instanceof BigDecimal ? ((BigDecimal)
>>> v).doubleValue() : (Double) v);
>>>         }
>>>         if ((v = jso.get("longitude")) != null) {
>>>             u.setLongitude(v instanceof BigDecimal ? ((BigDecimal)
>>> v).doubleValue() : (Double) v);
>>>         }
>>>         if ((v = jso.get("greeting")) != null) {
>>>             u.setGreeting((String) v);
>>>         }
>>>         if ((v = jso.get("favoriteFruit")) != null) {
>>>             u.setFavoriteFruit((String) v);
>>>         }
>>>         if ((v = jso.get("tags")) != null) {
>>>             List<Object> jsonarr = (List<Object>) v;
>>>             u.setTags(new ArrayList<>());
>>>             for (Object vi : jsonarr) {
>>>                 u.getTags().add((String) vi);
>>>             }
>>>         }
>>>         if ((v = jso.get("friends")) != null) {
>>>             List<Object> jsonarr = (List<Object>) v;
>>>             u.setFriends(new ArrayList<>());
>>>             for (Object vi : jsonarr) {
>>>                 Map<String, Object> jso0 = (Map<String, Object>) vi;
>>>                 Friend f = new Friend();
>>>                 f.setId((String) jso0.get("id"));
>>>                 f.setName((String) jso0.get("name"));
>>>                 u.getFriends().add(f);
>>>             }
>>>         }
>>> 2. Have an explicit model for Json, and helper methods that do said
>>> casts[6]
>>> this.setSiteSetting(readFromJson(jsonObject.getJsonObject("site")));
>>> JsonArray groups = jsonObject.getJsonArray("group");
>>> if(groups != null)
>>> {
>>> int len = groups.size();
>>> for(int i=0; i<len; i++)
>>> {
>>> JsonObject grp = groups.getJsonObject(i);
>>> SNMPSetting grpSetting = readFromJson(grp);
>>> String grpName = grp.getString("dbgroup", null);
>>> if(grpName != null && grpSetting != null)
>>> this.groupSettings.put(grpName, grpSetting);
>>> }
>>> }
>>> JsonArray hosts = jsonObject.getJsonArray("host");
>>> if(hosts != null)
>>> {
>>> int len = hosts.size();
>>> for(int i=0; i<len; i++)
>>> {
>>> JsonObject host = hosts.getJsonObject(i);
>>> SNMPSetting hostSetting = readFromJson(host);
>>> String hostName = host.getString("dbhost", null);
>>> if(hostName != null && hostSetting != null)
>>> this.hostSettings.put(hostName, hostSetting);
>>> }
>>> }
>>> I think what has become easier to represent in the language nowadays is
>>> that explicit model for Json.
>>> Its the 101 lesson of sealed interfaces.[7] It feels nice and clean.
>>>         sealed interface Json {
>>>             final class Null implements Json {}
>>>             final class True implements Json {}
>>>             final class False implements Json {}
>>>             final class Array implements Json {}
>>>             final class Object implements Json {}
>>>             final class String implements Json {}
>>>             final class Number implements Json {}
>>>         }
>>> And the cast-and-check approach is now more viable on account of pattern
>>> matching.
>>>         if (jso.get("id") instanceof String v) {
>>>             u.setId(v);
>>>         }
>>>         if (jso.get("index") instanceof Long v) {
>>>             u.setIndex(v.intValue());
>>>         }
>>>         if (jso.get("guid") instanceof String v) {
>>>             u.setGuid(v);
>>>         }
>>>         // or
>>>         if (jso.get("id") instanceof String id &&
>>>                 jso.get("index") instanceof Long index &&
>>>                 jso.get("guid") instanceof String guid) {
>>>             return new User(id, index, guid, ...); // look ma, no
>>> setters!
>>>         }
>>> And on the horizon, again, is value types.
>>> But there are problems with this approach beyond the performance
>>> implications of loading into
>>> a tree.
>>> For one, all the code samples above have different behaviors around null
>>> keys and missing keys
>>> that are not obvious from first glance.
>>> This won't accept any null or missing fields
>>>         if (jso.get("id") instanceof String id &&
>>>                 jso.get("index") instanceof Long index &&
>>>                 jso.get("guid") instanceof String guid) {
>>>             return new User(id, index, guid, ...);
>>>         }
>>> This will accept individual null or missing fields, but also will
>>> silently ignore
>>> fields with incorrect types
>>>         if (jso.get("id") instanceof String v) {
>>>             u.setId(v);
>>>         }
>>>         if (jso.get("index") instanceof Long v) {
>>>             u.setIndex(v.intValue());
>>>         }
>>>         if (jso.get("guid") instanceof String v) {
>>>             u.setGuid(v);
>>>         }
>>> And, compared to databind where there is information about the expected
>>> structure of the document
>>> and its the job of the framework to assert that, I posit that the errors
>>> that would be encountered
>>> when writing code against this would be more like
>>>     "something wrong with user"
>>> than
>>>     "problem at users[5].name, expected string or null. got 5"
>>> Which feels unideal.
>>> One approach I find promising is something close to what Elm does with
>>> its decoders[8]. Not just combining assertion
>>> and binding like what pattern matching with records allows, but
>>> including a scheme for bubbling/nesting errors.
>>>     static String string(Json json) throws JsonDecodingException {
>>>         if (!(json instanceof Json.String jsonString)) {
>>>             throw JsonDecodingException.of(
>>>                     "expected a string",
>>>                     json
>>>             );
>>>         } else {
>>>             return jsonString.value();
>>>         }
>>>     }
>>>     static <T> T field(Json json, String fieldName, Decoder<? extends T>
>>> valueDecoder) throws JsonDecodingException {
>>>         var jsonObject = object(json);
>>>         var value = jsonObject.get(fieldName);
>>>         if (value == null) {
>>>             throw JsonDecodingException.atField(
>>>                     fieldName,
>>>                     JsonDecodingException.of(
>>>                             "no value for field",
>>>                             json
>>>                     )
>>>             );
>>>         }
>>>         else {
>>>             try {
>>>                 return valueDecoder.decode(value);
>>>             } catch (JsonDecodingException e) {
>>>                 throw JsonDecodingException.atField(
>>>                         fieldName,
>>>                         e
>>>                 );
>>>             }  catch (Exception e) {
>>>                 throw JsonDecodingException.atField(fieldName,
>>> JsonDecodingException.of(e, value));
>>>             }
>>>         }
>>>     }
>>> Which I think has some benefits over the ways I've seen of working with
>>> trees.
>>> - It is declarative enough that folks who prefer databind might be happy
>>> enough.
>>>         static User fromJson(Json json) {
>>>             return new User(
>>>                 Decoder.field(json, "id", Decoder::string),
>>>                 Decoder.field(json, "index", Decoder::long_),
>>>                 Decoder.field(json, "guid", Decoder::string),
>>>             );
>>>         }
>>>         / ...
>>>         List<User> users = Decoders.array(json, User::fromJson);
>>> - Handling null and optional fields could be less easily conflated
>>>     Decoder.field(json, "id", Decoder::string);
>>>     Decoder.nullableField(json, "id", Decoder::string);
>>>     Decoder.optionalField(json, "id", Decoder::string);
>>>     Decoder.optionalNullableField(json, "id", Decoder::string);
>>> - It composes well with user defined classes
>>>     record Guid(String value) {
>>>         Guid {
>>>             // some assertions on the structure of value
>>>         }
>>>     }
>>>     Decoder.string(json, "guid", guid -> new Guid(Decoder.string(guid)));
>>>     // or even
>>>     record Guid(String value) {
>>>         Guid {
>>>             // some assertions on the structure of value
>>>         }
>>>         static Guid fromJson(Json json) {
>>>             return new Guid(Decoder.string(guid));
>>>         }
>>>     }
>>>     Decoder.string(json, "guid", Guid::fromJson);
>>> - When something goes wrong, the API can handle the fiddlyness of
>>> capturing information for feedback.
>>>     In the code I've sketched out its just what field/index things went
>>> wrong at. Potentially
>>>     capturing metadata like row/col numbers of the source would be
>>> sensible too.
>>>     Its just not reasonable to expect devs to do extra work to get that
>>> and its really nice to give it.
>>> There are also some downsides like
>>> -  I do not know how compatible it would be with lazy trees.
>>>      Lazy trees being the only way that a tree api could handle big or
>>> deep documents.
>>>      The general concept as applied in libraries like json-tree[9] is to
>>> navigate without
>>>      doing any work, and that clashes with wanting to instanceof check
>>> the info at the
>>>      current path.
>>> - It *almost* gives enough information to be a general schema approach
>>>     If one field fails, that in the model throws an exception
>>> immediately. If an API should
>>>     return "errors": [...], that is inconvenient to construct.
>>> - None of the existing popular libraries are doing this
>>>      The only mechanics that are strictly required to give this sort of
>>> API is lambdas. Those have
>>>      been out for a decade. Yes sealed interfaces make the data model
>>> prettier but in concept you
>>>      can build the same thing on top of anything.
>>>      I could argue that this is because of "cultural momentum" of
>>> databind or some other reason,
>>>      but the fact remains that it isn't a proven out approach.
>>>      Writing Json libraries is a todo list[10]. There are a lot of bad
>>> ideas and this might be one of the,
>>> - Performance impact of so many instanceof checks
>>>     I've gotten a 4.2% slowdown compared to the "regular" tree code
>>> without the repeated casts.
>>>     But that was with a parser that is 5x slower than Jacksons. (using
>>> the same benchmark project as for the snippets).
>>>     I think there could be reason to believe that the JIT does well
>>> enough with repeated instanceof
>>>     checks to consider it.
>>> My current thinking is that - despite not solving for large or deep
>>> documents - starting with a really "dumb" realized tree api
>>> might be the right place to start for the read side of a potential
>>> incubator module.
>>> But regardless - this feels like a good time to start more concrete
>>> conversations. I fell I should cap this email since I've reached the point
>>> of decoherence and haven't even mentioned the write side of things
>>> [1]:
>>> [2]:
>>> [3]: I only know like 8 people
>>> [4]:
>>> [5]: When I say "intent", I do so knowing full well no one has been
>>> actively thinking of this for an entire Game of Thrones
>>> [6]:
>>> [7]:
>>> [8]:
>>> [9]:
>>> [10]:
>>> [11]: In 30 days JEP-198 it will be recognizably PI days old for the 2nd
>>> time in its history.
>>> [12]: To me, the fact that is still an open JEP is more a social
>>> convenience than anything. I could just as easily writing this exact same
>>> email about TOML.
