JEP-198 - Lets start talking about JSON

Tue Feb 28 17:16:08 UTC 2023

As an update to my character arc, I documented and wrote up an explanation
for the prototype library I was working on.[1]

And I've gotten a good deal of feedback on reddit[2] and in private.

I think its relevant to the conversation here in the sense of

- There are more of rzwitserloot's objections to read on the general
concept JSON as a built in.[3]
- There are a lot of well reasoned objections to the manner in which I am
interpreting a JSON tree, as well
as objections to the usage of a tree as the core. JEP 198's current writeup
(which I know is subject to a rewrite/retraction)
presumes that an immutable tree would be the core data structure.
- The peanut gallery might be interested in a "base" to implement whatever
their take on an API should be.

For that last category, I have a method-handle proxy written up for those
who want to try the "push parser into a pull parser"
transformation I alluded to in my first email of this thread.

[1]: https://mccue.dev/pages/2-26-23-json
[2]:
https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/
[3]: Including one that reddit took down, but can be seen through reveddit
https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6&limit=1&sort=new&show=t1_jaa3x0q&removal_status=all

On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue <ethan at mccue.dev> wrote:

> Sidenote about "Project Galahad" - I know Graal uses json for a few things
> including a reflection-config.json. Food for thought.
>
> > the java.util.log experiment shows that trying to ‘core-librarize’ needs
> that the community at large already fulfills with third party deps isn’t a
> good move,
>
> I, personally, do not have much historical context for java.util.log. What
> feels distinct about providing a JSON api is that
> logging is an implicitly global thing. If a JSON api doesn't fill all
> ecosystem niches, multiple can be used alongside
> each other.
>
> > The root issue with JSON is that you just can’t tell how to interpret
> any given JSON token
>
> The point where this could be an issue is numbers. Once something is
> identified as a number we can
>
> 1. Parse it immediately. Using a long and falling back to a BigInteger.
> For decimals its harder to know
> whether to use a double or BigDecimal internally. In the library I've been
> copy pasting from to build
> a prototype that last one is an explicit option and it defaults to doubles
> for the whole parse.
> 2. Store the string and parse it upon request. We can still model it as a
> Json.Number, but the
> work of interpreting is deferred.
>
> But in general, making a tree of json values doesn't particularly affect
> our ability to interpret it
> in a certain way. That interpretation is just positional. That's just as
> true as when making assertions
> in the form of class structure and field types as it is when making
> assertions in the form of code.[2]
>
>     record Thing(Instant a) {}
>
>     // vs.
>
>     Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))
>
> If anything, using a named type as a lookup key for a deserialization
> function is the less obvious
> way to do this.
>
> > I’m not sure how to square this circle
> > I don’t like the idea of shipping a non-data-binding JSON API in the
> core libs.
>
> I think the way to cube this rhombus is to find ways to like the idea of a
> non-data-binding JSON API. ¯\_(ツ)_/¯
>
> My personal journey with that is reaching its terminus here I think.
>
> Look on the bright side though - there are legit upsides to explicit tree
> plucking!
>
> Yeah, the friction per field is slightly higher, but the relative
> friction of custom types, or multiple construction methods for a
> particular type, or maintaining compatibility with
> legacy representations, or even just handling a top level list of things -
> its much lower.
>
> And all that complexity - that an instant is made by looking for a long or
> that it is parsed from a string in a
> particular format - it lives in Java code you can see, touch, feel and
> taste.
>
> I know "nobody does this"[2] but it's not that bad, actually.
>
> [1]: I do apologize for the code sketches consistently being "what I think
> an interaction with a tree api should look like."
> That is what I have been thinking about for a while so it's hard to resist.
> [2]: https://youtu.be/dOgfWXw9VrI?t=1225
>
> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue <ethan at mccue.dev> wrote:
>
>> > are pure JSON parsers really the go-to for most people?
>>
>> Depends on what you mean by JSON parsers and it depends on what you mean
>> by people.
>>
>> To the best of my knowledge, both python and Javascript do not include
>> streaming, databinding, or path navigation capabilities in their json
>> parsers.
>>
>>
>> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue <ethan at mccue.dev> wrote:
>>
>>> > The 95%+ use case for working with JSON for your average java coder is
>>> best done with data binding.
>>>
>>> To be brave yet controversial: I'm not sure this is neccesarily true.
>>>
>>> I will elaborate and respond to the other points after a hot cocoa, but
>>> the last point is part of why I think that tree-crawling needs _something_
>>> better as an API to fit the bill.
>>>
>>> With my sketch that set of requirements would be represented as
>>>
>>>     record Thing(
>>>         List<Long> xs
>>>     ) {
>>>         static Thing fromJson(Json json)
>>>             var defaultList = List.of(0L);
>>>             return new Thing(Decoder.optionalNullableField(
>>>                 json
>>>                 "xs",
>>>                 Decoder.oneOf(
>>>                     Decoder.array(Decoder.oneOf(
>>>                         x -> Long.parseLong(Decoder.string(x)),
>>>                         Decoder::long
>>>                     ))
>>>                     Decoder.null_(defaultList),
>>>                     x -> List.of(Decoder.long_(x))
>>>                 ),
>>>                 defaultList
>>>             ));
>>>         )
>>>     }
>>>
>>> Which isn't amazing at first glance, but also
>>>
>>>    {}
>>>    {"xs": null}
>>>    {"xs": 5}
>>>    {"xs": [5]}   {"xs": ["5"]}
>>>    {"xs": [1, "2", "3"]}
>>>
>>> these are some wildly varied structures. You could make a solid argument
>>> that something which silently treats these all the same is
>>> a bad API for all the reasons you would consider it a good one.
>>>
>>> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
>>> lichtenberger.johannes at gmail.com> wrote:
>>>
>>>> I'll have to read the whole thing, but are pure JSON parsers really the
>>>> go-to for most people? I'm a big advocate of providing also something
>>>> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
>>>> biased, of course, as I'm working on Brackit[1] in my spare time (which is
>>>> also a query compiler and intended to be used with proven optimizations by
>>>> document stores / JSON stores), but also can be used as an in-memory query
>>>> engine.
>>>>
>>>> kind regards
>>>> Johannes
>>>>
>>>> [1] https://github.com/sirixdb/brackit
>>>>
>>>> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
>>>> reinier at zwitserloot.com>:
>>>>
>>>>> A recent Advent-of-Code puzzle also made me double check the support
>>>>> of JSON in the java core libs and it is indeed a curious situation that the
>>>>> java core libs don’t cater to it particularly well.
>>>>>
>>>>> However, I’m not seeing an easy way forward to try to close this hole
>>>>> in the core library offerings.
>>>>>
>>>>> If you need to stream huge swaths of JSON, generally there’s a clear
>>>>> unit size that you can just databind. Something like:
>>>>>
>>>>> String jsonStr = """ { "version": 5, "data": [
>>>>>   -- 1 million relatively small records in this list --
>>>>>   ] } """;
>>>>>
>>>>>
>>>>> The usual swath of JSON parsers tend to support this (giving you a
>>>>> stream of java instances created by databinding those small records one by
>>>>> one), or if not, the best move forward is presumably to file a pull request
>>>>> with those projects; the java.util.log experiment shows that trying to
>>>>> ‘core-librarize’ needs that the community at large already fulfills with
>>>>> third party deps isn’t a good move, especially if the core library variant
>>>>> tries to oversimplify to avoid the trap of being too opinionated (which
>>>>> core libs shouldn’t be). In other words, the need for ’stream this JSON for
>>>>> me’ style APIs is even more exotic that Ethan is suggesting.
>>>>>
>>>>> I see a fundamental problem here:
>>>>>
>>>>>
>>>>>    - The 95%+ use case for working with JSON for your average java
>>>>>    coder is best done with data binding.
>>>>>    - core libs doesn’t want to provide it, partly because it’s got a
>>>>>    large design space, partly because the field’s already covered by GSON and
>>>>>    Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>>>>>    that’s what Ethan thinks and I agree with this assessment.
>>>>>    - A language that claims to be “batteries included” that doesn’t
>>>>>    ship with a JSON parser in this era is dubious, to say the least.
>>>>>
>>>>>
>>>>> I’m not sure how to square this circle. Hence it feels like core-libs
>>>>> needs to hold some more fundamental debates first:
>>>>>
>>>>>
>>>>>    - Maybe it’s time to state in a more or less official decree that
>>>>>    well-established, large design space jobs will remain the purview of
>>>>>    dependencies no matter how popular it has, unless being part of the
>>>>>    core-libs adds something more fundamental the third party deps cannot bring
>>>>>    to the table (such as language integration), or the community standardizes
>>>>>    on a single library (JSR310’s story, more or less). JSON parsing would
>>>>>    qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
>>>>>    as Ethan pointed out.
>>>>>    - Given that 99% of java projects, even really simple ones, start
>>>>>    with maven/gradle and a list of deps, is that really a problem?
>>>>>
>>>>>
>>>>> I’m honestly not sure what the right answer is. On one hand, the npm
>>>>> ecosystem seems to be doing very well even though their ‘batteries
>>>>> included’ situation is an utter shambles. Then again, the notion that your
>>>>> average nodejs project includes 10x+ more dependencies than other languages
>>>>> is likely a significant part of the security clown fiesta going on over
>>>>> there as far as 3rd party deps is concerned, so by no means should java
>>>>> just blindly emulate their solutions.
>>>>>
>>>>> I don’t like the idea of shipping a non-data-binding JSON API in the
>>>>> core libs. The root issue with JSON is that you just can’t tell how to
>>>>> interpret any given JSON token, because that’s not how JSON is used in
>>>>> practice. What does 5 mean? Could be that I’m to take that as an int,
>>>>> or as a double, or perhaps even as a j.t.Instant (epoch-millis), and
>>>>> defaulting behaviour (similar to j.u.Map’s .getOrDefault is *very*
>>>>> convenient to parse most JSON out there in the real world - omitting k/v
>>>>> pairs whose value is still on default is very common). That’s what makes
>>>>> those databind libraries so enticing: Instead of trying to pattern match my
>>>>> way into this behaviour:
>>>>>
>>>>>
>>>>>    - If the element isn’t there at all or null, give me a
>>>>>    list-of-longs with a single 0 in it.
>>>>>    - If the element is a number, make me a list-of-longs with 1 value
>>>>>    in it, that is that number, as long.
>>>>>    - If the element is a string, parse it into a long, then get me a
>>>>>    list with this one long value (because IEEE double rules mean sometimes you
>>>>>    have to put these things in string form or they get mangled by javascript-
>>>>>    eval style parsers).
>>>>>
>>>>>
>>>>> And yet the above is quite common, and can easily be done by a
>>>>> databinder, which sees you want a List<Long> for a field whose
>>>>> default value is List.of(1L), and, armed with that knowledge, can
>>>>> transit the JSON into java in that way.
>>>>>
>>>>> You don’t *need* databinding to cater to this idea: You could for
>>>>> example have a jsonNode.asLong(123) method that would parse a string
>>>>> if need be, even. But this has nothing to do with pattern matching either.
>>>>>
>>>>>  --Reinier Zwitserloot
>>>>>
>>>>>
>>>>> On 15 Dec 2022 at 21:30:17, Ethan McCue <ethan at mccue.dev> wrote:
>>>>>
>>>>>> I'm writing this to drive some forward motion and to nerd-snipe those
>>>>>> who know better than I do into putting their thoughts into words.
>>>>>>
>>>>>> There are three ways to process JSON[1]
>>>>>> - Streaming (Push or Pull)
>>>>>> - Traversing a Tree (Realized or Lazy)
>>>>>> - Declarative Databind (N ways)
>>>>>>
>>>>>> Of these, JEP-198 explicitly ruled out providing "JAXB style type
>>>>>> safe data binding."
>>>>>>
>>>>>> No justification is given, but if I had to insert my own: mapping the
>>>>>> Json model to/from the Java/JVM object model is a cursed combo of
>>>>>> - Huge possible design space
>>>>>> - Unpalatably large surface for backwards compatibility
>>>>>> - Serialization! Boo![2]
>>>>>>
>>>>>> So for an artifact like the JDK, it probably doesn't make sense to
>>>>>> include. That tracks.
>>>>>> It won't make everyone happy, people like databind APIs, but it
>>>>>> tracks.
>>>>>>
>>>>>> So for the "read flow" these are the things to figure out.
>>>>>>
>>>>>>                 | Should Provide? | Intended User(s) |
>>>>>> ----------------+-----------------+------------------+
>>>>>>  Streaming Push |                 |                  |
>>>>>> ----------------+-----------------+------------------+
>>>>>>  Streaming Pull |                 |                  |
>>>>>> ----------------+-----------------+------------------+
>>>>>>  Realized Tree  |                 |                  |
>>>>>> ----------------+-----------------+------------------+
>>>>>>  Lazy Tree      |                 |                  |
>>>>>> ----------------+-----------------+------------------+
>>>>>>
>>>>>> At which point, we should talk about what "meets needs of Java
>>>>>> developers using JSON" implies.
>>>>>>
>>>>>> JSON is ubiquitous. Most kinds of software us schmucks write could
>>>>>> have a reason to interact with it.
>>>>>> The full set of "user personas" therefore aren't practical for me to
>>>>>> talk about.[3]
>>>>>>
>>>>>> JSON documents, however, are not so varied.
>>>>>>
>>>>>> - There are small ones (1-10kb)
>>>>>> - There are medium ones (10-1000kb)
>>>>>> - There are big ones (1000kb-???)
>>>>>>
>>>>>> - There are shallow ones
>>>>>> - There are deep ones
>>>>>>
>>>>>> So that feels like an easier direction to talk about it from.
>>>>>>
>>>>>>
>>>>>> This repo[4] has some convenient toy examples of how some of those
>>>>>> APIs look in libraries
>>>>>> in the ecosystem. Specifically the Streaming Pull and Realized Tree
>>>>>> models.
>>>>>>
>>>>>>         User r = new User();
>>>>>>         while (true) {
>>>>>>             JsonToken token = reader.peek();
>>>>>>             switch (token) {
>>>>>>                 case BEGIN_OBJECT:
>>>>>>                     reader.beginObject();
>>>>>>                     break;
>>>>>>                 case END_OBJECT:
>>>>>>                     reader.endObject();
>>>>>>                     return r;
>>>>>>                 case NAME:
>>>>>>                     String fieldname = reader.nextName();
>>>>>>                     switch (fieldname) {
>>>>>>                         case "id":
>>>>>>                             r.setId(reader.nextString());
>>>>>>                             break;
>>>>>>                         case "index":
>>>>>>                             r.setIndex(reader.nextInt());
>>>>>>                             break;
>>>>>>                         ...
>>>>>>                         case "friends":
>>>>>>                             r.setFriends(new ArrayList<>());
>>>>>>                             Friend f = null;
>>>>>>                             carryOn = true;
>>>>>>                             while (carryOn) {
>>>>>>                                 token = reader.peek();
>>>>>>                                 switch (token) {
>>>>>>                                     case BEGIN_ARRAY:
>>>>>>                                         reader.beginArray();
>>>>>>                                         break;
>>>>>>                                     case END_ARRAY:
>>>>>>                                         reader.endArray();
>>>>>>                                         carryOn = false;
>>>>>>                                         break;
>>>>>>                                     case BEGIN_OBJECT:
>>>>>>                                         reader.beginObject();
>>>>>>                                         f = new Friend();
>>>>>>                                         break;
>>>>>>                                     case END_OBJECT:
>>>>>>                                         reader.endObject();
>>>>>>                                         r.getFriends().add(f);
>>>>>>                                         break;
>>>>>>                                     case NAME:
>>>>>>                                         String fn = reader.nextName();
>>>>>>                                         switch (fn) {
>>>>>>                                             case "id":
>>>>>>
>>>>>> f.setId(reader.nextString());
>>>>>>                                                 break;
>>>>>>                                             case "name":
>>>>>>
>>>>>> f.setName(reader.nextString());
>>>>>>                                                 break;
>>>>>>                                         }
>>>>>>                                         break;
>>>>>>                                 }
>>>>>>                             }
>>>>>>                             break;
>>>>>>                     }
>>>>>>             }
>>>>>>
>>>>>> I think its not hard to argue that the streaming apis are brutalist.
>>>>>> The above is Gson, but Jackson, moshi, etc
>>>>>> seem at least morally equivalent.
>>>>>>
>>>>>> Its hard to write, hard to write *correctly*, and theres is a curious
>>>>>> protensity towards pairing it
>>>>>> with anemic, mutable models.
>>>>>>
>>>>>> That being said, it handles big documents and deep documents really
>>>>>> well. It also performs
>>>>>> pretty darn well and is good enough as a "fallback" when the intended
>>>>>> user experience
>>>>>> is through something like databind.
>>>>>>
>>>>>> So what could we do meaningfully better with the language we have
>>>>>> today/will have tommorow?
>>>>>>
>>>>>> - Sealed interfaces + Pattern matching could give a nicer model for
>>>>>> tokens
>>>>>>
>>>>>>         sealed interface JsonToken {
>>>>>>             record Field(String name) implements JsonToken {}
>>>>>>             record BeginArray() implements JsonToken {}
>>>>>>             record EndArray() implements JsonToken {}
>>>>>>             record BeginObject() implements JsonToken {}
>>>>>>             record EndObject() implements JsonToken {}
>>>>>>             // ...
>>>>>>         }
>>>>>>
>>>>>>         // ...
>>>>>>
>>>>>>         User r = new User();
>>>>>>         while (true) {
>>>>>>             JsonToken token = reader.peek();
>>>>>>             switch (token) {
>>>>>>                 case BeginObject __:
>>>>>>                     reader.beginObject();
>>>>>>                     break;
>>>>>>                 case EndObject __:
>>>>>>                     reader.endObject();
>>>>>>                     return r;
>>>>>>                 case Field("id"):
>>>>>>                     r.setId(reader.nextString());
>>>>>>                     break;
>>>>>>                 case Field("index"):
>>>>>>                     r.setIndex(reader.nextInt());
>>>>>>                     break;
>>>>>>
>>>>>>                 // ...
>>>>>>
>>>>>>                 case Field("friends"):
>>>>>>                     r.setFriends(new ArrayList<>());
>>>>>>                     Friend f = null;
>>>>>>                     carryOn = true;
>>>>>>                     while (carryOn) {
>>>>>>                         token = reader.peek();
>>>>>>                         switch (token) {
>>>>>>                 // ...
>>>>>>
>>>>>> - Value classes can make it all more efficient
>>>>>>
>>>>>>         sealed interface JsonToken {
>>>>>>             value record Field(String name) implements JsonToken {}
>>>>>>             value record BeginArray() implements JsonToken {}
>>>>>>             value record EndArray() implements JsonToken {}
>>>>>>             value record BeginObject() implements JsonToken {}
>>>>>>             value record EndObject() implements JsonToken {}
>>>>>>             // ...
>>>>>>         }
>>>>>>
>>>>>> - (Fun One) We can transform a simpler-to-write push parser into a
>>>>>> pull parser with Coroutines
>>>>>>
>>>>>>     This is just a toy we could play with while making something in
>>>>>> the JDK. I'm pretty sure
>>>>>>     we could make a parser which feeds into something like
>>>>>>
>>>>>>         interface Listener {
>>>>>>             void onObjectStart();
>>>>>>             void onObjectEnd();
>>>>>>             void onArrayStart();
>>>>>>             void onArrayEnd();
>>>>>>             void onField(String name);
>>>>>>             // ...
>>>>>>         }
>>>>>>
>>>>>>     and invert a loop like
>>>>>>
>>>>>>         while (true) {
>>>>>>             char c = next();
>>>>>>             switch (c) {
>>>>>>                 case '{':
>>>>>>                     listener.onObjectStart();
>>>>>>                     // ...
>>>>>>                 // ...
>>>>>>             }
>>>>>>         }
>>>>>>
>>>>>>     by putting a Coroutine.yield in the callback.
>>>>>>
>>>>>>     That might be a meaningful simplification in code structure, I
>>>>>> don't know enough to say.
>>>>>>
>>>>>> But, I think there are some hard questions like
>>>>>>
>>>>>> - Is the intent[5] to be make backing parser for ecosystem databind
>>>>>> apis?
>>>>>> - Is the intent that users who want to handle big/deep documents fall
>>>>>> back to this?
>>>>>> - Are those new language features / conveniences enough to offset the
>>>>>> cost of committing to a new api?
>>>>>> - To whom exactly does a low level api provide value?
>>>>>> - What benefit is standardization in the JDK?
>>>>>>
>>>>>> and just generally - who would be the consumer(s) of this?
>>>>>>
>>>>>> The other kind of API still on the table is a Tree. There are two
>>>>>> ways to handle this
>>>>>>
>>>>>> 1. Load it into `Object`. Use a bunch of instanceof checks/casts to
>>>>>> confirm what it actually is.
>>>>>>
>>>>>>         Object v;
>>>>>>         User u = new User();
>>>>>>
>>>>>>         if ((v = jso.get("id")) != null) {
>>>>>>             u.setId((String) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("index")) != null) {
>>>>>>             u.setIndex(((Long) v).intValue());
>>>>>>         }
>>>>>>         if ((v = jso.get("guid")) != null) {
>>>>>>             u.setGuid((String) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("isActive")) != null) {
>>>>>>             u.setIsActive(((Boolean) v));
>>>>>>         }
>>>>>>         if ((v = jso.get("balance")) != null) {
>>>>>>             u.setBalance((String) v);
>>>>>>         }
>>>>>>         // ...
>>>>>>         if ((v = jso.get("latitude")) != null) {
>>>>>>             u.setLatitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>> v).doubleValue() : (Double) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("longitude")) != null) {
>>>>>>             u.setLongitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>> v).doubleValue() : (Double) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("greeting")) != null) {
>>>>>>             u.setGreeting((String) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("favoriteFruit")) != null) {
>>>>>>             u.setFavoriteFruit((String) v);
>>>>>>         }
>>>>>>         if ((v = jso.get("tags")) != null) {
>>>>>>             List<Object> jsonarr = (List<Object>) v;
>>>>>>             u.setTags(new ArrayList<>());
>>>>>>             for (Object vi : jsonarr) {
>>>>>>                 u.getTags().add((String) vi);
>>>>>>             }
>>>>>>         }
>>>>>>         if ((v = jso.get("friends")) != null) {
>>>>>>             List<Object> jsonarr = (List<Object>) v;
>>>>>>             u.setFriends(new ArrayList<>());
>>>>>>             for (Object vi : jsonarr) {
>>>>>>                 Map<String, Object> jso0 = (Map<String, Object>) vi;
>>>>>>                 Friend f = new Friend();
>>>>>>                 f.setId((String) jso0.get("id"));
>>>>>>                 f.setName((String) jso0.get("name"));
>>>>>>                 u.getFriends().add(f);
>>>>>>             }
>>>>>>         }
>>>>>>
>>>>>> 2. Have an explicit model for Json, and helper methods that do said
>>>>>> casts[6]
>>>>>>
>>>>>>
>>>>>> this.setSiteSetting(readFromJson(jsonObject.getJsonObject("site")));
>>>>>> JsonArray groups = jsonObject.getJsonArray("group");
>>>>>> if(groups != null)
>>>>>> {
>>>>>> int len = groups.size();
>>>>>> for(int i=0; i<len; i++)
>>>>>> {
>>>>>> JsonObject grp = groups.getJsonObject(i);
>>>>>> SNMPSetting grpSetting = readFromJson(grp);
>>>>>> String grpName = grp.getString("dbgroup", null);
>>>>>> if(grpName != null && grpSetting != null)
>>>>>> this.groupSettings.put(grpName, grpSetting);
>>>>>> }
>>>>>> }
>>>>>> JsonArray hosts = jsonObject.getJsonArray("host");
>>>>>> if(hosts != null)
>>>>>> {
>>>>>> int len = hosts.size();
>>>>>> for(int i=0; i<len; i++)
>>>>>> {
>>>>>> JsonObject host = hosts.getJsonObject(i);
>>>>>> SNMPSetting hostSetting = readFromJson(host);
>>>>>> String hostName = host.getString("dbhost", null);
>>>>>> if(hostName != null && hostSetting != null)
>>>>>> this.hostSettings.put(hostName, hostSetting);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I think what has become easier to represent in the language nowadays
>>>>>> is that explicit model for Json.
>>>>>> Its the 101 lesson of sealed interfaces.[7] It feels nice and clean.
>>>>>>
>>>>>>         sealed interface Json {
>>>>>>             final class Null implements Json {}
>>>>>>             final class True implements Json {}
>>>>>>             final class False implements Json {}
>>>>>>             final class Array implements Json {}
>>>>>>             final class Object implements Json {}
>>>>>>             final class String implements Json {}
>>>>>>             final class Number implements Json {}
>>>>>>         }
>>>>>>
>>>>>> And the cast-and-check approach is now more viable on account of
>>>>>> pattern matching.
>>>>>>
>>>>>>         if (jso.get("id") instanceof String v) {
>>>>>>             u.setId(v);
>>>>>>         }
>>>>>>         if (jso.get("index") instanceof Long v) {
>>>>>>             u.setIndex(v.intValue());
>>>>>>         }
>>>>>>         if (jso.get("guid") instanceof String v) {
>>>>>>             u.setGuid(v);
>>>>>>         }
>>>>>>
>>>>>>         // or
>>>>>>
>>>>>>         if (jso.get("id") instanceof String id &&
>>>>>>                 jso.get("index") instanceof Long index &&
>>>>>>                 jso.get("guid") instanceof String guid) {
>>>>>>             return new User(id, index, guid, ...); // look ma, no
>>>>>> setters!
>>>>>>         }
>>>>>>
>>>>>>
>>>>>> And on the horizon, again, is value types.
>>>>>>
>>>>>> But there are problems with this approach beyond the performance
>>>>>> implications of loading into
>>>>>> a tree.
>>>>>>
>>>>>> For one, all the code samples above have different behaviors around
>>>>>> null keys and missing keys
>>>>>> that are not obvious from first glance.
>>>>>>
>>>>>> This won't accept any null or missing fields
>>>>>>
>>>>>>         if (jso.get("id") instanceof String id &&
>>>>>>                 jso.get("index") instanceof Long index &&
>>>>>>                 jso.get("guid") instanceof String guid) {
>>>>>>             return new User(id, index, guid, ...);
>>>>>>         }
>>>>>>
>>>>>> This will accept individual null or missing fields, but also will
>>>>>> silently ignore
>>>>>> fields with incorrect types
>>>>>>
>>>>>>         if (jso.get("id") instanceof String v) {
>>>>>>             u.setId(v);
>>>>>>         }
>>>>>>         if (jso.get("index") instanceof Long v) {
>>>>>>             u.setIndex(v.intValue());
>>>>>>         }
>>>>>>         if (jso.get("guid") instanceof String v) {
>>>>>>             u.setGuid(v);
>>>>>>         }
>>>>>>
>>>>>> And, compared to databind where there is information about the
>>>>>> expected structure of the document
>>>>>> and its the job of the framework to assert that, I posit that the
>>>>>> errors that would be encountered
>>>>>> when writing code against this would be more like
>>>>>>
>>>>>>     "something wrong with user"
>>>>>>
>>>>>> than
>>>>>>
>>>>>>     "problem at users[5].name, expected string or null. got 5"
>>>>>>
>>>>>> Which feels unideal.
>>>>>>
>>>>>>
>>>>>> One approach I find promising is something close to what Elm does
>>>>>> with its decoders[8]. Not just combining assertion
>>>>>> and binding like what pattern matching with records allows, but
>>>>>> including a scheme for bubbling/nesting errors.
>>>>>>
>>>>>>     static String string(Json json) throws JsonDecodingException {
>>>>>>         if (!(json instanceof Json.String jsonString)) {
>>>>>>             throw JsonDecodingException.of(
>>>>>>                     "expected a string",
>>>>>>                     json
>>>>>>             );
>>>>>>         } else {
>>>>>>             return jsonString.value();
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     static <T> T field(Json json, String fieldName, Decoder<? extends
>>>>>> T> valueDecoder) throws JsonDecodingException {
>>>>>>         var jsonObject = object(json);
>>>>>>         var value = jsonObject.get(fieldName);
>>>>>>         if (value == null) {
>>>>>>             throw JsonDecodingException.atField(
>>>>>>                     fieldName,
>>>>>>                     JsonDecodingException.of(
>>>>>>                             "no value for field",
>>>>>>                             json
>>>>>>                     )
>>>>>>             );
>>>>>>         }
>>>>>>         else {
>>>>>>             try {
>>>>>>                 return valueDecoder.decode(value);
>>>>>>             } catch (JsonDecodingException e) {
>>>>>>                 throw JsonDecodingException.atField(
>>>>>>                         fieldName,
>>>>>>                         e
>>>>>>                 );
>>>>>>             }  catch (Exception e) {
>>>>>>                 throw JsonDecodingException.atField(fieldName,
>>>>>> JsonDecodingException.of(e, value));
>>>>>>             }
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>> Which I think has some benefits over the ways I've seen of working
>>>>>> with trees.
>>>>>>
>>>>>>
>>>>>>
>>>>>> - It is declarative enough that folks who prefer databind might be
>>>>>> happy enough.
>>>>>>
>>>>>>         static User fromJson(Json json) {
>>>>>>             return new User(
>>>>>>                 Decoder.field(json, "id", Decoder::string),
>>>>>>                 Decoder.field(json, "index", Decoder::long_),
>>>>>>                 Decoder.field(json, "guid", Decoder::string),
>>>>>>             );
>>>>>>         }
>>>>>>
>>>>>>         / ...
>>>>>>
>>>>>>         List<User> users = Decoders.array(json, User::fromJson);
>>>>>>
>>>>>> - Handling null and optional fields could be less easily conflated
>>>>>>
>>>>>>     Decoder.field(json, "id", Decoder::string);
>>>>>>
>>>>>>     Decoder.nullableField(json, "id", Decoder::string);
>>>>>>
>>>>>>     Decoder.optionalField(json, "id", Decoder::string);
>>>>>>
>>>>>>     Decoder.optionalNullableField(json, "id", Decoder::string);
>>>>>>
>>>>>>
>>>>>> - It composes well with user defined classes
>>>>>>
>>>>>>     record Guid(String value) {
>>>>>>         Guid {
>>>>>>             // some assertions on the structure of value
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     Decoder.string(json, "guid", guid -> new
>>>>>> Guid(Decoder.string(guid)));
>>>>>>
>>>>>>     // or even
>>>>>>
>>>>>>     record Guid(String value) {
>>>>>>         Guid {
>>>>>>             // some assertions on the structure of value
>>>>>>         }
>>>>>>
>>>>>>         static Guid fromJson(Json json) {
>>>>>>             return new Guid(Decoder.string(guid));
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     Decoder.string(json, "guid", Guid::fromJson);
>>>>>>
>>>>>>
>>>>>> - When something goes wrong, the API can handle the fiddlyness of
>>>>>> capturing information for feedback.
>>>>>>
>>>>>>     In the code I've sketched out its just what field/index things
>>>>>> went wrong at. Potentially
>>>>>>     capturing metadata like row/col numbers of the source would be
>>>>>> sensible too.
>>>>>>
>>>>>>     Its just not reasonable to expect devs to do extra work to get
>>>>>> that and its really nice to give it.
>>>>>>
>>>>>> There are also some downsides like
>>>>>>
>>>>>> -  I do not know how compatible it would be with lazy trees.
>>>>>>
>>>>>>      Lazy trees being the only way that a tree api could handle big
>>>>>> or deep documents.
>>>>>>      The general concept as applied in libraries like json-tree[9] is
>>>>>> to navigate without
>>>>>>      doing any work, and that clashes with wanting to instanceof
>>>>>> check the info at the
>>>>>>      current path.
>>>>>>
>>>>>> - It *almost* gives enough information to be a general schema approach
>>>>>>
>>>>>>     If one field fails, that in the model throws an exception
>>>>>> immediately. If an API should
>>>>>>     return "errors": [...], that is inconvenient to construct.
>>>>>>
>>>>>> - None of the existing popular libraries are doing this
>>>>>>
>>>>>>      The only mechanics that are strictly required to give this sort
>>>>>> of API is lambdas. Those have
>>>>>>      been out for a decade. Yes sealed interfaces make the data model
>>>>>> prettier but in concept you
>>>>>>      can build the same thing on top of anything.
>>>>>>
>>>>>>      I could argue that this is because of "cultural momentum" of
>>>>>> databind or some other reason,
>>>>>>      but the fact remains that it isn't a proven out approach.
>>>>>>
>>>>>>      Writing Json libraries is a todo list[10]. There are a lot of
>>>>>> bad ideas and this might be one of the,
>>>>>>
>>>>>> - Performance impact of so many instanceof checks
>>>>>>
>>>>>>     I've gotten a 4.2% slowdown compared to the "regular" tree code
>>>>>> without the repeated casts.
>>>>>>
>>>>>>     But that was with a parser that is 5x slower than Jacksons.
>>>>>> (using the same benchmark project as for the snippets).
>>>>>>     I think there could be reason to believe that the JIT does well
>>>>>> enough with repeated instanceof
>>>>>>     checks to consider it.
>>>>>>
>>>>>>
>>>>>> My current thinking is that - despite not solving for large or deep
>>>>>> documents - starting with a really "dumb" realized tree api
>>>>>> might be the right place to start for the read side of a potential
>>>>>> incubator module.
>>>>>>
>>>>>> But regardless - this feels like a good time to start more concrete
>>>>>> conversations. I fell I should cap this email since I've reached the point
>>>>>> of decoherence and haven't even mentioned the write side of things
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]: http://www.cowtowncoder.com/blog/archives/2009/01/entry_131.html
>>>>>> [2]: https://security.snyk.io/vuln/maven?search=jackson-databind
>>>>>> [3]: I only know like 8 people
>>>>>> [4]:
>>>>>> https://github.com/fabienrenaud/java-json-benchmark/blob/master/src/main/java/com/github/fabienrenaud/jjb/stream/UsersStreamDeserializer.java
>>>>>> [5]: When I say "intent", I do so knowing full well no one has been
>>>>>> actively thinking of this for an entire Game of Thrones
>>>>>> [6]:
>>>>>> https://github.com/yahoo/mysql_perf_analyzer/blob/master/myperf/src/main/java/com/yahoo/dba/perf/myperf/common/SNMPSettings.java
>>>>>> [7]: https://www.infoq.com/articles/data-oriented-programming-java/
>>>>>> [8]:
>>>>>> https://package.elm-lang.org/packages/elm/json/latest/Json-Decode
>>>>>> [9]: https://github.com/jbee/json-tree
>>>>>> [10]: https://stackoverflow.com/a/14442630/2948173
>>>>>> [11]: In 30 days JEP-198 it will be recognizably PI days old for the
>>>>>> 2nd time in its history.
>>>>>> [12]: To me, the fact that is still an open JEP is more a social
>>>>>> convenience than anything. I could just as easily writing this exact same
>>>>>> email about TOML.
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230228/bc002f42/attachment-0001.htm>