JEP-198 - Lets start talking about JSON
Ethan McCue
ethan at mccue.dev
Tue Feb 28 17:16:08 UTC 2023
As an update to my character arc, I documented and wrote up an explanation
for the prototype library I was working on.[1]
And I've gotten a good deal of feedback on reddit[2] and in private.
I think its relevant to the conversation here in the sense of
- There are more of rzwitserloot's objections to read on the general
concept JSON as a built in.[3]
- There are a lot of well reasoned objections to the manner in which I am
interpreting a JSON tree, as well
as objections to the usage of a tree as the core. JEP 198's current writeup
(which I know is subject to a rewrite/retraction)
presumes that an immutable tree would be the core data structure.
- The peanut gallery might be interested in a "base" to implement whatever
their take on an API should be.
For that last category, I have a method-handle proxy written up for those
who want to try the "push parser into a pull parser"
transformation I alluded to in my first email of this thread.
[1]: https://mccue.dev/pages/2-26-23-json
[2]:
https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/
[3]: Including one that reddit took down, but can be seen through reveddit
https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6&limit=1&sort=new&show=t1_jaa3x0q&removal_status=all
On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue <ethan at mccue.dev> wrote:
> Sidenote about "Project Galahad" - I know Graal uses json for a few things
> including a reflection-config.json. Food for thought.
>
> > the java.util.log experiment shows that trying to ‘core-librarize’ needs
> that the community at large already fulfills with third party deps isn’t a
> good move,
>
> I, personally, do not have much historical context for java.util.log. What
> feels distinct about providing a JSON api is that
> logging is an implicitly global thing. If a JSON api doesn't fill all
> ecosystem niches, multiple can be used alongside
> each other.
>
> > The root issue with JSON is that you just can’t tell how to interpret
> any given JSON token
>
> The point where this could be an issue is numbers. Once something is
> identified as a number we can
>
> 1. Parse it immediately. Using a long and falling back to a BigInteger.
> For decimals its harder to know
> whether to use a double or BigDecimal internally. In the library I've been
> copy pasting from to build
> a prototype that last one is an explicit option and it defaults to doubles
> for the whole parse.
> 2. Store the string and parse it upon request. We can still model it as a
> Json.Number, but the
> work of interpreting is deferred.
>
> But in general, making a tree of json values doesn't particularly affect
> our ability to interpret it
> in a certain way. That interpretation is just positional. That's just as
> true as when making assertions
> in the form of class structure and field types as it is when making
> assertions in the form of code.[2]
>
> record Thing(Instant a) {}
>
> // vs.
>
> Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))
>
> If anything, using a named type as a lookup key for a deserialization
> function is the less obvious
> way to do this.
>
> > I’m not sure how to square this circle
> > I don’t like the idea of shipping a non-data-binding JSON API in the
> core libs.
>
> I think the way to cube this rhombus is to find ways to like the idea of a
> non-data-binding JSON API. ¯\_(ツ)_/¯
>
> My personal journey with that is reaching its terminus here I think.
>
> Look on the bright side though - there are legit upsides to explicit tree
> plucking!
>
> Yeah, the friction per field is slightly higher, but the relative
> friction of custom types, or multiple construction methods for a
> particular type, or maintaining compatibility with
> legacy representations, or even just handling a top level list of things -
> its much lower.
>
> And all that complexity - that an instant is made by looking for a long or
> that it is parsed from a string in a
> particular format - it lives in Java code you can see, touch, feel and
> taste.
>
> I know "nobody does this"[2] but it's not that bad, actually.
>
> [1]: I do apologize for the code sketches consistently being "what I think
> an interaction with a tree api should look like."
> That is what I have been thinking about for a while so it's hard to resist.
> [2]: https://youtu.be/dOgfWXw9VrI?t=1225
>
> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue <ethan at mccue.dev> wrote:
>
>> > are pure JSON parsers really the go-to for most people?
>>
>> Depends on what you mean by JSON parsers and it depends on what you mean
>> by people.
>>
>> To the best of my knowledge, both python and Javascript do not include
>> streaming, databinding, or path navigation capabilities in their json
>> parsers.
>>
>>
>> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue <ethan at mccue.dev> wrote:
>>
>>> > The 95%+ use case for working with JSON for your average java coder is
>>> best done with data binding.
>>>
>>> To be brave yet controversial: I'm not sure this is neccesarily true.
>>>
>>> I will elaborate and respond to the other points after a hot cocoa, but
>>> the last point is part of why I think that tree-crawling needs _something_
>>> better as an API to fit the bill.
>>>
>>> With my sketch that set of requirements would be represented as
>>>
>>> record Thing(
>>> List<Long> xs
>>> ) {
>>> static Thing fromJson(Json json)
>>> var defaultList = List.of(0L);
>>> return new Thing(Decoder.optionalNullableField(
>>> json
>>> "xs",
>>> Decoder.oneOf(
>>> Decoder.array(Decoder.oneOf(
>>> x -> Long.parseLong(Decoder.string(x)),
>>> Decoder::long
>>> ))
>>> Decoder.null_(defaultList),
>>> x -> List.of(Decoder.long_(x))
>>> ),
>>> defaultList
>>> ));
>>> )
>>> }
>>>
>>> Which isn't amazing at first glance, but also
>>>
>>> {}
>>> {"xs": null}
>>> {"xs": 5}
>>> {"xs": [5]} {"xs": ["5"]}
>>> {"xs": [1, "2", "3"]}
>>>
>>> these are some wildly varied structures. You could make a solid argument
>>> that something which silently treats these all the same is
>>> a bad API for all the reasons you would consider it a good one.
>>>
>>> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
>>> lichtenberger.johannes at gmail.com> wrote:
>>>
>>>> I'll have to read the whole thing, but are pure JSON parsers really the
>>>> go-to for most people? I'm a big advocate of providing also something
>>>> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
>>>> biased, of course, as I'm working on Brackit[1] in my spare time (which is
>>>> also a query compiler and intended to be used with proven optimizations by
>>>> document stores / JSON stores), but also can be used as an in-memory query
>>>> engine.
>>>>
>>>> kind regards
>>>> Johannes
>>>>
>>>> [1] https://github.com/sirixdb/brackit
>>>>
>>>> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
>>>> reinier at zwitserloot.com>:
>>>>
>>>>> A recent Advent-of-Code puzzle also made me double check the support
>>>>> of JSON in the java core libs and it is indeed a curious situation that the
>>>>> java core libs don’t cater to it particularly well.
>>>>>
>>>>> However, I’m not seeing an easy way forward to try to close this hole
>>>>> in the core library offerings.
>>>>>
>>>>> If you need to stream huge swaths of JSON, generally there’s a clear
>>>>> unit size that you can just databind. Something like:
>>>>>
>>>>> String jsonStr = """ { "version": 5, "data": [
>>>>> -- 1 million relatively small records in this list --
>>>>> ] } """;
>>>>>
>>>>>
>>>>> The usual swath of JSON parsers tend to support this (giving you a
>>>>> stream of java instances created by databinding those small records one by
>>>>> one), or if not, the best move forward is presumably to file a pull request
>>>>> with those projects; the java.util.log experiment shows that trying to
>>>>> ‘core-librarize’ needs that the community at large already fulfills with
>>>>> third party deps isn’t a good move, especially if the core library variant
>>>>> tries to oversimplify to avoid the trap of being too opinionated (which
>>>>> core libs shouldn’t be). In other words, the need for ’stream this JSON for
>>>>> me’ style APIs is even more exotic that Ethan is suggesting.
>>>>>
>>>>> I see a fundamental problem here:
>>>>>
>>>>>
>>>>> - The 95%+ use case for working with JSON for your average java
>>>>> coder is best done with data binding.
>>>>> - core libs doesn’t want to provide it, partly because it’s got a
>>>>> large design space, partly because the field’s already covered by GSON and
>>>>> Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>>>>> that’s what Ethan thinks and I agree with this assessment.
>>>>> - A language that claims to be “batteries included” that doesn’t
>>>>> ship with a JSON parser in this era is dubious, to say the least.
>>>>>
>>>>>
>>>>> I’m not sure how to square this circle. Hence it feels like core-libs
>>>>> needs to hold some more fundamental debates first:
>>>>>
>>>>>
>>>>> - Maybe it’s time to state in a more or less official decree that
>>>>> well-established, large design space jobs will remain the purview of
>>>>> dependencies no matter how popular it has, unless being part of the
>>>>> core-libs adds something more fundamental the third party deps cannot bring
>>>>> to the table (such as language integration), or the community standardizes
>>>>> on a single library (JSR310’s story, more or less). JSON parsing would
>>>>> qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
>>>>> as Ethan pointed out.
>>>>> - Given that 99% of java projects, even really simple ones, start
>>>>> with maven/gradle and a list of deps, is that really a problem?
>>>>>
>>>>>
>>>>> I’m honestly not sure what the right answer is. On one hand, the npm
>>>>> ecosystem seems to be doing very well even though their ‘batteries
>>>>> included’ situation is an utter shambles. Then again, the notion that your
>>>>> average nodejs project includes 10x+ more dependencies than other languages
>>>>> is likely a significant part of the security clown fiesta going on over
>>>>> there as far as 3rd party deps is concerned, so by no means should java
>>>>> just blindly emulate their solutions.
>>>>>
>>>>> I don’t like the idea of shipping a non-data-binding JSON API in the
>>>>> core libs. The root issue with JSON is that you just can’t tell how to
>>>>> interpret any given JSON token, because that’s not how JSON is used in
>>>>> practice. What does 5 mean? Could be that I’m to take that as an int,
>>>>> or as a double, or perhaps even as a j.t.Instant (epoch-millis), and
>>>>> defaulting behaviour (similar to j.u.Map’s .getOrDefault is *very*
>>>>> convenient to parse most JSON out there in the real world - omitting k/v
>>>>> pairs whose value is still on default is very common). That’s what makes
>>>>> those databind libraries so enticing: Instead of trying to pattern match my
>>>>> way into this behaviour:
>>>>>
>>>>>
>>>>> - If the element isn’t there at all or null, give me a
>>>>> list-of-longs with a single 0 in it.
>>>>> - If the element is a number, make me a list-of-longs with 1 value
>>>>> in it, that is that number, as long.
>>>>> - If the element is a string, parse it into a long, then get me a
>>>>> list with this one long value (because IEEE double rules mean sometimes you
>>>>> have to put these things in string form or they get mangled by javascript-
>>>>> eval style parsers).
>>>>>
>>>>>
>>>>> And yet the above is quite common, and can easily be done by a
>>>>> databinder, which sees you want a List<Long> for a field whose
>>>>> default value is List.of(1L), and, armed with that knowledge, can
>>>>> transit the JSON into java in that way.
>>>>>
>>>>> You don’t *need* databinding to cater to this idea: You could for
>>>>> example have a jsonNode.asLong(123) method that would parse a string
>>>>> if need be, even. But this has nothing to do with pattern matching either.
>>>>>
>>>>> --Reinier Zwitserloot
>>>>>
>>>>>
>>>>> On 15 Dec 2022 at 21:30:17, Ethan McCue <ethan at mccue.dev> wrote:
>>>>>
>>>>>> I'm writing this to drive some forward motion and to nerd-snipe those
>>>>>> who know better than I do into putting their thoughts into words.
>>>>>>
>>>>>> There are three ways to process JSON[1]
>>>>>> - Streaming (Push or Pull)
>>>>>> - Traversing a Tree (Realized or Lazy)
>>>>>> - Declarative Databind (N ways)
>>>>>>
>>>>>> Of these, JEP-198 explicitly ruled out providing "JAXB style type
>>>>>> safe data binding."
>>>>>>
>>>>>> No justification is given, but if I had to insert my own: mapping the
>>>>>> Json model to/from the Java/JVM object model is a cursed combo of
>>>>>> - Huge possible design space
>>>>>> - Unpalatably large surface for backwards compatibility
>>>>>> - Serialization! Boo![2]
>>>>>>
>>>>>> So for an artifact like the JDK, it probably doesn't make sense to
>>>>>> include. That tracks.
>>>>>> It won't make everyone happy, people like databind APIs, but it
>>>>>> tracks.
>>>>>>
>>>>>> So for the "read flow" these are the things to figure out.
>>>>>>
>>>>>> | Should Provide? | Intended User(s) |
>>>>>> ----------------+-----------------+------------------+
>>>>>> Streaming Push | | |
>>>>>> ----------------+-----------------+------------------+
>>>>>> Streaming Pull | | |
>>>>>> ----------------+-----------------+------------------+
>>>>>> Realized Tree | | |
>>>>>> ----------------+-----------------+------------------+
>>>>>> Lazy Tree | | |
>>>>>> ----------------+-----------------+------------------+
>>>>>>
>>>>>> At which point, we should talk about what "meets needs of Java
>>>>>> developers using JSON" implies.
>>>>>>
>>>>>> JSON is ubiquitous. Most kinds of software us schmucks write could
>>>>>> have a reason to interact with it.
>>>>>> The full set of "user personas" therefore aren't practical for me to
>>>>>> talk about.[3]
>>>>>>
>>>>>> JSON documents, however, are not so varied.
>>>>>>
>>>>>> - There are small ones (1-10kb)
>>>>>> - There are medium ones (10-1000kb)
>>>>>> - There are big ones (1000kb-???)
>>>>>>
>>>>>> - There are shallow ones
>>>>>> - There are deep ones
>>>>>>
>>>>>> So that feels like an easier direction to talk about it from.
>>>>>>
>>>>>>
>>>>>> This repo[4] has some convenient toy examples of how some of those
>>>>>> APIs look in libraries
>>>>>> in the ecosystem. Specifically the Streaming Pull and Realized Tree
>>>>>> models.
>>>>>>
>>>>>> User r = new User();
>>>>>> while (true) {
>>>>>> JsonToken token = reader.peek();
>>>>>> switch (token) {
>>>>>> case BEGIN_OBJECT:
>>>>>> reader.beginObject();
>>>>>> break;
>>>>>> case END_OBJECT:
>>>>>> reader.endObject();
>>>>>> return r;
>>>>>> case NAME:
>>>>>> String fieldname = reader.nextName();
>>>>>> switch (fieldname) {
>>>>>> case "id":
>>>>>> r.setId(reader.nextString());
>>>>>> break;
>>>>>> case "index":
>>>>>> r.setIndex(reader.nextInt());
>>>>>> break;
>>>>>> ...
>>>>>> case "friends":
>>>>>> r.setFriends(new ArrayList<>());
>>>>>> Friend f = null;
>>>>>> carryOn = true;
>>>>>> while (carryOn) {
>>>>>> token = reader.peek();
>>>>>> switch (token) {
>>>>>> case BEGIN_ARRAY:
>>>>>> reader.beginArray();
>>>>>> break;
>>>>>> case END_ARRAY:
>>>>>> reader.endArray();
>>>>>> carryOn = false;
>>>>>> break;
>>>>>> case BEGIN_OBJECT:
>>>>>> reader.beginObject();
>>>>>> f = new Friend();
>>>>>> break;
>>>>>> case END_OBJECT:
>>>>>> reader.endObject();
>>>>>> r.getFriends().add(f);
>>>>>> break;
>>>>>> case NAME:
>>>>>> String fn = reader.nextName();
>>>>>> switch (fn) {
>>>>>> case "id":
>>>>>>
>>>>>> f.setId(reader.nextString());
>>>>>> break;
>>>>>> case "name":
>>>>>>
>>>>>> f.setName(reader.nextString());
>>>>>> break;
>>>>>> }
>>>>>> break;
>>>>>> }
>>>>>> }
>>>>>> break;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I think its not hard to argue that the streaming apis are brutalist.
>>>>>> The above is Gson, but Jackson, moshi, etc
>>>>>> seem at least morally equivalent.
>>>>>>
>>>>>> Its hard to write, hard to write *correctly*, and theres is a curious
>>>>>> protensity towards pairing it
>>>>>> with anemic, mutable models.
>>>>>>
>>>>>> That being said, it handles big documents and deep documents really
>>>>>> well. It also performs
>>>>>> pretty darn well and is good enough as a "fallback" when the intended
>>>>>> user experience
>>>>>> is through something like databind.
>>>>>>
>>>>>> So what could we do meaningfully better with the language we have
>>>>>> today/will have tommorow?
>>>>>>
>>>>>> - Sealed interfaces + Pattern matching could give a nicer model for
>>>>>> tokens
>>>>>>
>>>>>> sealed interface JsonToken {
>>>>>> record Field(String name) implements JsonToken {}
>>>>>> record BeginArray() implements JsonToken {}
>>>>>> record EndArray() implements JsonToken {}
>>>>>> record BeginObject() implements JsonToken {}
>>>>>> record EndObject() implements JsonToken {}
>>>>>> // ...
>>>>>> }
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> User r = new User();
>>>>>> while (true) {
>>>>>> JsonToken token = reader.peek();
>>>>>> switch (token) {
>>>>>> case BeginObject __:
>>>>>> reader.beginObject();
>>>>>> break;
>>>>>> case EndObject __:
>>>>>> reader.endObject();
>>>>>> return r;
>>>>>> case Field("id"):
>>>>>> r.setId(reader.nextString());
>>>>>> break;
>>>>>> case Field("index"):
>>>>>> r.setIndex(reader.nextInt());
>>>>>> break;
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> case Field("friends"):
>>>>>> r.setFriends(new ArrayList<>());
>>>>>> Friend f = null;
>>>>>> carryOn = true;
>>>>>> while (carryOn) {
>>>>>> token = reader.peek();
>>>>>> switch (token) {
>>>>>> // ...
>>>>>>
>>>>>> - Value classes can make it all more efficient
>>>>>>
>>>>>> sealed interface JsonToken {
>>>>>> value record Field(String name) implements JsonToken {}
>>>>>> value record BeginArray() implements JsonToken {}
>>>>>> value record EndArray() implements JsonToken {}
>>>>>> value record BeginObject() implements JsonToken {}
>>>>>> value record EndObject() implements JsonToken {}
>>>>>> // ...
>>>>>> }
>>>>>>
>>>>>> - (Fun One) We can transform a simpler-to-write push parser into a
>>>>>> pull parser with Coroutines
>>>>>>
>>>>>> This is just a toy we could play with while making something in
>>>>>> the JDK. I'm pretty sure
>>>>>> we could make a parser which feeds into something like
>>>>>>
>>>>>> interface Listener {
>>>>>> void onObjectStart();
>>>>>> void onObjectEnd();
>>>>>> void onArrayStart();
>>>>>> void onArrayEnd();
>>>>>> void onField(String name);
>>>>>> // ...
>>>>>> }
>>>>>>
>>>>>> and invert a loop like
>>>>>>
>>>>>> while (true) {
>>>>>> char c = next();
>>>>>> switch (c) {
>>>>>> case '{':
>>>>>> listener.onObjectStart();
>>>>>> // ...
>>>>>> // ...
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> by putting a Coroutine.yield in the callback.
>>>>>>
>>>>>> That might be a meaningful simplification in code structure, I
>>>>>> don't know enough to say.
>>>>>>
>>>>>> But, I think there are some hard questions like
>>>>>>
>>>>>> - Is the intent[5] to be make backing parser for ecosystem databind
>>>>>> apis?
>>>>>> - Is the intent that users who want to handle big/deep documents fall
>>>>>> back to this?
>>>>>> - Are those new language features / conveniences enough to offset the
>>>>>> cost of committing to a new api?
>>>>>> - To whom exactly does a low level api provide value?
>>>>>> - What benefit is standardization in the JDK?
>>>>>>
>>>>>> and just generally - who would be the consumer(s) of this?
>>>>>>
>>>>>> The other kind of API still on the table is a Tree. There are two
>>>>>> ways to handle this
>>>>>>
>>>>>> 1. Load it into `Object`. Use a bunch of instanceof checks/casts to
>>>>>> confirm what it actually is.
>>>>>>
>>>>>> Object v;
>>>>>> User u = new User();
>>>>>>
>>>>>> if ((v = jso.get("id")) != null) {
>>>>>> u.setId((String) v);
>>>>>> }
>>>>>> if ((v = jso.get("index")) != null) {
>>>>>> u.setIndex(((Long) v).intValue());
>>>>>> }
>>>>>> if ((v = jso.get("guid")) != null) {
>>>>>> u.setGuid((String) v);
>>>>>> }
>>>>>> if ((v = jso.get("isActive")) != null) {
>>>>>> u.setIsActive(((Boolean) v));
>>>>>> }
>>>>>> if ((v = jso.get("balance")) != null) {
>>>>>> u.setBalance((String) v);
>>>>>> }
>>>>>> // ...
>>>>>> if ((v = jso.get("latitude")) != null) {
>>>>>> u.setLatitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>> v).doubleValue() : (Double) v);
>>>>>> }
>>>>>> if ((v = jso.get("longitude")) != null) {
>>>>>> u.setLongitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>> v).doubleValue() : (Double) v);
>>>>>> }
>>>>>> if ((v = jso.get("greeting")) != null) {
>>>>>> u.setGreeting((String) v);
>>>>>> }
>>>>>> if ((v = jso.get("favoriteFruit")) != null) {
>>>>>> u.setFavoriteFruit((String) v);
>>>>>> }
>>>>>> if ((v = jso.get("tags")) != null) {
>>>>>> List<Object> jsonarr = (List<Object>) v;
>>>>>> u.setTags(new ArrayList<>());
>>>>>> for (Object vi : jsonarr) {
>>>>>> u.getTags().add((String) vi);
>>>>>> }
>>>>>> }
>>>>>> if ((v = jso.get("friends")) != null) {
>>>>>> List<Object> jsonarr = (List<Object>) v;
>>>>>> u.setFriends(new ArrayList<>());
>>>>>> for (Object vi : jsonarr) {
>>>>>> Map<String, Object> jso0 = (Map<String, Object>) vi;
>>>>>> Friend f = new Friend();
>>>>>> f.setId((String) jso0.get("id"));
>>>>>> f.setName((String) jso0.get("name"));
>>>>>> u.getFriends().add(f);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> 2. Have an explicit model for Json, and helper methods that do said
>>>>>> casts[6]
>>>>>>
>>>>>>
>>>>>> this.setSiteSetting(readFromJson(jsonObject.getJsonObject("site")));
>>>>>> JsonArray groups = jsonObject.getJsonArray("group");
>>>>>> if(groups != null)
>>>>>> {
>>>>>> int len = groups.size();
>>>>>> for(int i=0; i<len; i++)
>>>>>> {
>>>>>> JsonObject grp = groups.getJsonObject(i);
>>>>>> SNMPSetting grpSetting = readFromJson(grp);
>>>>>> String grpName = grp.getString("dbgroup", null);
>>>>>> if(grpName != null && grpSetting != null)
>>>>>> this.groupSettings.put(grpName, grpSetting);
>>>>>> }
>>>>>> }
>>>>>> JsonArray hosts = jsonObject.getJsonArray("host");
>>>>>> if(hosts != null)
>>>>>> {
>>>>>> int len = hosts.size();
>>>>>> for(int i=0; i<len; i++)
>>>>>> {
>>>>>> JsonObject host = hosts.getJsonObject(i);
>>>>>> SNMPSetting hostSetting = readFromJson(host);
>>>>>> String hostName = host.getString("dbhost", null);
>>>>>> if(hostName != null && hostSetting != null)
>>>>>> this.hostSettings.put(hostName, hostSetting);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I think what has become easier to represent in the language nowadays
>>>>>> is that explicit model for Json.
>>>>>> Its the 101 lesson of sealed interfaces.[7] It feels nice and clean.
>>>>>>
>>>>>> sealed interface Json {
>>>>>> final class Null implements Json {}
>>>>>> final class True implements Json {}
>>>>>> final class False implements Json {}
>>>>>> final class Array implements Json {}
>>>>>> final class Object implements Json {}
>>>>>> final class String implements Json {}
>>>>>> final class Number implements Json {}
>>>>>> }
>>>>>>
>>>>>> And the cast-and-check approach is now more viable on account of
>>>>>> pattern matching.
>>>>>>
>>>>>> if (jso.get("id") instanceof String v) {
>>>>>> u.setId(v);
>>>>>> }
>>>>>> if (jso.get("index") instanceof Long v) {
>>>>>> u.setIndex(v.intValue());
>>>>>> }
>>>>>> if (jso.get("guid") instanceof String v) {
>>>>>> u.setGuid(v);
>>>>>> }
>>>>>>
>>>>>> // or
>>>>>>
>>>>>> if (jso.get("id") instanceof String id &&
>>>>>> jso.get("index") instanceof Long index &&
>>>>>> jso.get("guid") instanceof String guid) {
>>>>>> return new User(id, index, guid, ...); // look ma, no
>>>>>> setters!
>>>>>> }
>>>>>>
>>>>>>
>>>>>> And on the horizon, again, is value types.
>>>>>>
>>>>>> But there are problems with this approach beyond the performance
>>>>>> implications of loading into
>>>>>> a tree.
>>>>>>
>>>>>> For one, all the code samples above have different behaviors around
>>>>>> null keys and missing keys
>>>>>> that are not obvious from first glance.
>>>>>>
>>>>>> This won't accept any null or missing fields
>>>>>>
>>>>>> if (jso.get("id") instanceof String id &&
>>>>>> jso.get("index") instanceof Long index &&
>>>>>> jso.get("guid") instanceof String guid) {
>>>>>> return new User(id, index, guid, ...);
>>>>>> }
>>>>>>
>>>>>> This will accept individual null or missing fields, but also will
>>>>>> silently ignore
>>>>>> fields with incorrect types
>>>>>>
>>>>>> if (jso.get("id") instanceof String v) {
>>>>>> u.setId(v);
>>>>>> }
>>>>>> if (jso.get("index") instanceof Long v) {
>>>>>> u.setIndex(v.intValue());
>>>>>> }
>>>>>> if (jso.get("guid") instanceof String v) {
>>>>>> u.setGuid(v);
>>>>>> }
>>>>>>
>>>>>> And, compared to databind where there is information about the
>>>>>> expected structure of the document
>>>>>> and its the job of the framework to assert that, I posit that the
>>>>>> errors that would be encountered
>>>>>> when writing code against this would be more like
>>>>>>
>>>>>> "something wrong with user"
>>>>>>
>>>>>> than
>>>>>>
>>>>>> "problem at users[5].name, expected string or null. got 5"
>>>>>>
>>>>>> Which feels unideal.
>>>>>>
>>>>>>
>>>>>> One approach I find promising is something close to what Elm does
>>>>>> with its decoders[8]. Not just combining assertion
>>>>>> and binding like what pattern matching with records allows, but
>>>>>> including a scheme for bubbling/nesting errors.
>>>>>>
>>>>>> static String string(Json json) throws JsonDecodingException {
>>>>>> if (!(json instanceof Json.String jsonString)) {
>>>>>> throw JsonDecodingException.of(
>>>>>> "expected a string",
>>>>>> json
>>>>>> );
>>>>>> } else {
>>>>>> return jsonString.value();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> static <T> T field(Json json, String fieldName, Decoder<? extends
>>>>>> T> valueDecoder) throws JsonDecodingException {
>>>>>> var jsonObject = object(json);
>>>>>> var value = jsonObject.get(fieldName);
>>>>>> if (value == null) {
>>>>>> throw JsonDecodingException.atField(
>>>>>> fieldName,
>>>>>> JsonDecodingException.of(
>>>>>> "no value for field",
>>>>>> json
>>>>>> )
>>>>>> );
>>>>>> }
>>>>>> else {
>>>>>> try {
>>>>>> return valueDecoder.decode(value);
>>>>>> } catch (JsonDecodingException e) {
>>>>>> throw JsonDecodingException.atField(
>>>>>> fieldName,
>>>>>> e
>>>>>> );
>>>>>> } catch (Exception e) {
>>>>>> throw JsonDecodingException.atField(fieldName,
>>>>>> JsonDecodingException.of(e, value));
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Which I think has some benefits over the ways I've seen of working
>>>>>> with trees.
>>>>>>
>>>>>>
>>>>>>
>>>>>> - It is declarative enough that folks who prefer databind might be
>>>>>> happy enough.
>>>>>>
>>>>>> static User fromJson(Json json) {
>>>>>> return new User(
>>>>>> Decoder.field(json, "id", Decoder::string),
>>>>>> Decoder.field(json, "index", Decoder::long_),
>>>>>> Decoder.field(json, "guid", Decoder::string),
>>>>>> );
>>>>>> }
>>>>>>
>>>>>> / ...
>>>>>>
>>>>>> List<User> users = Decoders.array(json, User::fromJson);
>>>>>>
>>>>>> - Handling null and optional fields could be less easily conflated
>>>>>>
>>>>>> Decoder.field(json, "id", Decoder::string);
>>>>>>
>>>>>> Decoder.nullableField(json, "id", Decoder::string);
>>>>>>
>>>>>> Decoder.optionalField(json, "id", Decoder::string);
>>>>>>
>>>>>> Decoder.optionalNullableField(json, "id", Decoder::string);
>>>>>>
>>>>>>
>>>>>> - It composes well with user defined classes
>>>>>>
>>>>>> record Guid(String value) {
>>>>>> Guid {
>>>>>> // some assertions on the structure of value
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Decoder.string(json, "guid", guid -> new
>>>>>> Guid(Decoder.string(guid)));
>>>>>>
>>>>>> // or even
>>>>>>
>>>>>> record Guid(String value) {
>>>>>> Guid {
>>>>>> // some assertions on the structure of value
>>>>>> }
>>>>>>
>>>>>> static Guid fromJson(Json json) {
>>>>>> return new Guid(Decoder.string(guid));
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Decoder.string(json, "guid", Guid::fromJson);
>>>>>>
>>>>>>
>>>>>> - When something goes wrong, the API can handle the fiddlyness of
>>>>>> capturing information for feedback.
>>>>>>
>>>>>> In the code I've sketched out its just what field/index things
>>>>>> went wrong at. Potentially
>>>>>> capturing metadata like row/col numbers of the source would be
>>>>>> sensible too.
>>>>>>
>>>>>> Its just not reasonable to expect devs to do extra work to get
>>>>>> that and its really nice to give it.
>>>>>>
>>>>>> There are also some downsides like
>>>>>>
>>>>>> - I do not know how compatible it would be with lazy trees.
>>>>>>
>>>>>> Lazy trees being the only way that a tree api could handle big
>>>>>> or deep documents.
>>>>>> The general concept as applied in libraries like json-tree[9] is
>>>>>> to navigate without
>>>>>> doing any work, and that clashes with wanting to instanceof
>>>>>> check the info at the
>>>>>> current path.
>>>>>>
>>>>>> - It *almost* gives enough information to be a general schema approach
>>>>>>
>>>>>> If one field fails, that in the model throws an exception
>>>>>> immediately. If an API should
>>>>>> return "errors": [...], that is inconvenient to construct.
>>>>>>
>>>>>> - None of the existing popular libraries are doing this
>>>>>>
>>>>>> The only mechanics that are strictly required to give this sort
>>>>>> of API is lambdas. Those have
>>>>>> been out for a decade. Yes sealed interfaces make the data model
>>>>>> prettier but in concept you
>>>>>> can build the same thing on top of anything.
>>>>>>
>>>>>> I could argue that this is because of "cultural momentum" of
>>>>>> databind or some other reason,
>>>>>> but the fact remains that it isn't a proven out approach.
>>>>>>
>>>>>> Writing Json libraries is a todo list[10]. There are a lot of
>>>>>> bad ideas and this might be one of the,
>>>>>>
>>>>>> - Performance impact of so many instanceof checks
>>>>>>
>>>>>> I've gotten a 4.2% slowdown compared to the "regular" tree code
>>>>>> without the repeated casts.
>>>>>>
>>>>>> But that was with a parser that is 5x slower than Jacksons.
>>>>>> (using the same benchmark project as for the snippets).
>>>>>> I think there could be reason to believe that the JIT does well
>>>>>> enough with repeated instanceof
>>>>>> checks to consider it.
>>>>>>
>>>>>>
>>>>>> My current thinking is that - despite not solving for large or deep
>>>>>> documents - starting with a really "dumb" realized tree api
>>>>>> might be the right place to start for the read side of a potential
>>>>>> incubator module.
>>>>>>
>>>>>> But regardless - this feels like a good time to start more concrete
>>>>>> conversations. I fell I should cap this email since I've reached the point
>>>>>> of decoherence and haven't even mentioned the write side of things
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]: http://www.cowtowncoder.com/blog/archives/2009/01/entry_131.html
>>>>>> [2]: https://security.snyk.io/vuln/maven?search=jackson-databind
>>>>>> [3]: I only know like 8 people
>>>>>> [4]:
>>>>>> https://github.com/fabienrenaud/java-json-benchmark/blob/master/src/main/java/com/github/fabienrenaud/jjb/stream/UsersStreamDeserializer.java
>>>>>> [5]: When I say "intent", I do so knowing full well no one has been
>>>>>> actively thinking of this for an entire Game of Thrones
>>>>>> [6]:
>>>>>> https://github.com/yahoo/mysql_perf_analyzer/blob/master/myperf/src/main/java/com/yahoo/dba/perf/myperf/common/SNMPSettings.java
>>>>>> [7]: https://www.infoq.com/articles/data-oriented-programming-java/
>>>>>> [8]:
>>>>>> https://package.elm-lang.org/packages/elm/json/latest/Json-Decode
>>>>>> [9]: https://github.com/jbee/json-tree
>>>>>> [10]: https://stackoverflow.com/a/14442630/2948173
>>>>>> [11]: In 30 days JEP-198 it will be recognizably PI days old for the
>>>>>> 2nd time in its history.
>>>>>> [12]: To me, the fact that is still an open JEP is more a social
>>>>>> convenience than anything. I could just as easily writing this exact same
>>>>>> email about TOML.
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230228/bc002f42/attachment-0001.htm>
More information about the core-libs-dev
mailing list