Towards a JSON API for the JDK
Lars Bruun-Hansen
lbruun at apache.org
Fri May 16 14:28:38 UTC 2025
Ahh, as expected some great thoughts have gone into this. :-)
Allowing the ignore of comments when parsing is not "inventing a new
format", IMO. More so because the RFC allows it (section 9) and because
it would be strictly opt-in. So the parser can still be said to be
strictly RFC-8259 while optionally supporting such feature. I'm
speculating the authors of the RFC has thought of it as a form a
pre-processing when they used the word "extension" and that is indeed a
good way to think about it.
In any case my goal was purely comments in the JSON, not about allowing
other forms of leniency, like trailing commas, flexible quotes, etc. I
feel those are in another category and not something that should (ever)
be supported. I can see why someone might ask for that if comments where
allowed to be ignored by the lib. The argument about slippery slope can
be made. I just fear that newbies will try to build their own
pre-processor, using regexp or whatever .. and get it wrong.
I agree with you that JSON5 sneaks in too many odd things.
Keep up the good work.
/Lars
On 16/05/2025 15.17, Brian Goetz wrote:
> At first, we were hopeful that we could jump right to JSON5, which
> appears at first glance to be a strictly lexical, more permissive
> grammar for JSON (supporting comments, trailing commas, more flexible
> quoting, white space, etc.) If that were actually true, this would
> have been a slam dunk, since all these lexical niceties don't have an
> impact on the parsed results. And JSON5 has been gaining some
> traction, so it probably could have been a justifiable move to jump
> right to that.
>
> But then we discovered that JSON5 also sneaks in some semantics, by
> also supporting the exotic numeric values (NaN, infinities, signed
> zero), which now has consequences for "what is a number", the numeric
> representation, the API for unpacking numeric values, etc. (Having
> multiple parsers is one thing; having multiple parsers that produce
> different semantics is entirely another.) And inventing a new "JSON5
> but not quite" subset would be doing no one any favors.
>
> Jsonc seems to be entirely a MS-ecosystem thing; it does not have
> broad enough traction to be the "one grammar" we accept. So pure
> JSON, as specification-challenged as it is, is the logical, though
> sad, conclusion (for now.)
>
>
>
> On 5/16/2025 9:02 AM, Lars Bruun-Hansen wrote:
>>
>>
>> Great work.
>>
>>
>> I feel the elephant in the room needs to addressed: JSON comments. I
>> haven't tested the proposed lib but I cannot see it mentioned so I'm
>> assuming that comments are not supported.
>>
>> For better or worse, the use of jsonc (JSON with comments) is
>> everywhere in some ecosystems. Unsurprisingly this happens often when
>> JSON is used as a config file format. Looking at you, Microsoft.
>>
>> It would be nice if the JDK's build-in JSON parser at least
>> recognized this.
>>
>> I'm well aware that comments are frowned upon in JSON and not part of
>> neither the spec at www.json.org nor the RFC-8259.
>>
>> Yet, I advocate the JDK JSON library should optionally allow comments
>> to be ignored when PARSING. This should be an opt-in feature that
>> would technically treat comments as whitespace during the parsing
>> process.
>>
>> This would also be in line with what many other parsers do. For
>> example, Jackson has "ALLOW_COMMENTS" feature [1]. Also, by
>> comparison, the build-in parser in the .NET world, known as
>> System.Text.Json, also supports this [2].
>>
>>
>>
>> The "discoverer" of JSON, Douglas Crowford, had this to say [3] on
>> the topic:
>>
>>
>> [QUOTE]
>>
>> I removed comments from JSON because I saw people were using them to
>> hold parsing directives, a practice which would have destroyed
>> interoperability. I know that the lack of comments makes some people
>> sad, but it shouldn't.
>>
>> Suppose you are using JSON to keep configuration files, which you
>> would like to annotate. Go ahead and insert all the comments you
>> like. Then pipe it through JSMin before handing it to your JSON parser.
>>
>> [/QUOTE]
>>
>>
>> By not having the ability to ignore comments when parsing we would
>> effectively force users to use another parser first or a minifier. I
>> doubt beginners would appreciate that.
>>
>>
>> BTW: The test suite already has tests for comments.
>>
>>
>> /Lars
>>
>>
>> [1]:
>> https://www.javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.19.0/com/fasterxml/jackson/core/JsonParser.Feature.html#ALLOW_COMMENTS
>>
>> [2]:
>> https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonreaderoptions?view=net-9.0#properties
>>
>> [3]: https://plus.google.com/118095276221607585885/posts/RK8qyGVaGSr
>>
>>
>>
>>
>> On 16/05/2025 01.44, Ethan McCue wrote:
>>> I present for your consideration the library I made when spiraling
>>> about this problem space a few years ago
>>>
>>> https://github.com/bowbahdoe/json
>>>
>>> https://javadoc.io/doc/dev.mccue/json/latest/dev.mccue.json/dev/mccue/json/package-summary.html
>>>
>>> Notably missing during the design process here were patterns, hence
>>> the JsonDecoder design. I haven't been able to evaluate how patterns
>>> affect that on account of them not being out.
>>>
>>> I will more thoroughly peruse the draft of java.util.json at a later
>>> date, but my initial observations/comments:
>>>
>>> * I am not sure having JsonValue be distinct from Json has value.
>>> * toUntyped feels a little strange to me - the only type information
>>> presumably lost is the sealed-ness of the hierarchy. The interplay
>>> between that and toNumber is also a little unnerving.
>>> * One notion that I found helpful was that a class could be "json
>>> encodable," meaning there is a method to call to obtain a canonical
>>> json representation.
>>>
>>> record Person(String name) implements JsonEncodable {
>>> @Override
>>> public Json toJson() {
>>> return Json.objectBuilder()
>>> .put("namen", name)
>>> .build();
>>> }
>>> }
>>>
>>> Which helper methods like Json#of(List<? extends
>>> JsonEncodable>) could make use of. Json itself (JsonValue in your
>>> prototype) could then have a vacuous implementation.
>>>
>>> * Terminology wise - I went with reading/writing for the actual
>>> parsing/generation of json and encoding/decoding for the mapping of
>>> those representations to/from specific classes. The merits are not
>>> top of mind, just noting the difference. read/write vs
>>> parse/toString+toDisplayString
>>> * One thing I did was make the helper methods in Json null tolerant
>>> and the ones in the specific subtypes like JsonString not. This was
>>> because from what I saw of usages of javax.json/jakarta.json that
>>> nullability was a footgun and correcting for it required changes to
>>> code structure (breaking up builder chains with if (x != null) checks)
>>> * The functionality you want from JsonNumber could be achieved by
>>> making it just extend Number
>>> (https://github.com/bowbahdoe/json/blob/main/src/main/java/dev/mccue/json/JsonNumber.java)
>>> instead of a bespoke toNumber. You need the extra methods to go to
>>> big decimal and co, but it's just an extension to the behavior of
>>> Number at that point.
>>> * JsonObject and JsonArray could implement Map<String, Json> and
>>> List<Json> respectively. This lowers the need for toUntyped() -
>>> since presumably one of the use cases for that is turning the json
>>> tree into something that more generic map/list traversal code can
>>> handle. It also complicates any lazy loading somewhat.
>>> * Assuming patterns can be placed on interfaces, you might want to
>>> consider something similar to JsonDecoder, but with a pattern
>>> instead of a method that throws an exception.
>>>
>>> // Where here fromJson would box up the logic for testing and
>>> extracting from each element in the array.
>>> List<Person> people = array(json, Person::fromJson);
>>>
>>> * I don't think there is sufficient cause for anything to be
>>> non-sealed at this point.
>>> * JsonBoolean and JsonNull do not have reasonable alternative
>>> implementations - as far as I can imagine, maybe i'm wrong - so
>>> maybe those can just be final classes?
>>> * If you seal up the whole hierarchy then its pretty trivial to make
>>> it serializable
>>> (https://github.com/bowbahdoe/json/blob/main/src/main/java/dev/mccue/json/serialization/JsonSerializationProxy.java)
>>>
>>>
>>>
>>>
>>> On Thu, May 15, 2025 at 11:29 PM Remi Forax <forax at univ-mlv.fr> wrote:
>>>
>>> Hi Paul,
>>> yes, not having a simple JSON API in Java is an issue for beginners.
>>>
>>> It's not clear to me why JsonArray (for example) has to be an
>>> interface instead of a record ?
>>>
>>> I understand why Json.parse() only works on String and char[]
>>> but the API make it too easy to have many performance issues.
>>> I think you need versions using a Reader and a Path.
>>> Bonus point, if there is a method walk() that also returns a
>>> JsonValue but the List/Map inside JsonArray/JsonObject are
>>> populated lazily.
>>>
>>> Minor point: Json.toDisplayString() should takes a second
>>> parameters indicating the number of spaces used for the
>>> indentation (like JSON.stringify in JS).
>>>
>>> regards,
>>> Rémi
>>>
>>> ----- Original Message -----
>>> > From: "Paul Sandoz" <paul.sandoz at oracle.com>
>>> > To: "core-libs-dev" <core-libs-dev at openjdk.org>
>>> > Sent: Thursday, May 15, 2025 10:30:42 PM
>>> > Subject: Towards a JSON API for the JDK
>>>
>>> > Hi,
>>> >
>>> > We would like to share with you our thoughts and plans towards
>>> a JSON API for
>>> > the JDK.
>>> > Please see the document below.
>>> >
>>> > -
>>> >
>>> > We have had the pleasure of using a clone of this API in some
>>> experiments we are
>>> > conducting with
>>> > ONNX and code reflection [1]. Using the API we were able to
>>> quickly write code
>>> > to ingest and convert
>>> > a JSON document representing ONNX operation schema into
>>> instances of records
>>> > modeling the schema
>>> > (see here [2]).
>>> >
>>> > The overall out-of-box experience with such a minimal
>>> "batteries included” API
>>> > has so far been positive.
>>> >
>>> > Thanks,
>>> > Paul.
>>> >
>>> > [1] https://openjdk.org/projects/babylon/
>>> > [2]
>>> >
>>> https://github.com/openjdk/babylon/blob/code-reflection/cr-examples/onnx/opgen/src/main/java/oracle/code/onnx/opgen/OpSchemaParser.java#L87
>>> >
>>> > # Towards a JSON API for the JDK
>>> >
>>> > One of the most common requests for the JDK is an API for
>>> parsing and generating
>>> > JSON. While JSON originated as a text-based serialization
>>> format for JSON
>>> > objects ("JSON" stands for "JavaScript Object Notation"),
>>> because of its simple
>>> > and flexible syntax, it eventually found use outside the
>>> JavaScript ecosystem as
>>> > a general data interchange format, such as framework
>>> configuration files and web
>>> > service requests/response formats.
>>> >
>>> > While the JDK cannot, and should not, provide libraries for
>>> every conceivable
>>> > file format or protocol, the JDK philosophy is one of
>>> "batteries included",
>>> > which is to say we should be able to write basic programs that
>>> use common
>>> > protocols such as HTTP, without having to appeal to third
>>> party libraries.
>>> > The Java ecosystem already has plenty of JSON libraries, so
>>> inclusion in
>>> > the JDK is largely meant to be a convenience, rather than
>>> needing to be the "one
>>> > true" JSON library to meet the needs of all users. Users with
>>> specific needs
>>> > are always free to select one of the existing third-party
>>> libraries.
>>> >
>>> > ## Goals and requirements
>>> >
>>> > Our primary goal is that the library be simple to use for
>>> parsing, traversing,
>>> > and generating conformant JSON documents. Advanced features,
>>> such as data
>>> > binding or path-based traversal should be possible to
>>> implement as layered
>>> > features, but for simplicity are not included in the core API.
>>> We adopt a goal
>>> > that the performance should be "good enough", but where
>>> performance
>>> > considerations conflict with simplicity and usability, we will
>>> choose in favor
>>> > of the latter.
>>> >
>>> > ## API design approach
>>> >
>>> > The description of JSON at `https:://json.org
>>> <http://json.org>` describes a JSON document using
>>> > the familiar "railroad diagram":
>>> > 
>>> >
>>> > This diagram describes an algebraic data type (a sum of
>>> products), which we
>>> > model directly with a set of Java interfaces:
>>> >
>>> > ```
>>> > interface JsonValue { }
>>> > interface JsonArray extends JsonValue { List<JsonValue>
>>> values(); }
>>> > interface JsonObject extends JsonValue { Map<String,
>>> JsonValue> members(); }
>>> > interface JsonNumber extends JsonValue { Number toNumber(); }
>>> > interface JsonString extends JsonValue { String value(); }
>>> > interface JsonBoolean extends JsonValue { boolean value(); }
>>> > interface JsonNull extends JsonValue { }
>>> > ```
>>> >
>>> > These interfaces have (hidden) companion implementation
>>> classes that admit
>>> > greater flexibility of implementation than modeling them
>>> directly with records
>>> > would permit.
>>> > Further, these interfaces are unsealed. We compromise on the
>>> sealed sum of
>>> > products to enable
>>> > alternative implementations, for example to support
>>> alternative formats that
>>> > encode the same information in a JSON document but in a more
>>> efficient form than
>>> > text.
>>> >
>>> > The API has static methods for parsing strings into a
>>> `JsonValue`, conversion to
>>> > and from purely untyped representations (lists and maps), and
>>> factory methods
>>> > for building JSON documents. We apply composition
>>> consistently, e.g, a
>>> > JsonString has a string, a JsonObject has a map of string to
>>> JsonValue, as
>>> > opposed to extension for structural JSON values.
>>> >
>>> > It turns out that this simple API is almost all we need for
>>> traversal. It gives
>>> > us an immutable representation of a document, and we can use
>>> pattern matching to
>>> > answer the myriad questions that will come up (Does this
>>> object have key X? Does
>>> > it map to a number? Is that number representable as an
>>> integer?) when going
>>> > from an untyped format like JSON to a more strongly typed
>>> domain model.
>>> > Given a simple document like:
>>> >
>>> > ```
>>> > {
>>> > "name": "John”,
>>> > "age": 30
>>> > }
>>> > ```
>>> >
>>> > we can parse and traverse the document as follows:
>>> >
>>> > ```
>>> > JsonValue doc = Json.parse(inputString);
>>> > if (doc instanceof JsonObject o
>>> > && o.members().get("name") instanceof JsonString s
>>> > && s.value() instanceof String name
>>> > && o.members().get("age") instanceof JsonNumber n
>>> > && n.toNumber() instanceof Long l && l instanceof int age) {
>>> > // use "name" and "age"
>>> > }
>>> > ```
>>> >
>>> > Later, when the language acquires the ability to expose
>>> deconstruction patterns
>>> > for arbitrary interfaces (similar to today's record patterns, see
>>> >
>>> https://openjdk.org/projects/amber/design-notes/patterns/towards-member-patterns),
>>> > this will be simplifiable to:
>>> >
>>> > ```
>>> > JsonValue doc = Json.parse(inputString);
>>> > if (doc instanceof JsonObject(var members)
>>> > && members.get("name") instanceof JsonString(String name)
>>> > && members.get("age") instanceof JsonNumber(int age)) {
>>> > // use "name" and "age"
>>> > }
>>> > ```
>>> >
>>> > So, overtime, as more pattern matching features are introduced
>>> we anticipate
>>> > improved use of the API. This is a primary reason why the API
>>> is so minimal.
>>> > Convenience methods we add today, such as a method that
>>> accesses a JSON
>>> > object component as say a JSON string or throws an exception,
>>> will become
>>> > redundant in the future.
>>> >
>>> > ## JSON numbers
>>> >
>>> > The specification of JSON number makes no explicit distinction
>>> between integral
>>> > and decimal numbers, nor specifies limits on the size of those
>>> numbers.
>>> > This is a common source of interoperability issues when
>>> consuming JSON
>>> > documents. Generally users cannot always but often do assume
>>> JSON numbers are
>>> > parsable, without loss of precision, to IEEE double-precision
>>> floating point
>>> > numbers or 32-bit signed integers.
>>> >
>>> > In this respect the API provides three means to operate on the
>>> JSON number,
>>> > giving the user full control:
>>> >
>>> > 1. Underlying string representation can be obtained, if
>>> preserving syntactic
>>> > details such as leading or trailing zeros is important.
>>> > 2. The string representation can be parsed to an instance of
>>> `BigDecimal`, using
>>> > `toBigDecimal` if preserving decimal numbers is important.
>>> > 3. The string representation can be parsed into an instance of
>>> `Long`, `Double`,
>>> > `BigInteger`, or `BigDecimal`, using `toNumber`. The result
>>> of this method
>>> > depends on how the representation can be parsed, possibly
>>> losing precision,
>>> > choosing a suitably convenient numeric type that can then be
>>> pattern
>>> > matched on.
>>> >
>>> > Primitive pattern matching will help as will further pattern
>>> matching features
>>> > enabling the user to partially match.
>>> >
>>> > ## Prototype implementation
>>> >
>>> > The prototype implementation is currently located into the JDK
>>> sandbox
>>> > repository
>>> > under the `json` branch, see
>>> > here
>>> >
>>> https://github.com/openjdk/jdk-sandbox/tree/json/src/java.base/share/classes/java/util/json
>>> > The prototype API javadoc generated from the repository is
>>> also available at
>>> >
>>> https://cr.openjdk.org/~naoto/json/javadoc/api/java.base/java/util/json/package-summary.html
>>> >
>>> > ### Testing and conformance
>>> >
>>> > The prototype implementation passes all conformance test cases
>>> but two,
>>> > available
>>> > on https://github.com/nst/JSONTestSuite. The two exceptions
>>> are the ones which
>>> > the
>>> > prototype specifically prohibits, i.e, duplicated names in
>>> JSON objects
>>> >
>>> (https://cr.openjdk.org/~naoto/json/conformance/results/parsing.html#35).
>>> >
>>> > ### Performance
>>> >
>>> > Our main focus so far has been on the API design and a functional
>>> > implementation.
>>> > Hence, there has been less focus on performance even though we
>>> know there are a
>>> > number of performance enhancements we can make eventually.
>>> > We are reasonably happy with the current performance. The
>>> > implementation performs well when compared to other JSON
>>> implementations
>>> > parsing from string instances and traversing documents.
>>> >
>>> > An example of where we may choose simplicity and usability
>>> over performance
>>> > is the rejection of JSON documents containing objects that in
>>> turn contain
>>> > members
>>> > with duplicate names. That may increase the cost of parsing,
>>> but simplifies the
>>> > user
>>> > experience for the majority of cases since if we reasonably
>>> assume JsonObjects
>>> > are
>>> > map-like, what should the user do with such members, pick one
>>> the last one?
>>> > merge
>>> > the values? or reject?
>>> >
>>> > ## A JSON JEP?
>>> >
>>> > We plan to draft JEP when we are ready. Attentive readers will
>>> observe that
>>> > a JEP already exists, JEP 198: Light-Weight JSON API
>>> > (https://openjdk.org/jeps/198). We will
>>> > either update this JEP, or withdraw it and draft a new one.
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250516/44f686fe/attachment-0001.htm>
More information about the core-libs-dev
mailing list