NPE throwing behavior of immutable collections

Remi Forax forax at univ-mlv.fr
Tue Jan 31 21:28:50 UTC 2023


> From: "Glavo" <zjx001202 at gmail.com>
> To: "Stuart Marks" <stuart.marks at oracle.com>
> Cc: "John Hendrikx" <john.hendrikx at gmail.com>, "core-libs-dev"
> <core-libs-dev at openjdk.org>
> Sent: Tuesday, January 31, 2023 10:12:24 PM
> Subject: Re: NPE throwing behavior of immutable collections

> I understand that null is prohibited by default, but can we also provide a set
> of factory methods that accept null?
> They can be named like List.ofNullable/copyOfNullable.

It is actually spelt, Arrays.asList() and collection.stream().toList(). 

Rémi 

> On Tue, Jan 31, 2023 at 10:13 AM Stuart Marks < [ mailto:stuart.marks at oracle.com
> | stuart.marks at oracle.com ] > wrote:

>> In this reply I'll focus on the null handling issues in collections.

>> As you've noted, there are really (at least) two distinct issues here: whether a
>> collection permits nulls, and the behavior of contains(null) queries. There have
>> been continual complaints about both, and the issues are somewhat distinct.

>> The collection implementations in the JDK are all over the place with regard to
>> nulls. I believe this is because the original collections and the concurrent
>> collections up through JDK 1.6 or so were mostly done by Josh Bloch and Doug
>> Lea,
>> who disagreed about null handling. They had this exchange in 2006:

>> Doug Lea wrote: [1]

>> > The main reason that nulls aren't allowed in ConcurrentMaps
>> > (ConcurrentHashMaps, ConcurrentSkipListMaps) is that
>> > ambiguities that may be just barely tolerable in non-concurrent
>> > maps can't be accommodated. The main one is that if
>> > map.get(key) returns null, you can't detect whether the
>> > key explicitly maps to null vs the key isn't mapped.
>> > In a non-concurrent map, you can check this via map.contains(key),
>> > but in a concurrent one, the map might have changed between calls.

>> > Further digressing: I personally think that allowing
>> > nulls in Maps (also Sets) is an open invitation for programs
>> > to contain errors that remain undetected until
>> > they break at just the wrong time. (Whether to allow nulls even
>> > in non-concurrent Maps/Sets is one of the few design issues surrounding
>> > Collections that Josh Bloch and I have long disagreed about.)

>> Josh Bloch wrote: [2]

>> > I have moved towards your position over the years. It was probably a
>> > mistake to allow null keys in Maps and null elements in Sets. I'm
>> > still not sure about Map values and List elements.

>> > In other words, Doug hates null more than I do, but over the years
>> > I've come to see it as quite troublesome.

>> (I haven't talked to Josh or Doug about this particular issue recently, so I
>> suppose
>> it's possible their opinions have changed in the intervening time.)

>> I had to decide what to do about nulls when I added the unmodifiable collections
>> in
>> JDK 9. Given that Doug "hates" nulls, and Josh finds them at least quite
>> troublesome, I started from a position of disallowing null members in all the
>> new
>> collections. Most of the unmodifiable collections are still this way (but see
>> below).

>> What about contains(null)? Surely it should be ok to query for a null, and
>> return
>> false, even if the collection itself can't contain null. However, a bunch of
>> collections and collection views in the JDK throw NPE on contains(null), so the
>> unmodifiable collections don't set a precedent here.

>> If you're dealing with all non-null values, and a null creeps in somehow, then
>> calling contains(null) might be an error, and it would be good for the library
>> to
>> inform you of that instead of just returning false and letting the program
>> continue
>> until it fails later (fail-fast). A similar issue arises with the non-generic
>> contains() method. If you have a Collection<Integer>, then shouldn't
>> contains("abc")
>> be an error? It's not, and it simply returns false, because the signature is
>> contains(Object). But there have been complaints that this should have been an
>> error
>> because a Collection of Integer cannot possibly contain a String. Similar
>> reasoning
>> applies to contains(null) on a collection that can't contain null. (Yes, the
>> error
>> occurs at runtime instead of compile time, but better late than never.)

>> A meta-issue is: should a new thing correct a perceived design flaw in the older
>> thing, or should it be consistent with the old thing? Either way, you lose.

>> Another factor in my original decision was that it's much easier to start with a
>> restriction and relax it later than it is to go the other way. We have some
>> flexibility to allow nulls and contains(null) if we want; but it's essentially a
>> non-starter to disallow nulls once they've been allowed somewhere.

>> That's why my starting position for the design prohibited nulls everywhere.

>> Over time we've relaxed some restrictions around null handling in the
>> unmodifiable
>> collections. Streams permit nulls, and the List returned from Stream.toList()
>> also
>> permits nulls, and it also allows contains(null). So there's a different
>> unmodifiable List implementation under there, accessible only via streams. (Note
>> that the null-permitting unmodifiable lists lose the optimizations for sizes 0,
>> 1,
>> and 2, which are in part based on nulls being disallowed.)

>> I'm fairly sympathetic to the case for changing the unmodifiable collections to
>> allow contains(null), given that the current NPE-throwing behavior has generated
>> a
>> fair amount of complaints over the years. And the situation with having two
>> flavors
>> of unmodifiable list is quite odd and is an uncomfortable point in the design
>> space.

>> I should point out however that even if the unmodifiable collections were
>> changed to
>> allow contains(null), it's still kind of unclear as to whether this code is
>> safe:

>> public void addShoppingItems(Collection<String> shoppingItems) {
>> if (shoppingItems.contains(null)) { // this looks quite reasonable and safe...
>> throw new IllegalArgumentException("shoppingItems should not contain nulls");
>> }
>> ...
>> }

>> If you're only ever passing ArrayList, Arrays.asList(), and List.of() lists
>> here,
>> then sure, this would work great. If the source collection is of unknown
>> provenance,
>> you can still run into trouble:

>> 1. contains(null) calls on the keySet and values collections of
>> ConcurrentHashMap
>> and Hashtable all throw NPE. So does contains(null) on a set from
>> Collections.newSetFromMap(), which is built from those maps' keySets. (I suppose
>> Doug Lea might be convinced to relax the CHM behavior.)

>> 2. Speaking of concurrency, if the source collection is being modified
>> concurrently,
>> then a null could be introduced after the check (TOCTOU) so you'd need to make a
>> defensive copy before checking it.

>> 3. NavigableSet and friends delegate contains() to the comparison method, which
>> might or might not accept nulls. There's no way the collection implementation
>> can
>> tell in advance.

>> You might consider List.copyOf() as an alternative, as that makes a null-free
>> defensive copy in one shot, and avoids copying if the source is an unmodifiable
>> list. If you don't care about concurrency, then

>> shoppingItems.forEach(Objects::requireNonNull)

>> might be suitable.

>> If unmodifiable collections were to change to allow contains(null), this
>> probably
>> would make them somewhat less annoying, but it wouldn't necessarily solve yours
>> or
>> everyone's problems. There's simply no way at this point to make the
>> collections'
>> treatment of nulls uniform.

>> s'marks

>> [1]
>> [
>> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html
>> |
>> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html
>> ]

>> [2]
>> [
>> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html
>> |
>> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html
>> ]

>> On 1/29/23 7:28 AM, John Hendrikx wrote:
>>> TLDR; why does contains(null) not just return false for collections that don't
>> > allow
>> > nulls. Just because the interface allows it, does not mean it should do it as it
>> > devalues the usefulness of the abstraction provided by the interface.

>> > Background:

>>> I'm someone that likes to check correctness of any constructor or method
>> > parameter
>>> before allowing an object to be constructed or to be modified, in order to
>> > maintain
>> > invariants that are provided by the class to its users.

>>> This ranges from simple null checks, to range checks on numeric values, to
>> > complete
>>> checks like "is a collection sorted" or is a list of nodes acyclic. Anything I
>> > can
>>> check in the constructor that may avoid problems further down the line or that
>> > may
>>> affect what I can guarantee on its own API methods. For example, if I check in
>> > the
>>> constructor that something is not null, then the associated getter will
>> > guarantee
>>> that the returned value is not null. If I check that a List doesn't contain
>> > nulls,
>>> then the associated getter will reflect that as well (assuming it is immutable
>> > or
>> > defensivily copied).

>>> For collections, this is currently becoming harder and harder because more and
>> > more
>>> new collections are written to be null hostile. It is fine if a collection
>> > doesn't
>> > accept nulls, but throwing NPE when I ask such a collection if it contains null
>> > seems to be counter productive, and reduces the usefulness of the collection
>> > interfaces.

>> > This strict interpretation makes the collection interfaces harder to use than
>>> necessary. Interfaces are only useful when their contract is well defined. The
>> > more
>>> things an interface allows or leaves unspecified, the less useful it is for its
>> > users.

>>> I know that the collection interfaces allow this, but you have to ask yourself
>> > this
>>> important question: how useful is an interface that makes almost no guarantees
>> > about
>> > what its methods will do? Interfaces like `List`, `Map` and `Set` are passed as
>> > method parameters a lot, and to make these useful, implementations of these
>> > interfaces should do their best to provide as consistent an experience as is
>>> reasonably possible. The alternative is that these interfaces will slowly
>> > decline in
>>> usefulness as methods will start asking for `ArrayList` instead of `List` to
>> > avoid
>> > having to deal with a too lenient specification.

>>> With the collection interfaces I get the impression that recently there has been
>> > too
>>> much focus on what would be easy for the collection implementation instead of
>> > what
>>> would be easy for the users of said interfaces. In my opinion, the concerns of
>> > the
>>> user of interfaces almost always far outweigh the concerns of the implementors
>> > of
>> > said interfaces.

>>> In the past, immutable collections were rare, but they get are getting more and
>> > more
>>> common all the time. For example, in unit testing. Unfortunately, these
>> > immutable
>>> collections differ quite a bit from their mutable counterparts. Some parts are
>> > only
>>> midly annoying (not accepting `null` as the **value** in a `Map` for example),
>> > but
>> > other parts require code to be rewritten for it to be able to work as a drop-in
>> > replacement for the mutable variant. A simple example is this:

>> > public void addShoppingItems(Collection<String> shoppingItems) {
>> > if (shoppingItems.contains(null)) { // this looks quite reasonable and
>> > safe...
>> > throw new IllegalArgumentException("shoppingItems should not contain
>> > nulls");
>> > }

>> > this.shoppingItems.addAll(shoppingItems);
>> > }

>> > Testing this code is annoying:

>> > x.addShoppingItems(List.of("a", null")); // can't construct immutable
>> > collection with null

>> > x.addShoppingItems(Arrays.asList("a", null")); // fine, go back to what we
>> > did before then...

>>> The above problems, I suppose we can live with it; immutable collections don't
>> > want
>> > `null` although I don't see any reason to not allow it as I can write a similar
>>> `List.of` that returns immutable collections that do allow `null`. For JDK code
>> > this
>>> is a bit disconcerting, as it is supposed to be as flexible as is reasonable
>> > without
>> > having too much of an opinion about what is good or bad.

>> > This next one however:

>> > assertNoExceptionThrown(() -> x.addShoppingItems(List.of("a", "b"))); // not
>> > allowed, contains(null) in application code throws NPE

>>> This is much more insidious; the `contains(null)` check has been a very
>> > practical
>>> way to check if collections contain null, and this works for almost all
>> > collections
>> > in common use, so there is no problem. But now, more and more collections are
>>> starting to just throw NPE immediately even just to **check** if a null element
>> > is
>> > present. This only serves to distinguish themselves loudly from other similar
>> > collections that will simply return `false`.

>>> I think this behavior is detrimental to the java collection interfaces in
>> > general,
>> > especially since there is a clear answer to the question if such a collection
>>> contains null or not. In fact, why `contains` was ever allowed to throw an
>> > exception
>>> aside from "UnsupportedOperationException" is a mystery to me, and throwing one
>> > when
>>> one could just as easily return the correct and expected answer seems very
>> > opiniated
>> > and almost malicious -- not behavior I'd expect from JDK core libraries.

>>> Also note that there is no way to know in advance if `contains(null)` is going
>> > to be
>>> throwing the NPE. If interfaces had a method `allowsNulls()` that would already
>> > be
>> > an improvement.

>> > --John


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230131/40bd91b1/attachment-0001.htm>


More information about the core-libs-dev mailing list