NPE throwing behavior of immutable collections
Tagir Valeev
amaembo at gmail.com
Tue Jan 31 17:29:05 UTC 2023
Hello!
I agree that collection factories should not throw on contains(null).
This issue actually reduces their
adoption. Imagine that I always returned from some method
Collections.unmodifiableSet(new HashSet<>(Arrays.asList(non, null, values)));
It's a verbose construct that creates inefficient data structure with
tons of garbage in memory. But I cannot simply replace it
with Set.of(non, null, values) in a compatible way, as I don't know
whether clients rely on contains(null) or not.
When we migrated our codebase from Java 8 to Java 11, we eagerly
replaced dozens of uses of unmodifiableList/Set/Map to
collection factories. We even have an inspection for this. Most of
them were ok, but we also introduced new bugs, as contains(null)
is definitely called in some cases. Sometimes, this happens rarely,
and was not noticed during testing.
So we had new bugs in production and unhappy customers. Now we try not
to touch old code, unless we are totally sure that
contains(null) is never called.
I see that concurrent collections were implemented in the same way,
but the case when you want to replace
a normal collection with a concurrent one occurs rarely, and it's
likely that you still need to revisit all the uses of this collection,
to ensure that they are thread-safe, so you can fix potential problems
with contains(). On the other hand, collection factories
are very appealing to use, but this unpleasant behavior really gets in the way.
With best regards,
Tagir Valeev.
On Tue, Jan 31, 2023 at 3:14 AM Stuart Marks <stuart.marks at oracle.com> wrote:
>
> In this reply I'll focus on the null handling issues in collections.
>
> As you've noted, there are really (at least) two distinct issues here: whether a
> collection permits nulls, and the behavior of contains(null) queries. There have
> been continual complaints about both, and the issues are somewhat distinct.
>
> The collection implementations in the JDK are all over the place with regard to
> nulls. I believe this is because the original collections and the concurrent
> collections up through JDK 1.6 or so were mostly done by Josh Bloch and Doug Lea,
> who disagreed about null handling. They had this exchange in 2006:
>
> Doug Lea wrote: [1]
>
> > The main reason that nulls aren't allowed in ConcurrentMaps
> > (ConcurrentHashMaps, ConcurrentSkipListMaps) is that
> > ambiguities that may be just barely tolerable in non-concurrent
> > maps can't be accommodated. The main one is that if
> > map.get(key) returns null, you can't detect whether the
> > key explicitly maps to null vs the key isn't mapped.
> > In a non-concurrent map, you can check this via map.contains(key),
> > but in a concurrent one, the map might have changed between calls.
> >
> > Further digressing: I personally think that allowing
> > nulls in Maps (also Sets) is an open invitation for programs
> > to contain errors that remain undetected until
> > they break at just the wrong time. (Whether to allow nulls even
> > in non-concurrent Maps/Sets is one of the few design issues surrounding
> > Collections that Josh Bloch and I have long disagreed about.)
>
>
> Josh Bloch wrote: [2]
>
> > I have moved towards your position over the years. It was probably a
> > mistake to allow null keys in Maps and null elements in Sets. I'm
> > still not sure about Map values and List elements.
> >
> > In other words, Doug hates null more than I do, but over the years
> > I've come to see it as quite troublesome.
>
>
> (I haven't talked to Josh or Doug about this particular issue recently, so I suppose
> it's possible their opinions have changed in the intervening time.)
>
> I had to decide what to do about nulls when I added the unmodifiable collections in
> JDK 9. Given that Doug "hates" nulls, and Josh finds them at least quite
> troublesome, I started from a position of disallowing null members in all the new
> collections. Most of the unmodifiable collections are still this way (but see below).
>
> What about contains(null)? Surely it should be ok to query for a null, and return
> false, even if the collection itself can't contain null. However, a bunch of
> collections and collection views in the JDK throw NPE on contains(null), so the
> unmodifiable collections don't set a precedent here.
>
> If you're dealing with all non-null values, and a null creeps in somehow, then
> calling contains(null) might be an error, and it would be good for the library to
> inform you of that instead of just returning false and letting the program continue
> until it fails later (fail-fast). A similar issue arises with the non-generic
> contains() method. If you have a Collection<Integer>, then shouldn't contains("abc")
> be an error? It's not, and it simply returns false, because the signature is
> contains(Object). But there have been complaints that this should have been an error
> because a Collection of Integer cannot possibly contain a String. Similar reasoning
> applies to contains(null) on a collection that can't contain null. (Yes, the error
> occurs at runtime instead of compile time, but better late than never.)
>
> A meta-issue is: should a new thing correct a perceived design flaw in the older
> thing, or should it be consistent with the old thing? Either way, you lose.
>
> Another factor in my original decision was that it's much easier to start with a
> restriction and relax it later than it is to go the other way. We have some
> flexibility to allow nulls and contains(null) if we want; but it's essentially a
> non-starter to disallow nulls once they've been allowed somewhere.
>
> That's why my starting position for the design prohibited nulls everywhere.
>
> Over time we've relaxed some restrictions around null handling in the unmodifiable
> collections. Streams permit nulls, and the List returned from Stream.toList() also
> permits nulls, and it also allows contains(null). So there's a different
> unmodifiable List implementation under there, accessible only via streams. (Note
> that the null-permitting unmodifiable lists lose the optimizations for sizes 0, 1,
> and 2, which are in part based on nulls being disallowed.)
>
> I'm fairly sympathetic to the case for changing the unmodifiable collections to
> allow contains(null), given that the current NPE-throwing behavior has generated a
> fair amount of complaints over the years. And the situation with having two flavors
> of unmodifiable list is quite odd and is an uncomfortable point in the design space.
>
> I should point out however that even if the unmodifiable collections were changed to
> allow contains(null), it's still kind of unclear as to whether this code is safe:
>
> public void addShoppingItems(Collection<String> shoppingItems) {
> if (shoppingItems.contains(null)) { // this looks quite reasonable and safe...
> throw new IllegalArgumentException("shoppingItems should not contain nulls");
> }
> ...
> }
>
> If you're only ever passing ArrayList, Arrays.asList(), and List.of() lists here,
> then sure, this would work great. If the source collection is of unknown provenance,
> you can still run into trouble:
>
> 1. contains(null) calls on the keySet and values collections of ConcurrentHashMap
> and Hashtable all throw NPE. So does contains(null) on a set from
> Collections.newSetFromMap(), which is built from those maps' keySets. (I suppose
> Doug Lea might be convinced to relax the CHM behavior.)
>
> 2. Speaking of concurrency, if the source collection is being modified concurrently,
> then a null could be introduced after the check (TOCTOU) so you'd need to make a
> defensive copy before checking it.
>
> 3. NavigableSet and friends delegate contains() to the comparison method, which
> might or might not accept nulls. There's no way the collection implementation can
> tell in advance.
>
> You might consider List.copyOf() as an alternative, as that makes a null-free
> defensive copy in one shot, and avoids copying if the source is an unmodifiable
> list. If you don't care about concurrency, then
>
> shoppingItems.forEach(Objects::requireNonNull)
>
> might be suitable.
>
> If unmodifiable collections were to change to allow contains(null), this probably
> would make them somewhat less annoying, but it wouldn't necessarily solve yours or
> everyone's problems. There's simply no way at this point to make the collections'
> treatment of nulls uniform.
>
> s'marks
>
>
> [1]
> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html
>
> [2]
> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html
>
>
>
>
> On 1/29/23 7:28 AM, John Hendrikx wrote:
> > TLDR; why does contains(null) not just return false for collections that don't allow
> > nulls. Just because the interface allows it, does not mean it should do it as it
> > devalues the usefulness of the abstraction provided by the interface.
> >
> > Background:
> >
> > I'm someone that likes to check correctness of any constructor or method parameter
> > before allowing an object to be constructed or to be modified, in order to maintain
> > invariants that are provided by the class to its users.
> >
> > This ranges from simple null checks, to range checks on numeric values, to complete
> > checks like "is a collection sorted" or is a list of nodes acyclic. Anything I can
> > check in the constructor that may avoid problems further down the line or that may
> > affect what I can guarantee on its own API methods. For example, if I check in the
> > constructor that something is not null, then the associated getter will guarantee
> > that the returned value is not null. If I check that a List doesn't contain nulls,
> > then the associated getter will reflect that as well (assuming it is immutable or
> > defensivily copied).
> >
> > For collections, this is currently becoming harder and harder because more and more
> > new collections are written to be null hostile. It is fine if a collection doesn't
> > accept nulls, but throwing NPE when I ask such a collection if it contains null
> > seems to be counter productive, and reduces the usefulness of the collection
> > interfaces.
> >
> > This strict interpretation makes the collection interfaces harder to use than
> > necessary. Interfaces are only useful when their contract is well defined. The more
> > things an interface allows or leaves unspecified, the less useful it is for its users.
> >
> > I know that the collection interfaces allow this, but you have to ask yourself this
> > important question: how useful is an interface that makes almost no guarantees about
> > what its methods will do? Interfaces like `List`, `Map` and `Set` are passed as
> > method parameters a lot, and to make these useful, implementations of these
> > interfaces should do their best to provide as consistent an experience as is
> > reasonably possible. The alternative is that these interfaces will slowly decline in
> > usefulness as methods will start asking for `ArrayList` instead of `List` to avoid
> > having to deal with a too lenient specification.
> >
> > With the collection interfaces I get the impression that recently there has been too
> > much focus on what would be easy for the collection implementation instead of what
> > would be easy for the users of said interfaces. In my opinion, the concerns of the
> > user of interfaces almost always far outweigh the concerns of the implementors of
> > said interfaces.
> >
> > In the past, immutable collections were rare, but they get are getting more and more
> > common all the time. For example, in unit testing. Unfortunately, these immutable
> > collections differ quite a bit from their mutable counterparts. Some parts are only
> > midly annoying (not accepting `null` as the **value** in a `Map` for example), but
> > other parts require code to be rewritten for it to be able to work as a drop-in
> > replacement for the mutable variant. A simple example is this:
> >
> > public void addShoppingItems(Collection<String> shoppingItems) {
> > if (shoppingItems.contains(null)) { // this looks quite reasonable and
> > safe...
> > throw new IllegalArgumentException("shoppingItems should not contain
> > nulls");
> > }
> >
> > this.shoppingItems.addAll(shoppingItems);
> > }
> >
> > Testing this code is annoying:
> >
> > x.addShoppingItems(List.of("a", null")); // can't construct immutable
> > collection with null
> >
> > x.addShoppingItems(Arrays.asList("a", null")); // fine, go back to what we
> > did before then...
> >
> > The above problems, I suppose we can live with it; immutable collections don't want
> > `null` although I don't see any reason to not allow it as I can write a similar
> > `List.of` that returns immutable collections that do allow `null`. For JDK code this
> > is a bit disconcerting, as it is supposed to be as flexible as is reasonable without
> > having too much of an opinion about what is good or bad.
> >
> > This next one however:
> >
> > assertNoExceptionThrown(() -> x.addShoppingItems(List.of("a", "b"))); // not
> > allowed, contains(null) in application code throws NPE
> >
> > This is much more insidious; the `contains(null)` check has been a very practical
> > way to check if collections contain null, and this works for almost all collections
> > in common use, so there is no problem. But now, more and more collections are
> > starting to just throw NPE immediately even just to **check** if a null element is
> > present. This only serves to distinguish themselves loudly from other similar
> > collections that will simply return `false`.
> >
> > I think this behavior is detrimental to the java collection interfaces in general,
> > especially since there is a clear answer to the question if such a collection
> > contains null or not. In fact, why `contains` was ever allowed to throw an exception
> > aside from "UnsupportedOperationException" is a mystery to me, and throwing one when
> > one could just as easily return the correct and expected answer seems very opiniated
> > and almost malicious -- not behavior I'd expect from JDK core libraries.
> >
> > Also note that there is no way to know in advance if `contains(null)` is going to be
> > throwing the NPE. If interfaces had a method `allowsNulls()` that would already be
> > an improvement.
> >
> > --John
> >
> >
More information about the core-libs-dev
mailing list