NPE throwing behavior of immutable collections
Glavo
zjx001202 at gmail.com
Tue Jan 31 21:12:24 UTC 2023
I understand that null is prohibited by default, but can we also provide a
set of factory methods that accept null?
They can be named like List.ofNullable/copyOfNullable.
On Tue, Jan 31, 2023 at 10:13 AM Stuart Marks <stuart.marks at oracle.com>
wrote:
> In this reply I'll focus on the null handling issues in collections.
>
> As you've noted, there are really (at least) two distinct issues here:
> whether a
> collection permits nulls, and the behavior of contains(null) queries.
> There have
> been continual complaints about both, and the issues are somewhat distinct.
>
> The collection implementations in the JDK are all over the place with
> regard to
> nulls. I believe this is because the original collections and the
> concurrent
> collections up through JDK 1.6 or so were mostly done by Josh Bloch and
> Doug Lea,
> who disagreed about null handling. They had this exchange in 2006:
>
> Doug Lea wrote: [1]
>
> > The main reason that nulls aren't allowed in ConcurrentMaps
> > (ConcurrentHashMaps, ConcurrentSkipListMaps) is that
> > ambiguities that may be just barely tolerable in non-concurrent
> > maps can't be accommodated. The main one is that if
> > map.get(key) returns null, you can't detect whether the
> > key explicitly maps to null vs the key isn't mapped.
> > In a non-concurrent map, you can check this via map.contains(key),
> > but in a concurrent one, the map might have changed between calls.
> >
> > Further digressing: I personally think that allowing
> > nulls in Maps (also Sets) is an open invitation for programs
> > to contain errors that remain undetected until
> > they break at just the wrong time. (Whether to allow nulls even
> > in non-concurrent Maps/Sets is one of the few design issues surrounding
> > Collections that Josh Bloch and I have long disagreed about.)
>
>
> Josh Bloch wrote: [2]
>
> > I have moved towards your position over the years. It was probably a
> > mistake to allow null keys in Maps and null elements in Sets. I'm
> > still not sure about Map values and List elements.
> >
> > In other words, Doug hates null more than I do, but over the years
> > I've come to see it as quite troublesome.
>
>
> (I haven't talked to Josh or Doug about this particular issue recently, so
> I suppose
> it's possible their opinions have changed in the intervening time.)
>
> I had to decide what to do about nulls when I added the unmodifiable
> collections in
> JDK 9. Given that Doug "hates" nulls, and Josh finds them at least quite
> troublesome, I started from a position of disallowing null members in all
> the new
> collections. Most of the unmodifiable collections are still this way (but
> see below).
>
> What about contains(null)? Surely it should be ok to query for a null, and
> return
> false, even if the collection itself can't contain null. However, a bunch
> of
> collections and collection views in the JDK throw NPE on contains(null),
> so the
> unmodifiable collections don't set a precedent here.
>
> If you're dealing with all non-null values, and a null creeps in somehow,
> then
> calling contains(null) might be an error, and it would be good for the
> library to
> inform you of that instead of just returning false and letting the program
> continue
> until it fails later (fail-fast). A similar issue arises with the
> non-generic
> contains() method. If you have a Collection<Integer>, then shouldn't
> contains("abc")
> be an error? It's not, and it simply returns false, because the signature
> is
> contains(Object). But there have been complaints that this should have
> been an error
> because a Collection of Integer cannot possibly contain a String. Similar
> reasoning
> applies to contains(null) on a collection that can't contain null. (Yes,
> the error
> occurs at runtime instead of compile time, but better late than never.)
>
> A meta-issue is: should a new thing correct a perceived design flaw in the
> older
> thing, or should it be consistent with the old thing? Either way, you lose.
>
> Another factor in my original decision was that it's much easier to start
> with a
> restriction and relax it later than it is to go the other way. We have
> some
> flexibility to allow nulls and contains(null) if we want; but it's
> essentially a
> non-starter to disallow nulls once they've been allowed somewhere.
>
> That's why my starting position for the design prohibited nulls everywhere.
>
> Over time we've relaxed some restrictions around null handling in the
> unmodifiable
> collections. Streams permit nulls, and the List returned from
> Stream.toList() also
> permits nulls, and it also allows contains(null). So there's a different
> unmodifiable List implementation under there, accessible only via streams.
> (Note
> that the null-permitting unmodifiable lists lose the optimizations for
> sizes 0, 1,
> and 2, which are in part based on nulls being disallowed.)
>
> I'm fairly sympathetic to the case for changing the unmodifiable
> collections to
> allow contains(null), given that the current NPE-throwing behavior has
> generated a
> fair amount of complaints over the years. And the situation with having
> two flavors
> of unmodifiable list is quite odd and is an uncomfortable point in the
> design space.
>
> I should point out however that even if the unmodifiable collections were
> changed to
> allow contains(null), it's still kind of unclear as to whether this code
> is safe:
>
> public void addShoppingItems(Collection<String> shoppingItems) {
> if (shoppingItems.contains(null)) { // this looks quite reasonable
> and safe...
> throw new IllegalArgumentException("shoppingItems should not
> contain nulls");
> }
> ...
> }
>
> If you're only ever passing ArrayList, Arrays.asList(), and List.of()
> lists here,
> then sure, this would work great. If the source collection is of unknown
> provenance,
> you can still run into trouble:
>
> 1. contains(null) calls on the keySet and values collections of
> ConcurrentHashMap
> and Hashtable all throw NPE. So does contains(null) on a set from
> Collections.newSetFromMap(), which is built from those maps' keySets. (I
> suppose
> Doug Lea might be convinced to relax the CHM behavior.)
>
> 2. Speaking of concurrency, if the source collection is being modified
> concurrently,
> then a null could be introduced after the check (TOCTOU) so you'd need to
> make a
> defensive copy before checking it.
>
> 3. NavigableSet and friends delegate contains() to the comparison method,
> which
> might or might not accept nulls. There's no way the collection
> implementation can
> tell in advance.
>
> You might consider List.copyOf() as an alternative, as that makes a
> null-free
> defensive copy in one shot, and avoids copying if the source is an
> unmodifiable
> list. If you don't care about concurrency, then
>
> shoppingItems.forEach(Objects::requireNonNull)
>
> might be suitable.
>
> If unmodifiable collections were to change to allow contains(null), this
> probably
> would make them somewhat less annoying, but it wouldn't necessarily solve
> yours or
> everyone's problems. There's simply no way at this point to make the
> collections'
> treatment of nulls uniform.
>
> s'marks
>
>
> [1]
>
> https://web.archive.org/web/20210510190135/https://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002485.html
>
> [2]
>
> https://web.archive.org/web/20200930213909/http://cs.oswego.edu/pipermail/concurrency-interest/2006-May/002486.html
>
>
>
>
> On 1/29/23 7:28 AM, John Hendrikx wrote:
> > TLDR; why does contains(null) not just return false for collections that
> don't allow
> > nulls. Just because the interface allows it, does not mean it should do
> it as it
> > devalues the usefulness of the abstraction provided by the interface.
> >
> > Background:
> >
> > I'm someone that likes to check correctness of any constructor or method
> parameter
> > before allowing an object to be constructed or to be modified, in order
> to maintain
> > invariants that are provided by the class to its users.
> >
> > This ranges from simple null checks, to range checks on numeric values,
> to complete
> > checks like "is a collection sorted" or is a list of nodes acyclic.
> Anything I can
> > check in the constructor that may avoid problems further down the line
> or that may
> > affect what I can guarantee on its own API methods. For example, if I
> check in the
> > constructor that something is not null, then the associated getter will
> guarantee
> > that the returned value is not null. If I check that a List doesn't
> contain nulls,
> > then the associated getter will reflect that as well (assuming it is
> immutable or
> > defensivily copied).
> >
> > For collections, this is currently becoming harder and harder because
> more and more
> > new collections are written to be null hostile. It is fine if a
> collection doesn't
> > accept nulls, but throwing NPE when I ask such a collection if it
> contains null
> > seems to be counter productive, and reduces the usefulness of the
> collection
> > interfaces.
> >
> > This strict interpretation makes the collection interfaces harder to use
> than
> > necessary. Interfaces are only useful when their contract is well
> defined. The more
> > things an interface allows or leaves unspecified, the less useful it is
> for its users.
> >
> > I know that the collection interfaces allow this, but you have to ask
> yourself this
> > important question: how useful is an interface that makes almost no
> guarantees about
> > what its methods will do? Interfaces like `List`, `Map` and `Set` are
> passed as
> > method parameters a lot, and to make these useful, implementations of
> these
> > interfaces should do their best to provide as consistent an experience
> as is
> > reasonably possible. The alternative is that these interfaces will
> slowly decline in
> > usefulness as methods will start asking for `ArrayList` instead of
> `List` to avoid
> > having to deal with a too lenient specification.
> >
> > With the collection interfaces I get the impression that recently there
> has been too
> > much focus on what would be easy for the collection implementation
> instead of what
> > would be easy for the users of said interfaces. In my opinion, the
> concerns of the
> > user of interfaces almost always far outweigh the concerns of the
> implementors of
> > said interfaces.
> >
> > In the past, immutable collections were rare, but they get are getting
> more and more
> > common all the time. For example, in unit testing. Unfortunately, these
> immutable
> > collections differ quite a bit from their mutable counterparts. Some
> parts are only
> > midly annoying (not accepting `null` as the **value** in a `Map` for
> example), but
> > other parts require code to be rewritten for it to be able to work as a
> drop-in
> > replacement for the mutable variant. A simple example is this:
> >
> > public void addShoppingItems(Collection<String> shoppingItems) {
> > if (shoppingItems.contains(null)) { // this looks quite
> reasonable and
> > safe...
> > throw new IllegalArgumentException("shoppingItems should
> not contain
> > nulls");
> > }
> >
> > this.shoppingItems.addAll(shoppingItems);
> > }
> >
> > Testing this code is annoying:
> >
> > x.addShoppingItems(List.of("a", null")); // can't construct
> immutable
> > collection with null
> >
> > x.addShoppingItems(Arrays.asList("a", null")); // fine, go back
> to what we
> > did before then...
> >
> > The above problems, I suppose we can live with it; immutable collections
> don't want
> > `null` although I don't see any reason to not allow it as I can write a
> similar
> > `List.of` that returns immutable collections that do allow `null`. For
> JDK code this
> > is a bit disconcerting, as it is supposed to be as flexible as is
> reasonable without
> > having too much of an opinion about what is good or bad.
> >
> > This next one however:
> >
> > assertNoExceptionThrown(() -> x.addShoppingItems(List.of("a",
> "b"))); // not
> > allowed, contains(null) in application code throws NPE
> >
> > This is much more insidious; the `contains(null)` check has been a very
> practical
> > way to check if collections contain null, and this works for almost all
> collections
> > in common use, so there is no problem. But now, more and more
> collections are
> > starting to just throw NPE immediately even just to **check** if a null
> element is
> > present. This only serves to distinguish themselves loudly from other
> similar
> > collections that will simply return `false`.
> >
> > I think this behavior is detrimental to the java collection interfaces
> in general,
> > especially since there is a clear answer to the question if such a
> collection
> > contains null or not. In fact, why `contains` was ever allowed to throw
> an exception
> > aside from "UnsupportedOperationException" is a mystery to me, and
> throwing one when
> > one could just as easily return the correct and expected answer seems
> very opiniated
> > and almost malicious -- not behavior I'd expect from JDK core libraries.
> >
> > Also note that there is no way to know in advance if `contains(null)` is
> going to be
> > throwing the NPE. If interfaces had a method `allowsNulls()` that would
> already be
> > an improvement.
> >
> > --John
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230201/9a160cf4/attachment-0001.htm>
More information about the core-libs-dev
mailing list