Collection::getAny discussion

Donald Raab donraab at gmail.com
Sat May 1 00:07:25 UTC 2021


To clarify, RichIterable is not a subclass of Collection. As we discovered in JDK 15, a problem exists when we add default methods to interfaces that might get “mixed” with other interfaces that already have those methods. There are a few potential issues with adding zero argument default methods to common interfaces. The two easiest to reason about that we have experience with are:

1. Signatures don’t match (e.g. getAny() returns Optional) - very bad - lots of pain caused - forces backwards incompatible change to library APIs - this happened when default sort() was added to List and returned void
2. Signatures match, but no concrete implementations when mixing competing default implementations - work for library developers - forces clients to upgrade to new version of library to use new version of JDK (e.g. EC 10.4 upgrade to work with JDK 15 for CharSequence.isEmpty())

https://stuartmarks.wordpress.com/2020/09/22/incompatibilities-with-jdk-15-charsequence-isempty/ 

I think adding getAny() to Collection makes sense, that’s why we have defined the method on RichIterable. For Eclipse Collections, option #2 is the only option that works (getAny() returns E). It will cause us OSS library maintainers a bunch of work and a release upgrade. Knowing the direction and testing with early access versions (as we do) prior to our next release will help so we can start coding out any necessary concrete implementations wherever they don’t exist in the hierarchy.

The default getAny() implementation on RichIterable is different than just calling iterator.next() though. The getAny() method is currently defined as calling getFirst() which will return null in the case an iterable is empty. 

So this creates a new kind of potential issue we haven’t experienced where we could wind up with two getAny() methods with the same signature but with different specifications and results on empty. This will be potentially confusing for Eclipse Collections users as they could get different behavior from any JDK collections for getAny(), or libraries they use could get different behavior for Eclipse Collections types that extend JDK types. Do we change the Eclipse Collections specification to match the new JDK specification? Will this break anyone that currently uses it by causing new unexpected Runtime exceptions? I honestly don’t know.
 
For additional reference, there is also a getOnly() method on RichIterable which behaves differently than getAny(). 

https://www.eclipse.org/collections/javadoc/10.4.0/org/eclipse/collections/api/RichIterable.html#getOnly()

There are also getFirst() and getLast() methods on RichIterable which were deprecated in 6.0 but will never be removed. They were added to OrderedIterable where they make more sense.

 

> On Apr 30, 2021, at 4:14 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> While I agree that we should be careful, let's not paint ourselves into an either/or strawman.  The choice is not "never add anything to Collection" vs "let's dump every silly idea that comes to anyone's mind into Collection"; it is, as always, going to involve tradeoffs between stability and evolution.  
> 
> We cannot constrain ourselves so hard that we cannot evolve the core libraries because it might collide with someone else's subclass.  That's not reasonable, nor is that good for Java.
> 
> On the other hand, we must continue to exercise care in many dimensions when adding to libraries that are widely used and implemented -- which we already do (so much so, in fact, that people are often frustrated by our seeming conservatism.)  
> 
> 
> 
> 
> 
> 
> 
> On 4/30/2021 4:02 PM, Donald Raab wrote:
>> There is a default method getAny defined on the RichIterable interface in Eclipse Collections. Adding a getAny with the same signature to Collection is bound to cause a break similar to CharSequence.isEmpty did with JDK 15 but possibly more extensive since RichIterable is the parent interface for all collection types in Eclipse Collections. Adding it with a different signature (returns Optional) could cause extensive damage.
>> 
>> https://www.eclipse.org/collections/javadoc/10.4.0/org/eclipse/collections/api/RichIterable.html#getAny() <https://www.eclipse.org/collections/javadoc/10.4.0/org/eclipse/collections/api/RichIterable.html#getAny()> 
>> 
>> I highly recommend we stop looking to add new zero-argument default methods to 20+ year Collection interfaces and hope that we don’t break anything. There seems to be desire to breathe life into the old Collection interfaces. IMO, we should just start planning and focusing on a Collections 2.0 design.
>> 
>> 
>>> On Apr 30, 2021, at 2:49 PM, Stuart Marks <stuart.marks at oracle.com> <mailto:stuart.marks at oracle.com> wrote:
>>> 
>>> Hi Henri,
>>> 
>>> I've changed the subject of this thread because I think it's out of scope of the ReversibleCollection proposal. I don't mean to say that we can't talk about it, but I would like it to be decoupled from ReversibleCollection. I'm somewhat arbitrarily calling it "Collection::getAny" because something similar to that was mentioned by both Remi and Peter elsewhere in this thread. There is also a bunch of history in the bug database that contains related ideas.
>>> 
>>> Before we dive in, I want to explain why this is separate from ReversibleCollection. Most of the ideas, including yours, involve an implementation that does something like `iterator().next()`, in other words, getting the "first" element from an Iterator. Hey, there's getFirst() in ReversibleCollection, let's use that! No. The "first" element of an iterator is in general an arbitrary element, which is different from the "first" element in the structural ordering of elements provided by a ReversibleCollection. The "arbitrary" notion is captured by "getAny" so that's what I'll use as a working term. (Of course any actual API we might add would have a better name if we can find one.)
>>> 
>>> For a historical perspective, let's dig into the bug database and take a look at this bug:
>>> 
>>> https://bugs.openjdk.java.net/browse/JDK-4939317 <https://bugs.openjdk.java.net/browse/JDK-4939317>
>>> 
>>> This requests a method Collection.get(Object). This searches the collection for an element that equals() the argument and returns the element, or it returns null if the element isn't found. (Recall in those days it was impossible to add a method to an interface without breaking compatibility, so it also proposes various workarounds that are no longer necessary.)
>>> 
>>> There's a comment from Josh Bloch saying that Collection should have had a get() method as well as a no-arg remove() method. Well ok then! And he points to the then-new Queue interface that was delivered in Java SE 5. Queue adds the following methods that seem relevant to this discussion:
>>> 
>>> * E element() -- gets the head element, throws NSEE if empty
>>> * E remove() -- removes and returns the head element, throws NSEE if empty
>>> 
>>> (It also adds peek() and poll(), which are similar to the above except they return null if empty.)
>>> 
>>> This is kind of odd, in that none of these methods satisfy what the bug's submitter was requesting, which is a one-arg get() method. Also, these methods are on Queue, which doesn't really help with collections in general.
>>> 
>>> You're asking for something that's somewhat different, which you called the "find the first element when there is only one" problem. Here, there's a precondition that the collection have a single element. (It's not clear to me what should happen if the collection has zero or more than one element.)
>>> 
>>> To throw a couple more variations into the mix, Guava has a couple Collectors (for streams) that do interesting things. The class is MoreCollectors:
>>> 
>>> https://guava.dev/releases/30.1.1-jre/api/docs/com/google/common/collect/MoreCollectors.html <https://guava.dev/releases/30.1.1-jre/api/docs/com/google/common/collect/MoreCollectors.html>
>>> 
>>> and the collectors are:
>>> 
>>> * onlyElement -- if source has 1 element, returns it; throws NSEE if empty, IAE if > 1
>>> * toOptional -- if source has 0 or 1 elements, returns an Optional; otherwise throws
>>> 
>>> These apply to streams, but I think you can see the applicability to Collection as well. In particular, your proposal is similar to what onlyElement would look like if it were on Collection.
>>> 
>>> Let's summarize the variations so far:
>>> 
>>> * preconditions: exactly one element, at-most-one, at-least-one
>>> * behavior if preconditions not met: return null, return empty Optional, throw
>>>   exception
>>> * remove element or just peek
>>> * match a particular element, or return an arbitrary element
>>> 
>>> That's a lot of variations!
>>> 
>>> Before we talk about specific APIs, though, I wanted to talk about use cases. Which of these variations are more useful or less useful? Which are likely to appear in code? Henri gave a fairly specific example with a reasonable "success" case (preconditions met) but it's not clear what should happen if the preconditions aren't met. Clearly an API would have to choose. What would the use site look like? In particular, does the use site always have to check for null, or catch an exception, or something?
>>> 
>>> Answers to these questions will help determine what APIs, if any, we might want to add.
>>> 
>>> ***
>>> 
>>> There's another thing looming in the distance, which is pattern matching. You might have seen one of Brian Goetz's talks on pattern matching futures. You can get a glimpse of some upcoming pattern matching features in JEP 405.
>>> 
>>>     http://openjdk.java.net/jeps/405 <http://openjdk.java.net/jeps/405>
>>> 
>>> In particular, this JEP extends pattern matching with an /extraction/ step, where, if there is a match, record or array components can be extracted and bound to local variables. This is a step closer to /deconstruction patterns/, where arbitrary classes and interfaces (not just records or arrays) can participate in pattern matching. (See discussion of this at the end of the JEP.)
>>> 
>>> Deconstruction patterns apply directly to this discussion. For example, Henri's example could be rewritten like this, using pattern matching:
>>> 
>>>     Set<String> s = ... ;
>>>     if (s instanceof Set.containing(String string)) {
>>>         // use string, which is the only element
>>>     } else {
>>>         // handle failure
>>>     }
>>> 
>>> This "containing" thing would be a deconstruction pattern declared on Set or Collection (or possibly elsewhere), and as I've written here, it implicitly matches sets that contain exactly one element. Taking a nod from JEP 405's varargs-style pattern, extracting an arbitrary element from a set that contains one or more elements might look like this:
>>> 
>>>     Set<String> s = ... ;
>>>     if (s instanceof Set.containing(String string, ...)) { // varargs style
>>>         // use string, which is an arbitrary element
>>>     } else {
>>>         // handle failure
>>>     }
>>> 
>>> This deconstruction pattern addresses a bunch of the design decisions all at once. It handles one or at-least-one element. The preconditions-not-met case is handled by matching failure. Matching failure also sidesteps the issue of whether we return null, return an Optional, or throw an exception. Since pattern matching is inherently conditional, you can't forget to check for null or catch an exception (and you won't have to deal with that darned Optional API). The only thing this doesn't handle is removing an element. I don't think we want pattern matching to have side effects, and element removal is probably pretty rare. But we can still discuss whether it's a valuable case to support.
>>> 
>>> In summary, I think it's important and useful to have a conversation about use cases, what problems we are trying to achieve, and what we think the code at the call site should look like. That can be used to drive API discussions, such as adding new methods. It can /also/ be used to drive discussion about deconstruction patterns that would be useful to add to collections. Furthermore, since the pattern matching language feature is being designed right now, this discussion can help by providing information of the form "libraries need to be able to do this kind of pattern matching".
>>> 
>>> Thanks.
>>> 
>>> s'marks
>>> 
>>> 
>>> 
>>> On 4/28/21 6:23 AM, Henri Tremblay wrote:
>>>> Hi,
>>>> 
>>>> (I think the first version of this message never went through, so here it goes just in case)
>>>> 
>>>> I just read quickly the proposal and it made me think about another common issue. I wonder if we could tackle it as well.
>>>> I will call it the "find the first element when there is only one" problem.
>>>> It happens on unordered collections like a HashSet. It forces to do
>>>> 
>>>> Set<String> s = new HashSet<>();
>>>> if (s.size() == 1) {
>>>>    return s.iterator().next();
>>>>    // or
>>>>    return s.stream().findFirst().get();
>>>> }
>>>> 
>>>> Which is a lot of ugliness and object instantiations just to get the first element.
>>>> I would be nice to have a public T findFirst() directly on Iterable<T>. With a default implementation returning iterator().next(). Things like ArrayList will want to override will return elementData[0]. It would return null when the list is empty. Or NoSuchElementException.
>>>> It needs to be polished of course but will that be acceptable?
>>>> 
>>>> Thanks,
>>>> Henri
> 



More information about the core-libs-dev mailing list