Addition of Predicate-based findIndex and findLastIndex methods to java.util.List

Holo The Sage Wolf holo3146 at gmail.com
Sat Apr 20 11:09:59 UTC 2024


Hello,

I want to challenge the idea that every stream may be indexed for 3 reasons:

Any sensible type to use for an index (int/long) will be bounded, but
streams need not, e.g.
    new Random().ints().withIndex().forEach(v -> ...);
May result (after a very long time) non sensical values. Using bigInteger
will solve this problem but would be silly as in practice 99.99999% of the
times long is more than enough (and 90% int is probably enough)

The two other problems have to do with the semantics of "index":
Index gives the impression that the "index" is a real position, but the
stream may be truly non deterministic (using e.g. Hardware Random Stream)
Index gives the impression that the "index" respects the canonical order of
the underlying collection (if exists), that is:
var x = myList.stream().parallel()
                .withIndex().findAny().get();
assert x.value() == myList.get(x.index()); // implementation dependent

Which I don't think is possible without enormous performance problems.

---

To be clear about my position: I like and want the idea. The first problem,
in my opinion, shouldn't block it, but we should document it. That being
said, I don't think we should be calling this "index".

On Sat, 20 Apr 2024, 11:54 -, <liangchenblue at gmail.com> wrote:

> Hi Rotan',
> Interface methods are both exposed to callers and to implementations. Is
> there any scenario where the implementation can be more efficient than the
> stream/gatherer scanning approach?
>
> indexOf or lastIndexOf may be faster if a list has some tricks like a
> hashmap to quickly look up the index of an item. In contrast, the predicate
> looks too generic and barely optimizable by list implementations; even if
> we can stream in parallel, this is already covered by the stream API, so we
> can simply declare a Gatherer to perform this findIndex operation on a
> stream of elements (also note that a list can be reversed with
> List::reversed() since JDK 21, so the same Gatherer can be reused for
> findLastIndex)
>
> I think if there is no scenario where implementation can do better with
> findIndex, the findIndex should become a Gatherer, which has better
> compatibility.
>
> Best,
> Chen Liang
>
> On Fri, Apr 19, 2024 at 8:51 PM ІП-24 Олександр Ротань <
> rotan.olexandr at gmail.com> wrote:
>
>> Turned out this whole time I was only replying to David. I have to thank
>> him for his patience explaining me how this whole mailing thing work.
>>
>> To summarize what has been missed out, besides what have you seen in
>> quotes. I have pointed out that, while flexibility and simplicity is
>> important, the main reason why I have decided to propose this feature is to
>> enhance developer experience.
>>
>> I know that conciseness is not a main goal of Java, and therefore my
>> proposal is kind of contradicts the general direction of development of
>> Java as language, but my desire to make Java more user friendly is based on
>> personal experience of introducing fresh devs or switchers to java and
>> their feedback. Therefore, while it is not a priority for community, I
>> really hope you guys don't mind me working at a backrank of Java making it
>> a bit prettier.
>>
>> On Sat, Apr 20, 2024, 02:56 David Alayachew <davidalayachew at gmail.com>
>> wrote:
>>
>>> > That was what i referred to as "Map-solvable" problems,
>>> > but if this way of doing it is a thing, then I agree with
>>> > the point.
>>>
>>> Doing it the Map way would not only be slower, but it would force you to
>>> include more concepts than you need. Your identifier is your index.
>>> Lists/arrays have index. Bringing a map in only complicates things.
>>>
>>> As for the rest of your response, I think I understand the trouble here.
>>> You put it best when you said that you are introducing this feature with
>>> the intent of improving the developer experience, specifically through
>>> making code more concise.
>>>
>>> I can understand that goal, and it is a good goal to strive for, but
>>> conciseness is one of the last things Java aims to provide as a language or
>>> through its standard library. I won't tell you not to do it, but I will
>>> tell you that there will be significant push back from folks here, and for
>>> good reason -- conciseness is not the priority here at Java. At Java, the
>>> lack of conciseness is not a weakness. Sure, unneeded verbosity can be a
>>> weakness, but the solution I suggested above is basically the tempo that
>>> Java strives for -- compose simple concepts so that you can create what you
>>> need.
>>>
>>> All of that is to say, you have made your point clear. You goal is
>>> conciseness and ease of access. Understanding that, it's clear that my
>>> solution is not concise enough to meet that goal, so I won't push it
>>> further.
>>>
>>> On Fri, Apr 19, 2024 at 7:26 PM ІП-24 Олександр Ротань <
>>> rotan.olexandr at gmail.com> wrote:
>>>
>>>> " 1. Fetching data from same-sized lists -- This is a trick from
>>>> Data-Oriented Design (not to be confused with Data-Oriented Programming)
>>>> where you filter down to the indexes that you want, and use the indexes to
>>>> fetch related values from the respective lists. So, I would filter down to
>>>> Player1, Player4, and Player6, then I would use their indexes to do lookups
>>>> in PlayerItemList, PlayerSkinList, etc. This tactic is especially useful
>>>> when working with structs (or value classes when Java gets them)."
>>>>
>>>> That was what i referred to as "Map-solvable" problems, but if this way
>>>> of doing it is a thing, then I agree with the point.
>>>>
>>>> "By this logic, many of the methods in the Stream library should
>>>> instead be in the Collection library. There are several emails on the
>>>> lambda mailing list that address why this wasn't done. I imagine that Rémi
>>>> himself could point them out to you."
>>>>
>>>> It may be just my opinion, but I don't see anything harmful in some
>>>> shortcuts that could be optimized, especially if they still converge to a
>>>> single point under the hood. I would love to read mails about maintainers'
>>>> opinions on why some features of Streams couldnt be also a part of
>>>> Collection library if it would enhance developer experience, so, if it
>>>> wouldn't be hard for you, could you attach some links? Also I guess a good
>>>> compromise would be extension methods, I am currently exploring
>>>> possibilities of their implementation.
>>>>
>>>> "Well hold on. I am not at all saying that it is redundant. I am saying
>>>> that the functionality that you desire can be easily recreated by using a
>>>> more far-reaching feature, the .withIndex() idea that Rémi suggested. I am
>>>> just trying to say that your idea probably should not be done on the List
>>>> interface."
>>>>
>>>> I am not saying that .withIndex() is a bad idea in any way. I can
>>>> remember several times when I was desperately looking for something like
>>>> python enumerate(), but all I found were some IntStream-based approaches.
>>>> What I am saying is, while there is definitely room for both, withIndex
>>>> does not solve the initial problem that motivated me to propose such
>>>> changes: redundant (emphasis on this) verbosity. The main reason I decided
>>>> to become an openjdk contributor is because I love to create tools that
>>>> make development easier and more pleasant, and I feel more sophisticated,
>>>> although a more specific API would deal much better with this task, and as
>>>> this is my personal priority, I would stand my ground here.
>>>>
>>>> "Regardless, Valhalla is not here yet. So at least for now, I will
>>>> concede your point about performance."
>>>>
>>>> Valhalla, at least for now, is nowhere near to a release even as a
>>>> preview, would be great if it will get to Java releases until the next LTS.
>>>> I think that even, even for a 2-4 years, this point is valid, then it at
>>>> least should be taken into consideration.
>>>>
>>>> PS: As you might notice, I am a developer experience guy rather than a
>>>> flexible guy. My opinion is a result of some personal beliefs, but also is
>>>> an aggregation of opinions of numbers of other programmers that I
>>>> introduced to Java, especially switchers. Many of then feel those
>>>> "wordiness" of Java and it repulses potential developers, especially when
>>>> there is C# just by the corner. I think that Java has really strong sides,
>>>> but I would love to cover its "weaknesses" as well, especially if it
>>>> doesn't come with any maintenance burdens.
>>>>
>>>> сб, 20 апр. 2024 г. в 02:00, David Alayachew <davidalayachew at gmail.com
>>>> >:
>>>>
>>>>> No confusion. You spoke clearly enough to be understood the first time.
>>>>>
>>>>> On Fri, Apr 19, 2024 at 6:28 PM ІП-24 Олександр Ротань <
>>>>> rotan.olexandr at gmail.com> wrote:
>>>>>
>>>>>> Regarding the first point in my last message. "Them" is referring to
>>>>>> the developer, and "evaluate" means to be processed by the developer. Sorry
>>>>>> for the potential confusion
>>>>>>
>>>>>> сб, 20 апр. 2024 г. в 00:52, ІП-24 Олександр Ротань <
>>>>>> rotan.olexandr at gmail.com>:
>>>>>>
>>>>>>> I think that, while finding the index of a matching element is not a
>>>>>>> very common task, especially among commercial developers, when discussing
>>>>>>> indexed streams, this would be the most common use case. To be honest, I
>>>>>>> don't even see any other use cases right now, besides maybe some processing
>>>>>>> based on divisibility of index (odd/even as the most common one). I have a
>>>>>>> pretty strong background in competitive programming, more than 5 years, and
>>>>>>> can remember anything more complex than that that could not be replaced
>>>>>>> with Map usage.
>>>>>>>
>>>>>>> Regarding clarity, I fully agree with you. I am not a fan of those
>>>>>>> unreadable abbreviations in C/C++ or even in python. However, while it's
>>>>>>> hard to argue that your approach is more flexible, it has two downsides:
>>>>>>>
>>>>>>> 1. Although indexed streams are flexible, the flexibility might come
>>>>>>> at the cost of readability. Syntax like list.FindIndex is pretty
>>>>>>> straightforward and speaks for itself, while indexed-stream-based approach
>>>>>>> requires developer to explicitly evaluate what that chain of method does,
>>>>>>> which would distract them from understanding of general program logic flow,
>>>>>>> while usually being no more then utility. The reason for existence of
>>>>>>> findIndex is the same as for indexOf, so saying one is redundant means
>>>>>>> saying other one is redundant too.
>>>>>>>
>>>>>>> 2. Streams are performance-costy, thats the point that is hard to
>>>>>>> argue with. While list implementations could provide specified
>>>>>>> implementations of findIndex based on their internal structure, streams
>>>>>>> work only with generic interfaces. Moreover, even default implementation
>>>>>>> that uses ListIterator would be times faster and less resource consuming
>>>>>>> then generic stream-based approach. While I understand that performance is
>>>>>>> not as valuable as flexibility and clarity, it is still crucial, at least
>>>>>>> because of it being the main factor server maintenance cost after all.
>>>>>>>
>>>>>>> сб, 20 апр. 2024 г. в 00:32, David Alayachew <
>>>>>>> davidalayachew at gmail.com>:
>>>>>>>
>>>>>>>> It is not the goal of Java to be verbose, but it is also not the
>>>>>>>> goal of Java to be concise either. It is the goal of Java to be clear,
>>>>>>>> correct, and readable, in order to maximize maintainability, correctness,
>>>>>>>> and flexibility. When you focus on accomplishing that goal first, you will
>>>>>>>> find that conciseness takes care of itself.
>>>>>>>>
>>>>>>>> I understand that my approach is wordier than yours. Please also
>>>>>>>> consider that my suggestion does exactly one task -- stream an element with
>>>>>>>> its respective index. And yet, the number of ways that that can be used
>>>>>>>> covers hundreds of use cases while still being incredibly easy to maintain
>>>>>>>> and optimize. Far more than what you suggest. My feature easily composes
>>>>>>>> with others, with anything that can take a Stream<T> in fact.
>>>>>>>>
>>>>>>>> The bar to adding a feature to Java, even to its libraries, is
>>>>>>>> high. In fact, there is even a breaking point where a language/library can
>>>>>>>> just be too dense to work with. Too many methods, too much complexity.
>>>>>>>>
>>>>>>>> Java makes it a priority to limit adding features or functionality
>>>>>>>> wherever possible, only relenting when the benefit significantly outweighs
>>>>>>>> the cost. Adding a method like yours that only serves a tiny slice of use
>>>>>>>> cases just does not meet the goal of what Java is aiming for -- to get the
>>>>>>>> biggest bang for buck when adding a feature to language/library.
>>>>>>>>
>>>>>>>> I think that your pain point is significant and worth addressing.
>>>>>>>> But I think it should be done in a way that services a much larger group of
>>>>>>>> users and use cases. I think my idea does that.
>>>>>>>>
>>>>>>>> On Fri, Apr 19, 2024 at 5:14 PM ІП-24 Олександр Ротань <
>>>>>>>> rotan.olexandr at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> streamWithIndex() is indeed an interesting approach, however,
>>>>>>>>> working with pairs of values is not always comfortable as this may make
>>>>>>>>> code too wordy, which may be fixed using things like properties and tuples
>>>>>>>>> (which I am currently working on right now btw), but still code like
>>>>>>>>> list.streamWithIndex().filter(Predicate).findFirst().orElse(-1) is still
>>>>>>>>> long, thats why i think there is room for both in Java.
>>>>>>>>>
>>>>>>>>> Also maybe instead of returning traditional -1 it would be more
>>>>>>>>> suitable to use OptionalInt instead, but I would like to hear community
>>>>>>>>> opinion about that.
>>>>>>>>>
>>>>>>>>> сб, 20 апр. 2024 г. в 00:02, David Alayachew <
>>>>>>>>> davidalayachew at gmail.com>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No Rémi, I don't think your idea is the right approach. You are
>>>>>>>>>> working on the wrong level of abstraction.
>>>>>>>>>>
>>>>>>>>>> Many users ask requests like this all the time, and what you are
>>>>>>>>>> suggesting would be even more error-prone than the equivalent for loop or the
>>>>>>>>>> IntStream suggestion that the user above requested. Not to
>>>>>>>>>> mention that getting it to parallelize would be a task many users are
>>>>>>>>>> likely to mess up -- either in correctness or in performance.
>>>>>>>>>>
>>>>>>>>>> I think you would get far more mileage from adding 2 methods on
>>>>>>>>>> the list interface streamWithIndex() and parallelStreamWithIndex() that
>>>>>>>>>> would return a Stream<WithIndex<T>>, as opposed to just Stream<T>.
>>>>>>>>>>
>>>>>>>>>> That way, users are not writing a custom Gatherer each time they
>>>>>>>>>> want to work with the index. They just have the index be a field in the
>>>>>>>>>> object. They can work with it the same way they would any other object
>>>>>>>>>> field.
>>>>>>>>>>
>>>>>>>>>> Furthermore, doing it this way makes the correct answer obvious.
>>>>>>>>>> If I need to do something with an index, stream with the index.
>>>>>>>>>>
>>>>>>>>>> On top of that, it significantly enhances readability by making
>>>>>>>>>> it clear to the reader that, whatever this stream is doing will require use
>>>>>>>>>> of the index, so watch out for that.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 19, 2024, 1:47 PM Remi Forax <forax at univ-mlv.fr>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>> for me, it seems what you want is  Collector on Stream which is
>>>>>>>>>>> able to short-circuit,
>>>>>>>>>>> so you can write
>>>>>>>>>>>   list.stream().collect(Collectors.findFirst(s ->
>>>>>>>>>>> s.contains("o")))
>>>>>>>>>>> and in reverse
>>>>>>>>>>>   list.reversed().stream().collect(Collectors.findFirst(s ->
>>>>>>>>>>> s.contains("o")))
>>>>>>>>>>>
>>>>>>>>>>> Using a Stream here is more general and will work with other
>>>>>>>>>>> collections like a LinkedHashSet for example.
>>>>>>>>>>> Sadly, there is no way to define a short-circuiting collector :(
>>>>>>>>>>>
>>>>>>>>>>> You can have a short-circuiting Gatherer like this
>>>>>>>>>>>   <T> Gatherer<T, ?, Integer> findIndex(Predicate<? super T>
>>>>>>>>>>> predicate) {
>>>>>>>>>>>     return Gatherer.ofSequential(
>>>>>>>>>>>       () -> new Object() { int index; },
>>>>>>>>>>>       Integrtor.ofGreedy((state, element, downstream) -> {
>>>>>>>>>>>         var index = state.index++;
>>>>>>>>>>>         if (predicate.test(element)) {
>>>>>>>>>>>           return downstream.push(index);
>>>>>>>>>>>         }
>>>>>>>>>>>         return true;
>>>>>>>>>>>       }));
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> and use it like this:
>>>>>>>>>>>   list.stream().gather(findIndex(s ->
>>>>>>>>>>> s.contains("o"))).findFirst().orElse(-1);
>>>>>>>>>>>
>>>>>>>>>>> But it's more verbose.
>>>>>>>>>>>
>>>>>>>>>>> I wonder if at the same time that the Gatherer API is
>>>>>>>>>>> introduced, the Collector API should be enhanced to support
>>>>>>>>>>> short-circuiting collectors ?
>>>>>>>>>>>
>>>>>>>>>>> regards,
>>>>>>>>>>> Rémi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>> *From: *"ІП-24 Олександр Ротань" <rotan.olexandr at gmail.com>
>>>>>>>>>>> *To: *"core-libs-dev" <core-libs-dev at openjdk.org>
>>>>>>>>>>> *Sent: *Friday, April 19, 2024 5:59:39 PM
>>>>>>>>>>> *Subject: *Addition of Predicate-based findIndex and
>>>>>>>>>>> findLastIndex methods to java.util.List
>>>>>>>>>>>
>>>>>>>>>>> Subject
>>>>>>>>>>> Addition of Predicate-based findIndex and findLastIndex methods
>>>>>>>>>>> to java.util.List
>>>>>>>>>>>
>>>>>>>>>>> Motivation
>>>>>>>>>>> The motivation behind this proposal is to enhance the
>>>>>>>>>>> functionality of the List interface by providing a more flexible way to
>>>>>>>>>>> find the index of an element. Currently, the indexOf and lastIndexOf
>>>>>>>>>>> methods only accept an object as a parameter. This limits the flexibility
>>>>>>>>>>> of these methods as they can only find the index of exact object matches.
>>>>>>>>>>>
>>>>>>>>>>> Here I want to propose methods that would accept a Predicate as
>>>>>>>>>>> a parameter, allowing users to define a condition that the desired element
>>>>>>>>>>> must meet. This would provide a more flexible and powerful way to find the
>>>>>>>>>>> index of an element in a list.
>>>>>>>>>>>
>>>>>>>>>>> The changes I am proposing are implemented in this PR:
>>>>>>>>>>> https://github.com/openjdk/jdk/pull/18639. Here is a brief
>>>>>>>>>>> overview of the changes made in this pull request:
>>>>>>>>>>>
>>>>>>>>>>> Added the findIndex  (Predicate<? super E> filter) method to the
>>>>>>>>>>> List interface.
>>>>>>>>>>> Added the findLastIndex  (Predicate<? super E> filter) method to
>>>>>>>>>>> the List interface.
>>>>>>>>>>> Implemented these methods in all non-abstract classes that
>>>>>>>>>>> implement the List interface, as well as List itself (default impl).
>>>>>>>>>>> The changes have been thoroughly tested to ensure they work as
>>>>>>>>>>> expected and do not introduce any regressions. The test cases cover a
>>>>>>>>>>> variety of scenarios to ensure the robustness of the implementation.
>>>>>>>>>>>
>>>>>>>>>>> For example, consider the following test case:
>>>>>>>>>>>
>>>>>>>>>>> List<String> list = new ArrayList<>();
>>>>>>>>>>> list.add("Object one");
>>>>>>>>>>> list.add("NotObject two");
>>>>>>>>>>> list.add("NotObject three");
>>>>>>>>>>>
>>>>>>>>>>> int index1 = list.findIndex(s -> s.contains("ct t"));
>>>>>>>>>>> System.out.println(index1); // Expected output: 1
>>>>>>>>>>> int index2 = list. findLastIndex(s -> s.startsWith("NotObject"));
>>>>>>>>>>> System.out.println(index2); // Expected output: 2
>>>>>>>>>>> Currently, to achieve the same result, we would have to use a
>>>>>>>>>>> more verbose approach:
>>>>>>>>>>>
>>>>>>>>>>> int index1 = IntStream.range(0, list.size())
>>>>>>>>>>>                      .filter(i -> list.get(i).contains("ct t"))
>>>>>>>>>>>                      .findFirst()
>>>>>>>>>>>                      .orElse(-1);
>>>>>>>>>>> System.out.println(index1); // Output: 1
>>>>>>>>>>> int index2 = IntStream.range(0, list.size())
>>>>>>>>>>>                          .filter(i ->
>>>>>>>>>>> list.get(i).startsWith("NotObject"))
>>>>>>>>>>>                          .reduce((first, second) -> second)
>>>>>>>>>>>                          .orElse(-1);
>>>>>>>>>>> System.out.println(index2); // Output: 2
>>>>>>>>>>> Or other approaches that require additional instructions and,
>>>>>>>>>>> therefore, can`t be used in all scopes (like passing argument to method).
>>>>>>>>>>>
>>>>>>>>>>> I believe these additions would greatly enhance the
>>>>>>>>>>> functionality and flexibility of the List interface, making it more
>>>>>>>>>>> powerful and user-friendly. I look forward to your feedback and am open to
>>>>>>>>>>> making any necessary changes based on your suggestions.
>>>>>>>>>>>
>>>>>>>>>>> The main reason I am publishing this proposal in the mailing
>>>>>>>>>>> system is to gather feedback from the Java developers community, especially
>>>>>>>>>>> about possible caveats related to backward compatibility of your projects.
>>>>>>>>>>> Would appreciate every opinion!
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240420/37d7d8ca/attachment-0001.htm>


More information about the core-libs-dev mailing list