Generalizing binary search

Louis Wasserman lowasser at google.com
Wed May 15 22:27:43 UTC 2024


If you forget to implement RandomAccess, you'll get a very subtle
performance bug.

Just because it's easy if you know how to do it doesn't make it easy for a
random developer to do correctly.

On Wed, May 15, 2024 at 3:21 PM Mateusz Romanowski <
romanowski.mateusz at gmail.com> wrote:

> Hi,
> I would say it is not worth any effort.
>
> One can easily write an ad-hoc *local* adapter extending
> `java.util.AbstractList`..
> .. and immediately invoke existing `java.util.Collections::binarySearch`
> method.
>
> Cheers,
> MR
>
> On Thu, Apr 25, 2024 at 9:09 PM Pavel Rappo <pavel.rappo at oracle.com>
> wrote:
>
>>
>>
>> > On 25 Apr 2024, at 19:41, David Lloyd <david.lloyd at redhat.com> wrote:
>> >
>> > The JDK contains a few collection- and array-oriented implementations
>> of binary search. For the most part, they all work the same way: search the
>> target "thing" by index using the obvious bisection strategy, returning
>> either /index/ or /-index-1/ depending on whether it was found at the end
>> of the search.
>> >
>> > However, if you're doing a binary search on a thing that is not a list
>> or an array, you have two choices: try to make the thing you're searching
>> on implement List (often infeasible) or write your own binary search.
>> >
>> > I'm really tired of writing my own binary search. I've probably done it
>> 50 times by now, each one using a slightly different access and/or compare
>> strategy, and every time is a new chance to typo something or get something
>> backwards, leading to irritating debugging sessions and higher dentist
>> bills.
>>
>> Can we safely say that it sets your teeth on edge?
>>
>> > It got me to thinking that it wouldn't be too hard to make a "general"
>> binary search which can search on anything, so that's what I did. I was
>> thinking that it might be interesting to try adding this into the JDK
>> somehow.
>> >
>> > This implementation is more or less what I now copy & paste to
>> different projects at the moment:
>> >
>> >     public static <C, T> int binarySearch(C collection, int start, int
>> end, T key, Comparator<? super T> cmp, IntGetter<C, T> getter) {
>> >         int low = start;
>> >         int high = end - 1;
>> >         while (low <= high) {
>> >             int mid = low + high >>> 1;
>> >             int res = cmp.compare(getter.get(collection, mid), key);
>> >             if (res < 0) {
>> >                 low = mid + 1;
>> >             } else if (res > 0) {
>> >                 high = mid - 1;
>> >             } else {
>> >                 return mid;
>> >             }
>> >         }
>> >         return -low - 1;
>> >     }
>> >     // (Plus variations for `Comparable` keys and long indices)
>> >
>> > A key problem with this approach is that in the JDK, there is no
>> `ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary
>> to implement the "getter" portion of the algorithm (despite the existence
>> of the analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also
>> have to copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is
>> annoying.
>> >
>> > A slightly less-good approach is for the general `binarySearch` method
>> to accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection`
>> parameter, like this:
>> >
>> >     public static <T> int binarySearch(int start, int end, T key,
>> Comparator<? super T> cmp, IntFunction<T> getter) { ... }
>> >
>> > In this case, we can use the function types that the JDK already
>> provides, but we would very likely have to also create a capturing lambda
>> (e.g. `myList::get` instead of `List::get`). Maybe this isn't that bad of a
>> compromise.
>> >
>> > It would be possible to replace the existing `binarySearch`
>> implementations with delegating calls to a generalized implementation. For
>> `Collections`, the indexed version uses `List::get` and the iterator
>> version uses a helper lambda to move the iterator and get the result. For
>> arrays, a lambda would be provided which gets the corresponding array
>> element. If the non-capturing variant is used, then (on paper at least)
>> this version should perform similarly to the existing implementations, with
>> less code needed overall. However, if a capturing lambda is required (due
>> to the aforementioned lack of `ObjXFunction`), then this could be slightly
>> worse-performing than the existing implementation due to the construction
>> (and maybe dispatch) overhead of the lambda. Some real-world benchmarks
>> would have to be written with various-sized data sets.
>> >
>> > It would also be possible to produce primitive variations which operate
>> on int, float, long, and double values, using existing functions if
>> capturing is deemed "OK". It is also possible to produce a variation which
>> uses a `long` for the index, for huge data sets (for example, very large
>> mapped files using `MemorySegment`).
>> >
>> > Also unclear is: where would it live? `Collections`? Somewhere else?
>> >
>> > Any thoughts/opinions would be appreciated (even if they are along the
>> lines of "it's not worth the effort"). Particularly, any insight would be
>> appreciated as to whether or not this kind of hypothetical enhancement
>> would warrant a JEP (I wouldn't think so, but I'm no expert at such
>> assessments).
>> >
>> > --
>> > - DML • he/him
>>
>> Have a look at this recently filed issue:
>> https://bugs.openjdk.org/browse/JDK-8326330
>>
>> -Pavel
>>
>>
>>

-- 
Louis Wasserman (he/they)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240515/cedf0731/attachment.htm>


More information about the core-libs-dev mailing list