Generalizing binary search
Pavel Rappo
pavel.rappo at oracle.com
Thu Apr 25 19:09:04 UTC 2024
> On 25 Apr 2024, at 19:41, David Lloyd <david.lloyd at redhat.com> wrote:
>
> The JDK contains a few collection- and array-oriented implementations of binary search. For the most part, they all work the same way: search the target "thing" by index using the obvious bisection strategy, returning either /index/ or /-index-1/ depending on whether it was found at the end of the search.
>
> However, if you're doing a binary search on a thing that is not a list or an array, you have two choices: try to make the thing you're searching on implement List (often infeasible) or write your own binary search.
>
> I'm really tired of writing my own binary search. I've probably done it 50 times by now, each one using a slightly different access and/or compare strategy, and every time is a new chance to typo something or get something backwards, leading to irritating debugging sessions and higher dentist bills.
Can we safely say that it sets your teeth on edge?
> It got me to thinking that it wouldn't be too hard to make a "general" binary search which can search on anything, so that's what I did. I was thinking that it might be interesting to try adding this into the JDK somehow.
>
> This implementation is more or less what I now copy & paste to different projects at the moment:
>
> public static <C, T> int binarySearch(C collection, int start, int end, T key, Comparator<? super T> cmp, IntGetter<C, T> getter) {
> int low = start;
> int high = end - 1;
> while (low <= high) {
> int mid = low + high >>> 1;
> int res = cmp.compare(getter.get(collection, mid), key);
> if (res < 0) {
> low = mid + 1;
> } else if (res > 0) {
> high = mid - 1;
> } else {
> return mid;
> }
> }
> return -low - 1;
> }
> // (Plus variations for `Comparable` keys and long indices)
>
> A key problem with this approach is that in the JDK, there is no `ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary to implement the "getter" portion of the algorithm (despite the existence of the analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also have to copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is annoying.
>
> A slightly less-good approach is for the general `binarySearch` method to accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection` parameter, like this:
>
> public static <T> int binarySearch(int start, int end, T key, Comparator<? super T> cmp, IntFunction<T> getter) { ... }
>
> In this case, we can use the function types that the JDK already provides, but we would very likely have to also create a capturing lambda (e.g. `myList::get` instead of `List::get`). Maybe this isn't that bad of a compromise.
>
> It would be possible to replace the existing `binarySearch` implementations with delegating calls to a generalized implementation. For `Collections`, the indexed version uses `List::get` and the iterator version uses a helper lambda to move the iterator and get the result. For arrays, a lambda would be provided which gets the corresponding array element. If the non-capturing variant is used, then (on paper at least) this version should perform similarly to the existing implementations, with less code needed overall. However, if a capturing lambda is required (due to the aforementioned lack of `ObjXFunction`), then this could be slightly worse-performing than the existing implementation due to the construction (and maybe dispatch) overhead of the lambda. Some real-world benchmarks would have to be written with various-sized data sets.
>
> It would also be possible to produce primitive variations which operate on int, float, long, and double values, using existing functions if capturing is deemed "OK". It is also possible to produce a variation which uses a `long` for the index, for huge data sets (for example, very large mapped files using `MemorySegment`).
>
> Also unclear is: where would it live? `Collections`? Somewhere else?
>
> Any thoughts/opinions would be appreciated (even if they are along the lines of "it's not worth the effort"). Particularly, any insight would be appreciated as to whether or not this kind of hypothetical enhancement would warrant a JEP (I wouldn't think so, but I'm no expert at such assessments).
>
> --
> - DML • he/him
Have a look at this recently filed issue: https://bugs.openjdk.org/browse/JDK-8326330
-Pavel
More information about the core-libs-dev
mailing list