Generalizing binary search
David Lloyd
david.lloyd at redhat.com
Thu Apr 25 18:41:16 UTC 2024
The JDK contains a few collection- and array-oriented implementations of
binary search. For the most part, they all work the same way: search the
target "thing" by index using the obvious bisection strategy, returning
either /index/ or /-index-1/ depending on whether it was found at the end
of the search.
However, if you're doing a binary search on a thing that is not a list or
an array, you have two choices: try to make the thing you're searching on
implement List (often infeasible) or write your own binary search.
I'm really tired of writing my own binary search. I've probably done it 50
times by now, each one using a slightly different access and/or compare
strategy, and every time is a new chance to typo something or get something
backwards, leading to irritating debugging sessions and higher
dentist bills.
It got me to thinking that it wouldn't be too hard to make a "general"
binary search which can search on anything, so that's what I did. I was
thinking that it might be interesting to try adding this into the JDK
somehow.
This implementation is more or less what I now copy & paste to different
projects at the moment:
public static <C, T> int binarySearch(C collection, int start, int end,
T key, Comparator<? super T> cmp, IntGetter<C, T> getter) {
int low = start;
int high = end - 1;
while (low <= high) {
int mid = low + high >>> 1;
int res = cmp.compare(getter.get(collection, mid), key);
if (res < 0) {
low = mid + 1;
} else if (res > 0) {
high = mid - 1;
} else {
return mid;
}
}
return -low - 1;
}
// (Plus variations for `Comparable` keys and long indices)
A key problem with this approach is that in the JDK, there is no
`ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary
to implement the "getter" portion of the algorithm (despite the existence
of the analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also
have to copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is
annoying.
A slightly less-good approach is for the general `binarySearch` method to
accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection`
parameter, like this:
public static <T> int binarySearch(int start, int end, T key,
Comparator<? super T> cmp, IntFunction<T> getter) { ... }
In this case, we can use the function types that the JDK already provides,
but we would very likely have to also create a capturing lambda (e.g.
`myList::get` instead of `List::get`). Maybe this isn't that bad of a
compromise.
It would be possible to replace the existing `binarySearch` implementations
with delegating calls to a generalized implementation. For `Collections`,
the indexed version uses `List::get` and the iterator version uses a helper
lambda to move the iterator and get the result. For arrays, a lambda would
be provided which gets the corresponding array element. If the
non-capturing variant is used, then (on paper at least) this version should
perform similarly to the existing implementations, with less code needed
overall. However, if a capturing lambda is required (due to the
aforementioned lack of `ObjXFunction`), then this could be slightly
worse-performing than the existing implementation due to the construction
(and maybe dispatch) overhead of the lambda. Some real-world benchmarks
would have to be written with various-sized data sets.
It would also be possible to produce primitive variations which operate on
int, float, long, and double values, using existing functions if capturing
is deemed "OK". It is also possible to produce a variation which uses a
`long` for the index, for huge data sets (for example, very large mapped
files using `MemorySegment`).
Also unclear is: where would it live? `Collections`? Somewhere else?
Any thoughts/opinions would be appreciated (even if they are along the
lines of "it's not worth the effort"). Particularly, any insight would be
appreciated as to whether or not this kind of hypothetical enhancement
would warrant a JEP (I wouldn't think so, but I'm no expert at such
assessments).
--
- DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240425/f5b89ba3/attachment.htm>
More information about the core-libs-dev
mailing list