Math.clamp method?

Wed Jan 25 23:47:30 UTC 2023

Dealing with numeric ranges is error-prone, so I welcome these.

Another operation on ranges is membership.  This is easy to derive
from the clamp methods, I think:

```
   checkRange(x,min,max) := clamp(x,min,max) == x
```

That idiom works even for NaNs, which are never in any range.
I wonder about having a separate method for this. The idiom does
fail for empty ranges, so it’s almost worth making a method that
returns false with empty ranges instead of throwing.  Probably
not, though.

But I think there is no harm and real benefit from widening
the value parameters, so value is always either long or double.
That will save casts (which also cause bugs) for some use
cases, such as doing 32x32-to-64-bit arithmetic followed
by overflow checks and/or saturation.  That’s a common use
case for clamping.

```
public static int clamp(long value, int min, int max)
public static long clamp(long value, long min, long max)
public static double clamp(double value, double min, double max)
public static float clamp(double value, float min, float max)
```

I wonder if there is a use case for these methods (either clamp
or the range-membership method), with 64-bit overflow detection
for complex expressions.  The idea is that you compute the
expression in two domains, double and long.  The double domain
correctly tracks magnitude (but not precision) while long
tracks 64 bits of precision.  At the end, if the double value
is in the range representable by longs, you take the long
value, with full confidence.

```
   long x = …, y = a*x*x + b*x + c;
   double xf = x, yf = a*xf*xf + b*xf + c;
   double xc = clamp(xf, Long.MIN_VALUE, Long.MAX_VALUE);
   boolean saturated = false;
   if (xc != xf) {   // error processing
      saturated = true;  // or throw an exception
      x = xc;  // saturate towards the required magnitude
   }
```

That use case doesn’t require a membership operation per se,
just clamp.

I thought about cross-domain clamps, which preserve 64 bit
precision and double range at the same time in different places:

```
public static long clamp(long value, double min, double max)
public static long clamp(double value, long min, long max)
```

Those are perhaps overly subtle.  I think it would be OK for
nearly all use cases to cast the min/max values to the appropriate
type (same as value) and then call one of the clamp functions
originally proposed.  Note that (long)(double)Long.MAX_VALUE
is the same as Long.MAX_VALUE, and the same is true for MIN_VALUE.
So these critical parameters would be processed as the same
limits by both the regular and cross-typed overloading.

Now for another topic.

There is another common range operation which is very often the
source of bugs, and I’d like us to consider it for this group.

That is the midpoint operation, naively coded as (min+max)/2,
and more subtly as min+(max-min)/2 and even more subtly what
our friend Mr. Stackoverflow recommends, (min+max)>>>1.
All three of these idioms hide bugs, for some values of
max or min.

The fully correct idiom (for 2s complement numbers) is:

```
  midpoint(min, max) := min + ((max - min) >>> 1)
```

The subexpression max-min never overflows, when it
is regarded as an unsigned number, and the “>>>1”
immediately converts that unsigned number into the
correctly signed positive number.  That’s the only
way to avoid overflow for all inputs.  The third
idiom above (the “smartest” of the three) will fail,
for example, if you feed it negative values.
For some uses that is OK, but for others it isn’t.

```
public static int midpoint(int min, int max)
public static long midpoint(long min, long max)
public static double midpoint(double min, double max)
public static float midpoint(float min, float max)
```

All kinds of divide and conquer algorithms need a robust
midpoint operation.  And many of those are coded wrongly,
because affine ranges are hard to reason about.

The int and long midpoints are more useful than the floating
point ones, because ints and longs are used as indexes into
arrays (including Panama native arrays), where subdivision
and binary search is common (and very often buggy).

More generally, this midpoint operation performs an affine
weighted sum within the given numeric range, so one might
think that a floating point “alpha” parameter would be
interesting.  The midpoint operations hardwire alpha=0.5,
and they use clever fast fixpoint division to divide by 2.

There are applications for using alpha other than 0.5,
such as a 10x subdivision of a range using alpha=0.1, 0.2,
and so on.  I don’t recommend an “affine midpoint” method,
because I think the clamp methods combined with floating
point math can get the right answers.  For example, given
a 64-bit range, you can use double arithmetic to compute
affine inner points and then clamp them, like this:

```
final double ALPHA = 0.1;  //float would be OK too
final long min = …, max = …;
final double incr = ((double)max - min) * ALPHA;
for (int i = 1; i < 10; i++)
  addPoint(clamp((long)(min + i*incr), min, max));
```

Doing the grid measurements in double (instead of long) is good
enough for most purposes, and doesn’t cause anomalies even if
the difference between min and max is large.  Using a long
for the difference would lead to bugs when max-min cannot be
represented as a long.

Anyway, the above examples tell me that we don’t need a
general affine range splitter if we can use floating point
to compute the splits and use clamp to prevent bugs at the
range boundaries.  But we do need the optimized midpoint
operations, because people always insist on them taking
a couple of cycles, and then they insist on coding them
dangerously.

Besides clamp and midpoint is there another operation on
affine ranges that is commonly coded, and with bugs?
I don’t think so.  Range algebra such as union, intersection,
difference, convex hull: those come to mind, but I don’t
think they are as bug-prone (and as frequent) as clamp
and midpoint.  Range translation (moving a range to another
base point) is subject to overflow, but I think that can
be handled by using higher precision and/or overflow checks,
and/or clamping.

— John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230125/6ded18a3/attachment-0001.htm>