Optimizing saturated casts
Florian Weimer
fw at deneb.enyo.de
Sat Dec 27 13:07:57 PST 2008
* Clemens Eisserer:
> I currently use an utility-class heavily for the XRender Java2D
> backend, which performs saturated casts:
>
> 1.) return (short) (x > Short.MAX_VALUE ? Short.MAX_VALUE : (x <
> Short.MIN_VALUE ? Short.MIN_VALUE : x));
> 2.) return (short) (x > 65535 ? 65535 : (x < 0) ? 0 : x);
>
> I spent quite some time benchmarking/tuning the
> protocol-generation-methods, and a lot of cycles are spent in those
> saturated casts, even if the utility methods are static.
> E.g. XRenderFillRectangle takes 40 cycles without clamping, but
> already 70 cycles with on my core2duo with hotspot-server/jdk 14.0.
> Hotspot seems to solve the problem always with conditional jumps,
> although well predictable ones.
Have you tried Math.min/Math.max? They should avoid those jumps using
cmov. Using SSE2 is probably only efficient if the arguments are
already in SSE registers (which means some form of vectorization, I
guess).
> Modern processors seem to have support for this kind of operation, in
> x86 there's packssdw in MMX/SSE2.
> I think something like a saturated cast could be quite useful, there
> are already cast-methods in Long/Integer/Short - what do you think
> about adding saturated casts to that API?
> Those could be instrified to use MMX/SSE2 if available.
I would also like to see (taken from Hacker's Delight):
static boolean carry(int a, int b) {
return (a ^ b ^ (a + b)) < 0;
}
static boolean borrow(int a, int b) {
return (a ^ b ^ (a - b)) < 0;
}
static boolean overflow(int a, int b) {
return ((~(a ^ b)) & ((x + y) ^ x)) < 0;
}
static boolean underflow(int a, int b) {
return ((x ^ y) & ((x - y) ^ x)) < 0;
}
All properly intrinsified (in both int and long variants), so that
int c = a + b;
if (Integer.carry(a, b))
throw new OverflowException();
performs the addition just once on x86 and uses jc. This would enable
reasonably cost-effective integer overflow checking for languages
running on top of the JVM.
Browsing through Hacker's Delight, unsigned int multiplication and
access to the upper half of the multiplication result might be useful,
too.
More information about the hotspot-compiler-dev
mailing list