Long multiplication and BigInteger.mulAdd on x86_32

Thu Jan 28 16:38:37 PST 2010

Hi Tom, Christian, and others,

Here's a patch I'd like to contribute:
http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.00/

With it, C2 generates shorter long multiplication sequences on x86_32
when the high 32 bits are known to be zero.

Particularly, this applies to the loop in BigInteger.mulAdd():

    private final static long LONG_MASK = 0xffffffffL;

    static int mulAdd(int[] out, int[] in, int offset, int len, int k) {
        long kLong = k & LONG_MASK;
        long carry = 0;

        offset = out.length-offset - 1;
        for (int j=len-1; j >= 0; j--) {
            long product = (in[j] & LONG_MASK) * kLong +
                           (out[offset] & LONG_MASK) + carry;
            out[offset--] = (int)product;
            carry = product >>> 32;
        }
        return (int)carry;
    }

In my measurements, one of our internal microbenchmarks that uses
BigInteger.mulAdd sped up about 12%. Also, SPECjvm2008's crypto.rsa
and crypto.signverify improved about 7% and 2.3%, respectively.

It's been reviewed by Chuck. I thank Chuck for uploading the webrev on
his account.

Thanks,
Hiroshi