Long multiplication and BigInteger.mulAdd on x86_32

Fri Jan 29 01:52:59 PST 2010

On 01/29/10 01:38 AM, Hiroshi Yamauchi wrote:
> Hi Tom, Christian, and others,
>
> Here's a patch I'd like to contribute:
> http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.00/
>
> With it, C2 generates shorter long multiplication sequences on x86_32
> when the high 32 bits are known to be zero.
>
> Particularly, this applies to the loop in BigInteger.mulAdd():
>
>      private final static long LONG_MASK = 0xffffffffL;
>
>      static int mulAdd(int[] out, int[] in, int offset, int len, int k) {
>          long kLong = k&  LONG_MASK;
>          long carry = 0;
>
>          offset = out.length-offset - 1;
>          for (int j=len-1; j>= 0; j--) {
>              long product = (in[j]&  LONG_MASK) * kLong +
>                             (out[offset]&  LONG_MASK) + carry;
>              out[offset--] = (int)product;
>              carry = product>>>  32;
>          }
>          return (int)carry;
>      }
>
> In my measurements, one of our internal microbenchmarks that uses
> BigInteger.mulAdd sped up about 12%. Also, SPECjvm2008's crypto.rsa
> and crypto.signverify improved about 7% and 2.3%, respectively.

I think that's a good change.  I have two comments: personally I prefer 
to use assembler instructions directly in the ins_encode than writing 
very-hard-to-read enc_class methods and the predicates are kind of ugly, 
but I don't know if that could be done any better.

-- Christian