Long multiplication and BigInteger.mulAdd on x86_32

Fri Jan 29 02:03:12 PST 2010

On 01/29/10 10:52 AM, Christian Thalinger wrote:
> On 01/29/10 01:38 AM, Hiroshi Yamauchi wrote:
>> Hi Tom, Christian, and others,
>>
>> Here's a patch I'd like to contribute:
>> http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.00/
>>
>> With it, C2 generates shorter long multiplication sequences on x86_32
>> when the high 32 bits are known to be zero.
>>
>> Particularly, this applies to the loop in BigInteger.mulAdd():
>>
>> private final static long LONG_MASK = 0xffffffffL;
>>
>> static int mulAdd(int[] out, int[] in, int offset, int len, int k) {
>> long kLong = k& LONG_MASK;
>> long carry = 0;
>>
>> offset = out.length-offset - 1;
>> for (int j=len-1; j>= 0; j--) {
>> long product = (in[j]& LONG_MASK) * kLong +
>> (out[offset]& LONG_MASK) + carry;
>> out[offset--] = (int)product;
>> carry = product>>> 32;
>> }
>> return (int)carry;
>> }
>>
>> In my measurements, one of our internal microbenchmarks that uses
>> BigInteger.mulAdd sped up about 12%. Also, SPECjvm2008's crypto.rsa
>> and crypto.signverify improved about 7% and 2.3%, respectively.
>
> I think that's a good change. I have two comments: personally I prefer
> to use assembler instructions directly in the ins_encode than writing
> very-hard-to-read enc_class methods and the predicates are kind of ugly,
> but I don't know if that could be done any better.

Maybe, given that we probably support more 32-bit architectures in the 
future, we could model such instructions in ideal (e.g. in a pass that's 
only used on 32-bit).

-- Christian