Long multiplication and BigInteger.mulAdd on x86_32

Mon Feb 1 15:34:12 PST 2010

I think it's a little more readable this way:

http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.01/

Hiroshi

On Mon, Feb 1, 2010 at 2:01 PM, Tom Rodriguez <Thomas.Rodriguez at sun.com> wrote:
>
> On Feb 1, 2010, at 10:18 AM, Hiroshi Yamauchi wrote:
>
>> Hi Christian,
>>
>>> I think that's a good change.  I have two comments: personally I prefer to
>>> use assembler instructions directly in the ins_encode than writing
>>> very-hard-to-read enc_class methods and the predicates are kind of ugly, but
>>> I don't know if that could be done any better.
>
> The predicates are ugly but we don't currently have a better mechanism.  The cleaner way this could be expressed would be:
>
> predicate(phase->type(n->in(1))->higher_equals(TypeLong::UINT));
>
> but the Matcher doesn't capture the type table from the last phase of GVN.  The reason we need that is that bottom_type for the AndL doesn't return the narrow type that results from the use of a constant.  It returns the worst case type TypeLong::LONG.  Only the type table captures the improved type that we computed though iter GVN.  If we captured the type table we'd also need to deal with having both old and new nodes in the graph simultaneously which wouldn't be a huge deal but it does point at a potential problem with the predicate approach.  Depending on the graph shape n->in(1) and n->in(2) might already be machine nodes and their Opcode won't be the ideal opcode we're looking for.  I think in most interesting cases the predicate will work correctly though.  It's only if there are multiple users of in(1) or in(2) that it might fail to trigger.  Anyway, I think the predicates are fine for this purpose.
>
>>
>> I'll see if I can do it better that way.
>>
>>> Maybe, given that we probably support more 32-bit architectures in the
>>> future, we could model such instructions in ideal (e.g. in a pass that's
>>> only used on 32-bit).
>>
>> I agree that it'd be easier and probably more effective to apply the
>> same optimization at the ideal graph level.
>
> I agree that it would be nice if these optimizations kind of fell out more automatically but I'm leery of adding yet more one off ideal nodes to describe machine dependent operations.  Maybe there is a reasonable set that could be used for decomposing 64-bit operations into a clean set of 32 bit operations, though I think that might have ramifications for the register allocator since we represent longs as pairs.  We're definitely emitting less than optimal sequences for 64 bit operations in 32 mode.
>
> tom
>
>>
>> Thanks,
>> Hiroshi
>
>