Long multiplication and BigInteger.mulAdd on x86_32

Wed Feb 3 15:32:40 PST 2010

Tom,

Yes, it looks good to me. Thanks for applying the changes. I reran the
test and verified the speedup in a benchmark with that.

I'm not sure about the copyright, but if that follows a previous test
file that Chuck created before, then it's probably fine.

Thanks.

Hiroshi

On Wed, Feb 3, 2010 at 1:39 PM, Tom Rodriguez <Thomas.Rodriguez at sun.com> wrote:
> Is http://cr.openjdk.java.net/~never/6921969 ok?  I made the changes I proposed and also adjusted the copyright on the test to include the GPL language and removed your email address from it.  I believe that's the correct format but someone correct me if it isn't.  If it's ok then I will push it.
>
> tom
>
> On Feb 3, 2010, at 12:07 PM, Tom Rodriguez wrote:
>
>> The cast is unneeded and I don't like the use of a bool for input selection.  Would you mind if it were like this:
>>
>> +source_hpp %{
>> +// Must be visible to the DFA in dfa_x86_32.cpp
>> +extern bool is_operand_hi32_zero(Node* n);
>> +%}
>> +
>>
>> +// Returns true if the high 32 bits of the value is known to be zero.
>> +bool is_operand_hi32_zero(Node* n) {
>> +  int opc = n->Opcode();
>> +  if (opc == Op_LoadUI2L) {
>> +    return true;
>> +  }
>> +  if (opc == Op_AndL) {
>> +    Node* o2 = n->in(2);
>> +    if (o2->is_Con() && (o2->get_long() & 0xFFFFFFFF00000000LL) == 0LL) {
>> +      return true;
>> +    }
>> +  }
>> +  return false;
>> +}
>> +
>>
>> +  predicate(is_operand_hi32_zero(n->in(1)));
>>
>> +  predicate(is_operand_hi32_zero(n->in(2)));
>>
>> +  predicate(is_operand_hi32_zero(n->in(1)) && is_operand_hi32_zero(n->in(2)));
>>
>> tom
>>
>> On Feb 3, 2010, at 11:52 AM, Hiroshi Yamauchi wrote:
>>
>>> OK. I moved is_operand_hi32_zero to the ad file.
>>>
>>> http://cr.openjdk.java.net/~rasbold/6921969/webrev.02/
>>>
>>> On Tue, Feb 2, 2010 at 1:15 PM, Hiroshi Yamauchi <yamauchi at google.com> wrote:
>>>> I will work on this.
>>>>
>>>> Hiroshi
>>>>
>>>> On Mon, Feb 1, 2010 at 4:18 PM, Tom Rodriguez <Thomas.Rodriguez at sun.com> wrote:
>>>>> The assembler changes look good but I'd rather have the predicate stuff in the ad file instead of over in MulLNode.  The normal way make ad helper functions is to put it into the ad file itself in the source block.  There's a section of the ad file which begins with "source %{" and it's included verbatim in the main generated ad file.  So just add is_operand_hi32_zero as a static function in the source block.
>>>>>
>>>>> tom
>>>>>
>>>>> On Feb 1, 2010, at 3:34 PM, Hiroshi Yamauchi wrote:
>>>>>
>>>>>> I think it's a little more readable this way:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.01/
>>>>>>
>>>>>> Hiroshi
>>>>>>
>>>>>> On Mon, Feb 1, 2010 at 2:01 PM, Tom Rodriguez <Thomas.Rodriguez at sun.com> wrote:
>>>>>>>
>>>>>>> On Feb 1, 2010, at 10:18 AM, Hiroshi Yamauchi wrote:
>>>>>>>
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>>> I think that's a good change.  I have two comments: personally I prefer to
>>>>>>>>> use assembler instructions directly in the ins_encode than writing
>>>>>>>>> very-hard-to-read enc_class methods and the predicates are kind of ugly, but
>>>>>>>>> I don't know if that could be done any better.
>>>>>>>
>>>>>>> The predicates are ugly but we don't currently have a better mechanism.  The cleaner way this could be expressed would be:
>>>>>>>
>>>>>>> predicate(phase->type(n->in(1))->higher_equals(TypeLong::UINT));
>>>>>>>
>>>>>>> but the Matcher doesn't capture the type table from the last phase of GVN.  The reason we need that is that bottom_type for the AndL doesn't return the narrow type that results from the use of a constant.  It returns the worst case type TypeLong::LONG.  Only the type table captures the improved type that we computed though iter GVN.  If we captured the type table we'd also need to deal with having both old and new nodes in the graph simultaneously which wouldn't be a huge deal but it does point at a potential problem with the predicate approach.  Depending on the graph shape n->in(1) and n->in(2) might already be machine nodes and their Opcode won't be the ideal opcode we're looking for.  I think in most interesting cases the predicate will work correctly though.  It's only if there are multiple users of in(1) or in(2) that it might fail to trigger.  Anyway, I think the predicates are fine for this purpose.
>>>>>>>
>>>>>>>>
>>>>>>>> I'll see if I can do it better that way.
>>>>>>>>
>>>>>>>>> Maybe, given that we probably support more 32-bit architectures in the
>>>>>>>>> future, we could model such instructions in ideal (e.g. in a pass that's
>>>>>>>>> only used on 32-bit).
>>>>>>>>
>>>>>>>> I agree that it'd be easier and probably more effective to apply the
>>>>>>>> same optimization at the ideal graph level.
>>>>>>>
>>>>>>> I agree that it would be nice if these optimizations kind of fell out more automatically but I'm leery of adding yet more one off ideal nodes to describe machine dependent operations.  Maybe there is a reasonable set that could be used for decomposing 64-bit operations into a clean set of 32 bit operations, though I think that might have ramifications for the register allocator since we represent longs as pairs.  We're definitely emitting less than optimal sequences for 64 bit operations in 32 mode.
>>>>>>>
>>>>>>> tom
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Hiroshi
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>
>
>