[9] 8154122: Intrinsify fused mac operations on x86
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Aug 31 01:33:05 UTC 2016
Great! Thank you, Vivek.
Vladimir
On 8/30/16 12:07 PM, Deshpande, Vivek R wrote:
> Hi Vladimir
>
> I used a matrix multiplication micro-benchmark with -XX:+UseFMA and -XX:-UseFMA and observed significant speed up of 2500x with FMA instructions.
> Please find the micro-benchmark attached with the mail.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, August 30, 2016 10:14 AM
> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [9] 8154122: Intrinsify fused mac operations on x86
>
> Hi Vivek,
>
> Can you write micro-benchmark to show performance improvement for this intrinsic? It will help to get approval for "FC Extension Request".
>
> Thanks,
> Vladimir
>
> On 8/26/16 12:33 PM, Vladimir Kozlov wrote:
>> I forgot that we need "FC Extension Request" for this RFE :( Have to
>> wait approval.
>>
>> Regards,
>> Vladimir
>>
>> On 8/26/16 12:14 PM, Vladimir Kozlov wrote:
>>> Change subject to match JBS.
>>>
>>> Tests passed. I'm pushing these changes.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/25/16 2:45 PM, Vladimir Kozlov wrote:
>>>> Looks good! I will run tests and will push if they passed.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/25/16 10:51 AM, Deshpande, Vivek R wrote:
>>>>> Hi Vladimir
>>>>>
>>>>> I have updated the hotspot webrev as per your suggestions.
>>>>> Could you please review it.
>>>>> The webrev is at this location:
>>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/hotspot/webrev.0
>>>>> 2/
>>>>>
>>>>> Regards,
>>>>> Vivek
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, August 16, 2016 7:35 PM
>>>>> To: Deshpande, Vivek R; Andrew Haley;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Cc: Viswanathan, Sandhya
>>>>> Subject: Re: x86 Intrinsics for fma in Math Library
>>>>>
>>>>> Hi, Vivek
>>>>>
>>>>> You can't use UseFMA in shared code if you define it only in
>>>>> globals_x86.hpp. It should be in globals.hpp
>>>>>
>>>>> I would suggest to have new MacroAssembler methods fmad() and
>>>>> fmaf() instead of calling vfmadd231 directly.
>>>>>
>>>>> Pass it 4 registers and if dst == op3 don't do move. It will
>>>>> simplify code (in .ad files pass dst 2 times). Use movflt() and
>>>>> movdbl() macro instructions.
>>>>>
>>>>> The arguments order for this methods should be the same as for java
>>>>> fma() method (currently it is confusing since you shuffle them for
>>>>> vfmadd231).
>>>>>
>>>>> I would also guard UseFMA setting in vm_version with UseSSE >= 2
>>>>> (needed for 32-bit VM). There are a lot of code which check it when
>>>>> we work with float values to keep them in XMM registers.
>>>>>
>>>>> In templateInterpreterGenerator_x86_<>.cpp files use movflt for
>>>>> fmaF to load float arguments.
>>>>>
>>>>> In .ad files add comment into format // a * b + c
>>>>>
>>>>> subnode.cpp - move TOP checks before #ifndef. And we don't do
>>>>> indention of #ifdef.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/15/16 1:29 PM, Deshpande, Vivek R wrote:
>>>>>> Hi All
>>>>>>
>>>>>> I have updated the patch with suggested changes.
>>>>>> Please find the webrevs at this location:
>>>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/hotspot/webrev.
>>>>>> 01/
>>>>>> and
>>>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/jdk/webrev.01/
>>>>>>
>>>>>> Regards,
>>>>>> Vivek
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andrew Haley [mailto:aph at redhat.com]
>>>>>> Sent: Wednesday, August 03, 2016 3:03 PM
>>>>>> To: Deshpande, Vivek R; Vladimir Kozlov;
>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>> Subject: Re: x86 Intrinsics for fma in Math Library
>>>>>>
>>>>>> On 03/08/16 22:37, Deshpande, Vivek R wrote:
>>>>>>> I can do that along with rest of the suggested changes in the patch.
>>>>>>> Could you please also give me some more information on using
>>>>>>> #ifdef __STDC_IEC_559__
>>>>>>
>>>>>> Maybe do this:
>>>>>>
>>>>>> //------------------------------Value-----------------------------
>>>>>> ----
>>>>>> --------- const Type* FmaDNode::Value(PhaseGVN* phase) const {
>>>>>> #ifndef __STDC_IEC_559__
>>>>>> return Type::DOUBLE;
>>>>>> #else
>>>>>> const Type *t1 = phase->type(in(1));
>>>>>> if (t1 == Type::TOP) return Type::TOP;
>>>>>> if (t1->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>>> const Type *t2 = phase->type(in(2));
>>>>>> if (t2 == Type::TOP) return Type::TOP;
>>>>>> if (t2->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>>> const Type *t3 = phase->type(in(3));
>>>>>> if (t3 == Type::TOP) return Type::TOP;
>>>>>> if (t3->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>>> double d1 = t1->getd();
>>>>>> double d2 = t2->getd();
>>>>>> double d3 = t3->getd();
>>>>>> return TypeD::make(fma(d1, d2, d3)); #endif }
>>>>>>
>>>>>> Perhaps this is too simple, and you should return TOP if any of
>>>>>> the operands are of type TOP; I'm not sure.
>>>>>>
>>>>>> But the point is that if __STDC_IEC_559__ is defined, then you are
>>>>>> guaranteed that the libm fma() is the same as Java fma().
>>>>>>
>>>>>> Andrew.
>>>>>>
>>>>>>
More information about the hotspot-compiler-dev
mailing list