[9] 8154122: Intrinsify fused mac operations on x86
Deshpande, Vivek R
vivek.r.deshpande at intel.com
Tue Aug 30 19:07:57 UTC 2016
Hi Vladimir
I used a matrix multiplication micro-benchmark with -XX:+UseFMA and -XX:-UseFMA and observed significant speed up of 2500x with FMA instructions.
Please find the micro-benchmark attached with the mail.
Regards,
Vivek
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Tuesday, August 30, 2016 10:14 AM
To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net
Subject: Re: [9] 8154122: Intrinsify fused mac operations on x86
Hi Vivek,
Can you write micro-benchmark to show performance improvement for this intrinsic? It will help to get approval for "FC Extension Request".
Thanks,
Vladimir
On 8/26/16 12:33 PM, Vladimir Kozlov wrote:
> I forgot that we need "FC Extension Request" for this RFE :( Have to
> wait approval.
>
> Regards,
> Vladimir
>
> On 8/26/16 12:14 PM, Vladimir Kozlov wrote:
>> Change subject to match JBS.
>>
>> Tests passed. I'm pushing these changes.
>>
>> Thanks,
>> Vladimir
>>
>> On 8/25/16 2:45 PM, Vladimir Kozlov wrote:
>>> Looks good! I will run tests and will push if they passed.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/25/16 10:51 AM, Deshpande, Vivek R wrote:
>>>> Hi Vladimir
>>>>
>>>> I have updated the hotspot webrev as per your suggestions.
>>>> Could you please review it.
>>>> The webrev is at this location:
>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/hotspot/webrev.0
>>>> 2/
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, August 16, 2016 7:35 PM
>>>> To: Deshpande, Vivek R; Andrew Haley;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Cc: Viswanathan, Sandhya
>>>> Subject: Re: x86 Intrinsics for fma in Math Library
>>>>
>>>> Hi, Vivek
>>>>
>>>> You can't use UseFMA in shared code if you define it only in
>>>> globals_x86.hpp. It should be in globals.hpp
>>>>
>>>> I would suggest to have new MacroAssembler methods fmad() and
>>>> fmaf() instead of calling vfmadd231 directly.
>>>>
>>>> Pass it 4 registers and if dst == op3 don't do move. It will
>>>> simplify code (in .ad files pass dst 2 times). Use movflt() and
>>>> movdbl() macro instructions.
>>>>
>>>> The arguments order for this methods should be the same as for java
>>>> fma() method (currently it is confusing since you shuffle them for
>>>> vfmadd231).
>>>>
>>>> I would also guard UseFMA setting in vm_version with UseSSE >= 2
>>>> (needed for 32-bit VM). There are a lot of code which check it when
>>>> we work with float values to keep them in XMM registers.
>>>>
>>>> In templateInterpreterGenerator_x86_<>.cpp files use movflt for
>>>> fmaF to load float arguments.
>>>>
>>>> In .ad files add comment into format // a * b + c
>>>>
>>>> subnode.cpp - move TOP checks before #ifndef. And we don't do
>>>> indention of #ifdef.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/15/16 1:29 PM, Deshpande, Vivek R wrote:
>>>>> Hi All
>>>>>
>>>>> I have updated the patch with suggested changes.
>>>>> Please find the webrevs at this location:
>>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/hotspot/webrev.
>>>>> 01/
>>>>> and
>>>>> http://cr.openjdk.java.net/~vdeshpande/FMA/8154122/jdk/webrev.01/
>>>>>
>>>>> Regards,
>>>>> Vivek
>>>>>
>>>>> -----Original Message-----
>>>>> From: Andrew Haley [mailto:aph at redhat.com]
>>>>> Sent: Wednesday, August 03, 2016 3:03 PM
>>>>> To: Deshpande, Vivek R; Vladimir Kozlov;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: x86 Intrinsics for fma in Math Library
>>>>>
>>>>> On 03/08/16 22:37, Deshpande, Vivek R wrote:
>>>>>> I can do that along with rest of the suggested changes in the patch.
>>>>>> Could you please also give me some more information on using
>>>>>> #ifdef __STDC_IEC_559__
>>>>>
>>>>> Maybe do this:
>>>>>
>>>>> //------------------------------Value-----------------------------
>>>>> ----
>>>>> --------- const Type* FmaDNode::Value(PhaseGVN* phase) const {
>>>>> #ifndef __STDC_IEC_559__
>>>>> return Type::DOUBLE;
>>>>> #else
>>>>> const Type *t1 = phase->type(in(1));
>>>>> if (t1 == Type::TOP) return Type::TOP;
>>>>> if (t1->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>> const Type *t2 = phase->type(in(2));
>>>>> if (t2 == Type::TOP) return Type::TOP;
>>>>> if (t2->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>> const Type *t3 = phase->type(in(3));
>>>>> if (t3 == Type::TOP) return Type::TOP;
>>>>> if (t3->base() != Type::DoubleCon) return Type::DOUBLE;
>>>>> double d1 = t1->getd();
>>>>> double d2 = t2->getd();
>>>>> double d3 = t3->getd();
>>>>> return TypeD::make(fma(d1, d2, d3)); #endif }
>>>>>
>>>>> Perhaps this is too simple, and you should return TOP if any of
>>>>> the operands are of type TOP; I'm not sure.
>>>>>
>>>>> But the point is that if __STDC_IEC_559__ is defined, then you are
>>>>> guaranteed that the libm fma() is the same as Java fma().
>>>>>
>>>>> Andrew.
>>>>>
>>>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MatMult.java
Type: application/octet-stream
Size: 3505 bytes
Desc: MatMult.java
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160830/1914cfe3/MatMult.java>
More information about the hotspot-compiler-dev
mailing list