RFR 8076276 support for AVX512

Wed Apr 29 21:11:26 UTC 2015

For the records, I reviewed it and I think it is good.

Thanks,
Vladimir

On 4/23/15 12:24 PM, Vladimir Kozlov wrote:
> Updated webrev:
>
> http://cr.openjdk.java.net/~kvn/8076276/webrev.02
>
> Passed JPRT testing.
>
> Changes:
>
> The assembler layer now handles KNL as well for EVEX, it's a target that
> will be available earlier than Skylake server.   This is done by
> carefully managing cpuid information and applying each machines
> characteristics to their code generation model.  I also added support
> for 32-bit compilation via the machine description which manage many of
> the same things in 64-bit with some additions for instruction size
> calculations, such as a static function which answers the question of
> displacement size for memory offsets.  You will see two versions, one
> which modifies the offset and answer the question of size range, another
> which statically takes all the equivalent object data as its dynamic
> counterpart as input to interpret if the displacement fits the motif.
> One is made to be run statically and one as part of assembler processing
> in its allocated object dynamically.  There is also a dummy region in
> 32-bit register description of floating point registers which are used
> to stage regmask alignment for the xmm register bank on that target.  I
> do this so that I can use the same code for both compiler models wrt
> register mask handling of vector components.  Please also note the new
> long java tests in superword.  The afore mentioned zmm save region for
> OS vector testing was ported to run in KNL mode.  The call save regions
> have been extended for both compilation models to handle their
> respective register banks and are working correctly.
>
> Thanks,
> Michael
>
> On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
>> Michael,
>>
>> Thank you for detail explanation. I need to clarify by request:
>>
>> 1. I am fine with kmov amd Kregister definitions and usage in assembler,
>> macroassembler and stubs.
>>
>> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files)
>> until we have full support for them in RA and signal processing.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>>> Vladimir, some explanation of the EVEX encoding model is needed:
>>>
>>> Some instructions are agnostic to vector length and can take the
>>> implicit k0 definition in encoding.  Some instructions must have
>>> predication definitions for their mask application to SIMD, which
>>> explicitly exclude k0. The range usage of predication mask registers
>>> must be k1..k7 as a real definition which code must provide with a
>>> mask value.  The EVEX enabled machine environment does not
>>> automatically initialize any of the mask assignable registers
>>> (k1..k7), so we must emit kmov instructions which gather an immediate
>>> value from a gpr register.  You will see code such as this in the
>>> review.  This effectively means KRegister must stay in the
>>> implementation, but I can accommodate the lion share of what you have
>>> indicated.  The places where KRegister is used via the assembler layer
>>> are:
>>>
>>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it
>>> needs one too"
>>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>>
>>> This is in place of formal register allocation for now as well as when
>>> we do more extravagant things with SIMD masks.  I will keep the webrev
>>> around so I can easily add these pieces back in as we are going to
>>> need them.
>>> Also there are many other mask register instructions in the ISA which
>>> we will need to make use of in the future.  If this is amenable I will
>>> look into the other changes and resend the webrev accordingly modified.
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 08, 2015 1:33 PM
>>> To: Berg, Michael C
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>>
>>> Michael, please, make sure to include mailing lists in replies - it is
>>> review process.
>>>
>>> I understand that K register may be important but I don't see the need
>>> to include it in these changes which are huge already. We can do it as
>>> separate changes unless you point me where they are critical needed
>>> for avx512 instructions.
>>> I don't see the use of it in current changes which simple widen
>>> vectors to 512 bits.
>>>
>>> I am concern that K reg implementation is incomplete but it is hard to
>>> see and review it in current changes.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>>> Vladimir, RegK is needed as it frames the kmov instructions which
>>>> utilize KRegister and the enumerated k registers, which are critically
>>>> needed and used, although not yet matched (we use k1 and k0 now).  I
>>>> will look into to the rest of the comments.  The plan is to register
>>>> allocate the k registers at some point though.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>>> Vladimir Kozlov
>>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>>> To: hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR 8076276 support for AVX512
>>>>
>>>> I would suggest to remove MoveK and RegK from these changes since
>>>> they are not used.
>>>> We can add them later when you have the use case.
>>>>
>>>> sharedRuntime_x86_64.* You should have code and not comment:
>>>> // TODO: add ZMM save code
>>>>
>>>> vm_version_x86.cpp Add code to verify that system preserve Z
>>>> registers during interrupt. See code after comment :
>>>>
>>>> // Some OSs have a bug when upper 128bits of YMM
>>>>
>>>>
>>>> I see repeated next pattern in C1 code. It should be moved to a
>>>> function in FrameMap:
>>>>
>>>> +        int num_caller_save_xmm_regs =
>>>> +FrameMap::nof_caller_save_xmm_regs;
>>>> +#if _LP64
>>>> +        if (UseAVX < 3) {
>>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>>> +        }
>>>> +#endif
>>>>
>>>>
>>>> In general we should avoid using #ifdef X86 in shared code:
>>>> matcher.cpp. This file will not be issue if you remove RegK from
>>>> changes.
>>>>
>>>> c2compiler.cpp - can you move that code to
>>>> Compile::pd_compiler2_init() which is platform specific?
>>>>
>>>> matcher.cpp - typo 'eno':
>>>>
>>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>>> spills.
>>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>>
>>>>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>>>>> encoding, new register support, new ISA support,
>>>>> etc) for EVEX enabled microarchitectures.
>>>>> The contribution is referenced as Bug ID 8076276 as a performance
>>>>> enhancement.
>>>>>
>>>>> Please review this patch and comment as needed:
>>>>>
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>>
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>>
>>>>> Superword optimizations covered on the vectorization path experience
>>>>> as much as 50% reduction in loop trace instruction count which make
>>>>> up the path length of EVEX encoded SIMD optimized loops.
>>>>>
>>>>> Vladimir Koslov has offered to sponsor this patch.
>>>>>