RFR 8076276 support for AVX512

Thu Apr 23 19:24:37 UTC 2015

Updated webrev:

http://cr.openjdk.java.net/~kvn/8076276/webrev.02

Passed JPRT testing.

Changes:

The assembler layer now handles KNL as well for EVEX, it's a target that 
will be available earlier than Skylake server.   This is done by 
carefully managing cpuid information and applying each machines 
characteristics to their code generation model.  I also added support 
for 32-bit compilation via the machine description which manage many of 
the same things in 64-bit with some additions for instruction size 
calculations, such as a static function which answers the question of 
displacement size for memory offsets.  You will see two versions, one 
which modifies the offset and answer the question of size range, another 
which statically takes all the equivalent object data as its dynamic 
counterpart as input to interpret if the displacement fits the motif. 
One is made to be run statically and one as part of assembler processing 
in its allocated object dynamically.  There is also a dummy region in 
32-bit register description of floating point registers which are used 
to stage regmask alignment for the xmm register bank on that target.  I 
do this so that I can use the same code for both compiler models wrt 
register mask handling of vector components.  Please also note the new 
long java tests in superword.  The afore mentioned zmm save region for 
OS vector testing was ported to run in KNL mode.  The call save regions 
have been extended for both compilation models to handle their 
respective register banks and are working correctly.

Thanks,
Michael

On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
> Michael,
>
> Thank you for detail explanation. I need to clarify by request:
>
> 1. I am fine with kmov amd Kregister definitions and usage in assembler,
> macroassembler and stubs.
>
> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files)
> until we have full support for them in RA and signal processing.
>
> Thanks,
> Vladimir
>
> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>> Vladimir, some explanation of the EVEX encoding model is needed:
>>
>> Some instructions are agnostic to vector length and can take the
>> implicit k0 definition in encoding.  Some instructions must have
>> predication definitions for their mask application to SIMD, which
>> explicitly exclude k0. The range usage of predication mask registers
>> must be k1..k7 as a real definition which code must provide with a
>> mask value.  The EVEX enabled machine environment does not
>> automatically initialize any of the mask assignable registers
>> (k1..k7), so we must emit kmov instructions which gather an immediate
>> value from a gpr register.  You will see code such as this in the
>> review.  This effectively means KRegister must stay in the
>> implementation, but I can accommodate the lion share of what you have
>> indicated.  The places where KRegister is used via the assembler layer
>> are:
>>
>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it
>> needs one too"
>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>
>> This is in place of formal register allocation for now as well as when
>> we do more extravagant things with SIMD masks.  I will keep the webrev
>> around so I can easily add these pieces back in as we are going to
>> need them.
>> Also there are many other mask register instructions in the ISA which
>> we will need to make use of in the future.  If this is amenable I will
>> look into the other changes and resend the webrev accordingly modified.
>>
>> Thanks,
>> Michael
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 08, 2015 1:33 PM
>> To: Berg, Michael C
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8076276 support for AVX512
>>
>> Michael, please, make sure to include mailing lists in replies - it is
>> review process.
>>
>> I understand that K register may be important but I don't see the need
>> to include it in these changes which are huge already. We can do it as
>> separate changes unless you point me where they are critical needed
>> for avx512 instructions.
>> I don't see the use of it in current changes which simple widen
>> vectors to 512 bits.
>>
>> I am concern that K reg implementation is incomplete but it is hard to
>> see and review it in current changes.
>>
>> Regards,
>> Vladimir
>>
>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>> Vladimir, RegK is needed as it frames the kmov instructions which
>>> utilize KRegister and the enumerated k registers, which are critically
>>> needed and used, although not yet matched (we use k1 and k0 now).  I
>>> will look into to the rest of the comments.  The plan is to register
>>> allocate the k registers at some point though.
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>> Vladimir Kozlov
>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>> To: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>>
>>> I would suggest to remove MoveK and RegK from these changes since
>>> they are not used.
>>> We can add them later when you have the use case.
>>>
>>> sharedRuntime_x86_64.* You should have code and not comment:
>>> // TODO: add ZMM save code
>>>
>>> vm_version_x86.cpp Add code to verify that system preserve Z
>>> registers during interrupt. See code after comment :
>>>
>>> // Some OSs have a bug when upper 128bits of YMM
>>>
>>>
>>> I see repeated next pattern in C1 code. It should be moved to a
>>> function in FrameMap:
>>>
>>> +        int num_caller_save_xmm_regs =
>>> +FrameMap::nof_caller_save_xmm_regs;
>>> +#if _LP64
>>> +        if (UseAVX < 3) {
>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>> +        }
>>> +#endif
>>>
>>>
>>> In general we should avoid using #ifdef X86 in shared code:
>>> matcher.cpp. This file will not be issue if you remove RegK from
>>> changes.
>>>
>>> c2compiler.cpp - can you move that code to
>>> Compile::pd_compiler2_init() which is platform specific?
>>>
>>> matcher.cpp - typo 'eno':
>>>
>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>> spills.
>>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>> Hi Folks,
>>>>
>>>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>>>> encoding, new register support, new ISA support,
>>>> etc) for EVEX enabled microarchitectures.
>>>> The contribution is referenced as Bug ID 8076276 as a performance
>>>> enhancement.
>>>>
>>>> Please review this patch and comment as needed:
>>>>
>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>
>>>> webrev:
>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>
>>>> Superword optimizations covered on the vectorization path experience
>>>> as much as 50% reduction in loop trace instruction count which make
>>>> up the path length of EVEX encoded SIMD optimized loops.
>>>>
>>>> Vladimir Koslov has offered to sponsor this patch.
>>>>