RFR: 8208171: PPC64: Enrich SLP support

Gustavo Romero gromero at linux.vnet.ibm.com
Wed Sep 5 18:29:25 UTC 2018


Hi Martin,

On 09/05/2018 02:45 PM, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thank you for your detailed explanation. I wonder what happens with the registers when VSX gets disabled, but the regs are read again many context switches later. But I guess this is solved somehow.

No problem!

Yes, kernel solves that too: once a VSX instruction is used again and VSX is
disabled it generates an exception and the exception handler calls
load_up_vsx():

https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/vector.S#L119

load_up_vsx() calls, by its turn, load_up_fpu() and load_up_vec() to load FP and
VEC registers from the thread struct associated to the task that wants to use
the VSX facility again. That thread struct contains the correct FP/VEC/VSX
registers saved many context switches before when the facilities where disabled.

The best description on what happens in this case (valid for VEC and VSX as
well) can be found in load_up_fpu() description:

https://github.com/torvalds/linux/blob/master/arch/powerpc/kernel/fpu.S#L79-L83 :

  * This task wants to use the FPU now.
  * On UP, disable FP for the task which had the FPU previously,
  * and save its floating-point registers in its thread_struct.
  * Load up this task's FP registers from its thread_struct,
  * enable the FPU for the current task and return to the task.

Here UP stand for Uni Processor (in case we are running a machine with only 1
CPU).

  
> I'm getting different incorrect results every time I run the test on some machines, while other machines always compute the correct result and the test passes.
> 
> But I found out that the problem shows up with different kernel versions (4.4.0-101-generic, 3.10.0-693.1.1.el7.ppc64le, 4.4.126-94.22-default). So I guess it's rather unlikely that the problem is only caused by the OS.
> 
> After more investigation, it rather looks like v0 is not preserved across the safepoint:
> 
> vs32 = v0, vs36 = v4, vs40 = v8
> 
>    0x00007fff6813e6d0: extsw   r15,r17
>    0x00007fff6813e6d4: rldic   r18,r17,2,30
>    0x00007fff6813e6d8: add     r18,r21,r18
>    0x00007fff6813e6dc: addi    r20,r18,16
>    0x00007fff6813e6e0: addi    r18,r18,16
>    0x00007fff6813e6e4: lxvd2x  vs36,0,r18
>    0x00007fff6813e6e8: vaddfp  v4,v4,v0
>    0x00007fff6813e6ec: rldicr  r15,r15,2,61
>    0x00007fff6813e6f0: add     r15,r21,r15
>    0x00007fff6813e6f4: addi    r18,r15,32
>    0x00007fff6813e6f8: addi    r15,r15,32
>    0x00007fff6813e6fc: lxvd2x  vs40,0,r15                    ;*faload {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 21 (line 62)
> 
>    0x00007fff6813e700: stxvd2x vs36,0,r20
>    0x00007fff6813e704: vaddfp  v4,v8,v0
>    0x00007fff6813e708: stxvd2x vs36,0,r18                    ;*fastore {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 24 (line 62)
> 
>    0x00007fff6813e70c: addi    r17,r17,8                     ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 25 (line 61)
> 
>    0x00007fff6813e710: cmpw    cr5,r17,r24
>    0x00007fff6813e714: blt     cr5,0x00007fff6813e6d0        ;*goto {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
> 
>   ;; B15: #      B14 B16 <- B14  Freq: 12356.3
> 
>    0x00007fff6813e718: ld      r15,288(r16)                  ; ImmutableOopMap{R21=Oop }
>                                                              ;*goto {reexecute=1 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
> 
>    0x00007fff6813e71c: tdlgei  r15,8                         ;*goto {reexecute=0 rethrow=0 return_oop=0}
>                                                              ; - compiler.runtime.safepoints.TestRegisterRestoring::increment at 28 (line 61)
>                                                              ;   {poll}
>    0x00007fff6813e720: cmpw    cr6,r17,r24
>    0x00007fff6813e724: blt     cr6,0x00007fff6813e6d0
> 
> 
> At the end of the method, I see v4_float = {10000, 10000, 10000, 10000} on machines on which the test passes.
> On a machine on which it fails, e.g. v4_float = {0xffffffff, 0x8a296200, 0xffffffff, 0xffffffff}
> 
> I thought we had already checked saving and restoring vector registers at safepoints, but seems like we have missed something.

OK. So I was not able to reproduce yet... But looks like you pointed out a
solution to Michi already, so I'll stay tuned.

Thanks.


Best regards,
Gustavo

> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Mittwoch, 5. September 2018 18:21
> To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
> 
> Hi Martin,
> 
> On 09/03/2018 02:18 PM, Doerr, Martin wrote:
>> Hi Gustavo and Michihiro,
>>
>> we noticed jtreg test failures when using this change:
>> compiler/runtime/safepoints/TestRegisterRestoring.java
>> compiler/runtime/Test7196199.java
>>
>> TestRegisterRestoring is a simple test which returns arbitrary results instead of 10000.
> 
> Just to confirm I understood the description correctly:
> 
> Where you able to check it's returning random values for the
> array instead of 10_000 or you just checked that test failed?
> 
> Also, did you pass -XX:-SuperwordUseVSX when it failed? I'm
> asking because I'm able to fail that test due to a timeout, but not sure
> if it's the same you got there. Look (I'm using the same kernel as yours):
> 
> http://cr.openjdk.java.net/~gromero/logs/slp_failure0.txt
> 
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
>> We didn't see it on all machines, so it might be an issue with saving&restoring VR registers in the signal handler.
>> The machine which I have used has "SUSE Linux Enterprise Server 12 SP3" with kernel 4.4.126-94.22-default.
>>
>> That's what I found out so far. Maybe you have an idea?
>>
>> I also noticed that "-XX:-SuperwordUseVSX" crashes with bad ad file when your patch is applied. Looks like matching the vector nodes needs to be prevented.
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev <hotspot-dev-bounces at openjdk.java.net> On Behalf Of Gustavo Romero
>> Sent: Montag, 3. September 2018 14:57
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-dev at openjdk.java.net
>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>
>> Hi Goetz,
>>
>> On 09/03/2018 09:27 AM, Lindenmaier, Goetz wrote:
>>> Also, I can not find all of the mail traffic in
>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/thread.html.
>>> Is this a problem of the pipermail server?
>>>
>>> For some reason this webrev lacks the links to browse the diffs.
>>> Do you need to use a more recent webrev?  You can obtain it with
>>> hg clone http://hg.openjdk.java.net/code-tools/webrev/ .
>>
>> Yes, probably it was a problem of the pipermail or in some relay.
>> I noted the same thing, i.e. at least one Michi reply arrived
>> to me but missed a ML.
>>
>> The initial discussion is here:
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003613.html
>>
>> I understand Martin reviewed the last webrev in that thread, which is
>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/  (taken from
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-July/003615.html)
>>
>> Martin's review of webrev.01:
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-August/033958.html
>>
>> and Michi's reply to Martin's review of webrev.01:
>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html (with webrev.02,
>> taken from http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-August/003632.html).
>>
>> and your last review:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-September/030419.html
>>
>>
>> HTH.
>>
>> Best regards,
>> Gustavo
>>     
>>> Why do you rename vnoreg to vnoregi?
>>>
>>> Besides that the change is fine, thanks for implementing this!
>>>
>>> Best regards,
>>>      Goetz.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Doerr, Martin
>>>> Sent: Dienstag, 28. August 2018 19:35
>>>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Michihiro Horie
>>>> <HORIE at jp.ibm.com>
>>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
>>>> <volker.simonis at sap.com>
>>>> Subject: RE: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> Hi Michihiro,
>>>>
>>>> thank you for implementing it. I have just taken a first look at your
>>>> webrev.01.
>>>>
>>>> It looks basically good. Only the Power version check seems to be incorrect.
>>>> VM_Version::has_popcntb() checks for Power5.
>>>> I believe most instructions are available with Power7.
>>>> Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with
>>>> Power8?
>>>> We should check this carefully.
>>>>
>>>> Also, indentation in register_ppc.hpp could get improved.
>>>>
>>>> Thanks and best regard,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>> Sent: Donnerstag, 26. Juli 2018 16:02
>>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>>> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-
>>>> port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>
>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>
>>>> Hi Michi,
>>>>
>>>> On 07/26/2018 01:43 AM, Michihiro Horie wrote:
>>>>> I updated webrev:
>>>>> http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/
>>>>
>>>> Thanks for providing an updated webrev and for fixing indentation and
>>>> function
>>>> order in assembler_ppc.inline.hpp as well. I have no further comments :)
>>>>
>>>>
>>>> Best Regards,
>>>> Gustavo
>>>>
>>>>>
>>>>> Best regards,
>>>>> --
>>>>> Michihiro,
>>>>> IBM Research - Tokyo
>>>>>
>>>>> Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi,
>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---
>>>> 2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie
>>>> wrote:
>>>>>
>>>>> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
>>>>> To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-
>>>> dev at openjdk.java.net, hotspot-dev at openjdk.java.net
>>>>> Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin"
>>>> <martin.doerr at sap.com>
>>>>> Date: 2018/07/25 23:05
>>>>> Subject: Re: RFR: 8208171: PPC64: Enrich SLP support
>>>>>
>>>>> -------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> ----------------------------------------------------------------------------------------------
>>>> -----------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> Hi Michi,
>>>>>
>>>>> On 07/25/2018 02:43 AM, Michihiro Horie wrote:
>>>>>     > Dear all,
>>>>>     >
>>>>>     > Would you review the following change?
>>>>>     > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171
>>>>>     > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00
>>>>>     >
>>>>>     > This change adds support for vectorized arithmetic calculation with SLP.
>>>>>     >
>>>>>     > The to_vr function is added to convert VSR to VR. Currently, vecX is
>>>> associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad,
>>>> which are exactly overlapped with VRs. Instruction APIs receiving VRs use the
>>>> to_vr via vecX. Another thing is the change in sqrtF_reg to enable the
>>>> matching with SqrtVF. I think the change in sqrtF_reg would be fine due to
>>>> the ConvD2FNode::Value in convertnode.cpp.
>>>>>
>>>>> Looks good. Just a few comments:
>>>>>
>>>>> - In vmul4F_reg() would it be reasonable to use xvmulsp instead of
>>>> vmaddfp in
>>>>>       order to avoid the splat?
>>>>>
>>>>> - Although all instructions added by your change where introduced in ISA
>>>> 2.06,
>>>>>       so POWER7 and above are OK, as I see probes for
>>>> PowerArchictecturePPC64=6|5 in
>>>>>       vm_version_ppc.cpp (line 64),  I'm wondering if there is any control point
>>>> to
>>>>>       guarantee that these instructions won't be emitted on a CPU that does
>>>> not
>>>>>       support them.
>>>>>
>>>>> - I think that in general string in format %{} are in upper case. For instance,
>>>>>       this the current output on optoassembly for vmul4F:
>>>>>
>>>>> 2941835 5b4     ADDI    R24, R24, #64
>>>>> 2941836 5b8     vmaddfp  VSR32,VSR32,VSR36      ! mul packed4F
>>>>> 2941837 5c0     STXVD2X     [R17], VSR32        // store 16-byte Vector
>>>>>
>>>>>       I think it would be better to be in upper case instead. I also think that if
>>>>>       the node match emits more than one instruction all instructions must be
>>>> listed
>>>>>       in format %{}, since it's meant for detailed debugging. Finally I think it
>>>>>       would be better to replace \t! by \t// in that string (unless I'm missing any
>>>>>       special meaning for that char). So for vmul4F it would be something like:
>>>>>
>>>>> 2941835 5b4     ADDI      R24, R24, #64
>>>>>                     VSPLTISW  VSR34, 0                 // Splat 0 imm in VSR34
>>>>> 2941836 5b8     VMADDFP   VSR32,VSR32,VSR36,VSR34  // Mul packed4F
>>>>> 2941837 5c0     STXVD2X   [R17], VSR32             // store 16-byte Vector
>>>>>
>>>>>
>>>>> But feel free to change anything just after you get additional reviews :)
>>>>>
>>>>>
>>>>>     > I confirmed this change with JTREG. In addition, I used attached micro
>>>> benchmarks.
>>>>>     > /(See attached file: slp_microbench.zip)/
>>>>>
>>>>> Thanks for sharing it.
>>>>> Btw, another option to host it would be in the CR
>>>>> server, in http://cr.openjdk.java.net/~mhorie/8208171
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Gustavo
>>>>>
>>>>>     >
>>>>>     > Best regards,
>>>>>     > --
>>>>>     > Michihiro,
>>>>>     > IBM Research - Tokyo
>>>>>     >
>>>>>
>>>>>
>>>>>
>>>
>>
> 



More information about the hotspot-compiler-dev mailing list