RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays

Doerr, Martin martin.doerr at sap.com
Fri Jan 4 09:30:43 UTC 2019


Hi Gustavo,

thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue.

New webrev:
http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero <gromero at linux.vnet.ibm.com> 
Sent: Donnerstag, 3. Januar 2019 19:36
To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays

Hi Martin,

On 01/03/2019 03:34 PM, Doerr, Martin wrote:
> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build).
> I guess that the frameless spills mess up the stack. Can you check if the patch below helps?

Thanks for providing a fix so I can try it.
Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used.
I also confirm that I don't observe the crash on the fastdebug build, only on the release build.
It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build.

Just as reference, I can reproduce it on the release build with the following trivial code:

import java.util.zip.CRC32C;

class CRC32C_v1 {
   public static void main(String[] arg) {
     byte[] b = new byte[1024];
   
     CRC32C crc32c = new CRC32C();
     crc32c.update(b, 0, b.length);

     System.out.println(crc32c.getValue());
   }
}

Thanks for fixing the typos.


Best regards,
Gustavo
  
> Best regards,
> Martin
> 
> 
> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp
> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp  Thu Jan 03 17:30:03 2019 +0100
> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp  Thu Jan 03 18:33:16 2019 +0100
> @@ -1924,6 +1924,9 @@
>         __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE));
>       }
> 
> +    // Restore caller sp for c2i case.
> +    __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started.
> +
>       StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table);
> 
>       if (!VM_Version::has_vpmsumb()) {
> @@ -1933,8 +1936,6 @@
>         __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true);
>       }
> 
> -    // Restore caller sp for c2i case and return.
> -    __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started.
>       __ blr();
> 
>       // Generate a vanilla native entry as the slow path.
> @@ -2014,6 +2015,9 @@
>         __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE));
>       }
> 
> +    // Restore caller sp for c2i case.
> +    __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started.
> +
>       StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table);
> 
>       if (!VM_Version::has_vpmsumb()) {
> @@ -2023,8 +2027,6 @@
>         __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false);
>       }
> 
> -    // Restore caller sp for c2i case and return.
> -    __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started.
>       __ blr();
> 
>       BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}");
> 
> 
> -----Original Message-----
> From: Gustavo Romero <gromero at linux.vnet.ibm.com>
> Sent: Donnerstag, 3. Januar 2019 17:13
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays
> 
> Hi Martin,
> 
> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :)
> 
> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x.
> 
> On the Interpreter I see an improvement of at least 50% for 1024 bytes.
> 
> This is all for the CRC32 class.
> 
> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java.
> 
> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/
> 
> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :)
> 
> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos:
> 
> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version
> for Barrett but it should be changed in
> 
> +  // Point to Barret constants
> +  add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size);
> +
> 
> ?
> 
> s/not/note/ in:
> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es):
> 
> d/lives/ in:
> cpu/ppc/macroAssembler_ppc.cpp:4265:  mtvrwz(VCRC, crc); // crc lives lives in VCRC, now
> 
> Best regards,
> Gustavo
> 
> On 01/03/2019 12:17 PM, Doerr, Martin wrote:
>> Hi,
>>
>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code.
>>
>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes.
>>
>> Bug:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8216060
>>
>> I have addressed these 2 issues + some cleanup with the following webrev:
>>
>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ <http://cr.openjdk.java.net/%7Emdoerr/8216060_PPC64_CRC/webrev.00/>
>>
>> Please review.
>>
>> Best regards,
>>
>> Martin
>>
> 



More information about the hotspot-compiler-dev mailing list