Howto replicate failure of 8254790?
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Oct 20 21:39:52 UTC 2020
Thank you, Sandhya
Very nice analysis.
I just finished running dsig/GenerationTests.java test multiply runs (to besure) on our systems and confirmed your
proposed fix:
bsfl(ch, tmp);
+ if (UseNewCode) {
+ addptr(result, ch);
+ } else {
addl(result, ch);
+ }
It always fails with addl() and always passed with addptr(). I will assign bug to me and file PR now.
I will also fix Unicode string index instrinsic code.
Thanks,
Vladimir
On 10/20/20 10:27 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
>
> I analyzed the instruction dump yesterday to find out where the issue is. I have attached it to the bug report as 8254790.asm:
> https://bugs.openjdk.java.net/browse/JDK-8254790
>
> The crash is reported at:
> 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10]
>
> Which is just after the intrinsics and uses the rbx register (containing the index of char from the intrinsic).
>
> RBX has the large value 0xfffffff900000008 instead of 8. The length of the string is 34 bytes. The match is found in first 32 bytes at index 8.
> After doing the 32 bytes with the following instructions:
> 6b: C5FE6F13 vmovdqu ymm2, ymmword ptr [rbx]
> 6f: C5ED74D1 vpcmpeqb ymm2, ymm2, ymm1
> 73: C4E27D17C2 vptest ymm0, ymm2
> 78: 0F8369000000 jnb 0xe7
> The control goes to 0xe7.
>
> The code snippet at 0xe7 is:
> e7: C5FDD7CA vpmovmskb ecx, ymm2
> eb: 0FBCC1 bsf eax, ecx
> ee: 03D8 add ebx, eax
> f0: 482BDF sub rbx, rdi
> f3: 0F1F4000 nop dword ptr [rax], eax
> f7: 413BDB cmp ebx, r11d
> fa: 0F83DF290000 jnb 0x2adf
> 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10]
>
> After vpmovmskb, the bit mask in ecx is 0x1100, showing the match at 8th and 9th byte.
> The register rbx at this point must be holding address to the base of array: 0x00000007e41d2700 same as rdi.
> Bsf puts 8 in eax.
> Then 8 is added to ebx instead of rbx using 32-bit add, making upper 32 bits as 0, resulting in rbx = 0xe41d2708.
> If the add was 64-bit add, everything would have worked well.
> Then sub rbx, rdi results in 0xe41d2708 - 0x00000007e41d2700 = 0xFFFFFFF900000008 being loaded in rbx.
> This is the value we see at crash.
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Tuesday, October 20, 2020 10:01 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Tatton, Jason <jptatton at amazon.com>; David Holmes <david.holmes at oracle.com>; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; Hohensee, Paul <hohensee at amazon.com>
> Subject: Re: Howto replicate failure of 8254790?
>
> Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic.
> An other thing is we see the failure only on MacOS.
>
> I also want someone to decode asm dump I provided in bug to see actual instructions where it happened.
>
> Vladimir K
>
> On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote:
>> Hi Jason,
>>
>> I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem:
>>
>> 2807 bind(FOUND_CHAR);
>> 2808 if (UseAVX >= 2) {
>> 2809 vpmovmskb(tmp, vec3);
>> 2810 } else {
>> 2811 pmovmskb(tmp, vec3);
>> 2812 }
>> 2813 bsfl(ch, tmp);
>> 2814 addl(result, ch); <==== The problem is here
>> 2815
>> 2816 bind(FOUND_SEQ_CHAR);
>> 2817 subptr(result, str1);
>>
>> The line addl(result, ch) should have been addptr(result, ch).
>>
>> The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed.
>>
>> Hope this helps.
>>
>> Best Regards,
>> Sandhya
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Vladimir
>> Kozlov
>> Sent: Thursday, October 15, 2020 3:59 PM
>> To: Tatton, Jason <jptatton at amazon.com>; David Holmes
>> <david.holmes at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
>> core-libs-dev at openjdk.java.net
>> Subject: Re: Howto replicate failure of 8254790?
>>
>> Hi Jason,
>>
>> I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it.
>>
>> If you look on si_addr: 0x00000000e41d2718 which case memory map
>> failure, it looks like R8 =0x00000007e41d2700 is an
>> oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut.
>>
>> But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions.
>>
>> Vladimir K
>>
>> On 10/15/20 2:10 PM, Tatton, Jason wrote:
>>> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> Sent: 15 October 2020 20:25
>>> To: David Holmes <david.holmes at oracle.com>; Tatton, Jason
>>> <jptatton at amazon.com>; hotspot-compiler-dev at openjdk.java.net;
>>> core-libs-dev at openjdk.java.net
>>> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790?
>>>
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Note, we have old Mac machines in our testing env:
>>> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1,
>>> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul,
>>> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt
>>>
>>> Use -XX:UseAVX=2
>>>
>>> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS.
>>>
>>> Regards,
>>> Vladimir K
>>>
>>> On 10/14/20 5:48 PM, David Holmes wrote:
>>>> Hi Jason,
>>>>
>>>> On 15/10/2020 10:42 am, Tatton, Jason wrote:
>>>>> Hi all,
>>>>>
>>>>>
>>>>>
>>>>> I am trying to replicate the failure of the tier2 test mentioned in
>>>>> 8254790<https://bugs.openjdk.java.net/browse/JDK-8254790> but I am
>>>>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail?
>>>>
>>>> It failed on a Mac, not Linux.
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>>
>>>>>
>>>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java"
>>>>>
>>>>>
>>>>>
>>>>> Note that I am running the test against master without the commit:
>>>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> --
>>>>> Jason
>>>>>
More information about the core-libs-dev
mailing list