[x86_64 AVX2] weird crash due to RAX in String.compareTo(Object)
Liu, Xin
xxinliu at amazon.com
Wed Mar 11 22:54:47 UTC 2020
Hi,
Thanks for looking into it. I filed a bug for it. https://bugs.openjdk.java.net/browse/JDK-8240913
Am I right? the label 8-pool means all jdk8u versions.
For JDK-8154896, I don't quite understand the description. It looks like a hardware glitch in some specific context. Does it only happen on Skylake-X?
Machael also mentioned HSW(64/32 bit) in the bug. Our EC2 instance is Broadwell-E in the crash reports. I found that Broadwell is architecturally same as Haswell.
Thanks,
--lx
On 3/11/20, 12:09 PM, "Volker Simonis" <volker.simonis at gmail.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Wed, Mar 11, 2020 at 3:02 PM Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> Hi Liu,
>
> I wasn't able to find any similar reports in the JBS.
>
> > Forget about Test8005419.java because It can't reproduce this case neither.
> > Has you seen this kind of crash before? May I file a new bug about it?
> > I have already permutated all cases of two 70-length compareTo, but I still can't trigger it.
> > What piece of information did I miss here?
Hi Vladimir,
thanks for looking into this issue. I've also tried to understand it,
but the code is quite sophisticated :)
> The value in RAX is evidently broken.
>
> str1 and str2 are already adjusted (+116 from 0th element) [1], so it
> seems the vector loop (COMPARE_WIDE_VECTORS_LOOP) has been run before
> and there was a mismatch encountered (VECTOR_NOT_EQUAL =>
> COMPARE_16_CHARS). But then somehow there's no mismatch detected and it
> falls into vector loop once again.
Yes, but in that case, shouldn't 'str1' and 'str2' be adjusted such
that they point to an element which is a multiple of 16?
In COMPARE_WIDE_VECTORS, 'str1' and 'str2' are set up to point past
the last character and 'result' is the minimal string length (the two
strings are of equal size here if we can trust the hs_error file).
Afterwards, we subtract 'stride' from 'result' (the string length) and
negate 'result'.
bind(COMPARE_WIDE_VECTORS);
lea(str1, Address(str1, result, scale));
lea(str2, Address(str2, result, scale));
subl(result, stride2);
subl(cnt2, stride2);
jccb(Assembler::zero, COMPARE_WIDE_TAIL);
negptr(result);
So in COMPARE_WIDE_VECTORS_LOOP we will iterate from str1[16..31] to
str1[32..47] to [48..63]. If there was a mismatch in the last stride,
'str1' and 'str2' should be adjusted to point to 48th element (i.e.
index +96).
If the mismatch was in the COMPARE_WIDE_TAIL part, 'str1' and 'str2'
should point to the 70-16 = 54th element (i.e. index +108).
So the index +116 (i.e. the 58th character) looks suspicious to me.
Also, if we fall back in to the vector loop again, because the string
has been changed under the hood, 'result' (i.e. 'rax') should either
be '-22' if we're coming from VECTOR_NOT_EQUAL or -16 if we're coming
from COMPARE_WIDE_TAIL.
The other strange thing is that we had about 6 different processes
(running for up to 100 days and more) all crashing within the same
minute in this very same code on the same host. The hs_err file Xin
posted was the simplest one. The others all claim to crash at the same
instruction within the 'compareTo' intrinsic, but looking at the
registers, they don't even point into character arrays any more (see
for example http://cr.openjdk.java.net/~simonis/tmp/hs_err.log).
I found one difference between 8u an jdk which was introduced in jdk9
with "8154896: xml.transform fails intermittently on SKX"
(https://bugs.openjdk.java.net/browse/JDK-8154896). It changed some
short branches to normal ones (e.g. 'jccb' tp 'jcc') but I must
confess, that I don't really understand the explanation for the
change:
"There is a guarantee of isBit(imm8) for jccb which can sometimes fail
when upper bank marshaling is required for instructions without EVEX
or conditionally EVEX support, as the side effect code can push us
over the imm8 limit." I don't see a guarantee in jccb, just an
assertion which checks for the correct length.
Do you know why that change was necessary and do you think it can be a
reason why we see these errors?
Thank you and best regards,
Volker
> The only plausible explanation I have is there's patching of String
> backing char array happening and it breaks the intrinsic which doesn't
> expect any concurrent modifications (and there shouldn't be any).
>
> Best regards,
> Vladimir Ivanov
>
> RSI=0x00000000fe71487c is pointing into object: 0x00000000fe7147f8
> [C
> - klass: {type array char}
> - length: 70
> RDI=0x00000000fe711e44 is pointing into object: 0x00000000fe711dc0
> [C
> - klass: {type array char}
> - length: 70
>
> (lldb) p 0x00000000fe71487c - 0x00000000fe7147f8
> (unsigned int) $0 = 132
> (lldb) p 0x00000000fe711e44 - 0x00000000fe711dc0
> (unsigned int) $2 = 132
>
> > On 3/9/20, 1:00 AM, "hotspot-compiler-dev on behalf of Liu, Xin" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of xxinliu at amazon.com> wrote:
> >
> > Hi,
> >
> > I got some crash reports of C2 generated method String.compareTo(Object) on x86_64. This method is an intrinsics and defined in MacroAssembler::string_compare(macroAssembly_x86.cpp).
> > Yes, one interesting fact is the problem only happens on the bridge method compareTo(Object), deriving from the interface Comparable<String>.
> >
> > So far, I only see crashes in jdk8u because newer JDKs use AVX3 version by default, but I read the tip of jdk and AVX2 version is still the same. My concern is the bug is still there. Have you seen this problem before?
> >
> > I found they all crash at an AVX instruction "0x00007ffb0d830235 vmovdqu ymm0, ymmword ptr [rdi + rax*2]", where RAX=0xffffffff00000036, RDI=0x00000000fe711e44.
> > JVM got SIGSEGV because of access violation. The faulty address is 0xfffffffefe711eb0, which is exactly (rax *2 + rdi). It looks like result(rax) has been overflowed. -4294967242
> >
> > AVX2 version comes from JDK-8005419. By changing the method signature a little bit in Test8005419.java, we can get String.compareTo(Object) AVX2 version as string_compare.S.
> > diff --git a/src/hotspot/test/compiler/8005419/Test8005419.java b/src/hotspot/test/compiler/8005419/Test8005419.java
> > index 201153e8a..1f8c57097 100644
> > --- a/src/hotspot/test/compiler/8005419/Test8005419.java
> > +++ b/src/hotspot/test/compiler/8005419/Test8005419.java
> > @@ -114,7 +114,7 @@ public class Test8005419 {
> > System.out.println("PASSED");
> > }
> >
> > - private static int test(String str1, String str2) {
> > + private static int test(Comparable<String>str1, String str2) {
> > return str1.compareTo(str2);
> > }
> > }
> >
> > Because it's an intrinsics, there's no code shape issue, right? I can't figure out how Rax becomes 0xffffffff00000036. I attached the original error message. According to RSI and RDI, the method was comparing two 70-length strings.
> > Test.java permutates all cases of two 70-length strings. Why I still can't hit this problem? Did I still miss anything?
> >
> > Thanks in advanced.
> > --lx
> >
> >
> >
> >
> >
More information about the hotspot-compiler-dev
mailing list