[aarch64-port-dev ] Help debugging problem with large code cache
Edward Nevill
edward.nevill at gmail.com
Wed Dec 2 14:24:26 UTC 2015
Hi
I have been trying to debug a problem with large code caches on JDK 9 over the past week and could do with some help/advice on how to proceed.
I have filed a JIRA issue
https://bugs.openjdk.java.net/browse/JDK-8144498
Here is my analysis of the problem so far. Apologies if this is a bit stream of consciousness.
Running jtreg/langtools with -XX:ReservedCodeCacheSize=512m generates a number of failures dues to SEGVs whereas running without this option passes all tests.
The set of tests which fails each time is different. For example on two back to back runs I get
FAILED: tools/javac/classfiles/attributes/annotations/RuntimeAnnotationsForInnerAnnotationTest.java
FAILED: tools/javac/T6410706.java
FAILED: tools/jdeps/DotFileTest.java
ed at arm64:~/jtreg/jtreg$ fgrep FAILED log_512m_2
FAILED: com/sun/javadoc/testSimpleTag/TestSimpleTag.java
FAILED: com/sun/javadoc/testWindowTitle/TestWindowTitle.java
FAILED: jdk/jshell/CompletionSuggestionTest.java
The command used to invoke jtreg was
/home/ed/images/jdk9-orig/bin/java -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=512m -nr -conc:48 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test
The problem can also be replicated with EEMBC GrinderBench although it may required many 100s of runs to trigger. The command I used to invoke GrinderBench is
/home/ed/images/jdk9-orig/bin/java -XX:ReservedCodeCacheSize=512m -classpath dist/fullset/bench1.jar org.eembc.grinderbench.CmdlineWrapper -r 1 -m 1 -t 4
For the purposes of the following I have chosen to investigate the GrinderBench failure because it is easier to debug than random failures in jtreg/
The SEGV occurs in a method which is called from SharedRuntime::resolve_opt_virtual_call_C. The call backtrace is about 20 frames long. The following are the oldest few frames.
....
#17 0x000003ff99717a44 in SharedRuntime::resolve_helper (thread=thread at entry=0x3ff94010000,
is_virtual=is_virtual at entry=true, is_optimized=is_optimized at entry=true,
__the_thread__=__the_thread__ at entry=0x3ff94010000)
at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1186
#18 0x000003ff99718988 in SharedRuntime::resolve_opt_virtual_call_C (thread=0x3ff94010000)
at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1441
#19 0x000003ff70ab23a8 in ?? ()
#20 0x000003fdd59228f0 in ?? ()
Looking at frame #19
(gdb) x/10i $pc-20
0x3ff70ab2394: mov x0, x28
0x3ff70ab2398: mov x8, #0x8950 // #35152
0x3ff70ab239c: movk x8, #0x9971, lsl #16
0x3ff70ab23a0: movk x8, #0x3ff, lsl #32
0x3ff70ab23a4: blr x8
=> 0x3ff70ab23a8: isb
0x3ff70ab23ac: str xzr, [x28,#440]
0x3ff70ab23b0: str xzr, [x28,#448]
0x3ff70ab23b4: ldr x8, [x28,#8]
0x3ff70ab23b8: cbnz x8, 0x3ff70ab2454
This is a stub for resolve_opt_virtual_call. So here it calls 0x3ff99718950 and disassembling that
(gdb) x/i 0x3ff99718950
0x3ff99718950 <SharedRuntime::resolve_opt_virtual_call_C(JavaThread*)>:
stp x29, x30, [sp,#-80]!
So it is calling SharedRuntime::resolve_opt_virtual_call_C which is correct according to the above stack trace. However, looking at the previous frame
(gdb) x/2g $fp
0x3ff98dede60: 0x0000000000000138 0x000003ff7122469c
(gdb) x/12i 0x000003ff7122469c-40
0x3ff71224674: ret
0x3ff71224678: mov x8, #0x28f0 // #10480
0x3ff7122467c: movk x8, #0xd592, lsl #16
0x3ff71224680: movk x8, #0x3fd, lsl #32
0x3ff71224684: str x8, [sp,#8]
0x3ff71224688: mov x8, #0xffffffffffffffff // #-1
0x3ff7122468c: str x8, [sp]
0x3ff71224690: adrp x8, 0x3ff70ab2000 <<< HERE
0x3ff71224694: add x8, x8, #0x300 <<<
0x3ff71224698: blr x8 <<<
0x3ff7122469c: b 0x3ff712242f8 <<<
0x3ff712246a0: adrp x8, 0x3ff70adf000
The code marked HERE is a out of line stub which is calling the resolve_opt_virtual_call stub. So far so good.
*** But this is not the correct code to call resolve_opt_virtual_call ****
This is in fact the code generated by the following from c1_CodeStubs_aarch64.cpp
void CounterOverflowStub::emit_code(LIR_Assembler* ce) {
__ bind(_entry);
Metadata *m = _method->as_constant_ptr()->as_metadata();
__ mov_metadata(rscratch1, m);
ce->store_parameter(rscratch1, 1);
ce->store_parameter(_bci, 0);
__ far_call(RuntimeAddress(Runtime1::entry_for(Runtime1::counter_overflow_id)));
ce->add_call_info_here(_info);
ce->verify_oop_map(_info);
__ b(_continuation);
}
So this code is supposed to be calling Runtime1::counter_overflow. The -1 for the BCI is the InvocationEntryBci because this is an invocation entry counter overflow and it is this -1 which eventually causes the SEGV because it is being used as a genuine index into the bytecode to get a constant pool index for the invoke.
But is shouldn't be calling SharedRuntime::resolve_opt_virtual_call_C, it should be calling Runtime1::counter_overflow.
Tracing back where this out of line stub is called from
(gdb) x/10i 0x3ff712242f8-36
0x3ff712242d4: mov x0, #0xc250 // #49744
0x3ff712242d8: movk x0, #0xd592, lsl #16
0x3ff712242dc: movk x0, #0x3fd, lsl #32
0x3ff712242e0: ldr w6, [x0,#220]
0x3ff712242e4: add w6, w6, #0x8
0x3ff712242e8: str w6, [x0,#220]
0x3ff712242ec: and w6, w6, #0x1ff8
0x3ff712242f0: cmp w6, #0x0
0x3ff712242f4: b.eq 0x3ff71224678 <<<< HERE is the b to the out of line stub
0x3ff712242f8: str w5, [sp,#52]
(gdb)
So the above confirms that it is really doing a counter overflow but calling resolve_opt_virtual_call.
So I tried changing the 'far_call' method in macroAssembler_aarch64.cpp to use movz/movk/movk instead of adrp/add.
IE
// We can use ADRP here because we know that the total size of
// the code cache cannot exceed 2Gb.
adrp(tmp, entry, offset);
add(tmp, tmp, offset);
becomes
// We can use ADRP here because we know that the total size of
// the code cache cannot exceed 2Gb.
movptr(tmp, (uintptr_t)entry.target());
//adrp(tmp, entry, offset);
//add(tmp, tmp, offset);
This cause GrinderBench to start working (at least, no failures after about 5000 runs).
So I changed this to read
// We can use ADRP here because we know that the total size of
// the code cache cannot exceed 2Gb.
movptr(tmp, (uintptr_t)entry.target());
adrp(tmp, entry, offset);
add(tmp, tmp, offset);
IE. So it generate both the movz/movk/movk vsn and the adrp/add version but uses the adrp version discarding the result of the movz/movk/movk version.
Now when I list the out of line stub in gdb I get
(gdb) x/10i 0x000003ff5521d5dc-32
0x3ff5521d5bc: mov x8, #0xffffffffffffffff // #-1
0x3ff5521d5c0: str x8, [sp]
0x3ff5521d5c4: mov x8, #0x9780 <<< movz/movk/movk -> 0x3ff54c89780
0x3ff5521d5c8: movk x8, #0x54c8, lsl #16
0x3ff5521d5cc: movk x8, #0x3ff, lsl #32
0x3ff5521d5d0: adrp x8, 0x3ff54ab2000 <<< adrp/add -> 0x3ff54ab2300
0x3ff5521d5d4: add x8, x8, #0x300
0x3ff5521d5d8: blr x8
0x3ff5521d5dc: b 0x3ff5521d308
0x3ff5521d5e0: mov x8, #0xfc80 // #64640
So the adrp/add and movz/movk/movk address different runtime stubs. Disassembling both shows that the adrp is addressing the resolve_opt_virtual_call stub and the movz/movk/movk is addressing the Runtime1::counter_overflow stub.
So it looks like the adrp is either not being relocated, or is being relocated incorrectly.
Any suggestions as to why it might be doing this??? I have had a long look at the pd_patch_* code and it seems correct to me.
Unfortunately it is difficult to debug because I cannot walk though it in gdb because of the infrequency (once in every few 100 runs) so I can only debug as above by looking at the core files generated.
Thanks for your help,
Ed.
More information about the aarch64-port-dev
mailing list