[aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV
Edward Nevill
edward.nevill at gmail.com
Mon Dec 7 12:22:14 UTC 2015
On Fri, 2015-12-04 at 17:38 +0000, Andrew Haley wrote:
> On 12/04/2015 04:14 PM, Andrew Haley wrote:
> I'm going to suggest this as a simpler fix:
>
> address Relocation::pd_call_destination(address orig_addr) {
> assert(is_call(), "should be a call here");
> if (NativeCall::is_call_at(addr())) { // is a BL instruction
> address trampoline = nativeCall_at(addr())->get_trampoline();
> if (trampoline) {
> return nativeCallTrampolineStub_at(trampoline)->destination();
> }
> }
> if (orig_addr != NULL) {
> return MacroAssembler::pd_call_destination(orig_addr);
> }
> return MacroAssembler::pd_call_destination(addr());
> }
>
> I think it's right because this way we only follow real BL
> instructions, and if these point to trampolines they must be within
> the blob which is being relocated. I think this will fix your problem
> because such BL instructions cannot point to anywhere wild.
I am not sure this works.
Firstly, in the case that far_branches are not enabled (IE the code cache is <= 128m), then there could be BL instructions to other addresses outside the current code blob. These are generated by far_call as follows.
if (far_branches()) {
unsigned long offset;
// We can use ADRP here because we know that the total size of
// the code cache cannot exceed 2Gb.
adrp(tmp, entry, offset);
add(tmp, tmp, offset);
if (cbuf) cbuf->set_insts_mark();
blr(tmp);
} else {
if (cbuf) cbuf->set_insts_mark();
bl(entry);
}
I cannot see what prevents one of these BLs from being followed and since they may have been copied but not relocated then they may end up pointing somewhere random in the code buffer which just happens to look like a trampoline. Admittedly, the probability of failure is vastly reduced because there are no genuine trampolines for it to latch on to.
This case can be avoided by adding a far_branches() predicate to pd_call_destination as follows.
if (far_branches() && NativeCall::is_call_at(addr())) { // is a BL instruction
Second, I am not such that your assertion
> (When a trampoline call is first created it is a call to self; the
> reloc is the only way to find the trampoline. For this reason, you
> must use nativeCall_at(addr())->get_trampoline().)
is correct. In MacroAssembler::trampoline_call I see
if (Assembler::reachable_from_branch_at(pc(), entry.target())) {
bl(entry.target());
} else {
bl(pc());
}
so it only creates a call to self if the branch does not reach and as before you could have a dangling BL when this is copied.
I believe it would be possible to replace the above code section with simply
bl(pc());
since it will always be relocated and therefore you can always generate the call to self.
All of this seems very fragile and I am wondering about the value of trampolines. The alternative to using trampolines would be to always generate
adrp Xn, target & ~0xfff
add Xn, Xn, target & 0xfff
blr Xn
On most modern, out of order, dual issue implementations the ADRP and ADD will be folded into a single micro-op which will then be dual issued with the BLR so it doesn't end up costing us anything.
I did some experiments on 2 different implementations comparing the following 3 code fragments (where 'tramp_dest' is the final destination to be called).
1) Straight BL
tramp_test:
mov x2, x30
tramp1:
bl tramp_dest
subs x0, x0, #1
bne tramp1
ret x2
2) Straight ADRP/ADD
tramp_test:
mov x2, x30
tramp1:
adr x3, tramp_dest
add x3, x3, #0x0
blr x3
subs x0, x0, #1
bne tramp1
ret x2
3) Trampoline
tramp_test:
mov x2, x30
tramp1:
bl tramp
subs x0, x0, #1
bne tramp1
ret x2
tramp:
ldr x1, tramp_adcon
br x1
tramp_adcon:
.dword tramp_dest
I ran the above tests on 2 different implementations for 1E9 iteration. The results were
Imp 1: Straight BL = 4.50157 sec, ADRP/ADD = 4.50157 sec, trampoline = 6.00209 sec
Imp 2: Straight BL = 3.00107 sec, ADRP/ADD = 3.00106 sec, trampoline = 4.16815 sec
Maybe we could just get rid of trampolines?
All the best,
Ed.
More information about the aarch64-port-dev
mailing list