ARM: Reduce size of safepoint code, etc.
Andrew Haley
aph at redhat.com
Wed May 30 05:34:56 PDT 2012
On 05/30/2012 12:31 PM, Andrew Dinn wrote:
> On 29/05/12 16:31, Andrew Haley wrote:
>> A few miscellaneous improvements. These lead to reduced code size and
>> less memory traffic in the normal (i.e. non-safepoint) case.
>>
>> The result is like this:
>>
>> 0 : 10 24 bipush
>> 0x4085ec58: movs r3, #36 ; 0x24
>> 2 : ac ireturn
>> 0x4085ec5a: movw r0, #0
>> 0x4085ec5e: movt r0, #16670 ; 0x411e
>> 0x4085ec62: ldr r0, [r0, #8]
>> 0x4085ec64: b.n 0x4085ec78
>> 0x4085ec66: ; <UNDEFINED> instruction: 0xdead
>> 0x4085ec68: str.w r3, [r4, #-4]!
>> 0x4085ec6c: movs r1, #50 ; 0x32
>> 0x4085ec6e: adds r2, r4, #4
>> 0x4085ec70: bl 0x4072f782
>> 0x4085ec74: ldr.w r3, [r4], #4
>> 0x4085ec78:
>>
>> Note that r3 is only pushed to memory if we take a safepoint.
>
> Nice.
>
> The only thing I found mildly (very mildly) out of kilter was your
> definition of the SAVE_STACK and RESTORE_STACK macros. SAVE_STACK caches
> the current register set locally but expects an immediately succeeding
> call to Thumb2_Flush to generate the stack push instruction(s)
> (basically Thumb2_Flush just calls Thumb2_Push_Multiple). RESTORE_STACK
> explicitly calls Thumb2_Pop_Multiple to generate the stack pop(s) then
> repopulates the register set from the local cache. This slightly
> obfuscated what the macros were doing when I saw the definitions until I
> reconciled them with the point of use. Bundling the call to
> Thumb2_Push_Multiple into the definition of SAVE_STACK might be clearer.
I sorta supposed that it's possible someone might call something
other than Thumb2_Flush, but I guess that's unlikely.
>> Also, the code to tear down the stack frame and place the return
>> value is only generated once, and all the returns jump to it.
>
> Also very nice. It might possibly pay to try to do this same
> optimisation for lreturn and dreturn -- with 132 combinations of
> registers there is maybe not a high chance of reusing a return segment
> but it would not cost much in execution time at compile time to save and
> check a compiled_2word_return array to see if an lreturn/dreturn with
> the same register pair has already been generated. The saved code space
> might help with cache pressure.
Maybe. I think we're in diminishing returns in this area, and the
time will be better spent elsewhere.
> I think there is one further special case optimization possible although
> it is not critical to implement it. When reusing a generated return
> segment for a simple return the code ends up looking like
>
> load safepoint page address
> read safepoint page
> branch forward
> <
> safepoint code
> >
> branch back to generated return code
>
> where everything up to the last branch is planted by Thumb2_Safepoint
> and the backwards branch is planted by Thumb2_Return. If the backwards
> branch target were passed to Thumb2_Safepoint in this special case then
> the skip round could be avoided. n.b.this does not apply for ireturn etc
> as they need to Fill and POP before taking the backward branch.
>
> I have built with this patch and tested it with a variety of simple
> programs employing loops and returns and also with the SpecJVM
> compiler.compiler. I also eyeballed the code generated for the test
> programs and verified that it is doing the more conservative
> save/restore and reusing generated return code where possible. All works
> and looks good.
Ok, thanks.
Andrew.
More information about the distro-pkg-dev
mailing list