ARM: Reduce size of safepoint code, etc.

Andrew Haley aph at redhat.com
Wed May 30 05:34:56 PDT 2012


On 05/30/2012 12:31 PM, Andrew Dinn wrote:
> On 29/05/12 16:31, Andrew Haley wrote:
>> A few miscellaneous improvements.  These lead to reduced code size and
>> less memory traffic in the normal (i.e. non-safepoint) case.
>>
>> The result is like this:
>>
>>     0 : 10 24          bipush
>> 0x4085ec58:	movs	r3, #36	; 0x24
>>     2 : ac             ireturn
>> 0x4085ec5a:	movw	r0, #0
>> 0x4085ec5e:	movt	r0, #16670	; 0x411e
>> 0x4085ec62:	ldr	r0, [r0, #8]
>> 0x4085ec64:	b.n	0x4085ec78
>> 0x4085ec66:			; <UNDEFINED> instruction: 0xdead
>> 0x4085ec68:	str.w	r3, [r4, #-4]!
>> 0x4085ec6c:	movs	r1, #50	; 0x32
>> 0x4085ec6e:	adds	r2, r4, #4
>> 0x4085ec70:	bl	0x4072f782
>> 0x4085ec74:	ldr.w	r3, [r4], #4
>> 0x4085ec78:
>>
>> Note that r3 is only pushed to memory if we take a safepoint.
> 
> Nice.
> 
> The only thing I found mildly (very mildly) out of kilter was your
> definition of the SAVE_STACK and RESTORE_STACK macros. SAVE_STACK caches
> the current register set locally but expects an immediately succeeding
> call to Thumb2_Flush to generate the stack push instruction(s)
> (basically Thumb2_Flush just calls Thumb2_Push_Multiple). RESTORE_STACK
> explicitly calls Thumb2_Pop_Multiple to generate the stack pop(s) then
> repopulates the register set from the local cache. This slightly
> obfuscated what the macros were doing when I saw the definitions until I
> reconciled them with the point of use. Bundling the call to
> Thumb2_Push_Multiple into the definition of SAVE_STACK might be clearer.

I sorta supposed that it's possible someone might call something
other than Thumb2_Flush, but I guess that's unlikely.

>> Also, the code to tear down the stack frame and place the return
>> value is only generated once, and all the returns jump to it.
> 
> Also very nice. It might possibly pay to try to do this same
> optimisation for lreturn and dreturn -- with 132 combinations of
> registers there is maybe not a high chance of reusing a return segment
> but it would not cost much in execution time at compile time to save and
> check a compiled_2word_return array to see if an lreturn/dreturn with
> the same register pair has already been generated. The saved code space
> might help with cache pressure.

Maybe.  I think we're in diminishing returns in this area, and the
time will be better spent elsewhere.

> I think there is one further special case optimization possible although
> it is not critical to implement it. When reusing a generated return
> segment for a simple return the code ends up looking like
> 
>   load safepoint page address
>   read safepoint page
>   branch forward
>   <
>    safepoint code
>   >
>   branch back to generated return code
> 
> where everything up to the last branch is planted by Thumb2_Safepoint
> and the backwards branch is planted by Thumb2_Return. If the backwards
> branch target were passed to Thumb2_Safepoint in this special case then
> the skip round could be avoided. n.b.this does not apply for ireturn etc
> as they need to Fill and POP before taking the backward branch.
> 
> I have built with this patch and tested it with a variety of simple
> programs employing loops and returns and also with the SpecJVM
> compiler.compiler. I also eyeballed the code generated for the test
> programs and verified that it is doing the more conservative
> save/restore and reusing generated return code where possible. All works
> and looks good.

Ok, thanks.

Andrew.




More information about the distro-pkg-dev mailing list