[aarch64-port-dev ] AArch64 register usage questions

Mon Mar 20 21:56:53 UTC 2017

Hi,

I've been looking at the aarch64 port's register usage, and compared against Oracle's arm port, and have a few questions and observations:

Comparing the enum in c1_defs_aarch54.hpp vs. the enum in the arm code (and doing some constant folding by hand), you get:

ARM32/64

AArch64

Notes

pd_nof_cpu_regs_frame_map

33

32

number of registers used during code emission

pd_nof_caller_save_cpu_regs_frame_map

27

17

number of registers killed by calls

pd_nof_cpu_regs_reg_alloc

27

17

number of registers that are visible to register allocator ()

pd_nof_cpu_regs_linearscan

33

32

number of registers visible to linear scan

pd_nof_cpu_regs_processed_in_linearscan

28

-0

number of registers processed in linear scan; includes LR (in arm prt)

pd_first_cpu_reg

0

0

pd_last_cpu_reg

32

16

pd_first_callee_saved_reg

-0

17

pd_last_callee_saved_reg

-0

24

pd_last_allocatable_cpu_reg

-0

16

pd_first_byte_reg

-0

0

unused!

pd_last_byte_reg

-0

16

unused, except by unused last_byte_reg().

pd_nof_fpu_regs_frame_map

32

32

number of float registers used during code emission

pd_nof_caller_save_fpu_regs_frame_map

32

32

number of float registers killed by calls

pd_nof_fpu_regs_reg_alloc

32

8

number of float registers that are visible to register allocator

pd_nof_fpu_regs_linearscan

32

32

number of float registers visible to linear scan

pd_first_fpu_reg

33

32

'= pd_nof_cpu_regs_frame_map,

pd_last_fpu_reg

64

63

pd_first_callee_saved_fpu_reg

-0

40

pd_last_callee_saved_fpu_reg

-0

47

pd_nof_xmm_regs_linearscan

0

0

pd_nof_caller_save_xmm_regs

0

-0

pd_first_xmm_reg

-1

-0

pd_last_xmm_reg

-1

-0

I don't expect these values to match, but some items stand out:

-        AArch64 has fewer caller-saves registers, but more callee-saves registers defined above.

o   But the Aarch64 code has comments like: // FIXME: There are no callee-saved.

o   And C2 does not define any SOE registers.

o   Is C1 using few registers than it could? Than it should?

o   And/or is C2?

o   There are other comments, such as above generate_call_stub(), that says:

?  // we don't need to save r16-18 because Java does not use them

?  The comment says r16, but doesn't seem to match pd_first_callee_saved_reg.

?  I don't see where r18 came from.

-        pd_first_byte_reg, pd_last_byte_reg and last_byte_reg() seem unused.

Looking at the register definitions in aarch64.ad and other places:

-        Aarch64 always allocates r27 to use for compressed oops (rheapbase). Arm32/64 only allocates the register if CompressedOops is enabled.

o   In one sense the Aarch64 approach seems reasonable. I think the default setting for CompressedOops will be true until heap sizes get huge (e.g. somewhere past 256GB.), so there may not be much reason to optimize the non-CompressedOops path.

o   If we really wanted to use r27 for compiled code, we could probably only allocate r27 for rheapbase if (Universe::narrow_oop_base() != NULL).

Thanks for any thoughts you might have...

-        Derek