[aarch64-port-dev ] AArch64 register usage questions
White, Derek
Derek.White at cavium.com
Mon Mar 20 21:56:53 UTC 2017
Hi,
I've been looking at the aarch64 port's register usage, and compared against Oracle's arm port, and have a few questions and observations:
Comparing the enum in c1_defs_aarch54.hpp vs. the enum in the arm code (and doing some constant folding by hand), you get:
ARM32/64
AArch64
Notes
pd_nof_cpu_regs_frame_map
33
32
number of registers used during code emission
pd_nof_caller_save_cpu_regs_frame_map
27
17
number of registers killed by calls
pd_nof_cpu_regs_reg_alloc
27
17
number of registers that are visible to register allocator ()
pd_nof_cpu_regs_linearscan
33
32
number of registers visible to linear scan
pd_nof_cpu_regs_processed_in_linearscan
28
-0
number of registers processed in linear scan; includes LR (in arm prt)
pd_first_cpu_reg
0
0
pd_last_cpu_reg
32
16
pd_first_callee_saved_reg
-0
17
pd_last_callee_saved_reg
-0
24
pd_last_allocatable_cpu_reg
-0
16
pd_first_byte_reg
-0
0
unused!
pd_last_byte_reg
-0
16
unused, except by unused last_byte_reg().
pd_nof_fpu_regs_frame_map
32
32
number of float registers used during code emission
pd_nof_caller_save_fpu_regs_frame_map
32
32
number of float registers killed by calls
pd_nof_fpu_regs_reg_alloc
32
8
number of float registers that are visible to register allocator
pd_nof_fpu_regs_linearscan
32
32
number of float registers visible to linear scan
pd_first_fpu_reg
33
32
'= pd_nof_cpu_regs_frame_map,
pd_last_fpu_reg
64
63
pd_first_callee_saved_fpu_reg
-0
40
pd_last_callee_saved_fpu_reg
-0
47
pd_nof_xmm_regs_linearscan
0
0
pd_nof_caller_save_xmm_regs
0
-0
pd_first_xmm_reg
-1
-0
pd_last_xmm_reg
-1
-0
I don't expect these values to match, but some items stand out:
- AArch64 has fewer caller-saves registers, but more callee-saves registers defined above.
o But the Aarch64 code has comments like: // FIXME: There are no callee-saved.
o And C2 does not define any SOE registers.
o Is C1 using few registers than it could? Than it should?
o And/or is C2?
o There are other comments, such as above generate_call_stub(), that says:
? // we don't need to save r16-18 because Java does not use them
? The comment says r16, but doesn't seem to match pd_first_callee_saved_reg.
? I don't see where r18 came from.
- pd_first_byte_reg, pd_last_byte_reg and last_byte_reg() seem unused.
Looking at the register definitions in aarch64.ad and other places:
- Aarch64 always allocates r27 to use for compressed oops (rheapbase). Arm32/64 only allocates the register if CompressedOops is enabled.
o In one sense the Aarch64 approach seems reasonable. I think the default setting for CompressedOops will be true until heap sizes get huge (e.g. somewhere past 256GB.), so there may not be much reason to optimize the non-CompressedOops path.
o If we really wanted to use r27 for compiled code, we could probably only allocate r27 for rheapbase if (Universe::narrow_oop_base() != NULL).
Thanks for any thoughts you might have...
- Derek
More information about the aarch64-port-dev
mailing list