C2, ThreadLocalNode, and Loom

Fri Nov 25 08:58:45 UTC 2022

> naked TLS addresses are stored, and the majority (all?) uses reference that thread register. (I am 
> not sure what protects us from accidentally "caching" thread register into adhoc one. It would make 
> little sense from performance/compiler standpoint, but I cannot yet see what theoretically prevents 
> it in C2 code.)

I don't think the thread register on, say, x64 can be spilled. tsLoadP
has its own register mask (single register mask: r15) that doesn't
intersect with the mask used by spilling instructions (they don't mess
with r15) so it's impossible for tsLoadP to be an input to a
spill. tsLoadP is rematerializeable so when r15 is killed, rather than
spill it, a new tsLoadP is added near uses.

> 3) Some other easy way out I am overlooking?

Ideally, I think we want to avoid putting the burden of an uncommon case
on c2 generated code: can we have the runtime code that performs the
yield update the register that contains the thread (like gc logic would
update the live oops at a safepoint)? That would require a way for
runtime code to know where the thread pointer lives which is not
straighforward AFAICT.

Roland.