C2, ThreadLocalNode, and Loom

Aleksey Shipilev shade at redhat.com
Thu Nov 24 17:25:30 UTC 2022


Hi,

In Loom x86_32 port, I am following up on the remaining C2 bugs.

I believe one bug can be summarized as follows. C2 models thread-locals with ThreadLocalNode (TLN). 
TLN is effectively a constant node: its only input is root, and it hashes like a normal node. This 
was logically sound for decades, because the code never switched the threads. Therefore, code is 
free to treat thread address as constant. In Loom, this guarantee no longer holds. If we ever store 
the "old" value of TLN node somewhere and reuse it past the potential yield-resume point, we end up 
using the *wrong* thread.

How is this not a problem we saw before? On most architectures, we have the dedicated thread 
register, and TLN matches to it. That dedicated thread register holds the true current thread 
already. AFAICS, most TLN uses go straight to various AddP-s, so we are reasonably safe that no 
naked TLS addresses are stored, and the majority (all?) uses reference that thread register. (I am 
not sure what protects us from accidentally "caching" thread register into adhoc one. It would make 
little sense from performance/compiler standpoint, but I cannot yet see what theoretically prevents 
it in C2 code.)

On x86_32, however, TLN matches to full MacroAssembler::get_thread call, and there storing the 
thread address into an adhoc register is a normal thing to do. Reusing that register over the 
continuation switch points visibly breaks x86_32. This usually manifests like a heap corruption 
because multiple threads stomp over foreign TLABs, or a failure in runtime GC code. Current failures 
in Loom x86_32 port, for example, can be easily reproduced by adding a simple assert in any G1 
runtime method that pulls the (wrong) thread (mis)loaded from C2 barrier:

// G1 pre write barrier slowpath
JRT_LEAF(void, G1BarrierSetRuntime::write_ref_field_pre_entry(oopDesc* orig, JavaThread* thread))
   assert(thread == JavaThread::current(), "write_ref_field_pre_entry sanity");

So, while this manifests on x86_32, I think this is symptom of a larger problem with assuming TLN 
const-ness. At this point, I am trying to come up with solutions:

1) Dodge the problem in x86_32 and then pretend all arches have dedicated thread registers

Sacrifice one x86_32 register for thread address. This would likely to penalize performance a little 
bit, because x86_32 does not have lots of general purpose registers to begin with. We can probably 
try and go to FS/GS instead of carrying the address in the register; but I don't know how much work 
would that entail.

This feels like a cowardly way out, and it would still break any future arch that does not have 
dedicated thread registers. And, it would break if we ever replace and ThreadLocalNode with the call 
to Thread::current().

2) Remodel ThreadLocalNode as non-constant

What partially solves the problem: saying that ThreadLocalNode::hash() is NO_HASH. AFAICS, this 
successfully prevents collapsing TLNs in the same compilation. This still does not solve the case 
where a single TLN gets yanked to the earliest block and its value cached in the register.

AFAIU, we only want to make sure that TLN is reloaded after the potential continuation yield, which 
also serves as the point of return. Since continuation yields are modeled as calls, and calls 
produce both control and memory, we might need to hook up TLN to either control or memory.

I tried to hook up the current control to every TLN node [1]. It works with a few wrinkles, but the 
patch shows there are ripple effects throughout C2 code, and it sometimes breaks the graph. Some 
pattern matching code (for example AddP matching code in EA) also asserts, probably assuming that 
TLNs have no inputs. I suspect other places might have implicit dependencies like these as well. 
This would be the inevitable consequence for any patch that changes ThreadLocalNode inputs/outputs.

3) Some other easy way out I am overlooking?

Any other ideas?

-- 
Thanks,
-Aleksey

[1] https://cr.openjdk.java.net/~shade/loom/x86_32/tln-ctrl-1.patch



More information about the hotspot-compiler-dev mailing list