C2, ThreadLocalNode, and Loom
Aleksey Shipilev
shade at redhat.com
Thu Nov 24 17:25:30 UTC 2022
Hi,
In Loom x86_32 port, I am following up on the remaining C2 bugs.
I believe one bug can be summarized as follows. C2 models thread-locals with ThreadLocalNode (TLN).
TLN is effectively a constant node: its only input is root, and it hashes like a normal node. This
was logically sound for decades, because the code never switched the threads. Therefore, code is
free to treat thread address as constant. In Loom, this guarantee no longer holds. If we ever store
the "old" value of TLN node somewhere and reuse it past the potential yield-resume point, we end up
using the *wrong* thread.
How is this not a problem we saw before? On most architectures, we have the dedicated thread
register, and TLN matches to it. That dedicated thread register holds the true current thread
already. AFAICS, most TLN uses go straight to various AddP-s, so we are reasonably safe that no
naked TLS addresses are stored, and the majority (all?) uses reference that thread register. (I am
not sure what protects us from accidentally "caching" thread register into adhoc one. It would make
little sense from performance/compiler standpoint, but I cannot yet see what theoretically prevents
it in C2 code.)
On x86_32, however, TLN matches to full MacroAssembler::get_thread call, and there storing the
thread address into an adhoc register is a normal thing to do. Reusing that register over the
continuation switch points visibly breaks x86_32. This usually manifests like a heap corruption
because multiple threads stomp over foreign TLABs, or a failure in runtime GC code. Current failures
in Loom x86_32 port, for example, can be easily reproduced by adding a simple assert in any G1
runtime method that pulls the (wrong) thread (mis)loaded from C2 barrier:
// G1 pre write barrier slowpath
JRT_LEAF(void, G1BarrierSetRuntime::write_ref_field_pre_entry(oopDesc* orig, JavaThread* thread))
assert(thread == JavaThread::current(), "write_ref_field_pre_entry sanity");
So, while this manifests on x86_32, I think this is symptom of a larger problem with assuming TLN
const-ness. At this point, I am trying to come up with solutions:
1) Dodge the problem in x86_32 and then pretend all arches have dedicated thread registers
Sacrifice one x86_32 register for thread address. This would likely to penalize performance a little
bit, because x86_32 does not have lots of general purpose registers to begin with. We can probably
try and go to FS/GS instead of carrying the address in the register; but I don't know how much work
would that entail.
This feels like a cowardly way out, and it would still break any future arch that does not have
dedicated thread registers. And, it would break if we ever replace and ThreadLocalNode with the call
to Thread::current().
2) Remodel ThreadLocalNode as non-constant
What partially solves the problem: saying that ThreadLocalNode::hash() is NO_HASH. AFAICS, this
successfully prevents collapsing TLNs in the same compilation. This still does not solve the case
where a single TLN gets yanked to the earliest block and its value cached in the register.
AFAIU, we only want to make sure that TLN is reloaded after the potential continuation yield, which
also serves as the point of return. Since continuation yields are modeled as calls, and calls
produce both control and memory, we might need to hook up TLN to either control or memory.
I tried to hook up the current control to every TLN node [1]. It works with a few wrinkles, but the
patch shows there are ripple effects throughout C2 code, and it sometimes breaks the graph. Some
pattern matching code (for example AddP matching code in EA) also asserts, probably assuming that
TLNs have no inputs. I suspect other places might have implicit dependencies like these as well.
This would be the inevitable consequence for any patch that changes ThreadLocalNode inputs/outputs.
3) Some other easy way out I am overlooking?
Any other ideas?
--
Thanks,
-Aleksey
[1] https://cr.openjdk.java.net/~shade/loom/x86_32/tln-ctrl-1.patch
More information about the hotspot-compiler-dev
mailing list