RFR: 8291555: Implement alternative fast-locking scheme [v21]

Sat Mar 11 16:14:30 UTC 2023

On Sat, 11 Mar 2023 15:57:53 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > Proposal for omitting the lockstack size check (at least in 75% of all times):
> > 
> > * We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4.
> > * Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0.
> > * Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...).
> > * If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit.
> > * You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the _initial_ slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial)
> 
> This would only work when pushing a single slot, right? Have you seen what we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There we're doing a (conservative) estimate how many lock-slots are needed in the method, and check for enough slots upon method entry once, and then elide the check altogether in the lock-enter implementation.

Yeah, I just realized this myself. I started working on the template interpreter first, where we push single stack slots. There it may still make sense.

-------------

PR: https://git.openjdk.org/jdk/pull/10907