Infinispan server issue - putting it all together

John Rose john.r.rose at oracle.com
Sat Oct 5 15:22:41 UTC 2024


On 2 Oct 2024, at 20:19, ioi.lam at oracle.com wrote:

> Hi Ashutosh,
>
> Thanks for putting together the summary!

Yes, thank you to everyone who has been working to tame this bug,
and extra thanks to Ashutosh for the very clear write-up.

I have seen a number of problems like this lurking in the JDK.
I suspect there may be many more as well.  Anything like Leyden
that touches the surface of the problem, by perturbing initialization
order, can stir up Balrogs.

The general form is this:  Class 1 and class 2 have a mutual dependency.
The APIs are organized so that class 1 is the “front door” and class 2
is the “back door”.  If your first entry is from the front door,
then the code paths get executed in such a way that class 1 touches
class 2, which then (as part of class 2 initialization) performs some
step that is necessary for both classes.  Often but not always this
is setting a static final reference variable, in class 2, to some
non-null value.

Class 2 also touches class 1 in the course of its own initialization,
but since class 1 is already on stack (in the current thread) the
“touch” by class 2 (a call from the back door to the front door)
is a no-op.  And that’s OK, since class 1 has already done enough
to satisfy class 2’s initialization requirements.  Eventually
the init logic for class 1 finishes as well.  The key feature
here is both class 1 and class 2 inits logic (<clinit> methods)
are cut in half, interrupted by an initializing “touch” of the
other class.  So there are four initialization actions, not two.
Let’s call them init[12][ab].

When the front door is entered, we have this order of inits:

(front door call) init1a init2a (nop for init1) init2b init1b

Let’s say init2b (not init2a) creates state needed by init1b.
(The state could be the setting of a static final field.
It could also be the populating of a cache somewhere.)
In any case, init2b must precede init1b in order for init1b
to observe a useful side effect from init2b.

All this works as long as it is very difficult, or impossible,
to go in by the “back door”, say by making an instance of
class 2 as the very first use of the API, or calling a static
method on class 2, before class 1 is touched.

But, if and when somebody does eventually discover the back door
and give it a try, then class 2 init logic will execute without
class 1 init partially done (and waiting to finish on stack).
This means that when class 2 touches class 1, class 1 init
logic runs uninterrupted, and the recursive touch to class 2
is a no-op, so that the latter part of class 2’s init logic
goes last.  Here is the order:

(back door call) init2a init1a (nop for init2) init1b init2b

So class 1 is likely to fail to initialize because it reaches
out for states populated by init2b, which has been pushed out
to happen later.

In the present Infinispan problem, the front door is
ConstantDescs (totally public API) and the back door is a
helper class in a non-exported package.  That package is
protected enough, on a normal day, by the module system.
But reflective tricks, or an over-active CDS, can open
up the back door.

When such a problem is found, it can be patched in a number
of ways.  My favorite would be to refactor the offending
state (final field values) into a private static-holder
class, and let the public classes pull values from the
private class.  This is the reason I have wished (for
a long time!) for lazy-statics, because if you can mark
a static “lazy” you get the same effect, but you don’t
have to refactor it into a separate static holder class.
See [1] for more.  For now, we can use static holders.

[1] https://cr.openjdk.org/~jrose/leyden/after-computed-constants.html

Instead of a private static holder class, you can also
centralize the “pulled values” into a class that already
exists.  I think that’s the essence of Dan’s proposal
for repairing ConstantDesc.

I prefer to make the problem clearer with a static holder
that has a special name (“foo static holder” or the like)
so that maintainers better understand its special role.
BTW, by “private static class” I include package-private
classes (either top level or static member of another class),
to cover those common cases where several classes need to
share initial state.  Note that a touch to a static member
class does not count as an initializing touch to its
containing class.  Note also that using “import static”
can make it easier to write.  But if you have a public
static final, you will need to write something like
this:  “public static final FOO = FooStatics.FOO”, where
“FooStatics” is the class that sequences the necessary
inits, and the extra copy of FOO is extracted after
all the inits are done (by the static holder).

Sometimes it also helps to add an extra “touch” in a third
class 0 (the “driveway”?), that touches class 1.  But that
only works if you can prove that, out of all three classes,
class 0 is always touched first.  You might find examples in the
JDK by searching for the unsafe method “ensureClassInitialized”,
although most uses are for other purposes.  I think the uses
in LambdaForm.java are nailing down circular bootstrap order.
BTW, outside the JDK there is also Lookup::ensureInitialized.

During development of java.lang.invoke I bumped into many NPEs
from init loops during development and had to “shake around” the
code in order to get the inits into a good order.  The private
classes MethodHandleStatics and MethodHandleNatives are both
examples of private static holder classes, for (respectively)
pure Java and native state pertaining to java.lang.invoke.
There were other loops not covered by those as well.

I found (to my dismay) that I hadn’t discovered all the possible
bad reorderings, and a few bugs came in from the field just like
the Infinispan bug, where some code threw an NPE due to some
surprising API invocation order, not covered by our pre-integration
tests.  Oops.

The NPE often occurs some distance away from the root cause.
(That is, the init-loop and the static reference that pulls
out a null from a static that “obviously cannot be null”.)
The null often wanders some distance away before somebody
trips over it.  We have some technical debt here, in that we
have no good way to pinpoint the cause of such an NPE, if
the null has “wandered away” before it trips the NPE.

It can get worse, too, if you have an initialization loop with
two public doors.  If two threads are racing to enter both doors
concurrently, you can get a deadlock on a bad day.  I’m amazed
we don’t see this more often.  A couple years ago I posted a
very detailed POC of that pathology[2].

[2] https://cr.openjdk.org/~jrose/jvm/ClassInitDeadlock.zip

Back to you Dan…

On 4 Oct 2024, at 11:34, Dan Heidinga wrote:

> Problem #3 is an interesting case (and many thanks for the simple reproduction test!) in that it shows an example of a class initialization order cycle – thankfully one that can be worked around.  It requires careful separation to break the cycle while still preserving the identity invariants expressed in the original code. As an example, look at https://github.com/openjdk/leyden/compare/premain...DanHeidinga:leyden:rework-constantdesc-init?expand=1 for how to detangle the ConstantDesc / PrimitiveClassDescImpl / ReerenceClassDescImpl design.
>
> I don’t know that we actually want to propose this to mainline but it shows the pattern that can be used to rework the <clinit>’s of similar cycles.

If I understand this correctly, ReferenceClassDescImpl is being given
the role of common holder of constants.  My preference would be to
(a) use a new private class such as ClassDescStatics (cf. MHStatics)
and (b) liberally sprinkle “import static ClassDescStatics.*” in
the constant desc implementation files, and (c ) do NOT mangle
the static field names (with “BOOTSTRAP” prefix or the like);
just use the natural names (and import them everywhere).  Also,
(d) if there is some name that is a public API point, then just
duplicate the name as noted above:

 public static final Thingy API_NAME = ClassDescStatics.API_NAME;

(This works out fine except for String constants, if you really
need them to be JLS-style constants.)  So:

 public static final PrimitiveClassDescImpl CD_int = ClassDescStatics.CD_int;

Pushing all the constants into one class might be too heavy a lift
in some cases, but it is the surest way to eliminate bootstrap NPEs.
Dan, would you mind trying it out my way and see if it can work?

In a followup mail I wish to address the tricky problem of
diagnosing bootstrap loops, which (as we know) can manifest
in the field as mysterious NPE errors.  Leyden has a fresh
requirement for tools to detect and tame the Balrogs before
they can rampage.

— John


More information about the leyden-dev mailing list