Possible subtle memory model error in ClassValue

Mon Aug 10 06:12:55 UTC 2020

Hi,

I'm David's colleague. I'm the one who's spotted this issue twice while
GraalVM is doing its points-to analysis with jdk11u-dev. The fuller
exception is here:

Caused by: java.lang.NullPointerException
        at
java.base/java.lang.ClassValue$ClassValueMap.loadFromCache(ClassValue.java:535)
        at
java.base/java.lang.ClassValue$ClassValueMap.probeHomeLocation(ClassValue.java:541)
        at java.base/java.lang.ClassValue.get(ClassValue.java:101)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIMetaAccessContext.fromClass(HotSpotJVMCIMetaAccessContext.java:163)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.fromClass(HotSpotJVMCIRuntime.java:339)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.fromObjectClass(HotSpotResolvedObjectTypeImpl.java:83)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.fromMetaspace(HotSpotResolvedObjectTypeImpl.java:97)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.resolveTypeInPool(Native
Method)
        at
jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotConstantPool.loadReferencedType(HotSpotConstantPool.java:727)
        at
java.base/jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown
Source)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at
com.oracle.graal.pointsto.infrastructure.WrappedConstantPool.loadReferencedType(WrappedConstantPool.java:88)
        at
com.oracle.graal.pointsto.infrastructure.WrappedConstantPool.loadReferencedType(WrappedConstantPool.java:105)
        at
com.oracle.svm.hosted.phases.SubstrateClassInitializationPlugin.loadReferencedType(SubstrateClassInitializationPlugin.java:71)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.loadReferenceType(BytecodeParser.java:4384)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.maybeEagerlyResolve(BytecodeParser.java:4366)
        at
com.oracle.svm.hosted.phases.SharedGraphBuilderPhase$SharedBytecodeParser.maybeEagerlyResolve(SharedGraphBuilderPhase.java:127)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.lookupMethod(BytecodeParser.java:4316)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.genInvokeStatic(BytecodeParser.java:1659)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.processBytecode(BytecodeParser.java:5340)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.iterateBytecodesForBlock(BytecodeParser.java:3423)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.processBlock(BytecodeParser.java:3230)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.build(BytecodeParser.java:1088)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.buildRootMethod(BytecodeParser.java:982)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase$Instance.run(GraphBuilderPhase.java:84)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.run(Phase.java:49)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:214)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.apply(Phase.java:42)
        at
jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.apply(Phase.java:38)
        at
com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.parse(MethodTypeFlowBuilder.java:224)
        at
com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.apply(MethodTypeFlowBuilder.java:351)
        at
com.oracle.graal.pointsto.flow.MethodTypeFlow.doParse(MethodTypeFlow.java:322)
        at
com.oracle.graal.pointsto.flow.MethodTypeFlow.ensureParsed(MethodTypeFlow.java:311)
        at
com.oracle.graal.pointsto.flow.MethodTypeFlow.addContext(MethodTypeFlow.java:112)
        at
com.oracle.graal.pointsto.flow.StaticInvokeTypeFlow.update(InvokeTypeFlow.java:437)
        at com.oracle.graal.pointsto.BigBang$2.run(BigBang.java:530)
        at
com.oracle.graal.pointsto.util.CompletionExecutor.lambda$execute$0(CompletionExecutor.java:173)
        at
java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        ... 5 more

On Fri, Aug 7, 2020 at 11:36 PM John Rose <john.r.rose at oracle.com> wrote:

> On Aug 7, 2020, at 9:52 AM, Andrew Haley <aph at redhat.com> wrote:
> >
> > On 8/7/20 4:39 PM, David Lloyd wrote:
> >
> > > However, I'm wondering if this isn't a side effect of what appears
> > > to be an incorrect double-checked lock at lines 374-378 of
> > > ClassValue.java [1].  In order for the write to the non-volatile
> > > `cache` field of ClassValueMap, it is my understanding that there
> > > must be some acquire/release edge from where the variable is
> > > published to guarantee that all of the writes are visible to the
> > > read site, and I'm not really sure that the exit-the-constructor
> > > edge is going to match up with with the synchronized block which
> > > protects a different non-volatile field.  And even if my feeling
> > > that this is dodgy is valid, I'm not sure whether this NPE is a
> > > possible outcome of that problem!
> >
> > It certainly doesn't look right to me. AFAICS this is classic broken
> > double-checked locking. It'd need some sort of fence after
> > constructing the ClassValueMap instance and before publishing it.
>
> Y’all are calling my baby ugly but he’s really cute if you look
> at him from the right angle.
>
> (First, a quick question:  This bug shows up on x86, right?
>

Yes, the issue is happening on x86.

> If so, we probably are not looking at a hardware reordering
> problem but something stemming from Graal’s reordering
> of operations compared to C1 and C2, perhaps after a different
> set of inlining decisions uncharacteristic of C2.)
>

To the best of my knowledge, GraalVM points-to analysis is just a JVM
process that figures out which code is used by the end-user application.
The result of this process is then used to create the native executable.
I'm not aware of the Graal compiler having an impact here.

>From what I've learnt looking at this code and more recent JVMCI
implementations, jdk11u-dev relies on reflection to understand the end-user
application, whereas more modern JVMCI stacks avoid the use of reflection
and query the JVM directly.

I've been trying to create a vanilla JDK stress test that would trigger
such NPE but I've not had luck so far. I don't know whether JVMCI has a
part to play here.

>
> The bug indeed looks like a dropped fence at the end of the
> constructor for ClassValueMap.  (I don’t see an easier way to
> get a null at the NPE point.)  The fence might not actually be
> dropped, but a write which the fence is supposed to fence
> might have been reordered after the fence, and after an
> unlock operation, allowing someone to see the initial
> null after the unlock.
>
> The double-check idiom breaks when the outer check
> (not synchronized) gets half-baked data and acts on it.
> This much-maligned idiom would not be broken in the
> present case under the assumption that a data dependency
> on type.classValueMap (set in initializeMap) will see
> a fully initialized value of ClassValueMap.cacheArray.
>
> Now we get into the fine print, and I agree there is a bug
> here.  The easy fix is to repair the logic under which that
> ugly everybody’s-hating-on-it double check idiom would
> be in fact correct.
>
> The fine print requires a couple different things that are not
> quite fulfilled by the present code, and Graal has either
> reordered the memory accesses to expose the problem, or
> it (itself) has a complementary bug.  (Or it has an unrelated
> bug, and you people are *staring* at my baby!)
>
> First, as Paul noted, you need a final variable in your class
> to get the benefit of a fence at the end of the constructor.
> Oops #1:  ClassValueMap doesn’t have *any* finals.  That’s
> awkward.  Two fixes for that:  (a) add a dummy final, and
> (b) add a real fence in the constructor.  For better luck,
> maybe put the fence at the end of sizeCache.
>
> On machines (not x86) which need fences *before* final
> reads, another explicit fence should be placed before the
> unsynchronized read (or else inside the getCache method).
>
> By the letter of the JMM, the helpful effects of the fence
> at the end of the constructor are only guaranteed for
> references which depend on final variables set by that
> constructor, because only those final variables get a
> “freeze” operation which stabilizes them (and values
> dependently loaded via them).  Throwing in a dummy
> final is therefore not guaranteed to work.  But throwing
> in a fence will work.  One downside of the fence is that
> the JMM is silent about fences, but the JMM is widely
> recognized as a partial story, not the full or final story.
>
> (Here’s a tidbit of JMM politics:  I once heard Doug Lea
> considering whether maybe all fields should be treated
> more like final fields.  I don’t know if this is still a live
> idea, but it would make this bug go way, since nearly all
> constructors would then get fences of some sort.)
>
> Here’s a bit of explanatory (non-normative) text from the draft
> POPL paper for JMM, where the parenthetic comment indicates
> (I think) the sort of thing that is happening here:
>
> > Let us now assume that the acquire and release do happen. As long as
> > these actions take place after object has been constructed (and there
> > is no code motion around the end of the constructor), the diffs that
> > the second processor acquires are guaranteed to reflect the correctly
> > constructed object.
>
> Basically, the synchronization statement is expected to do a
> release *after* the non-null value is stored.  It looks like this
> is failing due to a lost and/or reordered fence.
>
> Second, David says:
>
> > I’m not really sure that the exit-the-constructor edge is going to match
> > up with with the synchronized block which protects a different
> > non-volatile field.
>
> By the letter of the JMM (AFAIUI), this code is coloring way outside
> the lines.  (Yes, I admit it.)  The problem is pretty fundamental.
> I went and swapped in some JMM spec just now (don’t have it in
> short term memory; go figure), and here’s what I re-learned.
>
> In JMM terms, a “release” is the earlier end of any edge in relation
> called “synchronizes-with”.  An “acquire” is the latter end of such
> an edge.  In the JMM this relation applies only to lock, unlock,
> and volatile reads and writes.  Your guess is as good as mine whether
> other operations (CAS, fences, etc.) participate; the JMM is silent.
> Crucially, acquires and release do not touch plain fields.  This is
> counter-intuitive for those of us who like to reason with fences.
>
> The JMM prevents regular writes from floating past what we like
> to think of as “release points” (unlocks) by appealing to processor
> order inside lock/unlock pairs, and also by appealing to global
> ordering of multiple lock/unlock pairs (or stuff with volatiles
> and other corner cases which we will neglect here).  In order
> to work right, a normal write has to come before a release *and*
> a matching normal read has to come after an acquire, and such
> an acquire is nothing other than the lock of a later lock/unlock
> pair, typically in another thread.
>
> So the rules which reliably connect normal writes to normal
> reads in other threads work differently from acquire fences and
> release fences as we (or just I?) usually think of them.  Yes, there
> are “acquires” and “releases” in the JMM.  No, they are not
> primitives but rather descriptions of the way the happens-before
> relation is constrained by (among other things) lock and unlock
> actions.
>
> On x86, you don’t have to worry about acquires; you just set up
> the right release fences.  In the JMM, there’s no way to get either
> an acquire or a release without a synchronized statement (or
> other fancy stuff like volatiles).  That’s why double-check is
> broken in the pure JMM, and why it can be repaired if you
> add some (non-JMM) fences.
>
> So why is this showing up suddenly with Graal?  Possibly it’s
> got a really aggressive interpretation of the JMM, relative to
> the standard HotSpot JITs.  Possibly it’s got a bug.  Graal might
> be allowing the initialization of ClassValueMap.cacheArray
> to float down past publication of the ClassValueMap on
> type.classValueMap (in initializeMap).  Less likely, it is allowing
> the initialization to totally escape the lock/unlock pair, something
> the JMM might allow but the Cookbook advises against (in its
> “Can Reorder” table).
>
> Perhaps Graal is modeling the “freeze” operations on finals more
> accurately (and the Cookbook gives instructions for this).  That
> would allow the ClassValueMap to correctly initialize its frozen
> finals, but not (unfortunately) the not-frozen cacheArray field.
> It appears that the new JIT on the block is exposing my neglect
> for my poor baby, in not putting the right fences around its playpen.
>
> Bottom line:  Add a release fence before type.cVM is set, and add
> a (no-op on x86) acquire fence in getCache.  Quick test:  Add a dummy
> final to CVM, to see if that makes the bug let go.
>
> — John
>
>