Possible subtle memory model error in ClassValue

older
Fix for Javadoc errors in java.base

David Lloyd

7 Aug 2020 7 Aug '20

3:39 p.m.

I'm helping a colleague debug a weird problem that occurs in ClassValue on jdk11u (and presumably on upstream as well, though it's presently impossible to verify). As a disclaimer, the problem manifests itself when building native images via GraalVM so it's possible that something is simply broken there, but it seems at least feasible that it could be a plain old Java bug so I thought I'd send up a flare here to see if this makes sense to anyone else. The bug itself manifests (on jdk11u) as an NPE with the following exception trace: java.lang.NullPointerException at java.base/java.lang.ClassValue$ClassValueMap.loadFromCache(ClassValue.java:535) at java.base/java.lang.ClassValue$ClassValueMap.probeHomeLocation(ClassValue.java:541) at java.base/java.lang.ClassValue.get(ClassValue.java:101) ... Some basic analysis shows that this should be impossible under normal/naive circumstances: the initializer of java.lang.ClassValue.ClassValueMap sets the corresponding field to non-null during construction. However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem! Thoughts? [1] https://github.com/openjdk/jdk11u-dev/blob/3789983e89c9de252ef546a1b98a732a7... -- - DML • he/him

Show replies by date

Paul Sandoz

7 Aug 7 Aug

4:17 p.m.

Hi David, This is subtle. ClassValue extends from WeakHashMap that has a few final fields. In such cases, for HotSpot at least, the compiler will place fence between the stores to the fields of ClassValue and the store to publish in the field of Class. So it should not be possible to observe a partially constructed ClassValue, where the field ClassValueMap.cacheArray is null (which seems to the the source of the NPE). However, perhaps the Graal AoT compiler does not behave the same? Paul.

...

On Aug 7, 2020, at 8:39 AM, David Lloyd <david.lloyd@redhat.com> wrote:

I'm helping a colleague debug a weird problem that occurs in ClassValue on jdk11u (and presumably on upstream as well, though it's presently impossible to verify). As a disclaimer, the problem manifests itself when building native images via GraalVM so it's possible that something is simply broken there, but it seems at least feasible that it could be a plain old Java bug so I thought I'd send up a flare here to see if this makes sense to anyone else.

The bug itself manifests (on jdk11u) as an NPE with the following exception trace:

java.lang.NullPointerException at java.base/java.lang.ClassValue$ClassValueMap.loadFromCache(ClassValue.java:535) at java.base/java.lang.ClassValue$ClassValueMap.probeHomeLocation(ClassValue.java:541) at java.base/java.lang.ClassValue.get(ClassValue.java:101) ...

Some basic analysis shows that this should be impossible under normal/naive circumstances: the initializer of java.lang.ClassValue.ClassValueMap sets the corresponding field to non-null during construction.

However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

Thoughts?

[1] https://github.com/openjdk/jdk11u-dev/blob/3789983e89c9de252ef546a1b98a732a7... -- - DML • he/him

Andrew Haley

4:52 p.m.

On 8/7/20 4:39 PM, David Lloyd wrote:

...

However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

It certainly doesn't look right to me. AFAICS this is classic broken double-checked locking. It'd need some sort of fence after constructing the ClassValueMap instance and before publishing it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

John Rose

9:35 p.m.

On Aug 7, 2020, at 9:52 AM, Andrew Haley <aph@redhat.com> wrote:

...

On 8/7/20 4:39 PM, David Lloyd wrote:

...
However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

It certainly doesn't look right to me. AFAICS this is classic broken double-checked locking. It'd need some sort of fence after constructing the ClassValueMap instance and before publishing it.

Y’all are calling my baby ugly but he’s really cute if you look at him from the right angle. (First, a quick question: This bug shows up on x86, right? If so, we probably are not looking at a hardware reordering problem but something stemming from Graal’s reordering of operations compared to C1 and C2, perhaps after a different set of inlining decisions uncharacteristic of C2.) The bug indeed looks like a dropped fence at the end of the constructor for ClassValueMap. (I don’t see an easier way to get a null at the NPE point.) The fence might not actually be dropped, but a write which the fence is supposed to fence might have been reordered after the fence, and after an unlock operation, allowing someone to see the initial null after the unlock. The double-check idiom breaks when the outer check (not synchronized) gets half-baked data and acts on it. This much-maligned idiom would not be broken in the present case under the assumption that a data dependency on type.classValueMap (set in initializeMap) will see a fully initialized value of ClassValueMap.cacheArray. Now we get into the fine print, and I agree there is a bug here. The easy fix is to repair the logic under which that ugly everybody’s-hating-on-it double check idiom would be in fact correct. The fine print requires a couple different things that are not quite fulfilled by the present code, and Graal has either reordered the memory accesses to expose the problem, or it (itself) has a complementary bug. (Or it has an unrelated bug, and you people are *staring* at my baby!) First, as Paul noted, you need a final variable in your class to get the benefit of a fence at the end of the constructor. Oops #1: ClassValueMap doesn’t have *any* finals. That’s awkward. Two fixes for that: (a) add a dummy final, and (b) add a real fence in the constructor. For better luck, maybe put the fence at the end of sizeCache. On machines (not x86) which need fences *before* final reads, another explicit fence should be placed before the unsynchronized read (or else inside the getCache method). By the letter of the JMM, the helpful effects of the fence at the end of the constructor are only guaranteed for references which depend on final variables set by that constructor, because only those final variables get a “freeze” operation which stabilizes them (and values dependently loaded via them). Throwing in a dummy final is therefore not guaranteed to work. But throwing in a fence will work. One downside of the fence is that the JMM is silent about fences, but the JMM is widely recognized as a partial story, not the full or final story. (Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.) Here’s a bit of explanatory (non-normative) text from the draft POPL paper for JMM, where the parenthetic comment indicates (I think) the sort of thing that is happening here:

...

Let us now assume that the acquire and release do happen. As long as these actions take place after object has been constructed (and there is no code motion around the end of the constructor), the diffs that the second processor acquires are guaranteed to reflect the correctly constructed object.

Basically, the synchronization statement is expected to do a release *after* the non-null value is stored. It looks like this is failing due to a lost and/or reordered fence. Second, David says:

...

I’m not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field.

By the letter of the JMM (AFAIUI), this code is coloring way outside the lines. (Yes, I admit it.) The problem is pretty fundamental. I went and swapped in some JMM spec just now (don’t have it in short term memory; go figure), and here’s what I re-learned. In JMM terms, a “release” is the earlier end of any edge in relation called “synchronizes-with”. An “acquire” is the latter end of such an edge. In the JMM this relation applies only to lock, unlock, and volatile reads and writes. Your guess is as good as mine whether other operations (CAS, fences, etc.) participate; the JMM is silent. Crucially, acquires and release do not touch plain fields. This is counter-intuitive for those of us who like to reason with fences. The JMM prevents regular writes from floating past what we like to think of as “release points” (unlocks) by appealing to processor order inside lock/unlock pairs, and also by appealing to global ordering of multiple lock/unlock pairs (or stuff with volatiles and other corner cases which we will neglect here). In order to work right, a normal write has to come before a release *and* a matching normal read has to come after an acquire, and such an acquire is nothing other than the lock of a later lock/unlock pair, typically in another thread. So the rules which reliably connect normal writes to normal reads in other threads work differently from acquire fences and release fences as we (or just I?) usually think of them. Yes, there are “acquires” and “releases” in the JMM. No, they are not primitives but rather descriptions of the way the happens-before relation is constrained by (among other things) lock and unlock actions. On x86, you don’t have to worry about acquires; you just set up the right release fences. In the JMM, there’s no way to get either an acquire or a release without a synchronized statement (or other fancy stuff like volatiles). That’s why double-check is broken in the pure JMM, and why it can be repaired if you add some (non-JMM) fences. So why is this showing up suddenly with Graal? Possibly it’s got a really aggressive interpretation of the JMM, relative to the standard HotSpot JITs. Possibly it’s got a bug. Graal might be allowing the initialization of ClassValueMap.cacheArray to float down past publication of the ClassValueMap on type.classValueMap (in initializeMap). Less likely, it is allowing the initialization to totally escape the lock/unlock pair, something the JMM might allow but the Cookbook advises against (in its “Can Reorder” table). Perhaps Graal is modeling the “freeze” operations on finals more accurately (and the Cookbook gives instructions for this). That would allow the ClassValueMap to correctly initialize its frozen finals, but not (unfortunately) the not-frozen cacheArray field. It appears that the new JIT on the block is exposing my neglect for my poor baby, in not putting the right fences around its playpen. Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache. Quick test: Add a dummy final to CVM, to see if that makes the bug let go. — John

Paul Sandoz

11:22 p.m.

Here’s some pertinent information from Doug’s excellent document of memory order modes [1]: "As a delicate (but commonly exploited) special case of the above considerations, acquire-style reads of immutable fields of newly constructed objects do not require an explicit fence or moded access -- Plain mode reads suffice: If the consumer has not seen the new object before, it cannot have stale values that it must otherwise ignore or discard, and it cannot perform the read until it knows the address. On subsequent encounters, reusing old values is OK, because they cannot have changed. This reasoning rests on the only known defensible exception to the rule of never making assumptions about local precedence order: The reference (address) of a new object is assumed never to be known and impossible to speculate until creation. This assumption is also relied on by other Java security requirements. The resulting techniques are used in Java final field implementations, and are the reason why specified guarantees for final fields are conditional upon constructors not leaking references to objects before constructor return. Classes with final fields are normally implemented by issuing a form of Release fence upon constructor return. Further, because nothing need be guaranteed about interactions with reads by the constructor, a StoreStoreFence suffices. Similar techniques may apply in other contexts, but can be unacceptably fragile. For example, code that works when the associated objects are always newly constructed may, without further safeguards, fail upon later changes to instead recycle the objects from pools." I thought that by virtue of ClassValueMap extending WeakHashMap, which contains final fields, it would all work out. But, I was uncertain so I ran a jcstress test modifying the sample JMMSample_06_Finals [1]. See gist here: https://gist.github.com/PaulSandoz/3ae6d3f3e4027653d185f237eedfab72 <https://gist.github.com/PaulSandoz/3ae6d3f3e4027653d185f237eedfab72> I ran in tough mode and observed this failure on my MacBook: *** FAILED tests Strong asserts were violated. Correct implementations should have no assert failures here. 1 matching test results. [FAILED] o.o.j.t.singletons.JMMSample_06_Finals.FinalInitExtendedNoFinal (JVM args: [-XX:-TieredCompilation, -XX:+UnlockDiagnosticVMOptions, -XX:+StressLCM, -XX:+StressGCM]) Observed state Occurrences Expectation Interpretation -1 2,420,284,404 ACCEPTABLE Object is not seen yet. 3 320 FORBIDDEN Seeing partially constructed object. 4 53 FORBIDDEN Seeing partially constructed object. 5 3,486 FORBIDDEN Seeing partially constructed object. 6 8,409 FORBIDDEN Seeing partially constructed object. 7 11,282 FORBIDDEN Seeing partially constructed object. 8 181,774,476 ACCEPTABLE Seen the complete object. So for HotSpot it seems to occur only for C2 under certain stress options, -XX:+StressLCM, -XX:+StressGCM, which might explain why an issue is not generally being observed. I don’t understand enough about these options to know why this is so. However, this makes me nervous enough that it might be best to add a dummy final field to ClassMapValue, and to review other similar code. David, perhaps you could add a dummy final field to ClassValueMap and see if that fixes the problem with Graal? Paul. [1] http://gee.cs.oswego.edu/dl/html/j9mm.html <http://gee.cs.oswego.edu/dl/html/j9mm.html> [2] http://hg.openjdk.java.net/code-tools/jcstress/file/1dd2cca36fa9/jcstress-sa... <http://hg.openjdk.java.net/code-tools/jcstress/file/1dd2cca36fa9/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/JMMSample_06_Finals.java>

...

On Aug 7, 2020, at 2:35 PM, John Rose <john.r.rose@oracle.com> wrote:

On Aug 7, 2020, at 9:52 AM, Andrew Haley <aph@redhat.com> wrote:

...
On 8/7/20 4:39 PM, David Lloyd wrote:

...
However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

It certainly doesn't look right to me. AFAICS this is classic broken double-checked locking. It'd need some sort of fence after constructing the ClassValueMap instance and before publishing it.

Y’all are calling my baby ugly but he’s really cute if you look at him from the right angle.

(First, a quick question: This bug shows up on x86, right? If so, we probably are not looking at a hardware reordering problem but something stemming from Graal’s reordering of operations compared to C1 and C2, perhaps after a different set of inlining decisions uncharacteristic of C2.)

The bug indeed looks like a dropped fence at the end of the constructor for ClassValueMap. (I don’t see an easier way to get a null at the NPE point.) The fence might not actually be dropped, but a write which the fence is supposed to fence might have been reordered after the fence, and after an unlock operation, allowing someone to see the initial null after the unlock.

The double-check idiom breaks when the outer check (not synchronized) gets half-baked data and acts on it. This much-maligned idiom would not be broken in the present case under the assumption that a data dependency on type.classValueMap (set in initializeMap) will see a fully initialized value of ClassValueMap.cacheArray.

Now we get into the fine print, and I agree there is a bug here. The easy fix is to repair the logic under which that ugly everybody’s-hating-on-it double check idiom would be in fact correct.

The fine print requires a couple different things that are not quite fulfilled by the present code, and Graal has either reordered the memory accesses to expose the problem, or it (itself) has a complementary bug. (Or it has an unrelated bug, and you people are *staring* at my baby!)

First, as Paul noted, you need a final variable in your class to get the benefit of a fence at the end of the constructor. Oops #1: ClassValueMap doesn’t have *any* finals. That’s awkward. Two fixes for that: (a) add a dummy final, and (b) add a real fence in the constructor. For better luck, maybe put the fence at the end of sizeCache.

On machines (not x86) which need fences *before* final reads, another explicit fence should be placed before the unsynchronized read (or else inside the getCache method).

By the letter of the JMM, the helpful effects of the fence at the end of the constructor are only guaranteed for references which depend on final variables set by that constructor, because only those final variables get a “freeze” operation which stabilizes them (and values dependently loaded via them). Throwing in a dummy final is therefore not guaranteed to work. But throwing in a fence will work. One downside of the fence is that the JMM is silent about fences, but the JMM is widely recognized as a partial story, not the full or final story.

(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

Here’s a bit of explanatory (non-normative) text from the draft POPL paper for JMM, where the parenthetic comment indicates (I think) the sort of thing that is happening here:

...
Let us now assume that the acquire and release do happen. As long as these actions take place after object has been constructed (and there is no code motion around the end of the constructor), the diffs that the second processor acquires are guaranteed to reflect the correctly constructed object.

Basically, the synchronization statement is expected to do a release *after* the non-null value is stored. It looks like this is failing due to a lost and/or reordered fence.

Second, David says:

...
I’m not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field.

By the letter of the JMM (AFAIUI), this code is coloring way outside the lines. (Yes, I admit it.) The problem is pretty fundamental. I went and swapped in some JMM spec just now (don’t have it in short term memory; go figure), and here’s what I re-learned.

In JMM terms, a “release” is the earlier end of any edge in relation called “synchronizes-with”. An “acquire” is the latter end of such an edge. In the JMM this relation applies only to lock, unlock, and volatile reads and writes. Your guess is as good as mine whether other operations (CAS, fences, etc.) participate; the JMM is silent. Crucially, acquires and release do not touch plain fields. This is counter-intuitive for those of us who like to reason with fences.

The JMM prevents regular writes from floating past what we like to think of as “release points” (unlocks) by appealing to processor order inside lock/unlock pairs, and also by appealing to global ordering of multiple lock/unlock pairs (or stuff with volatiles and other corner cases which we will neglect here). In order to work right, a normal write has to come before a release *and* a matching normal read has to come after an acquire, and such an acquire is nothing other than the lock of a later lock/unlock pair, typically in another thread.

So the rules which reliably connect normal writes to normal reads in other threads work differently from acquire fences and release fences as we (or just I?) usually think of them. Yes, there are “acquires” and “releases” in the JMM. No, they are not primitives but rather descriptions of the way the happens-before relation is constrained by (among other things) lock and unlock actions.

On x86, you don’t have to worry about acquires; you just set up the right release fences. In the JMM, there’s no way to get either an acquire or a release without a synchronized statement (or other fancy stuff like volatiles). That’s why double-check is broken in the pure JMM, and why it can be repaired if you add some (non-JMM) fences.

So why is this showing up suddenly with Graal? Possibly it’s got a really aggressive interpretation of the JMM, relative to the standard HotSpot JITs. Possibly it’s got a bug. Graal might be allowing the initialization of ClassValueMap.cacheArray to float down past publication of the ClassValueMap on type.classValueMap (in initializeMap). Less likely, it is allowing the initialization to totally escape the lock/unlock pair, something the JMM might allow but the Cookbook advises against (in its “Can Reorder” table).

Perhaps Graal is modeling the “freeze” operations on finals more accurately (and the Cookbook gives instructions for this). That would allow the ClassValueMap to correctly initialize its frozen finals, but not (unfortunately) the not-frozen cacheArray field. It appears that the new JIT on the block is exposing my neglect for my poor baby, in not putting the right fences around its playpen.

Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache. Quick test: Add a dummy final to CVM, to see if that makes the bug let go.

— John

David Holmes

8 Aug 8 Aug

5:21 a.m.

For DCL to be correct Class.classValueMap should be declared volatile. Final field guarantees are there to support unsafe publication in special cases. In this case we should be using safe publication. Cheers, David On 8/08/2020 9:22 am, Paul Sandoz wrote:

...

Here’s some pertinent information from Doug’s excellent document of memory order modes [1]:

"As a delicate (but commonly exploited) special case of the above considerations, acquire-style reads of immutable fields of newly constructed objects do not require an explicit fence or moded access -- Plain mode reads suffice: If the consumer has not seen the new object before, it cannot have stale values that it must otherwise ignore or discard, and it cannot perform the read until it knows the address. On subsequent encounters, reusing old values is OK, because they cannot have changed. This reasoning rests on the only known defensible exception to the rule of never making assumptions about local precedence order: The reference (address) of a new object is assumed never to be known and impossible to speculate until creation. This assumption is also relied on by other Java security requirements.

The resulting techniques are used in Java final field implementations, and are the reason why specified guarantees for final fields are conditional upon constructors not leaking references to objects before constructor return. Classes with final fields are normally implemented by issuing a form of Release fence upon constructor return. Further, because nothing need be guaranteed about interactions with reads by the constructor, a StoreStoreFence suffices. Similar techniques may apply in other contexts, but can be unacceptably fragile. For example, code that works when the associated objects are always newly constructed may, without further safeguards, fail upon later changes to instead recycle the objects from pools."

I thought that by virtue of ClassValueMap extending WeakHashMap, which contains final fields, it would all work out. But, I was uncertain so I ran a jcstress test modifying the sample JMMSample_06_Finals [1].

See gist here:

https://gist.github.com/PaulSandoz/3ae6d3f3e4027653d185f237eedfab72 <https://gist.github.com/PaulSandoz/3ae6d3f3e4027653d185f237eedfab72>

I ran in tough mode and observed this failure on my MacBook:

*** FAILED tests Strong asserts were violated. Correct implementations should have no assert failures here.

1 matching test results. [FAILED] o.o.j.t.singletons.JMMSample_06_Finals.FinalInitExtendedNoFinal (JVM args: [-XX:-TieredCompilation, -XX:+UnlockDiagnosticVMOptions, -XX:+StressLCM, -XX:+StressGCM]) Observed state Occurrences Expectation Interpretation -1 2,420,284,404 ACCEPTABLE Object is not seen yet. 3 320 FORBIDDEN Seeing partially constructed object. 4 53 FORBIDDEN Seeing partially constructed object. 5 3,486 FORBIDDEN Seeing partially constructed object. 6 8,409 FORBIDDEN Seeing partially constructed object. 7 11,282 FORBIDDEN Seeing partially constructed object. 8 181,774,476 ACCEPTABLE Seen the complete object.

So for HotSpot it seems to occur only for C2 under certain stress options, -XX:+StressLCM, -XX:+StressGCM, which might explain why an issue is not generally being observed. I don’t understand enough about these options to know why this is so.

However, this makes me nervous enough that it might be best to add a dummy final field to ClassMapValue, and to review other similar code.

David, perhaps you could add a dummy final field to ClassValueMap and see if that fixes the problem with Graal?

Paul.

[1] http://gee.cs.oswego.edu/dl/html/j9mm.html <http://gee.cs.oswego.edu/dl/html/j9mm.html>

[2] http://hg.openjdk.java.net/code-tools/jcstress/file/1dd2cca36fa9/jcstress-sa... <http://hg.openjdk.java.net/code-tools/jcstress/file/1dd2cca36fa9/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/JMMSample_06_Finals.java>

...
On Aug 7, 2020, at 2:35 PM, John Rose <john.r.rose@oracle.com> wrote:

On Aug 7, 2020, at 9:52 AM, Andrew Haley <aph@redhat.com> wrote:

...
On 8/7/20 4:39 PM, David Lloyd wrote:

...
However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

It certainly doesn't look right to me. AFAICS this is classic broken double-checked locking. It'd need some sort of fence after constructing the ClassValueMap instance and before publishing it.

Y’all are calling my baby ugly but he’s really cute if you look at him from the right angle.

(First, a quick question: This bug shows up on x86, right? If so, we probably are not looking at a hardware reordering problem but something stemming from Graal’s reordering of operations compared to C1 and C2, perhaps after a different set of inlining decisions uncharacteristic of C2.)

The bug indeed looks like a dropped fence at the end of the constructor for ClassValueMap. (I don’t see an easier way to get a null at the NPE point.) The fence might not actually be dropped, but a write which the fence is supposed to fence might have been reordered after the fence, and after an unlock operation, allowing someone to see the initial null after the unlock.

The double-check idiom breaks when the outer check (not synchronized) gets half-baked data and acts on it. This much-maligned idiom would not be broken in the present case under the assumption that a data dependency on type.classValueMap (set in initializeMap) will see a fully initialized value of ClassValueMap.cacheArray.

Now we get into the fine print, and I agree there is a bug here. The easy fix is to repair the logic under which that ugly everybody’s-hating-on-it double check idiom would be in fact correct.

The fine print requires a couple different things that are not quite fulfilled by the present code, and Graal has either reordered the memory accesses to expose the problem, or it (itself) has a complementary bug. (Or it has an unrelated bug, and you people are *staring* at my baby!)

First, as Paul noted, you need a final variable in your class to get the benefit of a fence at the end of the constructor. Oops #1: ClassValueMap doesn’t have *any* finals. That’s awkward. Two fixes for that: (a) add a dummy final, and (b) add a real fence in the constructor. For better luck, maybe put the fence at the end of sizeCache.

On machines (not x86) which need fences *before* final reads, another explicit fence should be placed before the unsynchronized read (or else inside the getCache method).

By the letter of the JMM, the helpful effects of the fence at the end of the constructor are only guaranteed for references which depend on final variables set by that constructor, because only those final variables get a “freeze” operation which stabilizes them (and values dependently loaded via them). Throwing in a dummy final is therefore not guaranteed to work. But throwing in a fence will work. One downside of the fence is that the JMM is silent about fences, but the JMM is widely recognized as a partial story, not the full or final story.

(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

Here’s a bit of explanatory (non-normative) text from the draft POPL paper for JMM, where the parenthetic comment indicates (I think) the sort of thing that is happening here:

...
Let us now assume that the acquire and release do happen. As long as these actions take place after object has been constructed (and there is no code motion around the end of the constructor), the diffs that the second processor acquires are guaranteed to reflect the correctly constructed object.

Basically, the synchronization statement is expected to do a release *after* the non-null value is stored. It looks like this is failing due to a lost and/or reordered fence.

Second, David says:

...
I’m not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field.

By the letter of the JMM (AFAIUI), this code is coloring way outside the lines. (Yes, I admit it.) The problem is pretty fundamental. I went and swapped in some JMM spec just now (don’t have it in short term memory; go figure), and here’s what I re-learned.

In JMM terms, a “release” is the earlier end of any edge in relation called “synchronizes-with”. An “acquire” is the latter end of such an edge. In the JMM this relation applies only to lock, unlock, and volatile reads and writes. Your guess is as good as mine whether other operations (CAS, fences, etc.) participate; the JMM is silent. Crucially, acquires and release do not touch plain fields. This is counter-intuitive for those of us who like to reason with fences.

The JMM prevents regular writes from floating past what we like to think of as “release points” (unlocks) by appealing to processor order inside lock/unlock pairs, and also by appealing to global ordering of multiple lock/unlock pairs (or stuff with volatiles and other corner cases which we will neglect here). In order to work right, a normal write has to come before a release *and* a matching normal read has to come after an acquire, and such an acquire is nothing other than the lock of a later lock/unlock pair, typically in another thread.

So the rules which reliably connect normal writes to normal reads in other threads work differently from acquire fences and release fences as we (or just I?) usually think of them. Yes, there are “acquires” and “releases” in the JMM. No, they are not primitives but rather descriptions of the way the happens-before relation is constrained by (among other things) lock and unlock actions.

On x86, you don’t have to worry about acquires; you just set up the right release fences. In the JMM, there’s no way to get either an acquire or a release without a synchronized statement (or other fancy stuff like volatiles). That’s why double-check is broken in the pure JMM, and why it can be repaired if you add some (non-JMM) fences.

So why is this showing up suddenly with Graal? Possibly it’s got a really aggressive interpretation of the JMM, relative to the standard HotSpot JITs. Possibly it’s got a bug. Graal might be allowing the initialization of ClassValueMap.cacheArray to float down past publication of the ClassValueMap on type.classValueMap (in initializeMap). Less likely, it is allowing the initialization to totally escape the lock/unlock pair, something the JMM might allow but the Cookbook advises against (in its “Can Reorder” table).

Perhaps Graal is modeling the “freeze” operations on finals more accurately (and the Cookbook gives instructions for this). That would allow the ClassValueMap to correctly initialize its frozen finals, but not (unfortunately) the not-frozen cacheArray field. It appears that the new JIT on the block is exposing my neglect for my poor baby, in not putting the right fences around its playpen.

Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache. Quick test: Add a dummy final to CVM, to see if that makes the bug let go.

— John

Doug Lea

10 Aug 10 Aug

12:17 p.m.

Catching up... As implied in other posts, the minimal fix is to add a trailing release fence (using Unsafe?) to the constructor. Or less delicately, to access only using acquire/release (which will cost a bit on ARM/Power, but probably not noticeable on x86), or most simply (but expensively) to declare the field volatile. Also, as Hans noted, the consensus seems to be that there not enough to be gained by always adding a release fence to constructors. A few errors like this might never occur, but other related anomalies with non-final field accesses would remain. -Doug

Galder Zamarreno

19 Aug 19 Aug

1:12 p.m.

On Mon, Aug 10, 2020 at 2:19 PM Doug Lea <dl@cs.oswego.edu> wrote:

...

Catching up...

As implied in other posts, the minimal fix is to add a trailing release fence (using Unsafe?) to the constructor.

FYI I have sent an RFR with the proposed fix ^ https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068244.htm...

...

Or less delicately, to access only using acquire/release (which will cost a bit on ARM/Power, but probably not noticeable on x86), or most simply (but expensively) to declare the field volatile.

Also, as Hans noted, the consensus seems to be that there not enough to be gained by always adding a release fence to constructors. A few errors like this might never occur, but other related anomalies with non-final field accesses would remain.

-Doug

Hans Boehm

4:55 p.m.

If I understand the code correctly, adding the fence should fix this for current implementations. But it is not correct by current language rules, and may conceivably break in the future unless the implementation enforces stronger rules. On Wed, Aug 19, 2020 at 6:13 AM Galder Zamarreno <galder@redhat.com> wrote:

...

On Mon, Aug 10, 2020 at 2:19 PM Doug Lea <dl@cs.oswego.edu> wrote:

...
Catching up...

As implied in other posts, the minimal fix is to add a trailing release fence (using Unsafe?) to the constructor.

FYI I have sent an RFR with the proposed fix ^

https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068244.htm...

...
Or less delicately, to access only using acquire/release (which will cost a bit on ARM/Power, but probably not noticeable on x86), or most simply (but expensively) to declare the field volatile.

Also, as Hans noted, the consensus seems to be that there not enough to be gained by always adding a release fence to constructors. A few errors like this might never occur, but other related anomalies with non-final field accesses would remain.

-Doug

Paul Sandoz

5:28 p.m.

Yes. Core library code close to the JVM in OpenJDK (that in the java.base module, commonly that in java.lang.* close to that in hotspot) often assumes implementation specifics of the JVM. The two are closely coupled. There are other such implementation-specific contracts like @Stable, or “final fields as really final", or certain intrinsics, hidden fields, and much of direct Unsafe usage etc. Paul.

...

On Aug 19, 2020, at 9:55 AM, Hans Boehm <hboehm@google.com> wrote:

If I understand the code correctly, adding the fence should fix this for current implementations. But it is not correct by current language rules, and may conceivably break in the future unless the implementation enforces stronger rules.

On Wed, Aug 19, 2020 at 6:13 AM Galder Zamarreno <galder@redhat.com> wrote:

...
On Mon, Aug 10, 2020 at 2:19 PM Doug Lea <dl@cs.oswego.edu> wrote:

...
Catching up...

As implied in other posts, the minimal fix is to add a trailing release fence (using Unsafe?) to the constructor.

FYI I have sent an RFR with the proposed fix ^

https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068244.htm...

...
Or less delicately, to access only using acquire/release (which will cost a bit on ARM/Power, but probably not noticeable on x86), or most simply (but expensively) to declare the field volatile.

Also, as Hans noted, the consensus seems to be that there not enough to be gained by always adding a release fence to constructors. A few errors like this might never occur, but other related anomalies with non-final field accesses would remain.

-Doug

John Rose

8 Aug 8 Aug

12:41 a.m.

On Aug 7, 2020, at 2:35 PM, John Rose <john.r.rose@oracle.com> wrote:

...

(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

Paul helpfully pointed me at Aleksey’s excellent description and investigation of this proposal, which I had forgotten about: https://shipilev.net/blog/2014/all-fields-are-final/ https://bugs.openjdk.java.net/browse/JDK-8031818 The JVM flag -XX:+AlwaysSafeConstructors might be useful to enable this proposed feature and see if it suppresses the bug. (Don’t forget to unlock.) Looks like the conversation is stalled, pending further insights and problem fixes. I had an idea for fixing one of the problems with chained constructors, commented on JDK-8032218. — John

Andrew Haley

9 Aug 9 Aug

2:52 p.m.

On 8/7/20 10:35 PM, John Rose wrote:

...

(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

I'd support that: accidental unsafe publication like this is a really counter-intuitive language feature.

...

Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache.

A StoreStore fence would be enough here, and cheaper on some systems. The memory dependency ordering from the load of classValueMap to the load of classValueMap.cache means we don't need an acquire fence. After all, final fields have the needed guarantee, only StoreStore fences, and no load fences. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Hans Boehm

5:55 p.m.

There is no guarantee that the address dependency enforces ordering if there is no final field involved. This may matter in the future, since ARM's Scalable Vector Extension does not guarantee ordering for address-dependent loads, if both loads are vector loads. I expect there are cases in which it would be profitable and safe, by the actual JMM rules, to use these. What bothers me most about effectively generalizing the JMM rules to reflect current implementations is that x = 1; // Initialization of non-final field in constructor assert x == 1; can fail, as can more interesting code that uses x after initialization in the constructor. The current rules are messy, but generally kind of make sense for final fields. The generalization to non-final fields is significantly stranger. Hans On Sun, Aug 9, 2020 at 7:53 AM Andrew Haley <aph@redhat.com> wrote:

...

On 8/7/20 10:35 PM, John Rose wrote:

...
(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

I'd support that: accidental unsafe publication like this is a really counter-intuitive language feature.

...
Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache.

A StoreStore fence would be enough here, and cheaper on some systems. The memory dependency ordering from the load of classValueMap to the load of classValueMap.cache means we don't need an acquire fence.

After all, final fields have the needed guarantee, only StoreStore fences, and no load fences.

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Andrew Haley

11 Aug 11 Aug

10:47 a.m.

On 09/08/2020 18:55, Hans Boehm wrote:

...

There is no guarantee that the address dependency enforces ordering if there is no final field involved. This may matter in the future, since ARM's Scalable Vector Extension does not guarantee ordering for address-dependent loads, if both loads are vector loads.

Ouch. Thanks, I didn't know that.

...

I expect there are cases in which it would be profitable and safe, by the actual JMM rules, to use these.

But... we have to implement final fields somehow. And if dependency ordering isn't sufficient then we're going to have to use fences before SVE loads. Given that data in SVE registers may be security sensitive, such as crypto keys, we'd have to do that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Andrew Dinn

11:03 a.m.

On 11/08/2020 11:47, Andrew Haley wrote:

...

On 09/08/2020 18:55, Hans Boehm wrote:

...
There is no guarantee that the address dependency enforces ordering if there is no final field involved. This may matter in the future, since ARM's Scalable Vector Extension does not guarantee ordering for address-dependent loads, if both loads are vector loads.

Ouch. Thanks, I didn't know that.

You ought to look at the pdf Ningsheng linked in the RFR that was posted with the SVE patch. The pdf is available here: https://developer.arm.com/docs/ddi0584/latest The relevant text is in section 4.4. Memory Ordering. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill

Andrew Haley

1:17 p.m.

On 11/08/2020 12:03, Andrew Dinn wrote:

...

You ought to look at the pdf Ningsheng linked in the RFR that was posted with the SVE patch. The pdf is available here:

https://developer.arm.com/docs/ddi0584/latest

The relevant text is in section 4.4. Memory Ordering.

That looks better than I feared. The only relevant text here AFAIUI is "If an address dependency exists between two memory reads, and an SVE non-temporal vector load instruction generated the second read, then in the absence of any other barrier mechanism to achieve order, the memory accesses can be observed in any order by the other observers within the shareability domain of the memory addresses being accessed." ... but this is only about non-temporal vector load instructions. It does mean that if we want to use those we'll have to separate them from loads of base addresses with fences. So it'll be Load base register load fence ... vector loop ... -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Hans Boehm

5:06 p.m.

I think the relevant statement is: "An address dependency between two reads generated by SVE vector load instructions does not contribute to the Dependency-ordered-before relation." This is only an issue if BOTH loads are SVE loads. In particular reference loads have to be vectorized for this to matter, which I expect is not the common situation. I have not explored this in great detail, but it should suffice to put fences between two dependent vector reference loads, and between a reference load and a dependent final field load. On Tue, Aug 11, 2020 at 6:17 AM Andrew Haley <aph@redhat.com> wrote:

...

On 11/08/2020 12:03, Andrew Dinn wrote:

...
You ought to look at the pdf Ningsheng linked in the RFR that was posted with the SVE patch. The pdf is available here:

https://developer.arm.com/docs/ddi0584/latest

The relevant text is in section 4.4. Memory Ordering.

That looks better than I feared. The only relevant text here AFAIUI is

"If an address dependency exists between two memory reads, and an SVE non-temporal vector load instruction generated the second read, then in the absence of any other barrier mechanism to achieve order, the memory accesses can be observed in any order by the other observers within the shareability domain of the memory addresses being accessed."

... but this is only about non-temporal vector load instructions. It does mean that if we want to use those we'll have to separate them from loads of base addresses with fences.

So it'll be

Load base register load fence ... vector loop ...

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Andrew Dinn

12 Aug 12 Aug

9:08 a.m.

On 11/08/2020 18:06, Hans Boehm wrote:

...

I think the relevant statement is:

"An address dependency between two reads generated by SVE vector load instructions does not contribute to the Dependency-ordered-before relation."

This is only an issue if BOTH loads are SVE loads. In particular reference loads have to be vectorized for this to matter, which I expect is not the common situation. I have not explored this in great detail, but it should suffice to put fences between two dependent vector reference loads, and between a reference load and a dependent final field load. Hmm, so this might perhaps be an issue with something like a deep copy of an int[][], where loading of successive int[] references might be vectorized using SVE instructions and processing of the contents of each referenced int[] might also be similarly vectorized. In that case the loading of the int contents would need to be ordered wrt the load of the int[] references using a LoadLoad barrier?

regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill

Peter Levart

15 Aug 15 Aug

8:37 a.m.

Hi, There might be a way to fix this NPE without adding additional memory fences. The CacheValueMap.cacheArray field is not final because it can change during lifetime of CacheValueMap - it holds an array of entries which can get resized (replaced with bigger array) which is performed while holding a lock on the CacheValueMap instance. So I thought: why couldn't the field be null initially and its value initialized in a similar way as it is replaced with bigger array now. If I got this correct, it would take the following changes: https://github.com/openjdk/jdk/pull/7 The fast-path now trades an explicit null check for a null check that throws NPE when dereferencing cache array, so it should probably have no effect on performance (benchmarks pending). So, what do you think? Regards, Peter On 8/12/20 11:08 AM, Andrew Dinn wrote:

...

On 11/08/2020 18:06, Hans Boehm wrote:

...
I think the relevant statement is:

"An address dependency between two reads generated by SVE vector load instructions does not contribute to the Dependency-ordered-before relation."

This is only an issue if BOTH loads are SVE loads. In particular reference loads have to be vectorized for this to matter, which I expect is not the common situation. I have not explored this in great detail, but it should suffice to put fences between two dependent vector reference loads, and between a reference load and a dependent final field load. Hmm, so this might perhaps be an issue with something like a deep copy of an int[][], where loading of successive int[] references might be vectorized using SVE instructions and processing of the contents of each referenced int[] might also be similarly vectorized. In that case the loading of the int contents would need to be ordered wrt the load of the int[] references using a LoadLoad barrier?

regards,

Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill

Peter Levart

9:13 a.m.

On 8/15/20 10:37 AM, Peter Levart wrote:

...

https://github.com/openjdk/jdk/pull/7

Oops, I forgot to remove the redundant initialization of cacheArray field in constructor, which revealed another place in code where the check for null value has to be made. Here's a modified patch: https://github.com/openjdk/jdk/pull/9 Sorry for abusing GitHub pull request mechanism but I don't have bandwidth currently to clone the mercurial repository ;-) Regards, Peter

Andrew Haley

16 Aug 16 Aug

5:35 p.m.

On 15/08/2020 10:13, Peter Levart wrote:

...

https://github.com/openjdk/jdk/pull/9

Sorry for abusing GitHub pull request mechanism but I don't have bandwidth currently to clone the mercurial repository ;-)

That's a lot of work to avoid a simple fence. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Peter Levart

17 Aug 17 Aug

2:24 p.m.

On 8/16/20 7:35 PM, Andrew Haley wrote:

...

On 15/08/2020 10:13, Peter Levart wrote:

...
https://github.com/openjdk/jdk/pull/9

Sorry for abusing GitHub pull request mechanism but I don't have bandwidth currently to clone the mercurial repository ;-) That's a lot of work to avoid a simple fence.

Two fences, mind you (the read fence is no-op only on Intel). So take half of that work for each fence ;-) Still a lot? No, really it was not much work to make the patch. The real work is yet to come - checking that it is correct. I leaned on the assumption that it is already correct when the cacheArray has to be swapped with bigger array. All accesses to the cacheArray field but the fast-path read-only probing is done under lock. So it is not that hard to reason about most of the code changes. The only changes that are done to code that accesses cacheArray field without holding a lock are two additional null checks in: loadFromCache and probeBackupLocations which basically just make the methods return null (not found) when parameter cache (read from cacheArray field) is found to be null. Now that the set of valid values for cache parameter contains null, there's no need for EMPTY_CACHE. I think that ClassValue was designed to cope with only one pair of fences (the ones that are used for initializing/accessing the final Entry.value field) and that all other accesses are performed with data races. So adding fences to accesses of cacheArray field would mean giving up on the design (John?). Regards, Peter

Andrew Haley

18 Aug 18 Aug

3 p.m.

On 17/08/2020 15:24, Peter Levart wrote:

...

On 8/16/20 7:35 PM, Andrew Haley wrote:

...
On 15/08/2020 10:13, Peter Levart wrote:

...
https://github.com/openjdk/jdk/pull/9

Sorry for abusing GitHub pull request mechanism but I don't have bandwidth currently to clone the mercurial repository ;-) That's a lot of work to avoid a simple fence.

Two fences, mind you (the read fence is no-op only on Intel). So take half of that work for each fence ;-) Still a lot?

OK, but in HotSpot (and GraalVM AFAICR, but I'd need to check) finals only require a StoreStore fence and the address dependency does the rest, so if we can get away with using a final field then we can also get away with just using a StoreStore fence at the end of the constructor. I understand that this (as Hans pointed out) isn't strict Java but it depends on how much we care about strict Java inside the core libraries.

...

No, really it was not much work to make the patch. The real work is yet to come - checking that it is correct.

Indeed. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Hans Boehm

3:50 p.m.

On Wed, Aug 12, 2020 at 2:09 AM Andrew Dinn <adinn@redhat.com> wrote:

...

On 11/08/2020 18:06, Hans Boehm wrote:

...
I think the relevant statement is:

"An address dependency between two reads generated by SVE vector load instructions does not contribute to the Dependency-ordered-before

relation."

...

...
This is only an issue if BOTH loads are SVE loads. In particular reference loads have to be vectorized for this to matter, which I expect is not the common situation. I have not explored this in great detail, but it should suffice to put fences between two dependent vector reference loads, and between a reference load and a dependent final field load.

Hmm, so this might perhaps be an issue with something like a deep copy of an int[][], where loading of successive int[] references might be vectorized using SVE instructions and processing of the contents of each referenced int[] might also be similarly vectorized. In that case the loading of the int contents would need to be ordered wrt the load of the int[] references using a LoadLoad barrier?

I think it's potentially more common, though it clearly depends on the details of current and future implementations. Let's say I define a MutableInteger class in the expected way, and then want to say, copy every element to an int[]. At least if we naively ignore details like null checks (which I may be able to statically preclude, and may not affect this anyway), I think this can be fairly easily vectorized in SVE. If we assume that MutableInteger somehow has a StoreStore barrier in its constructor, initializing the MutableInteger and then storing a reference to it in the array, while my copy function is running, can still result in the copy seeing an uninitialized value. If my example used Integer instead of MutableInteger, the compiler would have to either insert a fence, or not vectorize, to prevent this. With MutableInteger, there is no reason for it to do so. Hans

Galder Zamarreno

10 Aug 10 Aug

6:12 a.m.

Hi, I'm David's colleague. I'm the one who's spotted this issue twice while GraalVM is doing its points-to analysis with jdk11u-dev. The fuller exception is here: Caused by: java.lang.NullPointerException at java.base/java.lang.ClassValue$ClassValueMap.loadFromCache(ClassValue.java:535) at java.base/java.lang.ClassValue$ClassValueMap.probeHomeLocation(ClassValue.java:541) at java.base/java.lang.ClassValue.get(ClassValue.java:101) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIMetaAccessContext.fromClass(HotSpotJVMCIMetaAccessContext.java:163) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.fromClass(HotSpotJVMCIRuntime.java:339) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.fromObjectClass(HotSpotResolvedObjectTypeImpl.java:83) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.fromMetaspace(HotSpotResolvedObjectTypeImpl.java:97) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.resolveTypeInPool(Native Method) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotConstantPool.loadReferencedType(HotSpotConstantPool.java:727) at java.base/jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.oracle.graal.pointsto.infrastructure.WrappedConstantPool.loadReferencedType(WrappedConstantPool.java:88) at com.oracle.graal.pointsto.infrastructure.WrappedConstantPool.loadReferencedType(WrappedConstantPool.java:105) at com.oracle.svm.hosted.phases.SubstrateClassInitializationPlugin.loadReferencedType(SubstrateClassInitializationPlugin.java:71) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.loadReferenceType(BytecodeParser.java:4384) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.maybeEagerlyResolve(BytecodeParser.java:4366) at com.oracle.svm.hosted.phases.SharedGraphBuilderPhase$SharedBytecodeParser.maybeEagerlyResolve(SharedGraphBuilderPhase.java:127) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.lookupMethod(BytecodeParser.java:4316) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.genInvokeStatic(BytecodeParser.java:1659) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.processBytecode(BytecodeParser.java:5340) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.iterateBytecodesForBlock(BytecodeParser.java:3423) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.processBlock(BytecodeParser.java:3230) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.build(BytecodeParser.java:1088) at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.buildRootMethod(BytecodeParser.java:982) at jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase$Instance.run(GraphBuilderPhase.java:84) at jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.run(Phase.java:49) at jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:214) at jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.apply(Phase.java:42) at jdk.internal.vm.compiler/org.graalvm.compiler.phases.Phase.apply(Phase.java:38) at com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.parse(MethodTypeFlowBuilder.java:224) at com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.apply(MethodTypeFlowBuilder.java:351) at com.oracle.graal.pointsto.flow.MethodTypeFlow.doParse(MethodTypeFlow.java:322) at com.oracle.graal.pointsto.flow.MethodTypeFlow.ensureParsed(MethodTypeFlow.java:311) at com.oracle.graal.pointsto.flow.MethodTypeFlow.addContext(MethodTypeFlow.java:112) at com.oracle.graal.pointsto.flow.StaticInvokeTypeFlow.update(InvokeTypeFlow.java:437) at com.oracle.graal.pointsto.BigBang$2.run(BigBang.java:530) at com.oracle.graal.pointsto.util.CompletionExecutor.lambda$execute$0(CompletionExecutor.java:173) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) ... 5 more On Fri, Aug 7, 2020 at 11:36 PM John Rose <john.r.rose@oracle.com> wrote:

...

On Aug 7, 2020, at 9:52 AM, Andrew Haley <aph@redhat.com> wrote:

...
On 8/7/20 4:39 PM, David Lloyd wrote:

...
However, I'm wondering if this isn't a side effect of what appears to be an incorrect double-checked lock at lines 374-378 of ClassValue.java [1]. In order for the write to the non-volatile `cache` field of ClassValueMap, it is my understanding that there must be some acquire/release edge from where the variable is published to guarantee that all of the writes are visible to the read site, and I'm not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field. And even if my feeling that this is dodgy is valid, I'm not sure whether this NPE is a possible outcome of that problem!

It certainly doesn't look right to me. AFAICS this is classic broken double-checked locking. It'd need some sort of fence after constructing the ClassValueMap instance and before publishing it.

Y’all are calling my baby ugly but he’s really cute if you look at him from the right angle.

(First, a quick question: This bug shows up on x86, right?

Yes, the issue is happening on x86.

...

If so, we probably are not looking at a hardware reordering problem but something stemming from Graal’s reordering of operations compared to C1 and C2, perhaps after a different set of inlining decisions uncharacteristic of C2.)

To the best of my knowledge, GraalVM points-to analysis is just a JVM process that figures out which code is used by the end-user application. The result of this process is then used to create the native executable. I'm not aware of the Graal compiler having an impact here.

...

From what I've learnt looking at this code and more recent JVMCI implementations, jdk11u-dev relies on reflection to understand the end-user application, whereas more modern JVMCI stacks avoid the use of reflection and query the JVM directly.

I've been trying to create a vanilla JDK stress test that would trigger such NPE but I've not had luck so far. I don't know whether JVMCI has a part to play here.

...

The bug indeed looks like a dropped fence at the end of the constructor for ClassValueMap. (I don’t see an easier way to get a null at the NPE point.) The fence might not actually be dropped, but a write which the fence is supposed to fence might have been reordered after the fence, and after an unlock operation, allowing someone to see the initial null after the unlock.

The double-check idiom breaks when the outer check (not synchronized) gets half-baked data and acts on it. This much-maligned idiom would not be broken in the present case under the assumption that a data dependency on type.classValueMap (set in initializeMap) will see a fully initialized value of ClassValueMap.cacheArray.

Now we get into the fine print, and I agree there is a bug here. The easy fix is to repair the logic under which that ugly everybody’s-hating-on-it double check idiom would be in fact correct.

The fine print requires a couple different things that are not quite fulfilled by the present code, and Graal has either reordered the memory accesses to expose the problem, or it (itself) has a complementary bug. (Or it has an unrelated bug, and you people are *staring* at my baby!)

First, as Paul noted, you need a final variable in your class to get the benefit of a fence at the end of the constructor. Oops #1: ClassValueMap doesn’t have *any* finals. That’s awkward. Two fixes for that: (a) add a dummy final, and (b) add a real fence in the constructor. For better luck, maybe put the fence at the end of sizeCache.

On machines (not x86) which need fences *before* final reads, another explicit fence should be placed before the unsynchronized read (or else inside the getCache method).

By the letter of the JMM, the helpful effects of the fence at the end of the constructor are only guaranteed for references which depend on final variables set by that constructor, because only those final variables get a “freeze” operation which stabilizes them (and values dependently loaded via them). Throwing in a dummy final is therefore not guaranteed to work. But throwing in a fence will work. One downside of the fence is that the JMM is silent about fences, but the JMM is widely recognized as a partial story, not the full or final story.

(Here’s a tidbit of JMM politics: I once heard Doug Lea considering whether maybe all fields should be treated more like final fields. I don’t know if this is still a live idea, but it would make this bug go way, since nearly all constructors would then get fences of some sort.)

Here’s a bit of explanatory (non-normative) text from the draft POPL paper for JMM, where the parenthetic comment indicates (I think) the sort of thing that is happening here:

...
Let us now assume that the acquire and release do happen. As long as these actions take place after object has been constructed (and there is no code motion around the end of the constructor), the diffs that the second processor acquires are guaranteed to reflect the correctly constructed object.

Basically, the synchronization statement is expected to do a release *after* the non-null value is stored. It looks like this is failing due to a lost and/or reordered fence.

Second, David says:

...
I’m not really sure that the exit-the-constructor edge is going to match up with with the synchronized block which protects a different non-volatile field.

By the letter of the JMM (AFAIUI), this code is coloring way outside the lines. (Yes, I admit it.) The problem is pretty fundamental. I went and swapped in some JMM spec just now (don’t have it in short term memory; go figure), and here’s what I re-learned.

In JMM terms, a “release” is the earlier end of any edge in relation called “synchronizes-with”. An “acquire” is the latter end of such an edge. In the JMM this relation applies only to lock, unlock, and volatile reads and writes. Your guess is as good as mine whether other operations (CAS, fences, etc.) participate; the JMM is silent. Crucially, acquires and release do not touch plain fields. This is counter-intuitive for those of us who like to reason with fences.

The JMM prevents regular writes from floating past what we like to think of as “release points” (unlocks) by appealing to processor order inside lock/unlock pairs, and also by appealing to global ordering of multiple lock/unlock pairs (or stuff with volatiles and other corner cases which we will neglect here). In order to work right, a normal write has to come before a release *and* a matching normal read has to come after an acquire, and such an acquire is nothing other than the lock of a later lock/unlock pair, typically in another thread.

So the rules which reliably connect normal writes to normal reads in other threads work differently from acquire fences and release fences as we (or just I?) usually think of them. Yes, there are “acquires” and “releases” in the JMM. No, they are not primitives but rather descriptions of the way the happens-before relation is constrained by (among other things) lock and unlock actions.

On x86, you don’t have to worry about acquires; you just set up the right release fences. In the JMM, there’s no way to get either an acquire or a release without a synchronized statement (or other fancy stuff like volatiles). That’s why double-check is broken in the pure JMM, and why it can be repaired if you add some (non-JMM) fences.

So why is this showing up suddenly with Graal? Possibly it’s got a really aggressive interpretation of the JMM, relative to the standard HotSpot JITs. Possibly it’s got a bug. Graal might be allowing the initialization of ClassValueMap.cacheArray to float down past publication of the ClassValueMap on type.classValueMap (in initializeMap). Less likely, it is allowing the initialization to totally escape the lock/unlock pair, something the JMM might allow but the Cookbook advises against (in its “Can Reorder” table).

Perhaps Graal is modeling the “freeze” operations on finals more accurately (and the Cookbook gives instructions for this). That would allow the ClassValueMap to correctly initialize its frozen finals, but not (unfortunately) the not-frozen cacheArray field. It appears that the new JIT on the block is exposing my neglect for my poor baby, in not putting the right fences around its playpen.

Bottom line: Add a release fence before type.cVM is set, and add a (no-op on x86) acquire fence in getCache. Quick test: Add a dummy final to CVM, to see if that makes the bug let go.

— John

2029

Age (days ago)

2041

Last active (days ago)

List overview

Download

24 comments

10 participants

participants (10)

Andrew Dinn
Andrew Haley
David Holmes
David Lloyd
Doug Lea
Galder Zamarreno
Hans Boehm
John Rose
Paul Sandoz
Peter Levart

Possible subtle memory model error in ClassValue

tags

participants (10)