RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop
Stefan Karlsson
stefank at openjdk.org
Tue Sep 19 06:57:04 UTC 2023
The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper:
Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296
# assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1
V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296)
V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89)
V [libjvm.so+0xa66c83] int freeze_internal<Config<(oop_kind)1, ZBarrierSet> >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584)
V [libjvm.so+0xa6711b] int freeze<Config<(oop_kind)1, ZBarrierSet> >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272)
J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055]
This is the scenario that triggers this bug:
1) ContinuationWrapper is created on the stack
2) We enter a JRT_BLOCK section
3) Call ContinuationWrapper::done()
4) Exit the JRT_BLOCK
5) ~ContinuationWrapper is called
(3) sets ContinuationWrapper::_continuation to nullptr
(4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1
(5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop.
So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert:
diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp
index 40205d324a6..80b60d0b7b8 100644
--- a/src/hotspot/share/runtime/javaThread.hpp
+++ b/src/hotspot/share/runtime/javaThread.hpp
@@ -258,7 +258,7 @@ class JavaThread: public Thread {
public:
void inc_no_safepoint_count() { _no_safepoint_count++; }
- void dec_no_safepoint_count() { _no_safepoint_count--; }
+ void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); }
#endif // ASSERT
public:
// These functions check conditions before possibly going to a safepoint.
To catch the broken nullptr check in:
void allow_safepoint() {
#ifdef ASSERT
// we could have already allowed safepoints in done
if (_continuation != nullptr && _current_thread->is_Java_thread()) {
JavaThread::cast(_current_thread)->dec_no_safepoint_count();
}
#endif
}
The assert is triggered when I run the test with G1.
I propose that we fix this by stop setting _continuation to nullptr as a way to indicate cancellation of the ContinuationWrapper, and instead use a bool for that.
I made some slight changes to remove all the code duplication in the constructors so that I only had to initialize the new _done variable in one place.
I've tested this with the original reproducer + assert with both ZGC and G1. I'll run this through our internal CI pipeline as well.
-------------
Commit messages:
- 8316436: ContinuationWrapper uses unhandled nullptr oop
Changes: https://git.openjdk.org/jdk/pull/15810/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15810&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8316436
Stats: 27 lines in 2 files changed: 8 ins; 11 del; 8 mod
Patch: https://git.openjdk.org/jdk/pull/15810.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/15810/head:pull/15810
PR: https://git.openjdk.org/jdk/pull/15810
More information about the hotspot-dev
mailing list