RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop

Stefan Karlsson stefank at openjdk.org
Tue Sep 19 06:57:04 UTC 2023


The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper:


Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296
# assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 

V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296)
V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89)
V [libjvm.so+0xa66c83] int freeze_internal<Config<(oop_kind)1, ZBarrierSet> >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584)
V [libjvm.so+0xa6711b] int freeze<Config<(oop_kind)1, ZBarrierSet> >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272)
J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055]


This is the scenario that triggers this bug:
1) ContinuationWrapper is created on the stack
2) We enter a JRT_BLOCK section
3) Call ContinuationWrapper::done()
4) Exit the JRT_BLOCK
5) ~ContinuationWrapper is called

(3) sets ContinuationWrapper::_continuation to nullptr
(4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1
(5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop.

So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert:

diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp
index 40205d324a6..80b60d0b7b8 100644
--- a/src/hotspot/share/runtime/javaThread.hpp
+++ b/src/hotspot/share/runtime/javaThread.hpp
@@ -258,7 +258,7 @@ class JavaThread: public Thread {
 
  public:
   void inc_no_safepoint_count() { _no_safepoint_count++; }
- void dec_no_safepoint_count() { _no_safepoint_count--; }
+ void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); }
 #endif // ASSERT
  public:
   // These functions check conditions before possibly going to a safepoint.


To catch the broken nullptr check in:

   void allow_safepoint() {
     #ifdef ASSERT
       // we could have already allowed safepoints in done
      if (_continuation != nullptr && _current_thread->is_Java_thread()) {
         JavaThread::cast(_current_thread)->dec_no_safepoint_count();
       }
     #endif
   }


The assert is triggered when I run the test with G1.

I propose that we fix this by stop setting _continuation to nullptr as a way to indicate cancellation of the ContinuationWrapper, and instead use a bool for that.

I made some slight changes to remove all the code duplication in the constructors so that I only had to initialize the new _done variable in one place.

I've tested this with the original reproducer + assert with both ZGC and G1. I'll run this through our internal CI pipeline as well.

-------------

Commit messages:
 - 8316436: ContinuationWrapper uses unhandled nullptr oop

Changes: https://git.openjdk.org/jdk/pull/15810/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15810&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8316436
  Stats: 27 lines in 2 files changed: 8 ins; 11 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/15810.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15810/head:pull/15810

PR: https://git.openjdk.org/jdk/pull/15810


More information about the hotspot-dev mailing list