RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV

Patricio Chilano Mateo pchilanomate at openjdk.org
Tue Mar 7 22:31:11 UTC 2023


Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash.

As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have been 
 already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues.

The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired.

I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test.

Thanks,
Patricio

-------------

Commit messages:
 - v1

Changes: https://git.openjdk.org/jdk/pull/12911/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12911&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302779
  Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/12911.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12911/head:pull/12911

PR: https://git.openjdk.org/jdk/pull/12911


More information about the serviceability-dev mailing list