RFR: 8325469: Freeze/Thaw code can crash in the presence of OSR frames [v2]

Mon Apr 8 14:16:24 UTC 2024

> Freeze/thaw code assumes that a compiled frame for a method where num_stack_arg_slots() > 0 will always have the arguments setup above the metadata at the bottom of the frame. But when converting an interpreter frame to a compiled frame during OSR we don't explicitly leave room for the stack arguments after popping the interpreter frame. All parameters needed will be read from the "buf" array and stored inside the frame before calling OSR_migration_end().
> 
> This mismatch in how the stack looks and what we assume can lead to different crashes. In particular the issue happens when the OSR conversion happens for the bottom-most frame in the stack. If the OSR frame has a caller in the stack then there is no issue on freezing/thawing. I added more details about this in the bug comments.
> 
> When the OSR conversion happens for the bottom-most frame then a future freeze/thaw can lead to crashes for all cases: freeze_fast/thaw_fast, freeze_fast/thaw_slow, freeze_slow/thaw_slow. When freezing fast, either thawing fast or slow can lead to trying to read past the bottom of the stackChunk or writing below the allocated space in the stack. The freeze slow case is almost okay, except that it uncovered an invalid assert that is triggered if the size of the OSR frame plus all the other frames we freeze takes less space than the size of locals minus parameters of the interpreter frame that was OSR. I also added more details about these in the bug comments.
> 
> I tested different fixes, but I think the most straightforward one is to add _num_stack_arg_slots in the nmethod class and initialize it accordingly depending on whether the nmethod is an OSR one or not.
> 
> The patch includes a new test that exercises all these possible combinations of OSR frame at bottom of stack or not, and then freezing fast/slow and thawing fast/slow. The bottom case where we freeze fast and thaw slow reproduces the originally reported crash. There are actually two different failure modes depending of whether this is a thaw top or return barrier case. The other bottom cases lead to the other crashes described in the bug comments.
> The new test uncover another bug besides the OSR issues, but since it's a different one I filed a separate JBS issue (JDK-8329665) and I made this a dependent PR.
> 
> I tested the current patch with the new test and also run it through mach5 tiers1-6.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  fix comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18637/files
  - new: https://git.openjdk.org/jdk/pull/18637/files/07a9cb51..b35306f8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18637&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18637&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18637.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18637/head:pull/18637

PR: https://git.openjdk.org/jdk/pull/18637