RIP values like 0xffffffff94bf7f80 due to patched NMethod

Thu Nov 15 19:43:32 UTC 2018

hs_err file is interesting regardless core file. It contains a lot of information from which we can find what happened.
Most important what VM flags were used, CPU info, OS info, state of codecache (its address range), what compilations and 
deoptimizations happened before crash, etc.

You can strip out (or rename) class and method names and remove any other sensitive information if user concern about 
it. It would be nice to have it from JDK 11 runs.

You can include (copy/paste) hs_err file content into bug's description. That is what usually happens with external bugs.

Note, I did not remember we ever had such issue before. Did it happen only on one particular machine or on several 
different? Is the OS is up to date? In bug report it says windows_7. It is old.

On 11/15/18 8:41 AM, Alexander Miloslavskiy wrote:
> Vladimir, thanks for your time!
> 
> It took a couple days to receive permission from user to share his hs_err.log files. I have submitted bugreport as you 
> suggested (ID 9058132), but there was no way to attach those files there.
> 
> On the other hand, frankly, hs_err.log are quite useless in this specific case. Without a core dump there was no chance 
> to get any of the key insights.
> 
> I understand that a few bugs were fixed recently, but I guess since it sill happens on JRE11 it's a different problem.
> 
> I also studied the code to try to find how 0x90 can get written on top of jmp, but didn't find anything definite.

Patching consist of 2 stores. We write 5th byte of jump instruction and then flush cache line before storing first 4 
bytes of instruction and flush again.
Unless something happen to cacheline between 2 stores I don't understand how this corruption can happen at time of 
patching. Or first store is discarded by CPU which is impossible.

Can you decode what are instruction after 0x90?

> 
> Following your lead, I have checked if crashing NMethod is freed.
> * Its 'HeapBlock::Header::_used' contains 1
> * NMethod fields match what they should be for old NMethod, so I think that this block is not owned by someone else yet.
> * There are no pointers to old (crashing) NMethod anywhere except discarded (with addresses < RSP) stack of crashing 
> thread. This worries me a bit, but I guess I simply don't know how JVM works.
> 
> Since my last mail I debugged it a bit more and found one new fact: just after crashing method was compiled with new 
> optimization settings, the calling method also was compiled, fully inlining called method and therefore eliminating the 
> need for called Method's NMethod. My intuition says that should be related, because the coincidence is too obvious. 
> However, called Method's 'Method._code' still contains reference to new NMethod - looks like JVM didn't realize it's not 
> needed.

No, it is not related. Your original assumption was correct. Recompilation of any method by high tier (tier4 
CompLevel_full_optimization) leads to deoptimization of previous version of compiled code (tier3 CompLevel_full_profile)

> 
> This last fact is supported by one core dump so far, I still need to check others. Other facts I listed in my previous 
> mail are supported by multiple dumps (and all I have checked).
> 
> Unfortunately, the bug only reproduces for our customer.

If customer can do testing runs which reproduce the issue you can try to build fastdebug version of VM and give it to 
them to try.

Vladimir