RFR(S): Use Vectored Exception Handling on Windows

Ludovic Henry luhenry at microsoft.com
Tue Jul 14 16:34:49 UTC 2020


Hi Thomas,

This where Windows exception handling and Unix/Linux signals differ. On Windows, you have VEH, SEH and Unhandled Exception Handling (I'll call it UEH here), while on Unix/Linux, you only have signals.

On Windows, by having this split, you can easily split your exception handling into 1. treating expected exceptions (EXCEPTION_ILLEGAL_INSTRUCTION on a deoptimization, EXCEPTION_ACCESS_VIOLATION in arraycopy stub, etc.), and 2. generating an hs_err file on an unexpected exception. You can do 1. with VEH and SEH, and 2. with UEH, and that's what I am proposing to do here.

Practically speaking, the existing `topLevelExceptionFilter` would be split into two: a `topLevelVectoredExceptionFilter` which would be passed to `AddVectoredExceptionHandler`, and a `topLevelUnhandledExceptionHandler` which would be passed to `SetUnhandledExceptionHandler`. This `topLevelUnhandledExceptionHandler` would contain (more or less) _only_ the `VMError::report_and_die`, and the `topLevelVectoredExceptionFilter` would contain _no_ `VMError::report_and_die` whatsoever.

Keeping the `VMError::report_and_die` inside VEH would, like you say, completely kill any use of SEH, even in external libraries. That would be a breaking change, and is then, IMO, not acceptable.

Thanks,

--
Ludovic

________________________________________
From: Thomas Stüfe <thomas.stuefe at gmail.com>
Sent: Monday, July 13, 2020 23:29
To: Ludovic Henry
Cc: hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(S): Use Vectored Exception Handling on Windows

Hi Ludovic,

On Mon, Jul 13, 2020 at 11:55 PM Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:
Hi Thomas,

Thank you for your feedback!

Let me answer on some of the cases you mention.

> A) this case exists today. An app getting signals via VEH would have to willingly ignore signals for us to get them. This does not change, your patch would mean this happens less often, so I do not see a backward compatibility problem here.

Exactly.

> B) this is a new case. We would have to ignore signals not meant for us. Technically by just ignoring them. Distinguishing this is a bit difficult though. Note the subtle difference to Unix: there we have signal chaining, so an application which is really really interested in signals for its own purposes uses it (e.g. by preloading libjsig) and then we know its handler and hand over the signal.

Today, through SEH and RtlAddFunctionTable, we only get a very clear subset of exceptions: the one triggered in the code cache. If an exception is triggered from a PC outside of this code cache, SEH will not get the handler we registered with RtlAddFunctionTable, and we'll simply _not_ call into HandleExceptionFromCodeCache (the handler we register with RtlAddFunctionTable). That can be trivially reproduced in the VEH by simply checking that the PC is between CodeCache::low_bound() and CodeCache::high_bound().

This is what you are mentioning with "we only can distinguish our crashes from their crashes via crash pc, rejecting any crash not in our code (dynamic or static). Well, arguably this would be just how it is today with our code scoped via SEH".


Not sure we understand each other.

Today we get exceptions from two sides:
- via SEH, __try/__except, in threads attached to the VM. There the pc is either us or third party code below us which did not bother setting up SEH for themselves
- via RtlAddFunctionTable for the code cache, where we specify code cache boundaries.

With VEH we would get all exceptions in the process. Including exceptions from threads which have never seen the libjvm, or from caller code if the hotspot is embedded somewhere.

Under Unix we handle all those crashes by writing hs-err crashlogs, even if those crashes are not our responsibility. Unless user set up signal chaining, where we hand over any crash signal to the chained handler (which for the purpose of clear error reporting is also not perfect).

With VEH I get all exceptions, but have to decide on my own if an exception should result in a hs-err file or handed to the next exception handler. The only way I can see is by examining the pc - iterate through all our binaries and compare the pc with their text segments, and also check the code cache.

I may miss something here.

> With the added safety net of the unhandled exception filter (what happens if multiple parties call this?).

Here, Unhandled Exception Handling predates VEH and it doesn't integrate chaining. The API is similar to signals on Linux/Unix: the last one to register has to make sure to save the previous one and to call/chain it accordingly.

> My only very small personal gripe would be that I always liked how I can quickly use SEH to check if a pointer is valid without disturbing anyone. But within the hotspot at least I can just as well use SafeFetch.

Nothing from the Win32 API stops you from mix-and-matching VEH and SEH. If you want to do a `__try { val = *ptr; } __except (EXCEPTION_EXECUTE_HANDLER) { success = false; }` in some C++ code (in vm or native), nothing stops you from doing so. My understanding of the exception handler logic in the OpenJDK on Windows is that the accepted EXCEPTION_ACCESS_VIOLATION in java, vm, or native code is limited to a clear subset, and anything outside of these known cases is quickly treated as "an exception we cannot handle". SafeFetch is such a case where the instructions potentially triggering the EXCEPTION_ACCESS_VIOLATION are matched against by the exception handler.


Well, in your example, VEH would have preference and get the exception first; in our handler we recognize the exception as not allowed, hence a crash, and write a hs-err file. My success=false; handler would never execute.

But I admit this is really a minor point. I also dimly remember seeing some win32 API to check pointers for readability, so maybe using SEH for these things is not necessary anyway.

Thanks, Thomas

--
Ludovic

________________________________________
From: Thomas Stüfe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>>
Sent: Saturday, July 11, 2020 23:08
To: Ludovic Henry
Cc: hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net>
Subject: Re: RFR(S): Use Vectored Exception Handling on Windows

Hi Ludovic,

sorry for the delay, and thanks for the extensive answer. Please find remarks inline.

On Fri, Jun 26, 2020 at 12:11 AM Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com><mailto:luhenry at microsoft.com<mailto:luhenry at microsoft.com>>> wrote:
Hi Thomas,

It seems that the problem you're describing stems from the current exception handler treating two cases: 1. any exception knowingly triggered by Java code and treated by HotSpot (ex: safepoint-polling, arraycopy stubs, stackoverflow in Java code), and 2. exceptional cases leading to crashes (ex: uncaught C++ exception, an access violation in VM or native/external code, etc.). There is the same problem on Unix because there is only one system (signal handling) for both cases. Fortunately, Windows proposes different systems, each with its own advantages.

The order in which Windows invokes each of these systems is the following:
 1. Vectored Exception Handler registered with `AddVectoredExceptionHandler`
 2. Structured Exception Handler
 3. Vectored Exception Handler registered with `AddVectoredContinueHandler`
 4. Unhandled Exception Handler

Today, Hotspot on x86/x86_64 catches the exception at 2. via a handler registered with `RtlAddFunctionTable`. This handler does both the Java-triggered exceptions and any other exceptions.

Now, from the point of view of an external library or application embedding the JVM inside their own process, they still have all the above options to register an exception handler, irrespective of how Hotspot does it. This creates the following cases:
 - If the application uses VEH: they will (with Hotspot using SEH) be called _before_ Hotspot's exception handler and will then have to be aware that they may get exceptions unrelated to them and will have to ignore them accordingly
 - If the application uses SEH: they will only get exceptions related to their code area

If Hotspot is to use VEH, an exception would play as follow:
 - If the application uses VEH and their registered handler executes _before_ Hotspot's one: same as above
 - If the application uses VEH and their registered handler executes _after_ Hotspot's one: Hotspot has to make sure that the exception was triggered by Hotspot and ignore them otherwise (a range check on the PC can be used here to emulate how it's done with RltAddFunctionTable)
 - If the application uses SEH: the same case as to where the application's handler executes _after_ Hotspot's one

This all assumes that Hotspot's VEH handler doesn't trigger a crash report (VMError::report_and_die) on any exception it doesn't know how to handle. The simplest way to do that is simply _not_ to do it in Hotspot's VEH handler, and to do it by registering a Win32 Unhandled Exception Handler (with SetUnhandlerdExceptionFilter [1]). This handler is _only_ called when no other exception handler treated the exception (by returning EXCEPTION_CONTINUE_EXECUTION or EXCEPTION_EXECUTE_HANDLER). Invoking it means the application is "toast" and not in a runnable state anymore, which fits nicely with the purpose of the Hotspot crash report.


Okay, If I get this correctly:

Today:
  App uses VEH - they execute before us and have to handle this correctly (->A)
  App uses SEH - no interaction

With proposed switch:
  App uses VEH - they may or may not execute before us. If they come before us: (->A). If they come after us -> (B)
  App uses SEH -> (B)

A) this case exists today. An app getting signals via VEH would have to willingly ignore signals for us to get them. This does not change, your patch would mean this happens less often, so I do not see a backward compatibility problem here.

B) this is a new case. We would have to ignore signals not meant for us. Technically by just ignoring them. Distinguishing this is a bit difficult though. Note the subtle difference to Unix: there we have signal chaining, so an application which is really really interested in signals for its own purposes uses it (e.g. by preloading libjsig) and then we know its handler and hand over the signal.

On windows we do not know this (?), we only can distinguish our crashes from their crashes via crash pc, rejecting any crash not in our code (dynamic or static). Well, arguably this would be just how it is today with our code scoped via SEH. With the added safety net of the unhandled exception filter (what happens if multiple parties call this?).

Okay this seems safe enough to try it at least.

My only very small personal gripe would be that I always liked how I can quickly use SEH to check if a pointer is valid without disturbing anyone. But within the hotspot at least I can just as well use SafeFetch.

Thank you,

Thomas

I hope this sheds some light on possible solutions ahead of us.

Thank you,

--
Ludovic

[1] https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setunhandledexceptionfilter<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Ferrhandlingapi%2Fnf-errhandlingapi-setunhandledexceptionfilter&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767253164&sdata=7AF3UPjOdK%2Bmgr8OYFiQvsjEYSZ4fQpvLNvATm6pLls%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Ferrhandlingapi%2Fnf-errhandlingapi-setunhandledexceptionfilter&data=02%7C01%7Cluhenry%40microsoft.com%7C3a2bd46b66be4f6824b108d82629ffb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637301309117672388&sdata=zM0zOUCOujhp2fyW7PVXPplSn13elTyyf4cJUgZj%2Fm8%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Ferrhandlingapi%2Fnf-errhandlingapi-setunhandledexceptionfilter&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767253164&sdata=7AF3UPjOdK%2Bmgr8OYFiQvsjEYSZ4fQpvLNvATm6pLls%3D&reserved=0>>
________________________________________
From: Thomas Stüfe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com><mailto:thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>>>
Sent: Sunday, June 21, 2020 05:55
To: Ludovic Henry
Cc: hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net><mailto:hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net>>
Subject: Re: RFR(S): Use Vectored Exception Handling on Windows

Hi,

We at SAP had used VEH in our own Windows Itanium port and I dimly remember it being a source of problems. That is many years ago and I realize that it is not worth much, but it makes me bit apprehensive of this change.

The main problem I see is that this will be an observable change in behavior.

We currently use SEH, so our error handler is guaranteed to be invoked only for exceptions from within our own code. With VEH we now follow the Unix way of things and suddenly our error handler becomes a global resource.

We will suddenly be invoked for crashes outside the VM, e.g. in foreign launcher code atop of us or in non-java side threads, which will generate whole new classes of hs-err files for crashes the VM is not responsible for. Which are then perceived as VM crashes and sent to us vendors instead of going to the right people. This is the way it works on Unix today, and it is a constant annoyance and increases our support workload.

We also may introduce new problems since suddenly we interfere with application exception handling. At the very least, we have to think up a scheme for signal chaining (both ways: VM->foreign code and foreign code->VM). For the first, we probably need some form of libjsig preloading, or some other way to divert signal handler instalment. That would also need cooperation from the application programmers and/or operators.

Matters are even more complicated, since foreign code may use SEH instead of VEH, so what happens if a JNI library below me wants to use SEH, does that still work?

I feel this should not be rushed. Even considered "brittle" SEH has served us well, I do not recall many problems in the past aside from having to add the occasional __try/__except. Are there actual bugs we have to solve?

Lastly, personally I always found SEH quite a neat concept, and one of the few places where Windows was superior to Unix :)

Thanks, Thomas


On Fri, Jun 19, 2020 at 5:23 PM Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com><mailto:luhenry at microsoft.com<mailto:luhenry at microsoft.com>><mailto:luhenry at microsoft.com<mailto:luhenry at microsoft.com><mailto:luhenry at microsoft.com<mailto:luhenry at microsoft.com>>>> wrote:
Hello,

First, some context and definitions:
- when talking about exception here, I'm talking about Win32 exception which are equivalent to signals on Linux and other Unix, I am _not_ talking about Java exceptions.
- an explanation of an _exception filter_ can be found at https://docs.microsoft.com/en-us/cpp/cpp/writing-an-exception-filter?view=vs-2019<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767263161&sdata=7LKO5ISpYpdDKMysIeYx%2BT6B3o9uFNaY%2FDB924Sr6Vo%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C3a2bd46b66be4f6824b108d82629ffb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637301309117682378&sdata=LAIuT%2F0l9W1anQUurSRprjzrtAgRo%2F3SjiAHAUvm%2FDs%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767263161&sdata=7LKO5ISpYpdDKMysIeYx%2BT6B3o9uFNaY%2FDB924Sr6Vo%3D&reserved=0>><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7Cd552fedab47f45c6fe9808d815e2758f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637283409665642403&sdata=fjcrwcQYAg3TstTSO2YHKziszwlusbYV6uUXINydD1E%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767273154&sdata=88xdAtISIFDd52eRNLpr%2BJ8UNHdmXd6oZvdwsEygbZU%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C3a2bd46b66be4f6824b108d82629ffb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637301309117682378&sdata=LAIuT%2F0l9W1anQUurSRprjzrtAgRo%2F3SjiAHAUvm%2FDs%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fcpp%2Fcpp%2Fwriting-an-exception-filter%3Fview%3Dvs-2019&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767273154&sdata=88xdAtISIFDd52eRNLpr%2BJ8UNHdmXd6oZvdwsEygbZU%3D&reserved=0>>>. There is only a limited concept of that in Java with type-based exception filter (ex: `try { ... } catch (IOException ioe) { ... } catch (Throwable t) { ... }`).
- in Win32, there exist two exception handling mechanism:
  - Structured Exception Handling: the historical one, based on `__try {} __except (...) {}`
  - Vectored Exception Handling: introduced in Windows XP / Windows Server 2003, much more similar to signals on Linux

These exception handling mechanisms are used to catch any exceptions like Access Violation, Stack Overflow, Divide by Zero, Overflow, and more. These exceptions are equivalent to signal on Linux and are then core to many mechanisms in the OpenJDK.

Today, the OpenJDK uses Structured Exception Handling to catch such exceptions, creating several requirements. First, all code that might trigger an exception on purpose (like a Access Violation / SIGSEGV in the arraycopy stub), needs to be wrapped up in a __try / __except. Because it's not feasible to wrap every single instance of such code, these __try / __except are put at the top-level most function of any thread started by the runtime. Second, for code generated by Hotspot, `RtlAddFunctionTable` is used to simulate the use of __try / __except for a specific code area. This function needs platform specific code with the generation of  a trampoline that calls the exception filter declared in the runtime. It's also meant to be used as a one to one mapping with try / catch in user code, and not as a "catch all the exceptions in this code area". Third, Structured Exception Handling expects to be able to unwind the stack. However, because Hotspot doesn't guarantee the usage of the platform-specific ABI internally, the platform-specific unwinder might break. Hotspot's usage of `RtlAddFunctionTable` for the code cache relies on the assumption that Structured Exception Handling never tries to unwind the stack (which it would fail to do because of the different ABI) before calling the registered exception filter.

Discussing that with Windows Kernel maintainers, this approach is highly discouraged, considered brittle, and the better solution is Vectored Exception Handling. Vectored Exception Handling is conceptually much more similar to signal / sigaction on Linux and other Unix systems. It will catch all exceptions happening across the process, and no __try / __except will be required. It also removes the requirement to call `RtlAddFunctionTable`.  The exception filter then behaves like a signal handler with the possibility to modify the registers at will, modifying the PC to step over an instruction after an expected Access Violation for example. Vectored Exception Handling is also already used for AOT code.

The changes can be found at http://cr.openjdk.java.net/~burban/ludovic_vecexc/<https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767283147&sdata=d5JQScm01HijYY5AxVwV2AEjAr%2BuX90MxOGlpfj0lA8%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C3a2bd46b66be4f6824b108d82629ffb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637301309117692381&sdata=itjRga%2B5m%2FK2zyt6i0eN12wZMqekP4KPbAqJYgb3zDY%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767283147&sdata=d5JQScm01HijYY5AxVwV2AEjAr%2BuX90MxOGlpfj0lA8%3D&reserved=0>><https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cd552fedab47f45c6fe9808d815e2758f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637283409665652395&sdata=pTewy1%2BeB43HX4y0ypDwMDGRjBoNP6yBGrhRi7ncm1c%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767293145&sdata=SVmMjP8BRzSq1mm%2FG14cQRwiSqgTbx%2Bu8ZpeA1QjhFk%3D&reserved=0><https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C3a2bd46b66be4f6824b108d82629ffb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637301309117692381&sdata=itjRga%2B5m%2FK2zyt6i0eN12wZMqekP4KPbAqJYgb3zDY%3D&reserved=0<https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fludovic_vecexc%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C7c845a5f11314c645d5f08d827bf468a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637303049767293145&sdata=SVmMjP8BRzSq1mm%2FG14cQRwiSqgTbx%2Bu8ZpeA1QjhFk%3D&reserved=0>>>. As I am not an author, I have not created a corresponding bug in JBS.

Thank you, and looking forward for your feedback!

--
Ludovic




More information about the hotspot-runtime-dev mailing list