OpenJDK 18 Linux Bug when Wrapping Clang, Crashes Outside Native Frames
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Jul 18 16:43:59 UTC 2022
Hi Joshua,
I know about coffi, and very excited to see stuff like these happening -
kudos!
As for clang, my hunch is that you are hitting the dreaded "libclang
crash recovery" issue - e.g.
https://reviews.llvm.org/D23662
Basically, clang installs its own signal handlers, which end up
overriding (at least on Linux) the signal handlers installed by the JVM.
Our Jextract implementation (which is based on a Panama port of
libclang) has also to workaround this:
https://github.com/openjdk/jextract/blob/master/src/main/java/org/openjdk/jextract/clang/LibClang.java#L54
(On windows, the recovery logic seems to work ok, but on Linux it causes
spurious crashes, pretty much all over the place).
In principle, setting the LIBCLANG_DISABLE_CRASH_RECOVERY variable
should be a quick way for you to check if that's indeed the issue.
Cheers
Maurizio
On 18/07/2022 17:16, Joshua Suskalo wrote:
> This is the first time I've posted to the panama-dev mailing list, so
> if this isn't the right place for this, please forgive me.
>
> I've been working fairly happily with Panama (sans not liking that
> Addressable was made sealed in JDK 18) since JDK 17 building a wrapper
> for it in Clojure called coffi[1], but I've recently run into a bug
> that's a bit outside of what I think I can solve. I'm fairly sure it's
> not an error in how I'm using Panama (although tracking it down has
> helped me find and fix some bugs in coffi), and I've been able to
> reproduce it in a short plain Java file.
>
> I have a full listing for a Clojure reproduction case using coffi, a
> Java reproduction case using Panama directly, and a C version that
> appears to work just fine, as well as some test cpp files to use as
> inputs, all available in a paste on sourcehut[2]. The first argument
> to running them is the filename to parse. To run the Clojure version
> you only need to have the Clojure CLI installed (available from most
> package managers), mark the file executable, and run it as a script.
>
> The problem is that when calling functions from clang's C api[3], the
> JVM appears to enter a corrupted state that will eventually crash with
> a SIGSEGV. Usually in my experience this happens outside of native
> stack frames, and when working locally in a REPL in Clojure the actual
> crash that occurred seemed non-deterministic, though that likely just
> had to do with slightly different inputs to the system, as the
> reproduction case I've included[2] appears to have deterministic
> behavior on my machine.
>
> Notably I have not observed similar behavior with other C libraries,
> including ones which use upcalls, which means that this *may* simply
> be an issue of clang corrupting memory through its normal use that
> causes problems with the JVM but which does not affect the C runtime.
> Unfortunately I don't know how to test that theory. I also believe I
> have determined that this is not caused by the native threads in
> clang, as disabling threading by passing the arguments `-mthread-model
> single` to the parsing does not appear to prevent a crash, although in
> my limited testing it *did* appear to increase the delay between
> running the native code and the JVM crashing.
>
> Thanks for your time and for reading,
> Joshua Suskalo
>
> [1]: https://github.com/IGJoshua/coffi
> [2]: https://paste.sr.ht/~srasu/80750a5513bb5e175169465875a155136aad44d7
> [3]: https://clang.llvm.org/doxygen/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220718/2255a1fe/attachment.htm>
More information about the panama-dev
mailing list