[foreign-jextract] Segmentation fault from generated code
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Dec 18 14:51:17 UTC 2020
I managed to reproduce.
It seems to me that libucs is installing signal handlers - and that
these handlers interfere with the JVM's own signal handlers. The fact
that I get this on the command line:
```
Caught signal 11 (Segmentation fault: address not mapped to object at
address 0x14)
==== backtrace (tid: 558482) ====
0
/home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(ucs_handle_error+0x2a4)
[0x7fe23e0b4c74]
1
/home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(+0x27e4f)
[0x7fe23e0b4e4f]
2
/home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(+0x28184)
[0x7fe23e0b5184]
3 [0x7fe27130735b]
=================================
```
Seems very suspicious. This is in the standard output and NOT in the
hotspot trace.
This seems to suggest that libucs is installing an handler for signal
11, and that this handler is accidentally triggered by Hotspot C1/C2
signals (Hotspot uses signals to handle certain events, such as NPEs) -
see this:
http://mail.openjdk.java.net/pipermail/hotspot-dev/2011-March/003981.html
It is likely that the UCX library installs an handler to detect issues
in memory access - but that doesn't go well with hotspot.
There's an option in Hotspot to minimize signal usage (-Xrs) but that
will still install an handler for SIGSEGV, so it's useless in this
circumstance. Only running with "-Xint" (interpreted mode) allowed me to
run successfully.
Maybe there's some configuration option that can be passed to UCX when
building to prevent this signal handler to be installed?
Cheers
Maurizio
On 18/12/2020 14:10, Maurizio Cimadamore wrote:
>
> Thanks - unfortunately libucx seems to be unavailable for Ubuntu
> 20.04/18.04 - I'll try to build the library from scratch and reproduce
> your issue.
>
> Cheers
> Maurizio
>
> On 18/12/2020 12:56, Filip Krakowski wrote:
>> Hi,
>>
>> the same code is running well inside a Docker container (Debian 11).
>> I use Debian as the base image because it provides packages for the
>> ucx development headers. For reference, here is my Dockerfile.
>>
>> FROM debian:bullseye
>>
>> # Use bash shell
>> SHELL ["/bin/bash", "-c"]
>>
>> # Install packages
>> RUN apt update && apt install -y libucx0 libucx-dev curl unzip
>> zip wget llvm-9
>>
>> # Install SDKMAN!
>> RUN curl -s "https://get.sdkman.io" | bash
>>
>> # Initialize SDKMAN!
>> RUN source "$HOME/.sdkman/bin/sdkman-init.sh"
>>
>> # Install latest OpenJDK Panama nightly
>> RUN curl -s "https://coconucos.cs.hhu.de/forschung/jdk/install" |
>> bash
>>
>>
>> Inside the built container I switch to the Panama JDK using "sdk
>> default java panama", jextract the ucp headers using "jextract -l ucp
>> -d . -t org.openucx /usr/include/ucp/api/ucp.h" and run a simple
>> programm using "java -Dforeign.restricted=permit --add-modules
>> jdk.incubator.foreign Main.java".
>>
>> import org.openucx.ucx_h.ucp_params_t;
>>
>> public class Main {
>>
>> public static void main(String[] args) {
>> var layout = ucp_params_t.$LAYOUT();
>> System.out.println(layout);
>> }
>> }
>>
>>
>> The only difference I make with the other method (leading to a
>> segfault) is that I run jextract locally on my machine (Arch Linux),
>> compile the code afterwards and upload it to our cluster (CentOS).
>> Both machines have the same version (1.9) of ucx installed.
>>
>> Best regards
>> Filip
>>
>> On 12/18/20 12:12 PM, Filip Krakowski wrote:
>>> Hi,
>>>
>>> I ran the code on Linux (CentOS Linux release 8.1.1911) after
>>> installing the "ucx" package (version 1.9). I will create a Docker
>>> container with the environment to reproduce this issue for easier
>>> debugging.
>>>
>>> Best regards
>>> Filip
>>>
>>> On 12/17/20 10:49 PM, Maurizio Cimadamore wrote:
>>>> Hi,
>>>> I haven't seen this particular one.
>>>>
>>>> What platform are you on? What do you need to reproduce?
>>>>
>>>> Thanks
>>>> Maurizio
>>>>
>>>> On 17/12/2020 18:43, Filip Krakowski wrote:
>>>>> Hi,
>>>>>
>>>>> I work on a simple wrapper for ucx
>>>>> (https://github.com/openucx/ucx) and am experiencing a
>>>>> segmentation fault when calling any generated method. The strange
>>>>> thing is that the segmentation fault disappears as soon as I
>>>>> attach a (remote) debugger and manually step through the code.
>>>>>
>>>>> * Screenshot - https://i.imgur.com/okl3epv.png
>>>>>
>>>>> My code does only access a struct's layout. I don't create any
>>>>> additional threads.
>>>>>
>>>>> log.info("Starting");
>>>>> var layout = ucp_params_t.$LAYOUT();
>>>>> log.info("{}", layout);
>>>>>
>>>>> The generated layout looks like this.
>>>>>
>>>>> static final MemoryLayout ucp_params$struct$LAYOUT_ =
>>>>> MemoryLayout.ofStruct(
>>>>> C_LONG.withName("field_mask"),
>>>>> C_LONG.withName("features"),
>>>>> C_LONG.withName("request_size"),
>>>>> C_POINTER.withName("request_init"),
>>>>> C_POINTER.withName("request_cleanup"),
>>>>> C_LONG.withName("tag_sender_mask"),
>>>>> C_INT.withName("mt_workers_shared"),
>>>>> MemoryLayout.ofPaddingBits(32),
>>>>> C_LONG.withName("estimated_num_eps"),
>>>>> C_LONG.withName("estimated_num_ppn")
>>>>> ).withName("ucp_params");
>>>>>
>>>>>
>>>>> Is this a known Issue? I use the latest build from last night.
>>>>>
>>>>> Best regards
>>>>> Filip
>>>
>>
More information about the panama-dev
mailing list