RFR: 8261710: SA DSO objects have sizes that are too large
Chris Plummer
cjplummer at openjdk.java.net
Wed Feb 17 09:55:51 UTC 2021
On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> If you run ClhsdbPmap.java, you can see pmap output for both core and live processes. The sizes of the maps are very large for both of them, and actually a bit bigger with the live process. Here's the output for a live process:
>>
>> 0x000014755360c000 4048K /usr/lib64/libnss_sss.so.2
>> 0x0000147553815000 4012K /usr/lib64/libnss_files-2.17.so
>> 0x0000147560208000 4064K /usr/lib64/libm-2.17.so
>> 0x000014756050a000 3032K /usr/lib64/librt-2.17.so
>> 0x0000147560712000 32892K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
>> 0x0000147562731000 4924K /usr/lib64/libc-2.17.so
>> 0x0000147562aff000 3076K /usr/lib64/libdl-2.17.so
>> 0x0000147562d03000 3060K /usr/lib64/libpthread-2.17.so
>> 0x0000147562f1f000 2948K /usr/lib64/libz.so.1.2.7
>> 0x0000147563135000 2860K /usr/lib64/ld-2.17.so
>> 0x0000147563164000 92K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
>> 0x000014756317b000 80K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
>> 0x00001475631e0000 156K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
>> 0x0000147563207000 128K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
>> 0x000014756332c000 68K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
>> 0x0000563c950bf000 16K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
>> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test failure. It's only a 68k file but has a 4064k map. It's second in the list. I'm not sure if this is the order we would always see on linux systems. My assumption was it was the library at the highest address that was causing the problem, and that the inteprerter was located right after it, but that might not be the case.
>>
>> The address in the interpreter that we are doing findpc on turned up at `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" command to ClhsdbFindPC, and from my test runs the interpreter seemed to alway be just before the first library. However, maybe on some systems it is intermixed with the libraries.
>
> I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It almost works fine, but it is not perfect solution.
> For example, let's consider for libnss_sss (provided by Fedora 33) - `/proc/<PID>/maps` shows libnss as following. There are 5 segments.
>
> 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133 /usr/lib64/libnss_sss.so.2
>
> However I could see only 4 segments in libnss_sss.so when I ran `readelf -l /usr/lib64/libnss_sss.so.2`:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000001468 0x0000000000001468 R 0x1000
> LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
> 0x0000000000006931 0x0000000000006931 R E 0x1000
> LOAD 0x0000000000009000 0x0000000000009000 0x0000000000009000
> 0x0000000000001110 0x0000000000001110 R 0x1000
> LOAD 0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
> 0x000000000000044c 0x0000000000000658 RW 0x1000
>
> Linux kernel seems to separate final segment (0xbc78) into RO and RW segments when it attempts to load shared library. (but I'm not sure)
>
> I think we need to refactor handling shared libraries in other ways.
>
> For live process, we can use `/proc/<PID>/maps`.
> For coredump, we can use `NT_FILE` in note section in corefile, It has valid segments as below.
>
> $ readelf -n core
> :
> 0x00007f0ba6ec5000 0x00007f0ba6ec7000 0x0000000000000000
>
> 0x00007f0ba6ec7000 0x00007f0ba6ece000 0x0000000000000002
>
> 0x00007f0ba6ece000 0x00007f0ba6ed0000 0x0000000000000009
>
> 0x00007f0ba6ed0000 0x00007f0ba6ed1000 0x000000000000000a
>
> 0x00007f0ba6ed1000 0x00007f0ba6ed2000 0x000000000000000b
>
>
> But they makes big change to SA.
> As an option, we can integrate this change at first, then we will refactor them.
> What do you think?
> (I want to resolve this problem with smaller fix if I can of course, so another solutions are welcome)
@YaSuenag https://bugs.openjdk.java.net/browse/JDK-8250826 is the bug I was thinking of that sounds like the RO/RW issue you were talking about.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2563
More information about the serviceability-dev
mailing list