RFR: 8261710: SA DSO objects have sizes that are too large
Chris Plummer
cjplummer at openjdk.java.net
Wed Feb 17 19:28:39 UTC 2021
On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> If you run ClhsdbPmap.java, you can see pmap output for both core and live processes. The sizes of the maps are very large for both of them, and actually a bit bigger with the live process. Here's the output for a live process:
>>
>> 0x000014755360c000 4048K /usr/lib64/libnss_sss.so.2
>> 0x0000147553815000 4012K /usr/lib64/libnss_files-2.17.so
>> 0x0000147560208000 4064K /usr/lib64/libm-2.17.so
>> 0x000014756050a000 3032K /usr/lib64/librt-2.17.so
>> 0x0000147560712000 32892K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
>> 0x0000147562731000 4924K /usr/lib64/libc-2.17.so
>> 0x0000147562aff000 3076K /usr/lib64/libdl-2.17.so
>> 0x0000147562d03000 3060K /usr/lib64/libpthread-2.17.so
>> 0x0000147562f1f000 2948K /usr/lib64/libz.so.1.2.7
>> 0x0000147563135000 2860K /usr/lib64/ld-2.17.so
>> 0x0000147563164000 92K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
>> 0x000014756317b000 80K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
>> 0x00001475631e0000 156K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
>> 0x0000147563207000 128K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
>> 0x000014756332c000 68K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
>> 0x0000563c950bf000 16K /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
>> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test failure. It's only a 68k file but has a 4064k map. It's second in the list. I'm not sure if this is the order we would always see on linux systems. My assumption was it was the library at the highest address that was causing the problem, and that the inteprerter was located right after it, but that might not be the case.
>>
>> The address in the interpreter that we are doing findpc on turned up at `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" command to ClhsdbFindPC, and from my test runs the interpreter seemed to alway be just before the first library. However, maybe on some systems it is intermixed with the libraries.
>
> I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It almost works fine, but it is not perfect solution.
> For example, let's consider for libnss_sss (provided by Fedora 33) - `/proc/<PID>/maps` shows libnss as following. There are 5 segments.
>
> 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133 /usr/lib64/libnss_sss.so.2
> 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133 /usr/lib64/libnss_sss.so.2
>
> However I could see only 4 segments in libnss_sss.so when I ran `readelf -l /usr/lib64/libnss_sss.so.2`:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000001468 0x0000000000001468 R 0x1000
> LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
> 0x0000000000006931 0x0000000000006931 R E 0x1000
> LOAD 0x0000000000009000 0x0000000000009000 0x0000000000009000
> 0x0000000000001110 0x0000000000001110 R 0x1000
> LOAD 0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
> 0x000000000000044c 0x0000000000000658 RW 0x1000
>
> Linux kernel seems to separate final segment (0xbc78) into RO and RW segments when it attempts to load shared library. (but I'm not sure)
>
> I think we need to refactor handling shared libraries in other ways.
>
> For live process, we can use `/proc/<PID>/maps`.
> For coredump, we can use `NT_FILE` in note section in corefile, It has valid segments as below.
>
> $ readelf -n core
> :
> 0x00007f0ba6ec5000 0x00007f0ba6ec7000 0x0000000000000000
>
> 0x00007f0ba6ec7000 0x00007f0ba6ece000 0x0000000000000002
>
> 0x00007f0ba6ece000 0x00007f0ba6ed0000 0x0000000000000009
>
> 0x00007f0ba6ed0000 0x00007f0ba6ed1000 0x000000000000000a
>
> 0x00007f0ba6ed1000 0x00007f0ba6ed2000 0x000000000000000b
>
>
> But they makes big change to SA.
> As an option, we can integrate this change at first, then we will refactor them.
> What do you think?
> (I want to resolve this problem with smaller fix if I can of course, so another solutions are welcome)
@YaSuenag I asked Dan to run a modified `ClhsdbFindPC` that also issues a `clhsdb pmap` command so we can see what the maps look like, and compare them to the address being looked up. This is before your latest fix, so the the sizes are still too big, but that's ok for this analysis. First, this is the `findpc` command that was suppose to show the address in the interpreter:
hsdb> + findpc 0x00002ab36ca942b6
Address 0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6
And here's the pmap output . I had to manually sort by address, and I also added the location of the interpreter address being looked up.
0x00005652c8fd0000 16K <jdkdir>/jdk/bin/java
0x00002ab3692ae000 3400K /lib64/ld-linux-x86-64.so.2
0x00002ab3692e0000 12K <jdkdir>/test/hotspot/jtreg/native/libLingeredApp.so
0x00002ab3692ed000 84K <jdkdir>/jdk/bin/../lib/libjli.so
0x00002ab369406000 144K <jdkdir>/jdk/lib/libjimage.so
0x00002ab36942a000 200K <jdkdir>/jdk/lib/libjava.so
0x00002ab3694bc000 88K <jdkdir>/jdk/lib/libnio.so
0x00002ab3694d6000 3240K /lib/x86_64-linux-gnu/libz.so.1
0x00002ab3696f0000 3136K /lib/x86_64-linux-gnu/libpthread.so.0
0x00002ab36990d000 3020K /lib/x86_64-linux-gnu/libdl.so.2
0x00002ab369b11000 5052K /lib/x86_64-linux-gnu/libc.so.6
0x00002ab369edb000 31100K <jdkdir>/jdk/lib/server/libjvm.so
0x00002ab36bd3a000 2840K /lib/x86_64-linux-gnu/librt.so.1
0x00002ab36bf42000 4856K /lib/x86_64-linux-gnu/libm.so.6
0x00002ab36c24b000 3796K /lib/x86_64-linux-gnu/libnss_compat.so.2
0x00002ab36c454000 3760K /lib/x86_64-linux-gnu/libnsl.so.1
0x00002ab36c66d000 3660K /lib/x86_64-linux-gnu/libnss_nis.so.2
0x00002ab36c879000 3612K /lib/x86_64-linux-gnu/libnss_files.so.2
0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6
0x00002ab38fc08000 112K <jdkdir>/jdk/lib/libnet.so
0x00002ab38fc55000 3756K /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00002ab3bc000000 4096K /lib/x86_64-linux-gnu/libgcc_s.so.1
There appears to be a very large gap between `libnss_files.so.2` and `libnet.so` (about 590mb) so I assume a lot of hotspot memory allocations are located in this area, including the interpreter.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2563
More information about the serviceability-dev
mailing list