RFR: 8261710: SA DSO objects have sizes that are too large

Chris Plummer cjplummer at openjdk.java.net
Wed Feb 17 19:28:39 UTC 2021


On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

>> If you run ClhsdbPmap.java, you can see pmap output for both core and live processes. The sizes of the maps are very large for both of them, and actually a bit bigger with the live process. Here's the output for a live process:
>> 
>> 0x000014755360c000	4048K	/usr/lib64/libnss_sss.so.2
>> 0x0000147553815000	4012K	/usr/lib64/libnss_files-2.17.so
>> 0x0000147560208000	4064K	/usr/lib64/libm-2.17.so
>> 0x000014756050a000	3032K	/usr/lib64/librt-2.17.so
>> 0x0000147560712000	32892K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
>> 0x0000147562731000	4924K	/usr/lib64/libc-2.17.so
>> 0x0000147562aff000	3076K	/usr/lib64/libdl-2.17.so
>> 0x0000147562d03000	3060K	/usr/lib64/libpthread-2.17.so
>> 0x0000147562f1f000	2948K	/usr/lib64/libz.so.1.2.7
>> 0x0000147563135000	2860K	/usr/lib64/ld-2.17.so
>> 0x0000147563164000	92K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
>> 0x000014756317b000	80K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
>> 0x00001475631e0000	156K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
>> 0x0000147563207000	128K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
>> 0x000014756332c000	68K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
>> 0x0000563c950bf000	16K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
>> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test failure. It's only a 68k file but has a 4064k map. It's second in the list. I'm not sure if this is the order we would always see on linux systems. My assumption was it was the library at the highest address that was causing the problem, and that the inteprerter was located right after it, but that might not be the case.
>> 
>> The address in the interpreter that we are doing findpc on turned up at `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" command to ClhsdbFindPC, and from my test runs the interpreter seemed to alway be just before the first library. However, maybe on some systems it is intermixed with the libraries.
>
> I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It almost works fine, but it is not perfect solution.
> For example, let's consider for libnss_sss (provided by Fedora 33) - `/proc/<PID>/maps` shows libnss as following. There are 5 segments.
> 
> 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133                     /usr/lib64/libnss_sss.so.2
> 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133                     /usr/lib64/libnss_sss.so.2
> 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133                     /usr/lib64/libnss_sss.so.2
> 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133                     /usr/lib64/libnss_sss.so.2
> 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133                     /usr/lib64/libnss_sss.so.2
> 
> However I could see only 4 segments in libnss_sss.so when I ran `readelf -l /usr/lib64/libnss_sss.so.2`:
> 
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
>                  0x0000000000001468 0x0000000000001468  R      0x1000
>   LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
>                  0x0000000000006931 0x0000000000006931  R E    0x1000
>   LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
>                  0x0000000000001110 0x0000000000001110  R      0x1000
>   LOAD           0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
>                  0x000000000000044c 0x0000000000000658  RW     0x1000
> 
> Linux kernel seems to separate final segment (0xbc78) into RO and RW segments when it attempts to load shared library. (but I'm not sure)
> 
> I think we need to refactor handling shared libraries in other ways.
> 
> For live process, we can use `/proc/<PID>/maps`.
> For coredump, we can use `NT_FILE` in note section in corefile, It has valid segments as below.
> 
> $ readelf -n core
>   :
>     0x00007f0ba6ec5000  0x00007f0ba6ec7000  0x0000000000000000
> 
>     0x00007f0ba6ec7000  0x00007f0ba6ece000  0x0000000000000002
> 
>     0x00007f0ba6ece000  0x00007f0ba6ed0000  0x0000000000000009
> 
>     0x00007f0ba6ed0000  0x00007f0ba6ed1000  0x000000000000000a
> 
>     0x00007f0ba6ed1000  0x00007f0ba6ed2000  0x000000000000000b
> 
> 
> But they makes big change to SA.
> As an option, we can integrate this change at first, then we will refactor them.
> What do you think?
> (I want to resolve this problem with smaller fix if I can of course, so another solutions are welcome)

@YaSuenag I asked Dan to run a modified `ClhsdbFindPC` that also issues a `clhsdb pmap` command so we can see what the maps look like, and compare them to the address being looked up. This is before your latest fix, so the the sizes are still too big, but that's ok for this analysis. First, this is the `findpc` command that was suppose to show the address in the interpreter:

hsdb> + findpc 0x00002ab36ca942b6
Address 0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6

And here's the pmap output . I had to manually sort by address, and I also added the location of the interpreter address being looked up.

0x00005652c8fd0000	16K	<jdkdir>/jdk/bin/java
0x00002ab3692ae000	3400K	/lib64/ld-linux-x86-64.so.2
0x00002ab3692e0000	12K	<jdkdir>/test/hotspot/jtreg/native/libLingeredApp.so
0x00002ab3692ed000	84K	<jdkdir>/jdk/bin/../lib/libjli.so
0x00002ab369406000	144K	<jdkdir>/jdk/lib/libjimage.so
0x00002ab36942a000	200K	<jdkdir>/jdk/lib/libjava.so
0x00002ab3694bc000	88K	<jdkdir>/jdk/lib/libnio.so
0x00002ab3694d6000	3240K	/lib/x86_64-linux-gnu/libz.so.1
0x00002ab3696f0000	3136K	/lib/x86_64-linux-gnu/libpthread.so.0
0x00002ab36990d000	3020K	/lib/x86_64-linux-gnu/libdl.so.2
0x00002ab369b11000	5052K	/lib/x86_64-linux-gnu/libc.so.6
0x00002ab369edb000	31100K	<jdkdir>/jdk/lib/server/libjvm.so
0x00002ab36bd3a000	2840K	/lib/x86_64-linux-gnu/librt.so.1
0x00002ab36bf42000	4856K	/lib/x86_64-linux-gnu/libm.so.6
0x00002ab36c24b000	3796K	/lib/x86_64-linux-gnu/libnss_compat.so.2
0x00002ab36c454000	3760K	/lib/x86_64-linux-gnu/libnsl.so.1
0x00002ab36c66d000	3660K	/lib/x86_64-linux-gnu/libnss_nis.so.2
0x00002ab36c879000	3612K	/lib/x86_64-linux-gnu/libnss_files.so.2
0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6
0x00002ab38fc08000	112K	<jdkdir>/jdk/lib/libnet.so
0x00002ab38fc55000	3756K	/usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00002ab3bc000000	4096K	/lib/x86_64-linux-gnu/libgcc_s.so.1

There appears to be a very large gap between `libnss_files.so.2` and `libnet.so` (about 590mb) so I assume a lot of hotspot memory allocations are located in this area, including the interpreter.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2563


More information about the serviceability-dev mailing list