RFR(xs): 8251255: [linux] Add process-memory information to hs-err and VM.info

David Holmes david.holmes at oracle.com
Mon Aug 10 08:11:20 UTC 2020


Hi Thomas,

On 10/08/2020 4:13 pm, Thomas Stüfe wrote:
> On Mon, Aug 10, 2020 at 6:50 AM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Thomas,
> 
>     On 7/08/2020 12:01 am, Thomas Stüfe wrote:
>      > Hi,
>      >
>      > may I please have reviews for this addition:
>      >
>      > JBS: https://bugs.openjdk.java.net/browse/JDK-8251255
>      > Webrev:
>      >
>     http://cr.openjdk.java.net/~stuefe/webrevs/8251255-add-rss-to-hs-err/webrev.00/webrev/
>      >
>      > When analysing a customer memory leak recently I found for all
>      > the information we have in the error files we miss some really
>     fundamental
>      > statistics like process virtual size and rss. Either that or I am
>     blind :)
> 
>     This doesn't seem unreasonable, though I don't like to see continued
>     divergence between what gets reported on Linux versus other platforms.
> 
> 
> Neither do I but Linux is kind of the most important one so I do not 
> feel bad for enhancing only this; and people interested in other 
> platforms can always step in.

Ok.

> My problem here is while adding similar printouts for other platforms is 
> trivial, I do not like printing out stuff I do not have investigated and 
> fully understood / tested before. Which would take more time than I have 
> right now to do correctly for all platforms. See also below, my thoughts 
> about mallinfo.

Sure I understand.

>     Also is this something that is reasonable to always do or should it be
>     added to the set of things under ExtensiveErrorReports?
> 
> 
> Always, IMHO. There is a lot of other stuff less important which we may 
> shove into ExtensiveErrorReports, but this is very basic.

Ok.

> 
>      > In case I am not blind this patch adds those statistics. I
>     decided against
>      > dumping proc files wholesale to the hs-err file, I just print out
>     what I
>      > really find useful.
> 
>     +   FILE* f = ::fopen("/proc/self/status", "r");
>     +   char buf[256];
>     +   while (::fgets(buf, sizeof(buf), f) != NULL && num_found <
>     num_values) {
> 
>     But without counting the characters (and knowing what is fixed-width
>     and
>     what is variable) doesn't this risk getting a partial entry at the end
>     of the buffer such that you won't see it when trying to parse?
> 
> 
> Not sure if I understand your question. If you mean what happens should 
> the proc file contain a line longer than 256, that is not a problem: 
> fgets stops either on newline or when the buffer is full. We would read 
> the first 256 characters, scan, then the next characters (needlessly), 
> scan (needlessly, not finding anything) and so forth, until we hit a 
> newline. The subsequent call to fgets will start at the next line.

Sorry I was thinking fgets would read 256 chars at a time, not a line of 
up to 256 chars.

> Also, the proc file content is no mystery: it is clearly documented in 
> the kernel man pages (see section about /proc/[pid]/status). So I do not 
> have to assume an arbitrary input.
> 
> 
>      > Note that among other things I print out the total size of
>     outstanding heap
>      > allocations. Note that still the best way to do this is via
>     mallinfo, and
>      > that means the returned value is int and may wrap, see coding and
>     comment.
>      > Even with this caveat though this info is still very useful.
> 
>     I am less keen on this bit. Now I understand your query in another RFR
>     about how to flag such glibc'isms so they don't impact building against
>     different libs like musl. :)
> 
> 
> Funny enough, I read somewhere that mallinfo is a Posix function; I see 
> it mentioned in very old documents from the 80s. But OpenGroup has no 
> mention of it.

I read (Linux manpage) it is from System V Interface Definition but I 
don't see it in the 4th edition PDF.

> About muslc, it is highly probable that it would implement mallinfo(), 

No it doesn't and won't as it is considered fundamentally broken due to 
the int issue.

https://www.openwall.com/lists/musl/2018/01/17/2

> but the meaning of the content may be different, so the printout should 
> be adjusted for muslc. Same goes probably for all other *nix platforms - 
> we could print out all members, but one should investigate beforehand if 
> their content makes sense and what it means.
> 
>     Is mallinfo safe to call from a signal handling context? I mean that in
>     a practical sense (ie it doesn't try to acquire internal malloc locks)
>     rather than being designated as async-signal-safe.
> 
> 
> completely harmless, see int_malloc() in glibc malloc/malloc.c: 
> https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/
> 
> Does no locks, no calldowns to other functions (check_malloc_state is 
> empty in glibc release builds);

Saw your update - ok.

> Also similar coding has been active in our proprietary VM since about 
> five years so I think this is safe.
> 
> --
> 
> One other question just occurred to me, I wonder whether long is 64bit 
> on all our 64bit linux builds. Or whether I should use an explicit 64bit 
> type.

"long" will be 32-bit on 32-bit Linux (ILP32) and 64-bit on 64-bit Linux 
(LP64). Does /proc/self/status specify what ranges the values may have? 
Perhaps size_t would be better (even though same size as long) ?

Thanks,
David
-----

> 
> Thanks, Thomas
> 
> 
>     Thanks,
>     David
>     -----
> 
>      > I did a number of manual tests, the numbers seem legit.
>      >
>      > Looks like this:
>      >
>      > Process Memory:
>      > Virtual Size: 7494372K (peak: 7559908K)
>      > Resident Set Size: 271264K (peak: 571348K) (anon: 226460K, file:
>     44804K,
>      > shmem: 0K)
>      > Swapped out: 0K
>      > C-Heap outstanding allocations: 37845K
>      >
>      >
>      > Here I simulate a C-Heap memory leak:
>      >
>      > Process Memory:
>      > Virtual Size: 9062260K (peak: 9118772K)
>      > Resident Set Size: 1941776K (peak: 1941776K) (anon: 1896872K,
>     file: 44904K,
>      > shmem: 0K)
>      > Swapped out: 0K
>      > C-Heap outstanding allocations: 1291984K
>      >
>      >
>      > Same leak, with the mallinfo value wrapped around:
>      >
>      > Process Memory:
>      > Virtual Size: 12343352K (peak: 12445556K)
>      > Resident Set Size: 5232084K (peak: 5281512K) (anon: 5187052K,
>     file: 45032K,
>      > shmem: 0K)
>      > Swapped out: 0K
>      > C-Heap outstanding allocations: 15454K (may have wrapped)
>      >
>      > Thanks, Thomas
>      >
> 


More information about the hotspot-runtime-dev mailing list