RFR(xs): 8251255: [linux] Add process-memory information to hs-err and VM.info
David Holmes
david.holmes at oracle.com
Mon Aug 10 08:11:20 UTC 2020
Hi Thomas,
On 10/08/2020 4:13 pm, Thomas Stüfe wrote:
> On Mon, Aug 10, 2020 at 6:50 AM David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Thomas,
>
> On 7/08/2020 12:01 am, Thomas Stüfe wrote:
> > Hi,
> >
> > may I please have reviews for this addition:
> >
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8251255
> > Webrev:
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8251255-add-rss-to-hs-err/webrev.00/webrev/
> >
> > When analysing a customer memory leak recently I found for all
> > the information we have in the error files we miss some really
> fundamental
> > statistics like process virtual size and rss. Either that or I am
> blind :)
>
> This doesn't seem unreasonable, though I don't like to see continued
> divergence between what gets reported on Linux versus other platforms.
>
>
> Neither do I but Linux is kind of the most important one so I do not
> feel bad for enhancing only this; and people interested in other
> platforms can always step in.
Ok.
> My problem here is while adding similar printouts for other platforms is
> trivial, I do not like printing out stuff I do not have investigated and
> fully understood / tested before. Which would take more time than I have
> right now to do correctly for all platforms. See also below, my thoughts
> about mallinfo.
Sure I understand.
> Also is this something that is reasonable to always do or should it be
> added to the set of things under ExtensiveErrorReports?
>
>
> Always, IMHO. There is a lot of other stuff less important which we may
> shove into ExtensiveErrorReports, but this is very basic.
Ok.
>
> > In case I am not blind this patch adds those statistics. I
> decided against
> > dumping proc files wholesale to the hs-err file, I just print out
> what I
> > really find useful.
>
> + FILE* f = ::fopen("/proc/self/status", "r");
> + char buf[256];
> + while (::fgets(buf, sizeof(buf), f) != NULL && num_found <
> num_values) {
>
> But without counting the characters (and knowing what is fixed-width
> and
> what is variable) doesn't this risk getting a partial entry at the end
> of the buffer such that you won't see it when trying to parse?
>
>
> Not sure if I understand your question. If you mean what happens should
> the proc file contain a line longer than 256, that is not a problem:
> fgets stops either on newline or when the buffer is full. We would read
> the first 256 characters, scan, then the next characters (needlessly),
> scan (needlessly, not finding anything) and so forth, until we hit a
> newline. The subsequent call to fgets will start at the next line.
Sorry I was thinking fgets would read 256 chars at a time, not a line of
up to 256 chars.
> Also, the proc file content is no mystery: it is clearly documented in
> the kernel man pages (see section about /proc/[pid]/status). So I do not
> have to assume an arbitrary input.
>
>
> > Note that among other things I print out the total size of
> outstanding heap
> > allocations. Note that still the best way to do this is via
> mallinfo, and
> > that means the returned value is int and may wrap, see coding and
> comment.
> > Even with this caveat though this info is still very useful.
>
> I am less keen on this bit. Now I understand your query in another RFR
> about how to flag such glibc'isms so they don't impact building against
> different libs like musl. :)
>
>
> Funny enough, I read somewhere that mallinfo is a Posix function; I see
> it mentioned in very old documents from the 80s. But OpenGroup has no
> mention of it.
I read (Linux manpage) it is from System V Interface Definition but I
don't see it in the 4th edition PDF.
> About muslc, it is highly probable that it would implement mallinfo(),
No it doesn't and won't as it is considered fundamentally broken due to
the int issue.
https://www.openwall.com/lists/musl/2018/01/17/2
> but the meaning of the content may be different, so the printout should
> be adjusted for muslc. Same goes probably for all other *nix platforms -
> we could print out all members, but one should investigate beforehand if
> their content makes sense and what it means.
>
> Is mallinfo safe to call from a signal handling context? I mean that in
> a practical sense (ie it doesn't try to acquire internal malloc locks)
> rather than being designated as async-signal-safe.
>
>
> completely harmless, see int_malloc() in glibc malloc/malloc.c:
> https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/
>
> Does no locks, no calldowns to other functions (check_malloc_state is
> empty in glibc release builds);
Saw your update - ok.
> Also similar coding has been active in our proprietary VM since about
> five years so I think this is safe.
>
> --
>
> One other question just occurred to me, I wonder whether long is 64bit
> on all our 64bit linux builds. Or whether I should use an explicit 64bit
> type.
"long" will be 32-bit on 32-bit Linux (ILP32) and 64-bit on 64-bit Linux
(LP64). Does /proc/self/status specify what ranges the values may have?
Perhaps size_t would be better (even though same size as long) ?
Thanks,
David
-----
>
> Thanks, Thomas
>
>
> Thanks,
> David
> -----
>
> > I did a number of manual tests, the numbers seem legit.
> >
> > Looks like this:
> >
> > Process Memory:
> > Virtual Size: 7494372K (peak: 7559908K)
> > Resident Set Size: 271264K (peak: 571348K) (anon: 226460K, file:
> 44804K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 37845K
> >
> >
> > Here I simulate a C-Heap memory leak:
> >
> > Process Memory:
> > Virtual Size: 9062260K (peak: 9118772K)
> > Resident Set Size: 1941776K (peak: 1941776K) (anon: 1896872K,
> file: 44904K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 1291984K
> >
> >
> > Same leak, with the mallinfo value wrapped around:
> >
> > Process Memory:
> > Virtual Size: 12343352K (peak: 12445556K)
> > Resident Set Size: 5232084K (peak: 5281512K) (anon: 5187052K,
> file: 45032K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 15454K (may have wrapped)
> >
> > Thanks, Thomas
> >
>
More information about the hotspot-runtime-dev
mailing list