RFR(xs): 8251255: [linux] Add process-memory information to hs-err and VM.info
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Aug 10 06:13:20 UTC 2020
On Mon, Aug 10, 2020 at 6:50 AM David Holmes <david.holmes at oracle.com>
wrote:
> Hi Thomas,
>
> On 7/08/2020 12:01 am, Thomas Stüfe wrote:
> > Hi,
> >
> > may I please have reviews for this addition:
> >
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8251255
> > Webrev:
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8251255-add-rss-to-hs-err/webrev.00/webrev/
> >
> > When analysing a customer memory leak recently I found for all
> > the information we have in the error files we miss some really
> fundamental
> > statistics like process virtual size and rss. Either that or I am blind
> :)
>
> This doesn't seem unreasonable, though I don't like to see continued
> divergence between what gets reported on Linux versus other platforms.
>
>
Neither do I but Linux is kind of the most important one so I do not feel
bad for enhancing only this; and people interested in other platforms can
always step in.
My problem here is while adding similar printouts for other platforms is
trivial, I do not like printing out stuff I do not have investigated and
fully understood / tested before. Which would take more time than I have
right now to do correctly for all platforms. See also below, my thoughts
about mallinfo.
> Also is this something that is reasonable to always do or should it be
> added to the set of things under ExtensiveErrorReports?
>
Always, IMHO. There is a lot of other stuff less important which we may
shove into ExtensiveErrorReports, but this is very basic.
> > In case I am not blind this patch adds those statistics. I decided
> against
> > dumping proc files wholesale to the hs-err file, I just print out what I
> > really find useful.
>
> + FILE* f = ::fopen("/proc/self/status", "r");
> + char buf[256];
> + while (::fgets(buf, sizeof(buf), f) != NULL && num_found < num_values)
> {
>
> But without counting the characters (and knowing what is fixed-width and
> what is variable) doesn't this risk getting a partial entry at the end
> of the buffer such that you won't see it when trying to parse?
>
Not sure if I understand your question. If you mean what happens should the
proc file contain a line longer than 256, that is not a problem: fgets
stops either on newline or when the buffer is full. We would read the first
256 characters, scan, then the next characters (needlessly), scan
(needlessly, not finding anything) and so forth, until we hit a newline.
The subsequent call to fgets will start at the next line.
Also, the proc file content is no mystery: it is clearly documented in the
kernel man pages (see section about /proc/[pid]/status). So I do not have
to assume an arbitrary input.
>
> > Note that among other things I print out the total size of outstanding
> heap
> > allocations. Note that still the best way to do this is via mallinfo, and
> > that means the returned value is int and may wrap, see coding and
> comment.
> > Even with this caveat though this info is still very useful.
>
> I am less keen on this bit. Now I understand your query in another RFR
> about how to flag such glibc'isms so they don't impact building against
> different libs like musl. :)
>
>
Funny enough, I read somewhere that mallinfo is a Posix function; I see it
mentioned in very old documents from the 80s. But OpenGroup has no mention
of it.
About muslc, it is highly probable that it would implement mallinfo(), but
the meaning of the content may be different, so the printout should be
adjusted for muslc. Same goes probably for all other *nix platforms - we
could print out all members, but one should investigate beforehand if their
content makes sense and what it means.
> Is mallinfo safe to call from a signal handling context? I mean that in
> a practical sense (ie it doesn't try to acquire internal malloc locks)
> rather than being designated as async-signal-safe.
>
completely harmless, see int_malloc() in glibc malloc/malloc.c:
https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/
Does no locks, no calldowns to other functions (check_malloc_state is empty
in glibc release builds);
Also similar coding has been active in our proprietary VM since about five
years so I think this is safe.
--
One other question just occurred to me, I wonder whether long is 64bit on
all our 64bit linux builds. Or whether I should use an explicit 64bit type.
Thanks, Thomas
> Thanks,
> David
> -----
>
> > I did a number of manual tests, the numbers seem legit.
> >
> > Looks like this:
> >
> > Process Memory:
> > Virtual Size: 7494372K (peak: 7559908K)
> > Resident Set Size: 271264K (peak: 571348K) (anon: 226460K, file: 44804K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 37845K
> >
> >
> > Here I simulate a C-Heap memory leak:
> >
> > Process Memory:
> > Virtual Size: 9062260K (peak: 9118772K)
> > Resident Set Size: 1941776K (peak: 1941776K) (anon: 1896872K, file:
> 44904K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 1291984K
> >
> >
> > Same leak, with the mallinfo value wrapped around:
> >
> > Process Memory:
> > Virtual Size: 12343352K (peak: 12445556K)
> > Resident Set Size: 5232084K (peak: 5281512K) (anon: 5187052K, file:
> 45032K,
> > shmem: 0K)
> > Swapped out: 0K
> > C-Heap outstanding allocations: 15454K (may have wrapped)
> >
> > Thanks, Thomas
> >
>
More information about the hotspot-runtime-dev
mailing list