RFR(xs): 8251255: [linux] Add process-memory information to hs-err and VM.info
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Aug 10 07:32:02 UTC 2020
On Mon, Aug 10, 2020 at 8:13 AM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:
>
>
> On Mon, Aug 10, 2020 at 6:50 AM David Holmes <david.holmes at oracle.com>
> wrote:
>
>> Hi Thomas,
>>
>> On 7/08/2020 12:01 am, Thomas Stüfe wrote:
>> > Hi,
>> >
>> > may I please have reviews for this addition:
>> >
>> > JBS: https://bugs.openjdk.java.net/browse/JDK-8251255
>> > Webrev:
>> >
>> http://cr.openjdk.java.net/~stuefe/webrevs/8251255-add-rss-to-hs-err/webrev.00/webrev/
>> >
>> > When analysing a customer memory leak recently I found for all
>> > the information we have in the error files we miss some really
>> fundamental
>> > statistics like process virtual size and rss. Either that or I am blind
>> :)
>>
>> This doesn't seem unreasonable, though I don't like to see continued
>> divergence between what gets reported on Linux versus other platforms.
>>
>>
> Neither do I but Linux is kind of the most important one so I do not feel
> bad for enhancing only this; and people interested in other platforms can
> always step in.
>
> My problem here is while adding similar printouts for other platforms is
> trivial, I do not like printing out stuff I do not have investigated and
> fully understood / tested before. Which would take more time than I have
> right now to do correctly for all platforms. See also below, my thoughts
> about mallinfo.
>
>
>> Also is this something that is reasonable to always do or should it be
>> added to the set of things under ExtensiveErrorReports?
>>
>
> Always, IMHO. There is a lot of other stuff less important which we may
> shove into ExtensiveErrorReports, but this is very basic.
>
>
>> > In case I am not blind this patch adds those statistics. I decided
>> against
>> > dumping proc files wholesale to the hs-err file, I just print out what I
>> > really find useful.
>>
>> + FILE* f = ::fopen("/proc/self/status", "r");
>> + char buf[256];
>> + while (::fgets(buf, sizeof(buf), f) != NULL && num_found <
>> num_values) {
>>
>> But without counting the characters (and knowing what is fixed-width and
>> what is variable) doesn't this risk getting a partial entry at the end
>> of the buffer such that you won't see it when trying to parse?
>>
>
> Not sure if I understand your question. If you mean what happens should
> the proc file contain a line longer than 256, that is not a problem: fgets
> stops either on newline or when the buffer is full. We would read the first
> 256 characters, scan, then the next characters (needlessly), scan
> (needlessly, not finding anything) and so forth, until we hit a newline.
> The subsequent call to fgets will start at the next line.
>
> Also, the proc file content is no mystery: it is clearly documented in the
> kernel man pages (see section about /proc/[pid]/status). So I do not have
> to assume an arbitrary input.
>
>
>>
>> > Note that among other things I print out the total size of outstanding
>> heap
>> > allocations. Note that still the best way to do this is via mallinfo,
>> and
>> > that means the returned value is int and may wrap, see coding and
>> comment.
>> > Even with this caveat though this info is still very useful.
>>
>> I am less keen on this bit. Now I understand your query in another RFR
>> about how to flag such glibc'isms so they don't impact building against
>> different libs like musl. :)
>>
>>
> Funny enough, I read somewhere that mallinfo is a Posix function; I see it
> mentioned in very old documents from the 80s. But OpenGroup has no mention
> of it.
>
> About muslc, it is highly probable that it would implement mallinfo(), but
> the meaning of the content may be different, so the printout should be
> adjusted for muslc. Same goes probably for all other *nix platforms - we
> could print out all members, but one should investigate beforehand if their
> content makes sense and what it means.
>
>
>> Is mallinfo safe to call from a signal handling context? I mean that in
>> a practical sense (ie it doesn't try to acquire internal malloc locks)
>> rather than being designated as async-signal-safe.
>>
>
> completely harmless, see int_malloc() in glibc malloc/malloc.c:
> https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/
>
>
s/int_malloc/int_mallinfo
https://sources.debian.org/src/glibc/2.28-10/malloc/malloc.c/#L4878
> Does no locks, no calldowns to other functions (check_malloc_state is
> empty in glibc release builds);
>
> Also similar coding has been active in our proprietary VM since about five
> years so I think this is safe.
>
> --
>
> One other question just occurred to me, I wonder whether long is 64bit on
> all our 64bit linux builds. Or whether I should use an explicit 64bit type.
>
> Thanks, Thomas
>
>
>> Thanks,
>> David
>> -----
>>
>> > I did a number of manual tests, the numbers seem legit.
>> >
>> > Looks like this:
>> >
>> > Process Memory:
>> > Virtual Size: 7494372K (peak: 7559908K)
>> > Resident Set Size: 271264K (peak: 571348K) (anon: 226460K, file: 44804K,
>> > shmem: 0K)
>> > Swapped out: 0K
>> > C-Heap outstanding allocations: 37845K
>> >
>> >
>> > Here I simulate a C-Heap memory leak:
>> >
>> > Process Memory:
>> > Virtual Size: 9062260K (peak: 9118772K)
>> > Resident Set Size: 1941776K (peak: 1941776K) (anon: 1896872K, file:
>> 44904K,
>> > shmem: 0K)
>> > Swapped out: 0K
>> > C-Heap outstanding allocations: 1291984K
>> >
>> >
>> > Same leak, with the mallinfo value wrapped around:
>> >
>> > Process Memory:
>> > Virtual Size: 12343352K (peak: 12445556K)
>> > Resident Set Size: 5232084K (peak: 5281512K) (anon: 5187052K, file:
>> 45032K,
>> > shmem: 0K)
>> > Swapped out: 0K
>> > C-Heap outstanding allocations: 15454K (may have wrapped)
>> >
>> > Thanks, Thomas
>> >
>>
>
More information about the hotspot-runtime-dev
mailing list