A little more advice on the metaspace gc log

Fri Sep 28 08:31:54 UTC 2018

On Fri, Sep 28, 2018 at 8:24 AM, nijiaben <nijiaben at perfma.com> wrote:
> hi all,
>
> Thank you so much for the follow-up about the patch that I mentioned about
> the metaspace last week(A Bug Of Metaspace After Full GC).However,I still
> have the other suggestion about the metaspace. Please allow me to describe
> it briefly.
>
> Still in the Full GC scenario, we will print the size of the metaspace in
> the GC log. Suppose we set -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=100M,
> which means a metaspace full GC should be triggered once the metaspace usage
> reaches to 100M, but actually we find the problem from the gc log and only
> 70899KB is used in the metaspace.
>
> [Full GC (Metadata GC Threshold) [PSYoungGen: 5092K->0K(132096K)]
> [ParOldGen: 85822K->60451K(181760K)] 90914K->60451K(313856K), [Metaspace:
> 70899K->70899K(1136640K)], 0.1482072 secs] [Times: user=0.52 sys=0.00,
> real=0.15 secs]
>
> First let’s take a look at the code that prints this line of gc log which
> really depends on the size used by metaspace.
>
> void MetaspaceAux::print_metaspace_change(size_t prev_metadata_used) {
>   gclog_or_tty->print(", [Metaspace:");
>   if (PrintGCDetails && Verbose) {
>     gclog_or_tty->print(" "  SIZE_FORMAT
>                         "->" SIZE_FORMAT
>                         "("  SIZE_FORMAT ")",
>                         prev_metadata_used,
>                         used_bytes(),
>                         reserved_bytes());
>   } else {
>     gclog_or_tty->print(" "  SIZE_FORMAT "K"
>                         "->" SIZE_FORMAT "K"
>                         "("  SIZE_FORMAT "K)",
>                         prev_metadata_used/K,
>                         used_bytes()/K,
>                         reserved_bytes()/K);
>   }
>
>   gclog_or_tty->print("]");
> }
>
> However, the basis for triggering metaspace Full GC is not based on the real
> usage of metaspace, but the memory size of metaspace committed.
>
> size_t MetaspaceGC::allowed_expansion() {
>   size_t committed_bytes = MetaspaceAux::committed_bytes();
>   size_t capacity_until_gc = capacity_until_GC();
>
>   assert(capacity_until_gc >= committed_bytes,
>         err_msg("capacity_until_gc: " SIZE_FORMAT " < committed_bytes: "
> SIZE_FORMAT,
>                 capacity_until_gc, committed_bytes));
>
>   size_t left_until_max  = MaxMetaspaceSize - committed_bytes;
>   size_t left_until_GC = capacity_until_gc - committed_bytes;
>   size_t left_to_commit = MIN2(left_until_GC, left_until_max);
>
>   return left_to_commit / BytesPerWord;
> }
>
> Obviously, if we create a lot of classloaders, each of them will correspond
> to a metaspace data structure. The usage of our metaspace refers to the real
> in Metanchunk. The size of the memory is allocated, but Metanchunk is not
> all used up. This problem is simplified. Just like the fragmentation of the
> CMS GC, we have a certain gap between the memory committed and the memory
> used, and trigger the metaspace Full GC is based on the size of the
> committed, so the user feels like the metaspace doesn’t reach our threshold
> triggers the Full GC, which seems doubtful.
>
> Therefore, I would like to make a small suggestion that for the scenario of
> Full GC, the committed memory size instead of the used memory size will be
> printed more directly when we print the GC log of metaspace, which will make
> the user to see it more clearly and turn out that the threshold of metaspace
> has already reached the trigger of Full GC.
>
> The modified log is more conspicuous and indeed triggered when reaching
> 100M, but it will bring another drawback that we don‘t know how much
> metaspace really have been used.
>
> [Full GC (Metadata GC Threshold) [PSYoungGen: 5092K->0K(132096K)]
> [ParOldGen: 81395K->60429K(181248K)] 86488K->60429K(313344K), [Metaspace:
> 102400K->70599K(1136640K)], 0.1330421 secs] [Times: user=0.46 sys=0.00,
> real=0.13 secs]
>
> This issue will have some impact on various GC strategies.
>
> What do you think about this? I am looking forward to your reply.
>

FWIW, I agree this is confusing, and personally would prefer the
committed size be printed as you suggests, but I guess this may break
a lot of tooling.

Note that the gap between committed and used can be quite large in the
following pathological scenarios:

a) lots of class loaders which do not load many classes and stop
loading classes at an inconvenient time: when having been handed a
larger memory chunk after having used up four small chunks - which the
JVM takes as proof that this classloader is here to stay and therefore
may just as well get a large chunk to gnaw on.
This leads to fragmentation and there is no easy way to avoid it. We
can play around with chunk sizes and allocation strategies but there
will always be pathological scenarios since the JVM needs to guess and
may guess wrongly.

b) when class loaders died and their memory chunks are returned to the
internal metaspace freelists. They are not always uncommitted
immediately; in fact they never may be uncommitted. They will be
reused when other classloaders load classes, which may never happen.

Note: to analyze these situations better we (SAP) added a new command
to jcmd, "VM.metaspace". Among other things, the command outputs a
"Waste" section, which details both (a) and (b) and some minor other
waste scenarios. That command is now upstream in stock OpenJDK, so it
should be available in jdk11.

Thanks, Thomas

>
> Thanks,
>
> nijiaben