high StringTable scanning overhead during young GCs

Tue Mar 17 20:01:53 UTC 2020

Hi Kim,

Unfortunately, I’m not sure I can share exact numbers / logs from our
production servers (would a GC times chart without units help so you can
see the trend?). But let me give you a quick overview. These numbers are
from when a service used ParallelGC. We have detailed timers for certain GC
phases, so I know exactly how long each phase took (and let me know if
you’d like to know more about said timers :-) ):

* Immediately after a JVM started, StringTable scanning would take around
2ms per young GC.
* 3 days later, StringTable scanning would take around 45x longer than the
object copying time (this translate to around 97.5% of the total young GC
time). So, let’s say for the sake of argument that object copying took 10ms
(this number might or might not be close to the actual one), then
StringTable scanning took around 450ms.
* Immediately after a Full GC, StringTable scanning would drop back down to
around 2ms.

All this tells me that:

* I don’t think there’s a bug. I think the service just piles up a lot of
entries in the StringTable (unfortunately, I don’t know the actual number;
is there a flag that prints it in 8?).
* Parallelizing StringTable scanning would just postpone the problem. Let’s
say we use 4 threads. In the above example, StringTable scanning would
still take more than 10x longer than the object copying time after 3 days.

And, yes, this might be surprising but we do have services that do no Full
GCs / concurrent cycles over several days. Most of our services are
stateless that put little pressure on the old gen. And, yes, I know that
this is generally uncommon.

Tony

—————
Tony Printezis | @TonyPrintezis | tprintezis at twitter.com

On March 16, 2020 at 11:12:59 PM, Kim Barrett (kim.barrett at oracle.com)
wrote:

> On Mar 16, 2020, at 5:02 PM, Tony Printezis <tprintezis at twitter.com>
wrote:
>
> Hi all,
>
> We have seen the following issue a few times in our data centers: A
service
> is interning strings at a steady rate. The interned strings live long
> enough to be promoted to the old generation. They are reclaimed during
the
> following Full GC / concurrent cycle. However, until that happens, young
GC
> times are monotonically increasing as young GCs have to scan the entire
> StringTable, which keeps growing in size.
>
> I can reproduce this with the latest 15, with both G1 and ParallelGC.
>
> FWIW, we use mostly CMS in 8 and we addressed this by forcing more
frequent
> CMS cycles (we have a flag that starts a CMS cycle every N secs). This
> helps to keep young GC times mostly in check for our services that suffer
> from this issue.
>
> However, this overhead could be avoided if the StringTable was split into
> two parts, one for entries that could potentially point into the young
gen,
> the other for entries that definitely do not point into the young gen.
Each
> young GC will only have to scan the former (similar to what was done for
> nmethods).
>
> Has anyone looked into this?

All young collections treat the StringTable as weak. (I think only
Parallel used to, but that was changed a while ago.) Young collection
processing of a weak entry is relatively cheap when the referent is
old (determine that it's not in the collection set, so do nothing).

(since JDK 12?) G1 parallelizes StringTable processing (in conjunction
with several other kinds of off-heap weak references, such as JNI weak
references). So one would think it would require a pretty large
number of old StringTable entries to have a significant effect.

ParallelGC currently processes the StringTable single threaded; there's
an RFE to parallelize it (JDK-8210100).

Can you provide some numbers / log data? If there is a performance
problem there, maybe there's just a bug somewhere that needs to be
fixed. I think splitting the StringTable that way isn't entirely
trivial, and would need pretty strong justification.