<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>

<div id="x_compose-container" itemscope="" itemtype="https://schema.org/EmailMessage" style="direction:ltr">

<span itemprop="creator" itemscope="" itemtype="https://schema.org/Organization"><span itemprop="name"></span></span>

<div>

<div style="direction:ltr">

<div style="direction:ltr">Hello,</div>

<div><br>

</div>

<div style="direction:ltr">First of all I wanted to mention as I understand it the application is a graph query server, therefore freeing and creating of large graphs is the workload the GC would have to deal with. (I think Thomas described that allocation

 pattern - it would lead to spiky promotions if a query transaction lasts longer then a new collection cycle (or multiple))</div>

<div><br>

</div>

<div style="direction:ltr">And in Addition, Thomas did you removed your graph or is

<a dir="ltr" href="http://Java.net">Java.net</a> refusing to serve pages now?</div>

<div><br>

</div>

<div style="direction:ltr">Gruss</div>

<div style="direction:ltr">Bernd</div>

</div>

<div><br>

</div>

<div class="x_acompli_signature">

<div>Gruss</div>

<div>Bernd</div>

<div>-- </div>

<div><a dir="ltr" href="http://bernd.eckenfels.net">http://bernd.eckenfels.net</a></div>

</div>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> hotspot-gc-dev <hotspot-gc-dev-bounces@openjdk.java.net> on behalf of Thomas Schatzl <thomas.schatzl@oracle.com><br>

<b>Sent:</b> Tuesday, October 17, 2017 10:22:54 AM<br>

<b>To:</b> Kirk Pepperdine<br>

<b>Cc:</b> hotspot-gc-dev@openjdk.java.net openjdk.java.net<br>

<b>Subject:</b> Re: Strange G1 behavior</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Hi Kirk,<br>

<br>

On Mon, 2017-10-16 at 20:07 +0200, Kirk Pepperdine wrote:<br>

> Hi Thomas,<br>

> <br>

> Again, thank you for the detailed response.<br>

> <br>

> <br>

> > On Oct 16, 2017, at 1:32 PM, Thomas Schatzl <thomas.schatzl@oracle.<br>

> > com> wrote:<br>

> > <br>

> > For the allocation rate, please compare the slopes of heap usage<br>

> > after (young) gcs during these spikes (particularly in that full gc<br>

> > case) and normal operation.<br>

> <br>

> Censum estimates allocation rates as this is a metric that I<br>

> routinely evaluate.<br>

> <br>

> This log shows a spike at 10:07 which correlates with the Full GC<br>

> event. However the allocation rates while high, are well within<br>

> values I’ve seen with many other applications that are well behaved.<br>

> Censum also estimates rates of promotion and those seem exceedingly<br>

> high at 10:07. That said, there are spikes just after 10:10 and <br>

> around 10:30 which don’t trigger a Full. In both cases the estimates<br>

> for allocation rates are high though the estimates for rates<br>

> of  promotion while high, are not as high as those seen at 10:07.<br>

><br>

> All in all, nothing here seems out of the ordinary and while I want<br>

> you to be right about the waste and PLAB behaviors, these spikes feel<br>

> artificial, i.e. I still want to blame the collector for not being<br>

> able to cope with some aspect of application behavior that it should<br>

> be able to cope with.. that is something other than a high allocation<br>

> rate with low recover due to data simply being reference and therefor<br>

> not eligible for collection...<br>

<br>

I always meant "promotion rate" here as allocation rate. For this<br>

discussion (and in general) in a generational collector the<br>

application's real allocation rate is usually not very interesting.<br>

<br>

Sorry for being imprecise.<br>

<br>

> > In this application, given the information I have, every like<br>

> > 1500s, there seems to be some component in the application that<br>

> > allocates a lot of memory in a short time, and holds onto most of<br>

> > it for its duration.<br>

> <br>

> Sorry but I’m not seeing this pattern either in occupancy after or<br>

> allocation rate views. What I do see is a systematic loss of free<br>

> heap over time (slow memory leak??? Effects of caching???).<br>

<br>

Let's have a look at the heap usage after gc over time for a few<br>

collection cycle before that full gc event. Please look at <a href="http://cr.op">

http://cr.op</a><br>

enjdk.java.net/~tschatzl/g1-strange-log/strange-g1-promo.png which just<br>

shows a few of these.<br>

<br>

I added rough linear interpolations of the heap usage after gc (so that<br>

the slope of these lines corresponds to the promotion rates). I can see<br>

a large, significant difference in the slopes between the collection<br>

cycles before the full gc event (black lines) and the full gc event<br>

(red line), while all black ones are roughly the same. :)<br>

<br>

Note that my graphs were painted off-hand without any actual<br>

calculation, and particular the red one looks like an underestimation<br>

of the slope.<br>

<br>

> As I look at all of the views in Censum I see nothing outstanding<br>

> that leads me to believe that this Full is a by-product of some<br>

> interaction between the collector and the application (some form of<br>

> zombies????). Also, one certainly cannot rule out your speculation <br>

<br>

It does not look like there is an issue e.g. with j.l.ref.References of<br>

any kind.<br>

<br>

> for heap fragmentation in PLABs. I simply don’t have the data to say<br>

> anything about that though I know it can be a huge issue. What I can<br>

> say is that even if there is 20% waste, it can’t account for the<br>

> amount of memory being recovered. I qualify that with, unless there <br>

> is a blast of barely humongous allocations taking place. I’d like to<br>

> thing this is a waste issue but I’m suspicious. I’m also suspicious<br>

> that it’s simply the application allocating in a burst then<br>

> releasing. If that were the case I’d expect a much more gradual<br>

> reduction in the live set size.<br>

> <br>

> I think the answer right now is; we need more data.<br>

<br>

Agree.<br>

<br>

> I’ll try to get <br>

> the “client” to turn on the extra flags and see what that yields. I<br>

> won’t play with plab sizing this go ‘round if you don’t mind. If<br>

> you’re right and it is a problem with waste, then the beer is on me<br>

> the next time we meet.<br>

> <br>

> The don’t allocate arrays in size of powers of 2 is an interesting<br>

> comment. While there are clear advantages to allocating arrays in<br>

> size of powers of 2, I believe in that these cases are specialized<br>

> and that I don’t generally see people dogmatically allocating this<br>

> way.<br>

<br>

There are cases where you probably want 2^n sized buffers, but in many,<br>

many cases like some serialization for data transfer, it does not<br>

matter a bit whether the output buffer can hold exactly 2^n bytes or<br>

not, i.e. is just a bit smaller.<br>

<br>

Of course this is something G1 should handle better by itself, but for<br>

now that is what you can do about this.<br>

<br>

Thanks,<br>

  Thomas<br>

<br>

</div>

</span></font>

</body>

</html>