<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">BTW, Could we get a reaction from the Oracle folks on this? Even though Jeremy and myself are proposing different implementation approaches, we both agree (and Jeremy please correct me on this) that having an allocation sampling mechanism that’s more flexible to what’s already in HotSpot (in particular: the sampling frequency not being tied to the TLAB size) will be a very helpful profiling feature. Is this something that we pursue to contribute?</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Tony</div> <br><p class="airmail_on" style="color:#000;">On June 24, 2015 at 1:57:44 PM, Tony Printezis (<a href="mailto:tprintezis@twitter.com">tprintezis@twitter.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div></div><div>
<title></title>
<div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">
Hi Jeremy,</div>
<div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">
<br></div>
<div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">
Please see inline.</div>
<br>
<p class="airmail_on" style="color:#000;">On June 23, 2015 at
7:22:13 PM, Jeremy Manson (<a href="mailto:jeremymanson@google.com">jeremymanson@google.com</a>)
wrote:</p>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr"><span>I don't want the size of the TLAB, which is
ergonomically adjusted, to be tied to the sampling rate.
There is no reason to do that. I want reasonable statistical
sampling of the allocations. </span></div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>As I said explicitly in my e-mail, I totally agree with this.
Which is why I never suggested to resize TLABs in order to vary the
sampling rate. (Apologies if my e-mail was not clear.)</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div><span><br class="Apple-interchange-newline">
All this requires is a separate counter that is set to the next
sampling interval, and decremented when an allocation happens,
which goes into a slow path when the decrement hits 0. Doing
a subtraction and a pointer bump in allocation instead of just a
pointer bump is basically free. <span class="Apple-converted-space"> </span></span></div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>Maybe on intel is cheap, but maybe it’s not on other platforms
that other folks care about.</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div><span>Note that it has been doing an additional addition (to
keep track of per thread allocation) as part of allocation since
Java 7,<span class="Apple-converted-space"> </span></span></div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>Interesting. I hadn’t realized that. Does that keep track of
total size allocated per thread or number of allocated objects per
thread? If it’s the former, why isn’t it possible to calculate that
from the TLABs information?</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div><span>and no one has complained.</span></div>
<div><span><br></span></div>
<div><span>I'm not worried about the ease of implementation here,
because we've already implemented it. <span class="Apple-converted-space"> </span></span></div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>Yeah, but someone will have to maintain it moving forward.</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div><span>It hasn't even been hard for us to do the forward port,
except when the relevant Hotspot code is significantly
refactored.<br></span>
<div>
<div>
<div><span><br></span></div>
<div><span>We can also turn the sampling off, if we want. We
can set the sampling rate to 2^32, have the sampling code do
nothing, and no one will ever notice. <span class="Apple-converted-space"> </span></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>You still have extra instructions in the allocation path, so
it’s not turned off (i.e., you have the tax without any
benefit).</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div>
<div>
<div>
<div><span>In fact, we could just have the sampling code do
nothing, and no one would ever notice.</span></div>
<div><span><br></span></div>
<div><span>Honestly, no one ever notices the overhead of the
sampling, anyway. JDK8 made it more expensive to grab a stack
trace (the cost became proportional to the number of loaded
classes), but we have a patch that mitigates that, which we would
also be happy to upstream.</span></div>
<div><span><br></span></div>
<div><span>As for the other concern: my concern about *just* having
the callback mechanism is that there is quite a lot you can't do
from user code during an allocation, because of lack of access to
JNI.</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>Maybe I missed something. Are the callbacks in Java? I.e., do
you call them using JNI from the slow path you call directly from
the allocation code?</p>
<p><br></p>
<div>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div>
<div>
<div>
<div><span> However, you can do pretty much anything from the
VM itself. Crucially (for us), we don't just log the stack
traces, we also keep track of which are live and which
aren't. We can't do this in a callback, if the callback can't
create weak refs to the object.</span></div>
<span><br>
What we do at Google is to have two methods: one that you pass a
callback to (the callback gets invoked with a StackTraceData
object, as I've defined above), and another that just tells you
which sampled objects are still live. We could also add a
third, which allowed a callback to set the sampling interval
(basically, the VM would call it to get the integer number of bytes
to be allocated before the next sample). </span></div>
<div><span><br></span></div>
<div><span>Would people be amenable to that? It makes the
code more complex, but, as I say, it's nice for detecting memory
leaks ("Hey! Where did that 1 GB object come
from?").</span></div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br></p>
<p>Well, that 1GB object would have most likely been allocated
outside a TLAB and you could have identified it by instrumenting
the “outside-of-TLAB allocation path” (just saying…).</p>
<p>But, seriously, why didn’t you like my proposal? It can do
anything your scheme can with fewer and simpler code changes. The
only thing that it cannot do is to sample based on object count
(i.e., every 100 objects) instead of based on object size (i.e.,
every 1MB of allocations). But I think doing sampling based on size
is the right approach here (IMHO).</p>
<p>Tony</p>
<p><br></p>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div>
<div>
<div dir="ltr">
<div>
<div>
<div>
<div>
<div><span><br class="Apple-interchange-newline">
Jeremy</span></div>
<div><span><br></span></div>
</div>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><span><br></span>
<div class="gmail_quote"><span>On Tue, Jun 23, 2015 at 1:06 PM,
Tony Printezis<span class="Apple-converted-space"> </span><span dir="ltr"><<a href="mailto:tprintezis@twitter.com" target="_blank">tprintezis@twitter.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br></span>
<blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">
<div style="word-wrap: break-word;">
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
Jeremy (and all),</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
I’m not on the serviceability list so I won’t include the messages
so far. :-) Also CCing the hotspot GC list, in case they have some
feedback on this.</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
Could I suggest a (much) simpler but at least as powerful and
flexible way to do this? (This is something we’ve been meaning to
do for a while now for TwitterJDK, the JDK we develop and deploy
here at Twitter.) You can force allocations to go into the slow
path periodically by artificially setting the TLAB top to a lower
value. So, imagine a TLAB is 4M. You can set top to (bottom+1M).
When an allocation thinks the TLAB is full (in this case, the first
1MB is full) it will call the allocation slow path. There, you can
intercept it, sample the allocation (and, like in your case, you’ll
also have the correct stack trace), notice that the TLAB is not
actually full, extend its to top to, say, (bottom+2M), and you’re
done.</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
Advantages of this approach:</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* This is a much smaller, simpler, and self-contained change (no
compiler changes necessary to maintain...).</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* When it’s off, the overhead is only one extra test at the slow
path TLAB allocation (i.e., negligible; we do some sampling on
TLABs in TwitterJDK using a similar mechanism and, when it’s off,
I’ve observed no performance overhead).</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* (most importantly) You can turn this on and off, and adjust the
sampling rate, dynamically. If you do the sampling based on JITed
code, you’ll have to recompile all methods with allocation sites to
turn the sampling on or off. (You can of course have it always on
and just discard the output; it’d be nice not to have to do that
though. IMHO, at least.)</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* You can also very cheaply turn this on and off (or adjust the
sampling frequncy) per thread, if that’s be helpful in some way
(just add the appropriate info on the thread’s TLAB).</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
A few extra comments on the previous discussion:</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<div style="margin: 0px;">* "JFR samples per new TLAB allocation.
It provides really very good picture and I haven't seen overhead
more than 2” : When TLABs get very large, I don’t think sampling
one object per TLAB is enough to get a good sample (IMHO, at
least). It’s probably OK for something like jbb which mostly
allocates instances of a handful of classes and has very few
allocation sites. But, a lot of the code we run at Twitter is a lot
more elaborate than that and, in our experience, sampling one
object per TLAB is not enough. You can, of course, decrease the
TLAB size to increase the sampling size. But it’d be good not to
have to do that given a smaller TLAB size could increase contention
across threads.</div>
<div style="margin: 0px;"><br></div>
<div style="margin: 0px;">* "Should it *just* take a stack trace,
or should the behavior be configurable?” : I think we’d have to
separate the allocation sampling mechanism from the consumption of
the allocation samples. Once the sampling mechanism is in,
different JVMs can take advantage of it in different ways. I assume
that the Oracle folks would like at least a JFR event for every
such sample. But in your build you can add extra code to collect
the information in the way you have now.</div>
<div style="margin: 0px;"><br></div>
<div style="margin: 0px;">* Talking of JFR, it’s a bit unfortunate
that the AllocObjectInNewTLAB event has both the new TLAB
information and the allocation information. It would have been nice
if that event was split into two, say NewTLAB and
AllocObjectInTLAB, and we’d be able to fire the latter for each
sample.</div>
<div style="margin: 0px;"><br></div>
<div style="margin: 0px;">* "Should the interval between samples be
configurable?” : Totally. In fact, it’d be helpful if it was
configurable dynamically. Imagine if a JVM starts misbehaving after
2-3 weeks of running. You can dynamically increase the sampling
rate to get a better profile if the default is not giving
fine-grain enough information.</div>
<div style="margin: 0px;"><br></div>
</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* "As long of these features don’t contribute to sampling bias” :
If the sampling interval is fixed, sampling bias would be a very
real concern. In the above example, I’d increment top by 1M (the
sampling frequency) + p% (a fudge factor). </div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* "Yes, a perhaps optional callbacks would be nice too.” : Oh, no.
:-) But, as I said, we should definitely separate the sampling
mechanism from the mechanism that consumes the samples.</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
* "Another problem with our submitting things is that we can't
really test on anything other than Linux.” : Another reason to go
with a as platform independent solution as possible. :-)</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
Regards,</div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
<br></div>
<div style="font-family: Helvetica, Arial; font-size: 13px; color: rgb(0, 0, 0); margin: 0px;">
Tony</div>
<br>
<div>
<div style="font-family: helvetica, arial; font-size: 13px;">
<div>-----</div>
<div><br></div>
<div>Tony Printezis | JVM/GC Engineer / VM Team | Twitter</div>
<div><br></div>
<div>@TonyPrintezis</div>
<div><a href="mailto:tprintezis@twitter.com" target="_blank">tprintezis@twitter.com</a></div>
<div><br></div>
</div>
</div>
</div>
</blockquote>
</div>
<br></div>
</div>
</div>
</blockquote>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<br class="Apple-interchange-newline"></div>
<div id="bloop_sign_1435158013209820160" class="bloop_sign">
<div style="font-family:helvetica,arial;font-size:13px">
<div>-----</div>
<div><br></div>
<div>Tony Printezis | JVM/GC Engineer / VM Team | Twitter</div>
<div><br></div>
<div>@TonyPrintezis</div>
<div><a href="mailto:tprintezis@twitter.com">tprintezis@twitter.com</a></div>
<div><br></div>
</div>
</div>
</div></div></span></blockquote> <div id="bloop_sign_1435266682967471104" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px"><div>-----</div><div><br></div><div>Tony Printezis | JVM/GC Engineer / VM Team | Twitter</div><div><br></div><div>@TonyPrintezis</div><div><a href="mailto:tprintezis@twitter.com">tprintezis@twitter.com</a></div><div><br></div></div></div></body></html>