Low-Overhead Heap Profiling
JC Beyler
jcbeyler at google.com
Fri Jun 30 05:01:30 UTC 2017
Thanks Robbin,
This seems to have worked. When I have the next webrev ready, we will find
out but I'm fairly confident it will work!
Thanks agian!
Jc
On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> Hi JC,
>
> On 06/29/2017 12:15 AM, JC Beyler wrote:
>
>> B) Incremental changes
>>
>
> I guess the most common work flow here is using mq :
> hg qnew fix_v1
> edit files
> hg qrefresh
> hg qnew fix_v2
> edit files
> hg qrefresh
>
> if you do hg log you will see 2 commits
>
> webrev.ksh -r -2 -o my_inc_v1_v2
> webrev.ksh -o my_full_v2
>
>
> In your .hgrc you might need:
> [extensions]
> mq =
>
> /Robbin
>
>
>> Again another newbiew question here...
>>
>> For showing the incremental changes, is there a link that explains how to
>> do that? I apologize for my newbie questions all the time :)
>>
>> Right now, I do:
>>
>> ksh ../webrev.ksh -m -N
>>
>> That generates a webrev.zip and send it to Chuck Rasbold. He then uploads
>> it to a new webrev.
>>
>> I tried commiting my change and adding a small change. Then if I just do
>> ksh ../webrev.ksh without any options, it seems to produce a similar page
>> but now with only the changes I had (so the 06-07 comparison you were
>> talking about) and a changeset that has it all. I imagine that is what you
>> meant.
>>
>> Which means that my workflow would become:
>>
>> 1) Make changes
>> 2) Make a webrev without any options to show just the differences with
>> the tip
>> 3) Amend my changes to my local commit so that I have it done with
>> 4) Go to 1
>>
>> Does that seem correct to you?
>>
>> Note that when I do this, I only see the full change of a file in the
>> full change set (Side note here: now the page says change set and not
>> patch, which is maybe why Serguei was having issues?).
>>
>> Thanks!
>> Jc
>>
>>
>>
>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <robbin.ehn at oracle.com
>> <mailto:robbin.ehn at oracle.com>> wrote:
>>
>> Hi,
>>
>> On 06/28/2017 12:04 AM, JC Beyler wrote:
>>
>> Dear Thomas et al,
>>
>> Here is the newest webrev:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>
>>
>>
>> You have some more bits to in there but generally this looks good and
>> really nice with more tests.
>> I'll do and deep dive and re-test this when I get back from my long
>> vacation with whatever patch version you have then.
>>
>> Also I think it's time you provide incremental (v06->07 changes) as
>> well as complete change-sets.
>>
>> Thanks, Robbin
>>
>>
>>
>>
>> Thomas, I "think" I have answered all your remarks. The summary
>> is:
>>
>> - The statistic system is up and provides insight on what the
>> heap sampler is doing
>> - I've noticed that, though the sampling rate is at the
>> right mean, we are missing some samples, I have not yet tracked out why
>> (details below)
>>
>> - I've run a tiny benchmark that is the worse case: it is a very
>> tight loop and allocated a small array
>> - In this case, I see no overhead when the system is off so
>> that is a good start :)
>> - I see right now a high overhead in this case when sampling
>> is on. This is not a really too surprising but I'm going to see if this is
>> consistent with our
>> internal implementation. The benchmark is really allocation
>> stressful so I'm not too surprised but I want to do the due diligence.
>>
>> - The statistic system up is up and I have a new test
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >
>> - I did a bit of a study about the random generator here,
>> more details are below but basically it seems to work well
>>
>> - I added a capability but since this is the first time doing
>> this, I was not sure I did it right
>> - I did add a test though for it and the test seems to do
>> what I expect (all methods are failing with the
>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>> - http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>
>>
>> - I still need to figure out what to do about the multi-agent
>> vs single-agent issue
>>
>> - As far as measurements, it seems I still need to look at:
>> - Why we do the 20 random calls first, are they necessary?
>> - Look at the mean of the sampling rate that the random
>> generator does and also what is actually sampled
>> - What is the overhead in terms of memory/performance when
>> on?
>>
>> I have inlined my answers, I think I got them all in the new
>> webrev, let me know your thoughts.
>>
>> Thanks again!
>> Jc
>>
>>
>> On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <
>> thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com> <mailto:
>> thomas.schatzl at oracle.com
>>
>> <mailto:thomas.schatzl at oracle.com>>> wrote:
>>
>> Hi,
>>
>> On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:
>> > Hi all,
>> >
>> > First off: Thanks again to Robbin and Thomas for their
>> reviews :)
>> >
>> > Next, I've uploaded a new webrev:
>> > http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>
>> >
>> > Here is an update:
>> >
>> > - @Robbin, I forgot to say that yes I need to look at
>> implementing
>> > this for the other architectures and testing it before it
>> is all
>> > ready to go. Is it common to have it working on all
>> possible
>> > combinations or is there a subset that I should be doing
>> first and we
>> > can do the others later?
>> > - I've tested slowdebug, built and ran the JTreg tests I
>> wrote with
>> > slowdebug and fixed a few more issues
>> > - I've refactored a bit of the code following Thomas'
>> comments
>> > - I think I've handled all the comments from Thomas (I
>> put
>> > comments inline below for the specifics)
>>
>> Thanks for handling all those.
>>
>> > - Following Thomas' comments on statistics, I want to add
>> some
>> > quality assurance tests and find that the easiest way
>> would be to
>> > have a few counters of what is happening in the sampler
>> and expose
>> > that to the user.
>> > - I'll be adding that in the next version if no one
>> sees any
>> > objections to that.
>> > - This will allow me to add a sanity test in JTreg
>> about number of
>> > samples and average of sampling rate
>> >
>> > @Thomas: I had a few questions that I inlined below but I
>> will
>> > summarize the "bigger ones" here:
>> > - You mentioned constants are not using the right
>> conventions, I
>> > looked around and didn't see any convention except normal
>> naming then
>> > for static constants. Is that right?
>>
>> I looked through https://wiki.openjdk.java.net/
>> display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>
>> <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>> de and the rule is to "follow an existing pattern and must
>> have a
>> distinct appearance from other names". Which does not help a
>> lot I
>> guess :/ The GC team started using upper camel case, e.g.
>> SomeOtherConstant, but very likely this is probably not
>> applied
>> consistently throughout. So I am fine with not adding
>> another style
>> (like kMaxStackDepth with the "k" in front with some unknown
>> meaning)
>> is fine.
>>
>> (Chances are you will find that style somewhere used anyway
>> too,
>> apologies if so :/)
>>
>>
>> Thanks for that link, now I know where to look. I used the upper
>> camel case in my code as well then :) I should have gotten them all.
>>
>>
>> > PS: I've also inlined my answers to Thomas below:
>> >
>> > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl
>> <thomas.schatzl at oracl
>> > e.com <http://e.com> <http://e.com>> wrote:
>> > > Hi all,
>> > >
>> > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>> > > > Dear all,
>> > > >
>> > > > I've continued working on this and have done the
>> following
>> > > webrev:
>> > > > http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>>
>> > >
>> > > [...]
>> > > > Things I still need to do:
>> > > > - Have to fix that TLAB case for the FastTLABRefill
>> > > > - Have to start looking at the data to see that it
>> is
>> > > consistent and does gather the right samples, right
>> frequency, etc.
>> > > > - Have to check the GC elements and what that
>> produces
>> > > > - Run a slowdebug run and ensure I fixed all those
>> issues you
>> > > saw > Robbin
>> > > >
>> > > > Thanks for looking at the webrev and have a great
>> week!
>> > >
>> > > scratching a bit on the surface of this change, so
>> apologies for
>> > > rather shallow comments:
>> > >
>> > > - macroAssembler_x86.cpp:5604: while this is compiler
>> code, and I
>> > > am not sure this is final, please avoid littering the
>> code with
>> > > TODO remarks :) They tend to be candidates for later
>> wtf moments
>> > > only.
>> > >
>> > > Just file a CR for that.
>> > >
>> > Newcomer question: what is a CR and not sure I have the
>> rights to do
>> > that yet ? :)
>>
>> Apologies. CR is a change request, this suggests to file a
>> bug in the
>> bug tracker. And you are right, you can't just create a new
>> account in
>> the OpenJDK JIRA yourselves. :(
>>
>>
>> Ok good to know, I'll continue with my own todo list but I'll
>> work hard on not letting it slip in the webrevs anymore :)
>>
>>
>> I was mostly referring to the "... but it is a TODO" part of
>> that
>> comment in macroassembler_x86.cpp. Comments about the why of
>> the code
>> are appreciated.
>>
>> [Note that I now understand that this is to some degree
>> still work in
>> progress. As long as the final changeset does no contain
>> TODO's I am
>> fine (and it's not a hard objection, rather their use in
>> "final" code
>> is typically limited in my experience)]
>>
>> 5603 // Currently, if this happens, just set back the
>> actual end to
>> where it was.
>> 5604 // We miss a chance to sample here.
>>
>> Would be okay, if explaining "this" and the "why" of missing
>> a chance
>> to sample here would be best.
>>
>> Like maybe:
>>
>> // If we needed to refill TLABs, just set the actual end
>> point to
>> // the end of the TLAB again. We do not sample here although
>> we could.
>>
>> Done with your comment, it works well in my mind.
>>
>> I am not sure whether "miss a chance to sample" meant "we
>> could, but
>> consciously don't because it's not that useful" or "it would
>> be
>> necessary but don't because it's too complicated to do.".
>>
>> Looking at the original comment once more, I am also not
>> sure if that
>> comment shouldn't referring to the "end" variable (not
>> actual_end)
>> because that's the variable that is responsible for taking
>> the sampling
>> path? (Going from the member description of
>> ThreadLocalAllocBuffer).
>>
>>
>> I've moved this code and it no longer shows up here but the
>> rationale and answer was:
>>
>> So.. Yes, end is the variable provoking the sampling. Actual end
>> is the actual end of the TLAB.
>>
>> What was happening here is that the code is resetting _end to
>> point towards the end of the new TLAB. Because, we now have the end for
>> sampling and _actual_end for
>> the actual end, we need to update the actual_end as well.
>>
>> Normally, were we to do the real work here, we would calculate
>> the (end - start) offset, then do:
>>
>> - Set the new end to : start + (old_end - old_start)
>> - Set the actual end like we do here now where it because it is
>> the actual end.
>>
>> Why is this not done here now anymore?
>> - I was still debating which path to take:
>> - Do it in the fast refill code, it has its perks:
>> - In a world where fast refills are happening all the
>> time or a lot, we can augment there the code to do the sampling
>> - Remember what we had as an end before leaving the
>> slowpath and check on return
>> - This is what I'm doing now, it removes the need to
>> go fix up all fast refill paths but if you remain in fast refill paths, you
>> won't get sampling. I
>> have to think of the consequences of that, maybe a future change
>> later on?
>> - I have the statistics now so I'm going to study
>> that
>> -> By the way, though my statistics are showing
>> I'm missing some samples, if I turn off FastTlabRefill, it is the same loss
>> so for now, it seems
>> this does not occur in my simple test.
>>
>>
>>
>> But maybe I am only confused and it's best to just leave the
>> comment
>> away. :)
>>
>> Thinking about it some more, doesn't this not-sampling in
>> this case
>> mean that sampling does not work in any collector that does
>> inline TLAB
>> allocation at the moment? (Or is inline TLAB alloc
>> automatically
>> disabled with sampling somehow?)
>>
>> That would indeed be a bigger TODO then :)
>>
>>
>> Agreed, this remark made me think that perhaps as a first step
>> the new way of doing it is better but I did have to:
>> - Remove the const of the ThreadLocalBuffer remaining and
>> hard_end methods
>> - Move hard_end out of the header file to have a bit more
>> logic there
>>
>> Please let me know what you think of that and if you prefer it
>> this way or changing the fast refills. (I prefer this way now because it is
>> more incremental).
>>
>>
>> > > - calling HeapMonitoring::do_weak_oops() (which should
>> probably be
>> > > called weak_oops_do() like other similar methods) only
>> if string
>> > > deduplication is enabled (in g1CollectedHeap.cpp:4511)
>> seems wrong.
>> >
>> > The call should be at least around 6 lines up outside the
>> if.
>> >
>> > Preferentially in a method like
>> process_weak_jni_handles(), including
>> > additional logging. (No new (G1) gc phase without minimal
>> logging
>> > :)).
>> > Done but really not sure because:
>> >
>> > I put for logging:
>> > log_develop_trace(gc, freelist)("G1ConcRegionFreeing
>> [other] : heap
>> > monitoring");
>>
>> I would think that "gc, ref" would be more appropriate log
>> tags for
>> this similar to jni handles.
>> (I am als not sure what weak reference handling has to do
>> with
>> G1ConcRegionFreeing, so I am a bit puzzled)
>>
>>
>> I was not sure what to put for the tags or really as the message.
>> I cleaned it up a bit now to:
>> log_develop_trace(gc, ref)("HeapSampling [other] : heap
>> monitoring processing");
>>
>>
>>
>> > Since weak_jni_handles didn't have logging for me to be
>> inspired
>> > from, I did that but unconvinced this is what should be
>> done.
>>
>> The JNI handle processing does have logging, but only in
>> ReferenceProcessor::process_discovered_references(). In
>> process_weak_jni_handles() only overall time is measured (in
>> a G1
>> specific way, since only G1 supports disabling reference
>> procesing) :/
>>
>> The code in ReferenceProcessor prints both time taken
>> referenceProcessor.cpp:254, as well as the count, but
>> strangely only in
>> debug VMs.
>>
>> I have no idea why this logging is that unimportant to only
>> print that
>> in a debug VM. However there are reviews out for changing
>> this area a
>> bit, so it might be useful to wait for that (JDK-8173335).
>>
>>
>> I cleaned it up a bit anyway and now it returns the count of
>> objects that are in the system.
>>
>>
>> > > - the change doubles the size of
>> > > CollectedHeap::allocate_from_tlab_slow() above the
>> "small and nice"
>> > > threshold. Maybe it could be refactored a bit.
>> > Done I think, it looks better to me :).
>>
>> In ThreadLocalAllocBuffer::handle_sample() I think the
>> set_back_actual_end()/pick_next_sample() calls could be
>> hoisted out of
>> the "if" :)
>>
>>
>> Done!
>>
>>
>> > > - referenceProcessor.cpp:261: the change should add
>> logging about
>> > > the number of references encountered, maybe after the
>> corresponding
>> > > "JNI weak reference count" log message.
>> > Just to double check, are you saying that you'd like to
>> have the heap
>> > sampler to keep in store how many sampled objects were
>> encountered in
>> > the HeapMonitoring::weak_oops_do?
>> > - Would a return of the method with the number of
>> handled
>> > references and logging that work?
>>
>> Yes, it's fine if HeapMonitoring::weak_oops_do() only
>> returned the
>> number of processed weak oops.
>>
>>
>> Done also (but I admit I have not tested the output yet) :)
>>
>>
>> > - Additionally, would you prefer it in a separate block
>> with its
>> > GCTraceTime?
>>
>> Yes. Both kinds of information is interesting: while the
>> time taken is
>> typically more important, the next question would be why,
>> and the
>> number of references typically goes a long way there.
>>
>> See above though, it is probably best to wait a bit.
>>
>>
>> Agreed that I "could" wait but, if it's ok, I'll just
>> refactor/remove this when we get closer to something final. Either,
>> JDK-8173335
>> has gone in and I will notice it now or it will soon and I can
>> change it then.
>>
>>
>> > > - threadLocalAllocBuffer.cpp:331: one more "TODO"
>> > Removed it and added it to my personal todos to look at.
>> > > >
>> > > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer
>> class
>> > > documentation should be updated about the sampling
>> additions. I
>> > > would have no clue what the difference between
>> "actual_end" and
>> > > "end" would be from the given information.
>> > If you are talking about the comments in this file, I made
>> them more
>> > clear I hope in the new webrev. If it was somewhere else,
>> let me know
>> > where to change.
>>
>> Thanks, that's much better. Maybe a note in the comment of
>> the class
>> that ThreadLocalBuffer provides some sampling facility by
>> modifying the
>> end() of the TLAB to cause "frequent" calls into the runtime
>> call where
>> actual sampling takes place.
>>
>>
>> Done, I think it's better now. Added something about the
>> slow_path_end as well.
>>
>>
>> > > - in heapMonitoring.hpp: there are some random comments
>> about some
>> > > code that has been grabbed from
>> "util/math/fastmath.[h|cc]". I
>> > > can't tell whether this is code that can be used but I
>> assume that
>> > > Noam Shazeer is okay with that (i.e. that's all Google
>> code).
>> > Jeremy and I double checked and we can release that as I
>> thought. I
>> > removed the comment from that piece of code entirely.
>>
>> Thanks.
>>
>> > > - heapMonitoring.hpp/cpp static constant naming does not
>> correspond
>> > > to Hotspot's. Additionally, in Hotspot static methods
>> are cased
>> > > like other methods.
>> > I think I fixed the methods to be cased the same way as
>> all other
>> > methods. For static constants, I was not sure. I fixed a
>> few other
>> > variables but I could not seem to really see a consistent
>> trend for
>> > constants. I made them as variables but I'm not sure now.
>>
>> Sorry again, style is a kind of mess. The goal of my
>> suggestions here
>> is only to prevent yet another style creeping in.
>>
>> > > - in heapMonitoring.cpp there are a few cryptic comments
>> at the top
>> > > that seem to refer to internal stuff that should
>> probably be
>> > > removed.
>> > Sorry about that! My personal todos not cleared out.
>>
>> I am happy about comments, but I simply did not understand
>> any of that
>> and I do not know about other readers as well.
>>
>> If you think you will remember removing/updating them until
>> the review
>> proper (I misunderstood the review situation a little it
>> seems).
>>
>> > > I did not think through the impact of the TLAB changes
>> on collector
>> > > behavior yet (if there are). Also I did not check for
>> problems with
>> > > concurrent mark and SATB/G1 (if there are).
>> > I would love to know your thoughts on this, I think this
>> is fine. I
>>
>> I think so too now. No objects are made live out of thin air
>> :)
>>
>> > see issues with multiple threads right now hitting the
>> stack storage
>> > instance. Previous webrevs had a mutex lock here but we
>> took it out
>> > for simplificity (and only for now).
>>
>> :) When looking at this after some thinking I now assume for
>> this
>> review that this code is not MT safe at all. There seems to
>> be more
>> synchronization missing than just the one for the
>> StackTraceStorage. So
>> no comments about this here.
>>
>>
>> I doubled checked a bit (quickly I admit) but it seems that
>> synchronization in StackTraceStorage is really all you need (all methods
>> lead to a StackTraceStorage one
>> and can be multithreaded outside of that).
>> There is a question about the initialization where the method
>> HeapMonitoring::initialize_profiling is not thread safe.
>> It would work (famous last words) and not crash if there was a
>> race but we could add a synchronization point there as well (and therefore
>> on the stop as well).
>>
>> But anyway I will really check and do this once we add back
>> synchronization.
>>
>>
>> Also, this would require some kind of specification of what
>> is allowed
>> to be called when and where.
>>
>>
>> Would we specify this with the methods in the jvmti.xml file? We
>> could start by specifying in each that they are not thread safe but I saw
>> no mention of that for
>> other methods.
>>
>>
>> One potentially relevant observation about locking here:
>> depending on
>> sampling frequency, StackTraceStore::add_trace() may be
>> rather
>> frequently called. I assume that you are going to do
>> measurements :)
>>
>>
>> Though we don't have the TLAB implementation in our code, the
>> compiler generated sampler uses 2% of overhead with a 512k sampling rate. I
>> can do real measurements
>> when the code settles and we can see how costly this is as a TLAB
>> implementation.
>> However, my theory is that if the rate is 512k, the
>> memory/performance overhead should be minimal since it is what we saw with
>> our code/workloads (though not called
>> the same way, we call it essentially at the same rate).
>> If you have a benchmark you'd like me to test, let me know!
>>
>> Right now, with my really small test, this does use a bit of
>> overhead even for a 512k sample size. I don't know yet why, I'm going to
>> see what is going on.
>>
>> Finally, I think it is not reasonable to suppose the overhead to
>> be negligible if the sampling rate used is too low. The user should know
>> that the lower the rate,
>> the higher the overhead (documentation TODO?).
>>
>>
>> I am not sure what the expected usage of the API is, but
>> StackTraceStore::add_trace() seems to be able to grow
>> without bounds.
>> Only a GC truncates them to the live ones. That in itself
>> seems to be
>> problematic (GCs can be *wide* apart), and of course some of
>> the API
>> methods add to that because they duplicate that unbounded
>> array. Do you
>> have any concerns/measurements about this?
>>
>>
>> So, the theory is that yes add_trace can be able to grow without
>> bounds but it grows at a sample per 512k of allocated space. The stacks it
>> gathers are currently
>> maxed at 64 (I'd like to expand that to an option to the user
>> though at some point). So I have no concerns because:
>>
>> - If really this is taking a lot of space, that means the job is
>> keeping a lot of objects in memory as well, therefore the entire heap is
>> getting huge
>> - If this is the case, you will be triggering a GC at some point
>> anyway.
>>
>> (I'm putting under the rug the issue of "What if we set the rate
>> to 1 for example" because as you lower the sampling rate, we cannot
>> guarantee low overhead; the
>> idea behind this feature is to have a means of having meaningful
>> allocated samples at a low overhead)
>>
>> I have no measurements really right now but since I now have some
>> statistics I can poll, I will look a bit more at this question.
>>
>> I have the same last sentence than above: the user should expect
>> this to happen if the sampling rate is too small. That probably can be
>> reflected in the
>> StartHeapSampling as a note : careful this might impact your
>> performance.
>>
>>
>> Also, these stack traces might hold on to huge arrays. Any
>> consideration of that? Particularly it might be the cause
>> for OOMEs in
>> tight memory situations.
>>
>>
>> There is a stack size maximum that is set to 64 so it should not
>> hold huge arrays. I don't think this is an issue but I can double check
>> with a test or two.
>>
>>
>> - please consider adding a safepoint check in
>> HeapMonitoring::weak_oops_do to prevent accidental misuse.
>>
>> - in struct StackTraceStorage, the public fields may also
>> need
>> underscores. At least some files in the runtime directory
>> have structs
>> with underscored public members (and some don't). The
>> runtime team
>> should probably comment on that.
>>
>>
>> Agreed I did not know. I looked around and a lot of structs did
>> not have them it seemed so I left it as is. I will happily change it if
>> someone prefers (I was not
>> sure if you really preferred or not, your sentence seemed to be
>> more a note of "this might need to change but I don't know if the runtime
>> team enforces that", let
>> me know if I read that wrongly).
>>
>>
>> - In StackTraceStorage::weak_oops_do(), when examining the
>> StackTraceData, maybe it is useful to consider having a
>> non-NULL
>> reference outside of the heap's reserved space an error.
>> There should
>> be no oop outside of the heap's reserved space ever.
>>
>> Unless you allow storing random values in
>> StackTraceData::obj, which I
>> would not encourage.
>>
>>
>> I suppose you are talking about this part:
>> if ((value != NULL && Universe::heap()->is_in_reserved(value)) &&
>> (is_alive == NULL || is_alive->do_object_b(value))) {
>>
>> What you are saying is that I could have something like:
>> if (value != my_non_null_reference &&
>> (is_alive == NULL || is_alive->do_object_b(value))) {
>>
>> Is that what you meant? Is there really a reason to do so? When I
>> look at the code, is_in_reserved seems like a O(1) method call. I'm not
>> even sure we can have a
>> NULL value to be honest. I might have to study that to see if
>> this was not a paranoid test to begin with.
>>
>> The is_alive code has now morphed due to the comment below.
>>
>>
>>
>> - HeapMonitoring::weak_oops_do() does not seem to use the
>> passed AbstractRefProcTaskExecutor.
>>
>>
>> It did use it:
>> size_t HeapMonitoring::weak_oops_do(
>> AbstractRefProcTaskExecutor *task_executor,
>> BoolObjectClosure* is_alive,
>> OopClosure *f,
>> VoidClosure *complete_gc) {
>> assert(SafepointSynchronize::is_at_safepoint(), "must be at
>> safepoint");
>>
>> if (task_executor != NULL) {
>> task_executor->set_single_threaded_mode();
>> }
>> return StackTraceStorage::storage()->weak_oops_do(is_alive,
>> f, complete_gc);
>> }
>>
>> But due to the comment below, I refactored this, so this is no
>> longer here. Now I have an always true closure that is passed.
>>
>>
>> - I do not understand allowing to call this method with a
>> NULL
>> complete_gc closure. This would mean that objects referenced
>> from the
>> object that is referenced by the StackTraceData are not
>> pulled, meaning
>> they would get stale.
>>
>> - same with is_alive parameter value of NULL
>>
>>
>> So these questions made me look a bit closer at this code. This
>> code I think was written this way to have a very small impact on the file
>> but you are right, there
>> is no reason for this here. I've simplified the code by making in
>> referenceProcessor.cpp a process_HeapSampling method that handles
>> everything there.
>>
>> The code allowed NULLs because it depended on where you were
>> coming from and how the code was being called.
>>
>> - I added a static always_true variable and pass that now to be
>> more consistent with the rest of the code.
>> - I moved the complete_gc into process_phaseHeapSampling now (new
>> method) and handle the task_executor and the complete_gc there
>> - Newbie question: in our code we did a
>> set_single_threaded_mode but I see that process_phaseJNI does it right
>> before its call, do I need to do it for the
>> process_phaseHeapSample?
>> That API is much cleaner (in my mind) and is consistent with what
>> is done around it (again in my mind).
>>
>>
>> - heapMonitoring.cpp:590: I do not completely understand the
>> purpose of
>> this code: in the end this results in a fixed value directly
>> dependent
>> on the Thread address anyway? In the end this results in a
>> fixed value
>> directly dependent on the Thread address anyway?
>> IOW, what is special about exactly 20 rounds?
>>
>>
>> So we really want a fast random number generator that has a
>> specific mean (512k is the default we use). The code uses the thread
>> address as the start number of the
>> sequence (why not, it is random enough is rationale). Then
>> instead of just starting there, we prime the sequence and really only start
>> at the 21st number, it is
>> arbitrary and I have not done a study to see if we could do more
>> or less of that.
>>
>> As I have the statistics of the system up and running, I'll run
>> some experiments to see if this is needed, is 20 good, or not.
>>
>>
>> - also I would consider stripping a few bits of the threads'
>> address as
>> initialization value for your rng. The last three bits (and
>> probably
>> more, check whether the Thread object is allocated on special
>> boundaries) are always zero for them.
>> Not sure if the given "random" value is random enough
>> before/after,
>> this method, so just skip that comment if you think this is
>> not
>> required.
>>
>>
>> I don't know is the honest answer. I think what is important is
>> that we tend towards a mean and it is random "enough" to not fall in
>> pitfalls of only sampling a
>> subset of objects due to their allocation order. I added that as
>> test to do to see if it changes the mean in any way for the 512k default
>> value and/or if the first
>> 1000 elements look better.
>>
>>
>> Some more random nits I did not find a place to put anywhere:
>>
>> - ThreadLocalAllocBuffer::_extra_space does not seem to be
>> used
>> anywhere?
>>
>>
>> Good catch :).
>>
>>
>> - Maybe indent the declaration of
>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the other
>> members of that group.
>>
>>
>> Done moved it up a bit to have non static members together and
>> static separate.
>>
>> Thanks,
>> Thomas
>>
>>
>> Thanks for your review!
>> Jc
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170629/6b7bc0f6/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list