[RFC] Enhanced Garbage Collection Probe Points

Wed Aug 29 18:19:16 PDT 2012

Hey Lukas,

Nice improvement.  I've snipped the stuff that looks all-good now.

> > 
> > + * @size: word size to be cleaned
> > 
> > (in several places)
> > This wording (pun not intended) is not entirely clear.  Playing the
> > ignorant reader, I am not sure if this means the wordsize that the
> > jvm uses, or the total space, in words, that will be collected on
> > this gc run.  Can this be clarified?
> I've changed this to: "Word size of the object to be collected."

So, this maybe is not a great improvement.  Does this mean that in the
gc run there will be a single object collected?  Is that single object
@size words big?  I think I know better that this is not what you
mean here, but it reads unclear.  Let me ask my question another way,
is @size a target for this collection run?  Or is it the size of the
region to be collected?

This also introduces a terminology collision for java-centric folks.
I think you maybe mean 'region' or 'generation' rather than 'object',
which has a specific meaning in Java-land.  This is also an issue
below...

If the meaning of this value is what I think it is, can I suggest:

"The number of words of memory that this collection will try to reclaim"

> > 
> > + * probe - gc_collect_parallel_scavenge
> > + *
> > + * @name: gc_collect_parallel_scavenge
> > + * @address: address of object being collected
> > + * @cause: cause of the collection
> > + *
> > + * Description: This is a parallel collection, where the jvm
> > process don't
> > + * have to halt while the gc is being completed
> > 
> > I want to make sure I understand this.  If I read this correctly,
> > this
> > probe fires many times between a begin and end of parallel gc run,
> > for
> > each object that is scavenged?  If I do understand this, then
> > saying
> > "this is a parallel collection" I think is the wrong wording.  This
> > is not a collection, but an event that is part of a collection run.
> >  If
> > I misunderstand, please clarify :)
> No, a scavenge is just a miniature collection, not a process within a
> collection.  The mechanics are the same, just not on the entire
> object
> space/heap.  That being said I've s/This is a parallel
> collection/This
> is a parallel scavenge/ to be specific.

I think this is another place where the use of 'object' causes confusion.
Maybe in this case you mean 'region' or 'generation'?

Another question I have about this; what relationship does this probe
have to the corresponding _begin and _end probes?  Does this probe
fire near the beginning or near the end or before the beginning or
after the end?  Why do we need this probe, if we already can say
when this process begins and ends?

Anyways, this is mainly me being picky about documentation.  On the
technical level, this is fine, so while I think it would be nice to get
the docs perfect on the first try I don't demand it.  If my understanding
above about semantics of these variables is correct and you'd like to
accept my suggestions, please do commit this with those tiny updates.  If
my understanding is incorrect, let's not hold this commit back; just put
it in the way it is, but then I would like to better understand what the
values those particular probes are telling me (both for if I try to use
them, and for documenting them for other users!).

That's all I have to say about the patch :)

> > 
> > These numbers do seem to raise some questions, though.  Why do we
> > see such variation between the runs?  I'd encourage you to explore
> > this further.  Another aspect to consider: looking at run time of
> The variations can most likely be attributed to the fact I was
> running this on a vm on my laptop and had other (non trivial)
> processes running on the host.

Ah, so these numbers really don't tell us anything at all then!

Maybe we can collaborate on some better methodology...

> > benchmark will show the impact aggregated over entire java process,
> > but it would be interesting to see the impact relative to time
> > actually spent in gc.  I'd be concerned if simply printing this
> > probestr at beginning and end of gc made the gc run last 10%
> > longer,
> > even if it looks relatively minor aggregated over full java process
> > run time.  I'm not sure how you'd go about trying to measure this,
> > however.
> I can look into running scripts that would simply increment a
> variable
> (ie gc_hit++) every time a probe is hit instead of logging the
> probestr,
> also I could run the same tests and start a timer when
> hotspot.gc_begin
> is hit and ends when a hotspot.gc_end is hit. (those probes are
> already
> there).

That sounds like a good place to start.

> > 
> > Have you given any thought to how you might add tests for these
> > probes to the existing set of probe tests?
> Ideally I'd have some java test programs that would be able to
> allocate
> enough objects to trigger collections.  As mentioned earlier,
> different
> gc algorithms can be specified with jvm options, but what would be
> tricky is controlling the gc invocation via jvm options in a reliable
> enough manner that we can reasonably expect scavenges to occur.  I
> would
> have to experiment with that to see if its possible, if anybody else
> knows how to easily do that please let me know.
> 

So, this is in some ways analogous to what I struggled with in testing
some of the existing probes.  For example, the probes that fire when
a method is JIT compiled.  This happens (depending on tuning options
of course) after a certain number invocations of a particular method,
so in order to test that the probe fires when expected, I wrote a
java program that did call a certain method the correct number of times,
and checked that the output indicated the probe firing showing that
particular method being JIT compiled.

For your problem, instead of controlling the number of times a method
is called, you need to cause certain amounts of space to be allocated
for Java objects.  You probably to start from some baseline (an empty
main()), learn about the tuning options that affect the GC behaviour,
and so forth.  I think this is rather non-trivial, another reason why
not to block the patch for testing.  And then, as Mark pointed out in
his reply, there is also the matter of integrating with the existing
tests.  My apologies in advance about the wrapper script, and please
consider me available to help when you get to that point.

> Is this ok to commit?
> 

With caveats noted above, I say yes :)

cheers,
jon