review request (M) - 6725714 add a table to speed up bitmap searches
John Coomes
John.Coomes at oracle.com
Thu May 30 01:06:43 UTC 2013
Thomas Schatzl (thomas.schatzl at oracle.com) wrote:
> Hi,
>
> On Thu, 2013-05-23 at 08:03 -0700, John Coomes wrote:
> > Please review the following change which adds a new table (the
> > 'BlockTable') to the parallel compacting collector:
> >
> > http://cr.openjdk.java.net/~jcoomes/6725714-blocktable/
> >
> > I've run a number of test suites (vm.regression, vm.gc, vm.runtime,
> > nsk.stress, nsk.monitoring, vm.compiler) in 32- and 64-bit on solaris
> > and/or linux.
>
> Some notes/questions; please understand that I do not know the parallel
> old algorithm that well, so there might be some misunderstandings on my
> side. The comments are a little out-of-order too:
Thanks for looking it over.
> - what is the motivation to increase
> ParallelScavengeHeap::intra_heap_alignment() in this change? This
> particular change does not seem to be related to this optimization.
It's necessary so that each space (eden, from, to, etc.) is a multiple
of the parallel old Region size, which has increased. It would be
much too complicated to deal with spaces that spanned only part of a
Region.
> - why is ParallelCompactData::RegionData::_blocks_filled a size_t? It
> is only ever used as a bool.
I had hoped to combine it with other fields, and whether it's a size_t
or bool, a Region ends up the same size because of padding. Anyway, to
combine it with another field would require detailed performance testing
which I can't do now, so I've changed it to bool.
> - ParallelCompactData::ParallelCompactData::_region_end is not
> initialized in the constructor. I guess it's benign, but for the same
> reason the code could skip initializing _region_start and others (which
> are set in the initialize() method.
>
> I'd appreciate the use of an initializer list here. :)
FWIW, _region_end is a debug-only member that's not used elsewhere in
the code. I believe it was added only as a convenience when in the
debugger. Initializing or deleting _region_end is better done in a
separate cleanup.
> - in ParallelCompactData::calc_new_pointer() there is this comment to
> "Run some performance tests to determine if this special case pays off".
> Does that mean that you *ran* these tests, or that you still have to run
> them, or is this comment has been intended to be some todo for you (that
> should be removed)?
It's a suggestion for future work. I think it (and the following
comment related to performance) are worth keeping.
> (There is a superfluous white space between the two sentences btw)
Sigh. Two spaces between sentences is common usage.
> - PSParallelCompact::fill_blocks() is only every called with the first
> paramter set to NULL (and only once). Could the first parameter be
> removed?
Sure, done.
> - the first comment in PSParallelCompact::fill_blocks() seems to be
> ambiguous:
>
> // Fill in the block table elements for a region. Each element holds
> // the number of live words in the region that are to the left of the
> ^^^^
> I think it is better to say "area"/"heap", i.e. something different than
> "region" here (if I understand the code correctly). "Region" is already
> defined in this context as the work unit for parallel gc. If the comment
> really meant region here, then why does the algorithm need multiple
> blocks per Region? (if they only ever contain the number of live words
> of regions to the left of this region?)
region is correct here; "area" or "heap" would be incorrect. Each block
table element holds the number of live words in the region (the region
that contains the block) that are to the left of the first obj that
starts in the block.
> // first object that starts in the block. Thus only blocks in which an
> // object starts need to be filled.
>
> I also think that the conclusion in the second sentence is not supported
> by the first sentence. I think the reason for only requiring to fill
> blocks where an object starts is that the code never asks for the number
> of live bytes of other blocks. I.e. the code in calc_new_pointer() only
> ever retrieves this value for blocks that have live objects in them.
See above.
> (the two sentences are also separated by two spaces too :)
>
> - maybe it is a good idea to add a comment in calc_new_pointer() that
> it is possible that multiple threads update the blocks of a region at
> the same time. ...
Sure, added a comment.
> ... Except that it is possible that multiple threads may
> update the block table at the same time it does not seem to harm (except
> for wasted cpu cycles; I guess in your tests you found that this is not
> an issue?).
Yes, exactly. Testing showed that only a few regions were updated
multiple times, and the counts were small for those regions. Of
course, I haven't tested that many applications.
> I was a bit thrown off by the use of the
> Atomic::inc_ptr(...->_blocks_filled_cnt) at first...
I had hoped the DEBUG_ONLY would provide a good clue. I've moved that
increment into set_blocks_filled() and added a comment.
> - ParallelCompactData::region_calc_new_pointer(HeapWord* addr) is
> within a complete ASSERT block, so the #ifdef ASSERT (or DEBUG_ONLY) may
> not be required.
>
> - I think the ParallelCompactData::calc_new_pointer(HeapWord* addr)
> method could be moved into the private section of the class.
It's called from MoveAndUpdateClosure, so can't be private.
> - Jon already mentioned this: region_calc_new_pointer() is unused. If it
> can be used for some verification, maybe rename it to
> calc_new_pointer_slow() or something like that to indicate that this is
> for verification purposes?
It's very (very) slow, so I think it's only useful in the debugger or
where it will be called infrequently. I don't see a simple way to call
it infrequently, so I've removed it. The code is in the history as the
old version of calc_new_pointer(), so can be resurrected.
> - would it be possible to rename Log2BitsPerBlock which indicates that
> it is a value that is not only relative to the block size, but also
> relative to the object granularity/alignment? The NOT_LP64(-1) in
> PSParallelCompact::fill_blocks(), line 3258 looks quite strange to me.
>
> Maybe something like Log2ObjStartsPerBlock?
>
> (Actually, did you test the code with different heap object alignments?
> I expect it to fail with e.g. -XX:MinObjAlignment=16, i.e. something
> different than the default ones)
Yes, the NOT_LP64(-1) is strange and can be wrong. I changed it to
Log2BlockSize - LogMinObjAlignment.
The name, however, seems reasonable (to me :-)). It's (the log of) the
number of bits in the marking bitmap required to cover a Block. Maybe
Log2MarkBitsPerBlock?
> This would probably make a good jtreg test, fill the heap, and issue a
> few full gcs with different settings that may affect the code.
>
> - this is just style: but I'd make fill_blocks() to set the blocks
> filled, i.e. call region_ptr->set_blocks_filled() within the method.
> I.e. the fill_blocks method does not set the block-is-filled flag
> itself. Such things tend to be forgotten, especially if it is not
> written down anywhere too.
> (Not sure anymore though, maybe some comment in the declaration of
> fill_blocks() in the hpp file?)
I prefer that fill_blocks() only touch the block table -- that method
currently does just one thing. The caller is already checking whether
the blocks need filling, so it's appropriate that the caller update
the filled status.
> - the opening brace in the implementation of fill_blocks() is on the
> next line, not in the same as the declaration; but this file contains a
> mixed style in that respect, so not sure if interesting.
>
> - ParallelCompactData::clear() seems unused.
I think you're right. Removing it is best done in a separate cleanup;
for now I'm going to keep the change so it's consistent.
> - in PSParallelCompact::invoke_no_policy, in the new code within the
> TraceParallelOldGCCompactionPhase block tty is used directly. Is it
> possible to change this to gclog_or_tty to properly redirect its output
> to the gclog if possible? Gclog_or_tty seems to be already initialized
> because it's already used a few lines above.
I moved the code to its own method and now avoid tty.
> - this block of code also only seems to make sense if the
> _blocks_filled_cnt variables of the regions are filled in. That code is
> DEBUG_ONLY, but this code is under ASSERT. Could these #if-s be matched?
> (I do know that ASSERT is always defined in debug mode, but still)
There's no ASSERT_ONLY, and #ifdef DEBUG should not be used (a legacy
from the distant past).
A new webrev with the updates is at the same URL:
http://cr.openjdk.java.net/~jcoomes/6725714-blocktable/
It includes a link to the original webrev as well as a link to just the
differences between the two.
-John
More information about the hotspot-gc-dev
mailing list