RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Apr 10 08:37:41 UTC 2013
Hi,
On Tue, 2013-04-09 at 14:51 -0700, John Cuthbertson wrote:
> Hi Thomas,
>
> Thanks for looking over the changes. Replies inline....
>
> On 4/9/2013 9:07 AM, Thomas Schatzl wrote:
> > Hi,
> >
> > On Mon, 2013-04-08 at 11:14 -0700, John Cuthbertson wrote:
> >> Hi Everyone,
> >>
> >> I've synched up this webrev up with the latest changes in hotspot-gc and
> >> made one additional refactoring change that was suggested offline. The
> >> new webrev is at:
> >>
> >> http://cr.openjdk.java.net/~johnc/7176479/webrev.2/
> >>
> >> I'd like to use this webrev as the basis of some other refactoring work
> >> I'd like to do in RSet updating and this will make it easier.
> > Could you make the two new globals unsigned (e.g. "uintx")? I do not see
> > a reason to even indicate the user that they may have negative values. I
> > saw the bounds check in the initialization code.
>
> Sure. No problem.
>
> > Minor: G1CardCounts::add_card_count() returns an int, while it cannot be
> > < 0. The count table elements are of type jbyte too (but then again it's
> > all private).
>
> Again no problem. I've changed the element type to be jubyte, the return
> type of add_card_count() and parameter of is_hot() to uint.
Thanks.
> >>> The hot card cache can now exist independently of whether the counts
> >>> table exists. In this case refining a card once adds it to the hot
> >>> card cache, i.e. all cards are treated as 'hot'.
> > G1CardCounts::add_card_count() seems to return zero in this case, i.e.
> > not hot?
>
> Fair point. I was thinking of the case when the user specifies
> G1ConcRSHotCardLimit to be zero and so the refined cards will be hot. In
> the case where we fail to allocate or expand the counts table, it will
> be considered cold. I think that's OK.
It's okay, it's only that your description in the email to the mailing
list indicated otherwise to me.
The comment in the code reads "// If we failed to allocate the counts
table, return 0.". Maybe the comment could explicitly indicate that that
means the cards will always be cold in that case.
What does " 145 // If card_ptr is beyond the committed end of the
counts table, return 0." actually mean if card_ptr is beyond the end of
the count table? There was not enough memory?
In resize() the assert at line 132 indicates that this is actually a
failure, which seems somewhat inconsistent in the way this situation is
handled in debug/production mode.
> > In G1CardCounts::resize() the code checks whether _card_counts == 0
> > (note that _card_counts is a pointer, so maybe the code should compare
> > to NULL instead), and in this method the code uses
> > _committed_max_card_num. Maybe extract this into a predicate
> > has_count_table() and use it uniformly?
>
> Not sure what you mean in your first sentence. I'm checking against NULL:
My oversight. It's fine. Sorry.
> There's a slight issue with your suggestion. Most of the time we can use
> _committed_max_card_num > 0 to determine if we have a count table (i.e.
> we actually have committed the memory for the card table). The exception
> is the resize() routine and the destructor. The resize routine is called
> to initially commit the table (after committing the initial heap) and
> then to expand it (after the heap is expanded). In the first case
> _committed_max_card_num will be 0 and we can (and should) only check if
> the table has been reserved. The destructor doesn't need to care about
> whether the table was committed - only that it was reserved.
>
> So uniformity won't be possible. The best I think we can do is have a
> couple of predicate routines:
>
> // Returns true if the card counts table has been reserved.
> bool has_reserved_count_table() { return _card_counts != NULL; }
> // Returns true if the card counts table has been reserved and committed.
> bool has_count_table() { return has_reserved_count_table() &&
> _max_card_committed_num > 0; }
>
> and use has_reserved_count_table() in the resize() routine and
> destructor. In the others we use has_count_table(). Make sense?
Yes.
> > Any idea about performance differences or efficiency of the hot card
> > cache? (Maybe you mentioned that already).
>
> Unfortunately I don't have much. The systems team only functionally
> tested with heaps that were larger than 1T. They didn't do before and
> after tests with heaps less than 1T.
>
> I ran OpenDS just to verify that no threads were late to the party. I
> didn't see any gain or loss but there's such a large variability with
> OpenDS that it was impossible to tell with just a couple of runs. When
> the performance team ran SPECjbb2013 with a small (16Gb) heap, the
> change seemed to be performance neutral for both high bound IR and
> critical IR. This would only make a difference for workloads that mutate
> promoted data frequently and I don't know how much of that goes on in
> JBB2013.
>
> Thanks for the comments. Let me know if the proposal about the
> predicates is sufficient.
They're fine. Thanks.
One further nitpick: the general documentation for the new classes is
within the class. It looks as if that documentation is part of the first
member of the class.
E.g.
class XX {
// documentation for XX
}
instead of the seemingly more common
// documentation for XX
class XX {
}
Not sure what's the preference here.
Thomas
More information about the hotspot-gc-dev
mailing list