RFR: 8069367: assert(_nextMarkBitMap->isMarked((HeapWord*) obj)) failed

Mon Mar 30 13:44:37 UTC 2015

On 2015-03-30 14:42, Kim Barrett wrote:
> On Mar 30, 2015, at 5:15 AM, Bengt Rutisson <bengt.rutisson at oracle.com> wrote:
>> On 2015-03-30 03:28, Kim Barrett wrote:
>>> The cost of the pre-filtering of the mark stack is small, and some of
>>> it may be recovered by avoiding the cost of moving the filtered
>>> objects through the mark stack, and by doing less work on the filtered
>>> objects than would otherwise be needed. Aurora GC performance tests
>>> show mostly insignificant differences, with a few small improvements
>>> compared to the baseline that has bug 8069367.
>> I like this approach. I think it makes sense even without the
>> humongous reclamation issues to handle type arrays specially.
> I've been waffling over this. I would guess it would be a rare
> application that had a sufficiently low typeArray / object ratio that
> this would negatively impact performance (by a small amount?), and
> common for it to be a net benefit.  But I don't have any data to back
> up that guess.  And the effect is dynamic, since it doesn't apply in
> the non-push case - I don't think we should add such a check in
> scan_object.

Right. Agreed.

>
>>> [While I was at it, I simplified the conditionalization of
>>> deal_with_references for _CHECK_BOTH_FINGERS_, eliminating some code
>>> duplication.]
>> Personally I would prefer to do a small cleanup fix before your change
>> to simply remove _CHECK_BOTH_FINGERS_. It has been in the code since
>> 2011 (JDK-7046558: G1: concurrent marking optimizations) and as I
>> understand it it was mostly added because there was not time to do
>> enough performance testing at the time. I doubt that anyone will pick
>> up the performance work here, so I'd rather remove the code.
> I would be fine with killing off _CHECK_BOTH_FINGERS_.  And looking at
> that code again, it seems like it would be worth trying to eliminate
> one or both of the _finger or _curr_region NULL checks.
>
> I'll take a look at that.

Great! Thanks!

>
>> A couple of questions about this code in concurrentMark.inline.hpp:
>>
>> 300           if (_CHECK_BOTH_FINGERS_ && _finger != NULL && objAddr < _finger) {
>> 301             if (obj->is_typeArray()) {
>> 302               // Immediately process arrays of binary data, rather
>> 303               // than pushing on the mark stack.  This keeps us from
>> 304               // adding humongous objects to the mark stack that might
>> 305               // be reclaimed before the entry is processed - see
>> 306               // G1EagerReclaimHumongousPreSnapshotTypeArrays. The
>> 307               // cost of the additional type test is mitigated by
>> 308               // avoiding a trip through the mark stack, and by only
>> 309               // doing bookkeeping update and avoiding the actual scan
>> 310               // of the object - a typeArray contains no references,
>> 311               // and the metadata is built-in.
>> 312               process_grey_object<false>(obj);
>> 313             } else {
>> 314               if (_cm->verbose_high()) {
>> 315                 gclog_or_tty->print_cr(
>> 316                   "[%u] below the local finger (" PTR_FORMAT "), pushing " PTR_FORMAT,
>> 317                   _worker_id, p2i(_finger), p2i(objAddr));
>> 318               }
>> 319               push(obj);
>> 320             }
>>
>> It looks to me like the logging (lines 314-318) is relevant for type
>> arrays too. Can we move it up before line 301? Possibly changing the
>> wording from "pushing" to something more generic - maybe simply "obj".
> I limited the logging to the "push" branch intentionally.
> process_grey_objects similarly logs, and to me it didn't seem useful
> to have both in the non-push case and to lose the "pushing"
> distinction in that case. I would be more inclined to add an explicit
> immediate processing vs pushing distinction in the log output than to
> merge away that distinction.  (That distinction is presently implicit
> from deal_with_reference logging each object that needs marking,
> followed by the process_grey_objects message, the pushing message, or
> silence because par_mark lost the race with some other task.)

Ok. Thanks for explaining this. I'm fine with the logging as you 
suggested it.

>
>> I think the comment (lines 302-311) is good but it takes up a lot of
>> space inside the deal_with_reference() method. How about breaking
>> lines 301-320 out into its own method? (Keeping the logging that I
>> mentioned in the previous paragraph inside deal_with_reference().)
> I thought about that, but put it aside when I failed to come up with a
> decent name for that helper function, and didn't got back to it.  I'll
> think about it some more.

Good. Thanks!

>
>> I'm not sure I understand the changes in the g1CollectedHeap.*
>> files. In particular the changes to
>> RegisterHumongousWithInCSetFastTestClosure:
>>
>> http://cr.openjdk.java.net/~kbarrett/8069367/webrev.03/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp.frames.html
>>
>> It looks like the G1EagerReclaimHumongousPreSnapshotTypeArrays flag
>> controls if you want to use the previous version (don't reclaim
>> objects that may be part of the SATB snapshot) and the new version
>> (don't push type arrays on the mark stack). Is this debugging code
>> that was left or is your intention that we should have both versions
>> in the code?
> We've encountered a number of bugs related to the eager reclaim of
> humongous objects feature, some of which might end up requiring the
> previous behavior.  (When I say "might", I mean I have little
> confidence either way right now.) And there might be more we haven't
> run into yet. This new flag is intended as a partial mitigation
> strategy for those, being weaker than simply turning off one or both
> of the other related flags.
>
> It also provides a searchable name to for the coupling between the
> this code and the is_typeArray check in deal_with_reference.  But
> maybe the comment in the latter is sufficient now?

I'm not too happy about adding code just in case we need it. I'm afraid 
that the code for the -XX:-G1EagerReclaimHumongousPreSnapshotTypeArrays 
case will be tested very rarely and when we suddenly need it it will 
contain some subtle bugs and just cause more problems for us.

Bengt

>