request for review (M): 6883834: ParNew: assert(!_g->to()->is_in_reserved(obj), "Scanning field twice?") with LargeObjects tests

Sat May 7 09:12:43 UTC 2011

Hi again ...

Testing revealed another long-standing and nicely hidden bug.
(The bug revealed itself when I was testing with smallish
values of ParGCCardsPerStrideChunk, which by the way
also reproduces the original crash very easily without
the fix that is here under review -- after all it's the relative
sizes of the non-array objects and the chunks that is the
cause of the original problem... while on-array objects
exceeding 128K in size may be rare, those exceeding 512 bytes
are clearly not!)
The flag ParallelGCRetainPLAB controls whether
so-called "promotion labs" in the old gen are retained
and reused across scavenges. This is useful when the PLAB
resizing isn't optimal or if there is a lot of volatility
in the amount that a GC thread allocates because of which its
promotion rate may fluctuate rapidly from one scavenge to
another, wasting space to internal fragmentation within
the promotion LAB allocated to it. However, a little thought
reveals that if PLAB's are retained across scavenges, then
GC worker threads may be promoting into cards that may be
subject to scanning by worker threads and this can cause,
e.g., BOT walks to go awry. The expedient solution
is to switch off the PLAB retention feature so that there
is no promotion activity happening into the "used_region_at_save_marks()"
portion of the old gen when using ParNewGC. (CMS is not
affected because the old gen, being a non-contig freelist space,
is already prepared, at some cost, for such promotion activity into
cards subject to scanning at the same time.)

I also declared certain local variables const in process_chunk_boundaries()
and made a few other minor changes related to assertions checking,
documentation comments, and so forth.

However, the only real semantic change since the last webrev
is simply in the flipping of the default value of ParallelGCRetainPLAB
and making it a diagnostic.

Please point your browser to:

   http://cr.openjdk.java.net/~ysr/6883834/webrev.3

I'll do some performance measurements to determine the
extent of performance loss, if any, from disabling this
feature and thus potentially losing space in the old gen
to internal fragmentation in these promotion LABs.
(The frequency of old gen collections would likely increase
by the extent to which space is being wasted in these
PLABs.)

thanks.
-- ramki

On 5/6/2011 2:59 PM, Y. Srinivas Ramakrishna wrote:
> In case you haven't started looking yet, an updated version is in:-
>
> http://cr.openjdk.java.net/~ysr/6883834/webrev.1
>
> with changes based on the review(s) so far.
> If you've already started reviewing, it's fine to
> stick to the old one if you want to.
>
> thanks!
> -- ramki
>
> On 5/5/2011 1:23 AM, Y. Srinivas Ramakrishna wrote:
>>
>> 6883834: ParNew: assert(!_g->to()->is_in_reserved(obj),"Scanning field twice?") with LargeObjects
>> tests
>>
>> Investigation of this crash led to several avenues
>> towards fixing the problem, including a
>> more scalable and performant rewrite of the parallelization
>> of card scanning which would have fixed the bug as well,
>> and which was indeed the direction in which I had been
>> initially headed. However, I realized last week that,
>> given the upcoming hs21 code freeze, and the subtleties
>> in the current code, that might not be the most sensible
>> thing to do at this time. At that point, I decided to fix
>> only the correctness problem(s) indicated by the assertion
>> that we hit and not address in any manner the relative
>> lack of scalability in the current implementation in
>> the presence of large non-array objects. Somewhat
>> offsetting the lack of that scalability is, however,
>> the fact that the situation in which scalability suffers,
>> namely very large sparsely dirty non-array objects are the exception
>> in most programs (not so, very large sparsely dirty
>> array objects which are by far the more common).
>>
>> The problem in the current code turned out to be
>> two-fold:
>> (1) it may scan too little and miss some pointers:
>> when a large non-array object straddles entirely over a
>> complete "chunk" (the parallel unit of work) of the card table,
>> its head is dirty (so we are forced to assume an imprecise
>> card mark and scan it in its entirety), then
>> we may miss any part of it that extends past the end
>> of its second chunk. Currently chunks are fixed at
>> 256 cards == 128K, so any object larger than that
>> size is potentially vulnerable.
>> (2) it may scan too much and thus scan some pointers twice:
>> when a large object straddles onto a subsequent chunk
>> its head is dirty, and the next card is not dirty,
>> the intention of the code was that
>> the thread scanning the chunk in which the head of the
>> object lay would scan the object in its entirety
>> and the thread scanning the next chunk would not.
>> However, there was a flaw in the way card-dirtiness
>> was being tested for an object straddling into a
>> chunk from the left and how it was being tested for
>> an object straddling past the end (or right).
>> This could, under the right conditions, result
>> in a suffix of the object in the
>> second of the two chunks above being scanned twice.
>>
>> While it was the second scenario that caused the
>> assertion violation, the first could lead to more
>> insidious problems with stale pointers left in old
>> generation or permanent generation objects.
>>
>> The fix for (1) was to not terminate the search for the
>> last card to scan to the right not at the end of the
>> next chunk, but go right to the end of the object in
>> question.
>>
>> The fix for (2) was to make sure the correct predicate
>> was being tested for the card-values in order to ensure
>> that the thread for the left chunk did not scan beyond
>> the end of a specific point, as exemplified in the
>> "LNC" value.
>>
>> Along the way, I added a few assertion checks, some verbose logging
>> (disabled in all builds but which can be enabled via a source
>> level change: I'd like to leave this in for future debuggability
>> and code-comprehensibility), elaborated and corrected the documentation
>> comments and changed the order of filling the "LNC array"
>> (which is useful to and consulted by leftward neighbouring workers)
>> and determining the rightward extent of the chunk scan
>> past the end of the chunk. I also tightened the set of
>> conditions under which we extend our scan to the right,
>> which can save some unnecessary card value examination
>> in some cases.
>>
>> http://cr.openjdk.java.net/~ysr/6883834/webrev.0/
>>
>> Note that this is a nearly minimal fix to the correctness
>> problem in parallel card-scanning with ParNew. A more
>> extensive rewrite of the code with an eye to simplifying
>> some of the convolutedness of its current form and of
>> improving its scalability when there are very large
>> objects in the heap (or, as in this case, large constant
>> pools in the perm gen), will be the subject of an RFE
>> post-hs21.
>>
>> PS: I also pushed some of the closure construction
>> closer to its use. This will also make it easier to
>> do implement the performance RFE to follow in hs22.
>> In the same vein, I changed some defined constants
>> related to chunk size and parallel chunk strides
>> into global diagnostic flags. I did leave the current
>> values unchanged, although it is quite likely that these
>> values could be tuned some, an activity that will be
>> part of the aforementioned performance RFE.
>>
>> Perfromance:
>> A quick performance run indicated no change in performance,
>> as one might expect given the rarity of very large
>> non-array objects in most applications. More thorough performance
>> testing (with reference server refworkload) is in progress.
>>
>> Correctness Testing:
>> The jck60001 test that tripped the assert; refworkload; JPRT;
>> all three with and without heap verification on a variety
>> of different platforms and with the chunk size varied for
>> good measure.
>>
>> thanks for your reviews!
>> -- ramki
>>
>>
>