sample_eden()

Mon Aug 12 16:31:20 UTC 2013

On 8/9/13 1:48 PM, Hiroshi Yamauchi wrote:
> Jon,
>
> Sorry for a late response.
>
> Your explanation of what's happening sounds plausible to me.
>
> I took a look at
>
>    http://cr.openjdk.java.net/~jmasa/8021809
>
> which I assume is related to this issue. Here's a thought: perhaps
> since work_on_young_gen_roots() also reads an (incorrect)
> _eden_chunk_index, it should also have a similar check to the one in
> initialize_sequential_subtasks_for_young_gen_rescan()?

work_on_young_gen_roots() calls do_young_space_rescan() and
do_young_space_rescan() checks _n_tasks before doing any work.
I think that's why the fix works.  I could guard the do_youg_space_rescan()
to eden

if (eden_space->is_empty()) {
   do_young_space_rescan(worker_id, cl, eden_space, eca, ect);
}

Is that what you mean?

Did you think both are needed?  I did my fix so that the
SequentialSubTasksDone would not be intiailzied with
bad values for eden.  It seems hazardous to have the

SequentialSubTasksDone _par_seq_tasks;

in eden initialized with bad values, even
if we avoided using it later.

>
> Here's another thought: how about resetting _eden_chunk_index to zero
> at the end of Def/ParNewGeneration::collect()? This way, it'd be the
> young generation that initiates both updating (via
> sample_eden_chunk()) and resetting of _eden_chunk_index. This might
> make sense.

_eden_chunk_index belongs to CMSCollector and is cleared
in the call to  CMSCollector::epilogue() which is called
from ConcurrentMarkSweepGeneration::gc_epilogue().  CMSCollector
and ConcurrentMarkSweepGeneration are so tightly
coupled that that seems OK.   Clearing
_eden_chunk_index from Def/ParNewGeneration::collect() seemed
like it was letting too much of the implementation of
ConcurrentMarkSweepGeneration escape into Def/ParNewGeneration.

Jon

>
> Hiroshi
>
> On Mon, Jul 29, 2013 at 4:06 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>> On 7/29/13 2:29 PM, Srinivas Ramakrishna wrote:
>>> ah, makes sense! :-)
>>>
>>> So, if the scavenge failed, leaving stuff in Eden and the survivor
>>> spaces, the chunks should still be valid if a CMS collection could
>>> happen.
>> I expect that would be the case although I'd have to at the promotion
>> failure handling code again.
>>
>>> (I wonder after asking that question, though, how CMS would deal with
>>> such a situation -- two active survivor spaces, i think it deals OK
>>> with it, but not sure if both scans today parallelized or not, or if
>>> the question is moot because the failed scavenge causes a bail out to
>>> a stop-world gc... perhaps the latter?) Probably an academic question,
>>> but still... :-)
>>
>> Worth looking into.
>>
>> Jon
>>
>>
>>> - ramki
>>>
>>> On Mon, Jul 29, 2013 at 1:46 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>>> wrote:
>>>> Ramki,
>>>>
>>>> This has gotten interesting.  When
>>>>
>>>> 1) a System.gc() is called
>>>>
>>>> and
>>>>
>>>> 2) UseCMSCompactAtFullCollection is set to false
>>>>
>>>> the CMS generation tells GenCollectedHeap that CMS
>>>> does not collect the young gen.  That's right but I hadn't
>>>> appreciated that in that circumstance a young gen GC
>>>> was done before the CMS gen is collected.  Makes sense
>>>> but the GC epilogue code is called after all the generations
>>>> have been collected by GenCollectedHeap so the
>>>> _eden_chunk_index is not reset between the young GC
>>>> and the CMS GC.
>>>>
>>>> I think the right fix is to use the occupancy of eden to
>>>> decide what work we need to do instead of _eden_chunk_index.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 7/26/13 6:41 PM, Srinivas Ramakrishna wrote:
>>>>> Don't have the code in front of me to check, but yes that would seem to
>>>>> be
>>>>> the thing to do. I thought it was reset in the young gen gc epilogue ...
>>>>>
>>>>> ysr1729
>>>>>
>>>>> On Jul 26, 2013, at 14:46, Jon Masamitsu <jon.masamitsu at oracle.com>
>>>>> wrote:
>>>>>
>>>>>> Hiroshi,
>>>>>>
>>>>>> I'm looking at an assertion failure with CMSParallelInitialMarkEnabled
>>>>>> and CMSEdenChunksRecordAlways both enabled.  The assertion
>>>>>> failure is in do_young_space_rescan()
>>>>>>
>>>>>>     5506      assert(mr.is_empty() || space->used_region().contains(mr),
>>>>>>     5507             "Should be in space");
>>>>>>
>>>>>> and the failure occurs because _eden_chunk_index is > 0 and
>>>>>> eden is empty.
>>>>>>
>>>>>> A young GC has just occurred and a System.gc() is in progress where the
>>>>>> System.gc() is executing the the usual phases of CMS in a
>>>>>> stop-the-world
>>>>>> fashion.  A rarely seen scenario I think.  That is, the initial mark is
>>>>>> being
>>>>>> executed.
>>>>>>
>>>>>> I was looking at the places where _eden_chunk_index is reinitialized to
>>>>>> 0.  I don't think you added any in you changes, right?
>>>>>>
>>>>>> I was thinking that _eden_chunk_index should be reset to 0 after
>>>>>> a young GC where we know that eden is empty.  What do you think?
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>>