RFR: 8139424: SIGSEGV, Problematic frame: # V [libjvm.so+0xd0c0cc] void InstanceKlass::oop_oop_iterate_oop_maps_specialized<true, oopDesc*, MarkAndPushClosure>
stefan.johansson at oracle.com
Thu Nov 12 07:46:20 UTC 2015
On 2015-11-11 22:07, Tom Benson wrote:
> Hi Stefan,
> The fix looks good to me, except for a typo in "some entries that
> needs" in g1CollectHeap.cpp.
Thanks, will fix.
> About the test intermittently not crashing, is it worth looping to
> re-try some small number of times?
Could be, but I actually got back some testing later yesterday evening
where the JTREG test had been run on most supported platforms and
crashed on every single one of them. So the test seems to be more
deterministic than I believed.
Adding an outer loop might not help that much either since the
remembered set is already expanded for the humongous region. We could of
course work around this by adding another humongous object, but I fear
that it will only make the test harder to understand.
If you are ok with it, I prefer not to add an outer loop.
> On 11/11/2015 10:41 AM, Stefan Johansson wrote:
>> Please review this fix for:
>> The crash was caused by a faulty eager humongous reclaim. The reason
>> for reclaiming a live humongous object was an overlooked remembered
>> set entry when the object was treated as a candidate for humongous
>> reclamation. If the remembered set was expanded during the previous
>> GC, the code handling reclaim candidates would look at the old view
>> of the remembered set and due to that miss some entries. This was
>> caused by checking for eager reclaim candidates before calling
>> cleanupHRRS, which takes care of updating the remembered sets to be
>> ready for iteration.
>> The fix was to simply move the call to rem_set()->cleanupHRRS() to
>> before register_humongous_regions_with_cset(), which is were we check
>> for reclaim candidates.
>> I also added a test that provokes this and asserts when run with a
>> fastdebug build, the test is not 100% deterministic and requires to
>> be run with some special parameters set in the JTREG header. The test
>> will never fail intermittently, and could possibly pass even though
>> there are problems in the code. I still think it is worth adding, to
>> avoid doing the same error again in the future.
>> Failure was reproducable in 1-2 hours on the sparc-host where it
>> occurred, and with this fix a 24 hour run was fine. The JTREG test
>> also fails without the fix but passes when it is applied.
More information about the hotspot-gc-dev