RFR(s): 8055239: assert(_thread == Thread::current()->osthread()) failed: The PromotionFailedInfo should be thread local.

Mon Nov 24 07:32:35 UTC 2014

Hi,

I think the thread info can be useful in some cases, for instance to see if the 
same thread gets multiple promotion failures or if it is separate threads. If it 
is decided to remove this field please also consider how to treat it in G1 where 
it was supposed to be added. https://bugs.openjdk.java.net/browse/JDK-8012217

Also, consult Bengt before removing the field. I remember that he had opinions 
around the _threads field when I implemented this.
/Jesper

Sangheon Kim skrev 24/11/14 06:47:
> Hi Kim,
>
> Thank you for reviewing this!
> And this detailed answer.
>
> On 11/22/2014 12:45 PM, Kim Barrett wrote:
>> On Nov 21, 2014, at 12:35 AM, Sangheon Kim <sangheon.kim at oracle.com> wrote:
>>> Please review this change to reset PromotionFailedInfo before use:
>>> https://bugs.openjdk.java.net/browse/JDK-8055239
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~sangheki/8055239/webrev.00/
>>>
>>> Summary:
>>> This issue happens only with ParallelRefProcEnabled.
>>> And this assert happens as the current thread (when second promotion failure 
>>> happened) is different from saved thread (when first promotion failure 
>>> happened).
>>>  From ParNewGeneration::collect(), there are two mt processing, for live 
>>> object processing (by ParNewGenTask) and reference processing (by 
>>> ParNewRefEnqueueTaskProxy). When first promotion failure happened from 
>>> ParNewGenTask, thread pointer will be stored by 
>>> PromotionFailedInfo::register_copy_failure(). And at second failure, assert 
>>> happens as thread pointer is not guaranteed to be same.
>>> As the thread counts increase, its frequency become higher.
>>>
>>> Fix:
>>> Added ParScanThreadStateSet::trace_promotion_failed() to trace and reset 
>>> promotion failure information instance.
>>>
>>> Test:
>>> - jprt
>>>
>>> Thanks,
>>> Sangheon
>> This proposed change doesn't address what I think is the root cause of
>> the problem, which appears to me to be the semantically questionable
>> use of a single ParScanThreadStateSet for what are conceptually two
>> distinct sets of threads - the original generation scanning threads
>> and the reference processing threads.  (The same underlying OS threads
>> are being used for both, but they are being run with completely
>> different "tasks", and the ThreadState info is really per-task)
> I think my proposal is same way as your second opinion.
>>
>> Besides the unexpected reassociation of PromotionFailedInfo objects
>> (from owning ParScanThreadState objects) with new OS threads, it looks
>> like there are probably other things that are "messed up" by this
>> reuse, e.g. timing information and TASKQUEUE_STATS.
> Thanks for catching up about TASKQUEUE_STATS.
> Reset is really needed.
>>
>> Creating a second state set for use by reference processing would
>> require changing the signature of handle_promotion_failed to deal with
>> two sets, but that function is private and only called in one place,
>> so that shouldn't be hard.
> I didn't think about creating second ThreadState as original implementation is 
> already considered about reusing it.
> ParScanThreadStateSet::reset() is called before second use.
>>
>> The real cost would be in the runtime construction of a second state
>> set.  I'm not sure whether that would be important or not.  It is also
>> only needed when the ParallelRefProcEnabled option is true, since
>> that's the case where the re-use occurs, which adds a bit of
>> complexity.
> Agree if second ThreadState is needed.
>>
>> One option might be to capture the information to be reported in a new
>> object and reinitialize the ThreadState objects, capture additional
>> information after the reuse (if needed), and pass both sets of
>> captured information to handle_promotion_failed.
> Yes, this a little bit same as mine.
> The difference is that I wanted to report promotion failure before reuse not 
> together.
> And I think ThreadState is already reset just above line of my change. 
> (ParScanThreadStateSet::reset())
> So I just added to reset/trace PromotionFailedInfo.
>
> Can you suggest some 'additional information'? It would be nice if add more 
> information.
>
>>
>> An entirely different approach to addressing just the proximate
>> failure would be to eliminate the _thread member from
>> PromotionFailureInfo and deal with the consequences of that change.  I
>> can't see any utility in reporting the OS thread associated with an
>> internal worker task when reporting promotion failure statistics.  The
>> relevant context is long gone, after all.  And really, the ThreadState
>> objects exist to avoid race conditions amongst the multiple
>> threads. [I think what *should* be reported is a combined summary
>> rather than per-thread promotion failure statistics.]
>>
>> Removing the _thread member has the added benefit of being a code
>> removal and simplficiation change.  The downside is that it affects a
>> messaging API; I'm not sure what would be involved in dealing with
>> that.  The needed changes to the (open) jdk look straightforward:
>>
>> remove thread
>> - src/share/vm/gc_implementation/shared/copyFailedInfo.hpp
>>    Remove _thread member and associated code from PromotionFailedInfo.
>>
>> - hotspot/src/share/vm/gc_implementation/share/gcTraceSend.cpp
>>    Remove thread field from constructed message.
>>
>> - hotspot/src/share/vm/trace/trace.xml
>>    Remove thread field from PromotionFailed event description.
>>    This is the API change that needs to be appropriately handled.
> Oh. I didn't think about this approach since I do believe there are some 
> reason of saving thread id and deliver it by 
> YoungGCTracer::report_promotion_failed.
> This issue is occurred because of thread id but it doesn't mean that this 
> thread id information is not needed.
>
> Thanks,
> Sangheon
>