RFR(s): 8055239: assert(_thread == Thread::current()->osthread()) failed: The PromotionFailedInfo should be thread local.

Mon Nov 24 18:06:06 UTC 2014

On Nov 24, 2014, at 4:10 AM, Bengt Rutisson <bengt.rutisson at oracle.com> wrote:
> The promotion failed events are not sent at the time or by the thread
> where the promotion failure happens. Instead the information about the
> promotion failure is collected into a PromotionFailedInfo object for
> each thread (stored in ParScanThreadState). The main GC thread then
> iterates over all thread states and sends the actual events at the end
> of a GC. See ParScanThreadStateSet::trace_promotion_failed().

Yes. My working hypothesis is that

(1) Reporting is being deferred because it is deemed too expensive to
report the individual occurrences as they happen.  [Promotion failures
are presumably supposed to be relatively rare, but once they happen at
all in a given collection, it seems like there's a pretty good chance
of (possibly many) more.  That would be especially true with the fix
being discussed for JDK-8061259.]

(2) The PromotionFailedInfo objects are per-thread to avoid update
races without requiring atomic counters or locks.

> This means that the thread that sends the events is the main GC
> thread, which is not the same as the thread that experienced the
> promotion failure (which is one or several of the GC worker threads).
> 
> So, removing the _thread instance variable can not be done without
> affecting how JFR works. I think it may be useful information to have
> the thread information available, but one alternative would be to just
> not report a thread with the promotion failure.

Recall that it's not "the" promotion failure that is being reported.
It is a per-thread summary of promotion failures. There may be many
promotion failures being summarized.

I'm having a very hard time coming up with a way to make use of the
association of such summary promotion (or evacuation) failure
information with OS-level threads. The relevant context is long gone,
and the actual information being captured is pretty limited. I'd like
to see a real use-case.

All that said, a more complete and cleaned up fix for the
ParScanThreadState[Set] reuse problem should also suffice, e.g. either
don't reuse or capture / report data and reinitialize before reuse.