RFR(s): 8055239: assert(_thread == Thread::current()->osthread()) failed: The PromotionFailedInfo should be thread local.
Kim Barrett
kim.barrett at oracle.com
Mon Nov 24 18:06:06 UTC 2014
On Nov 24, 2014, at 4:10 AM, Bengt Rutisson <bengt.rutisson at oracle.com> wrote:
> The promotion failed events are not sent at the time or by the thread
> where the promotion failure happens. Instead the information about the
> promotion failure is collected into a PromotionFailedInfo object for
> each thread (stored in ParScanThreadState). The main GC thread then
> iterates over all thread states and sends the actual events at the end
> of a GC. See ParScanThreadStateSet::trace_promotion_failed().
Yes. My working hypothesis is that
(1) Reporting is being deferred because it is deemed too expensive to
report the individual occurrences as they happen. [Promotion failures
are presumably supposed to be relatively rare, but once they happen at
all in a given collection, it seems like there's a pretty good chance
of (possibly many) more. That would be especially true with the fix
being discussed for JDK-8061259.]
(2) The PromotionFailedInfo objects are per-thread to avoid update
races without requiring atomic counters or locks.
> This means that the thread that sends the events is the main GC
> thread, which is not the same as the thread that experienced the
> promotion failure (which is one or several of the GC worker threads).
>
> So, removing the _thread instance variable can not be done without
> affecting how JFR works. I think it may be useful information to have
> the thread information available, but one alternative would be to just
> not report a thread with the promotion failure.
Recall that it's not "the" promotion failure that is being reported.
It is a per-thread summary of promotion failures. There may be many
promotion failures being summarized.
I'm having a very hard time coming up with a way to make use of the
association of such summary promotion (or evacuation) failure
information with OS-level threads. The relevant context is long gone,
and the actual information being captured is pretty limited. I'd like
to see a real use-case.
All that said, a more complete and cleaned up fix for the
ParScanThreadState[Set] reuse problem should also suffice, e.g. either
don't reuse or capture / report data and reinitialize before reuse.
More information about the hotspot-gc-dev
mailing list