RFR: 8062036: ConcurrentMarkThread::slt may be invoked before ConcurrentMarkThread::makeSurrogateLockerThread causing intermittent crashes

Mon Nov 3 21:31:27 UTC 2014

Kim,

Change looks good.  I can sponsor.  Let me know when the reviews
are complete.

Some suggestions.

http://cr.openjdk.java.net/~kbarrett/8062036/webrev/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp.frames.html

>   216   if (SurrogateLockerThread* slt = ConcurrentMarkThread::slt()) {
>   217     slt->manipulatePLL(SurrogateLockerThread::acquirePLL);
>   218   } else {
>   219     SurrogateLockerThread::report_missing_slt();
>   220   }

I have a preference for

SurrogateLockerThread* slt = ConcurrentMarkThread::slt();
if (slt != NULL) {
    slt->manipulatePLL(SurrogateLockerThread::acquirePLL);
} else {
    SurrogateLockerThread::report_missing_slt();
}

Similarly with

http://cr.openjdk.java.net/~kbarrett/8062036/webrev/src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp.frames.html

>    45   if (SurrogateLockerThread* slt = ConcurrentMarkSweepThread::slt()) {
>    46     slt->manipulatePLL(SurrogateLockerThread::acquirePLL);
>    47   } else {
>    48     SurrogateLockerThread::report_missing_slt();
>    49   }

http://cr.openjdk.java.net/~kbarrett/8062036/webrev/src/share/vm/gc_implementation/shared/concurrentGCThread.hpp.frames.html

   96   // Terminate VM with error message that SLT needed but not yet created.

I think it would read better as

"SLT needed but not yet created." => "SLT is needed but has not yet been 
created."

Jon

On 11/01/2014 09:12 PM, Kim Barrett wrote:
> Please review this fix for a nightly test failure:
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8062036/webrev/
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8062036
>
> I'll also need a sponsor for this.
>
> The failing test is run with the -XX:+ScavengeALot option.  That leads
> to collections during VM initialization, and one of them might need
> the surrogate locking thread before that thread has been created.
>
> The proximate cause of the failure, use of -XX:+ScavengeALot (or
> -XX:+FullGCALot) leading to such problematic collections, is being
> addressed by suppressing "gc alot" until VM initialization is
> complete, conditionalizing it on Threads::is_vm_complete(), rather
> than the previously used is_init_completed().  The latter function was
> never really the proper predicate for this decision, but happened to
> work for collectors that don't use a Java thread as part of their
> implementation; that predicate has never been adequate for G1 or CMS,
> which (sometimes) involve the SLT (Java) thread.
>
> However, this doesn't address the possibility of a collection with
> some other cause occurring before SLT creation in VM initialization,
> and failing because it requires the SLT.  This might happen if, for
> example, the initial memory configuration options are overly
> restrictive.  There are limited options to deal with this situation.
>
> * In some cases it might be possible to report OOME, but there are no
> application threads running yet that might do anything useful with it.
> It's also not clear this should be treated as an OOME; there may be
> lots of available memory, if only the collector could actually run.
>
> * Creating the SLT on demand isn't a reliable solution; such a
> collection could occur before it is possible to create and run a Java
> thread. (The SLT is created soon after Java thread creation is
> possible, but there is a period between when the heap supports
> allocation (which might trigger GC) and the point where Java thread
> creation is allowed.)
>
> * Instead we're changing the reporting of the SLT being needed before
> created situation to use vm_exit_during_initialization() with a
> message about what happened, instead of the previous segfault.
>