RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java

David Holmes david.holmes at oracle.com
Tue Jun 28 04:04:20 UTC 2016


Hi Kim,

I like all the removal of the pending-list-locker related code, but 
can't really comment on the details of how it was replaced and interacts 
with all the GC code. The interface to the Reference class is fine, it 
is the GC code that pushes into the reference queue that I can't really 
comment on.

I have a couple of queries following up from Coleen's and Dan's reviews.

First, the synchronization of the pending-reference-list leaves me 
somewhat perplexed. As Coleen noted you require the heap_lock to be held 
but also use an Atomic::xchg. You have a comment:

+   // owns the lock.  Swap is used by parallel non-concurrent reference
+   // processing threads, where some higher level controller owns
+   // Heap_lock, so requires the lock is locked, but not necessarily by
+   // the current thread.

Can you clarify the higher-level protocol that is at play there. 
Asserting that "someone" owns a lock seems only marginally useful if you 
can't be sure who that owner is, or when they might release it. Some 
more direct detecting of being within this higher-level protocol would 
be better, if it is possible to detect.

---

As previously mentioned the change to the initialization order is a 
concern to me - there are always unexpected consequences of making such 
changes, no matter how straight-forward they seem at the time. 
Pre-initialization of exception classes is not really essential as the 
attempt to throw any of them from Java code will force initialization 
anyway - it's only for exceptions created and thrown from the VM code 
that direct initialization is needed. The problematic case is OOME 
because throwing the OOME may trigger a secondary OOME during that class 
initialization etc. Hence we try to deal with that by pre-allocating 
instances that do not have stacktraces etc, so they can be thrown 
safely. Hence my earlier comment that I think the real bug in your 
situation is that gen_out_of_memory_error() is not considering how far 
into the VM init sequence we are, and that it should be returning the 
pre-allocated instance with no backtrace.

---

A query on the JDK side:

src/java.base/share/classes/java/lang/ref/Reference.java

  private static native Reference<? super Object> 
popReferencePendingList(boolean wait);

I'm intrigued by the use of the lower-bounded wildcard using Object. As 
Object has no super-types this looks very strange and should be 
equivalent to the more common upper-bounded wildcard using Object ie 
Reference<? extends Object> or more concisely Reference<?>. I see this 
construct exists in the current code for the ReferenceQueue, but I am 
quite intrigued as to why? (Someone getting ready for Any-types? :) )

Thanks,
David
-----

On 25/06/2016 6:09 AM, Kim Barrett wrote:
> Please review this change which moves the Reference pending list and
> locking from the java.lang.ref.Reference class into the VM.  This
> includes elimination of the ReferencePendingListLocker mechanism in
> the VM. This fixes various fragility issues arising from having the
> management of this list split between Java and the VM (GC).
>
> This change addresses JDK-8156500 by eliminating the possibility of
> suspension while holding the lock. Because the locking is now done in
> the VM, we have full control over where suspension may occur.
>
> This change retains the existing interface between the reference
> processor and the nio.Bits package, e.g. Reference.tryHandlePending
> has the same signature and behavior; this change just pushes the
> pending list manipulation down into the VM.  There are open bugs
> reports, enhancement requests, and discussions related to that
> interface; see below. This change does not attempt to address them.
>
> This change additionally addresses or renders moot
> https://bugs.openjdk.java.net/browse/JDK-8055232
> (ref) Exceptions while processing Reference pending list
>
> and
> https://bugs.openjdk.java.net/browse/JDK-7103238
> Ensure pending list lock is held on behalf of GC before enqueuing references on to the pending list
>
> It is also relevant for followup discussion of
> https://bugs.openjdk.java.net/browse/JDK-8022321
> java/lang/ref/OOMEInReferenceHandler.java fails immediately
>
> and
> https://bugs.openjdk.java.net/browse/JDK-8149925
> We don't need jdk.internal.ref.Cleaner any more - part 1
>
> and has implications for:
> https://bugs.openjdk.java.net/browse/JDK-8066859
> java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died
>
> In addition to the obviously relevant changes, there are a couple of
> changes whose presence here might seem surprising and require further
> explanation.
>
> - Removal of a stale comment in create_vm, noticed because it was near
> some code being removed as part of this change.  The comment was
> rendered obsolete by JDK-8028275.
>
> - Moved initialization of exception classes earlier in
> initialize_java_lang_classes. The previous order only worked by
> accident, at least for OOME.  During the bulk of the library
> initialization, OOME may be thrown, perhaps due to poorly chosen
> command line options.  That attempts to use one of the pre-allocated
> OOME objects, and tries to fill in its stack trace.  If the Throwable
> class is not yet initialized, this leads to a failed assert in the VM
> because Throwable.UNASSIGNED_STACK still has a null value.  This
> initialization order issue was being masked by the old pending list
> implementation; the initialization of Reference ensured
> InterruptedException was initialized (thereby initializing its
> Throwable base class), and the initialization of Reference just
> happened to occur early enough that Throwable was initialized by the
> time it was needed when running certain tests.  The forced
> initialization of InterruptedException is no longer needed by
> Reference, but removal of it triggered test failures (and could
> trigger corresponding product failures) because of this ordering
> issue.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8156500
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.00/
> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.00/
>
> Testing:
> RBT ad hoc nightly runtime & gc tests.
>
> With the proposed patch for JDK-8153711 applied, locally ran nearly
> 35K iterations of the OomDebugTest that is part of that patch, with no
> failures / deadlocks.  Without this change it would typically only take
> on the order of 100 iterations to hit a deadlock failure.
>
> Locally ran direct byte buffer allocation test:
> jdk/test/java/nio/Buffer/DirectBufferAllocTest.java
> This change reported 3% faster allocation times, and 1/2 the standard
> deviation for allocation times.
>
> Locally ran failing test from JDK-8022321 and JDK-8066859 a few
> thousand times.
> jdk/test/java/lang/ref/OOMEInReferenceHandler.java
> No errors, but failures were noted as hard to reproduce.
>


More information about the hotspot-dev mailing list