RFR(10): JDK-8178536: OOM ERRORS + SERVICE-THREAD TAKES A PROCESSOR TO 100%

Poonam Parhar poonam.bajaj at oracle.com
Fri Jun 16 17:58:54 UTC 2017


Thanks Dan! I will incorporate your suggestions.

Regards,
Poonam

> -----Original Message-----
> From: Daniel D. Daugherty
> Sent: Friday, June 16, 2017 10:13 AM
> To: Poonam Parhar; Mandy Chung
> Cc: serviceability-dev; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(10): JDK-8178536: OOM ERRORS + SERVICE-THREAD TAKES A
> PROCESSOR TO 100%
> 
>  > Hotspot changes:
>  >
>  > http://cr.openjdk.java.net/~poonam/8178536/webrev.hotspot/
> 
> src/share/vm/services/lowMemoryDetector.cpp
>      nit - please update the copyright year before pushing.
> 
>      L307:     // call Sensor::trigger(int, MemoryUsage) to send
> notification to listeners.
>          nit - 'call' -> 'Call'
> 
>      L308:     // when OOME occurs and fails to allocate MemoryUsage
> object, call
>          nit - 'when' -> 'When'
> 
>      L328:        // we just clear the OOM pending exception that we
> might have encountered
>          nit 'we' -> 'We'
> 
>      L328, L330-32 - jcheck is going to complain about the trailing
> blanks on
>          these lines.
> 
> Thumbs up on the HotSpot changes, but these low memory situations are
> fraught with unexpected failure modes and code paths.
> 
> Dan
> 
> 
> On 6/14/17 1:58 PM, Poonam Parhar wrote:
> > Hello Mandy,
> >
> >
> >
> > Thanks for taking a look at the changes. My comments inlined below:
> >
> >
> >
> > From: Mandy Chung
> > Sent: Wednesday, June 14, 2017 10:18 AM
> > To: Poonam Parhar
> > Cc: hotspot-runtime-dev at openjdk.java.net; serviceability-dev
> > Subject: Re: RFR(10): JDK-8178536: OOM ERRORS + SERVICE-THREAD TAKES
> A
> > PROCESSOR TO 100%
> >
> >
> >
> >
> >
> > On Jun 7, 2017, at 8:53 AM, Poonam Parhar <HYPERLINK
> "mailto:poonam.bajaj at oracle.com"poonam.bajaj at oracle.com> wrote:
> >
> >
> >
> > Hello,
> >
> > Could I have reviews for the changes for the Bug HYPERLINK
> "https://bugs.openjdk.java.net/browse/JDK-8178536"JDK-8178536: OOM
> ERRORS + SERVICE-THREAD TAKES A PROCESSOR TO 100%.
> >
> > Problem: If there are listeners installed for MemoryMXBean then in
> the event of an OOME, JVM internal thread 'Service Thread' responsible
> for delivering notifications to the Java threads itself may encounter
> an OOM exception and get into a loop of processing its pending
> requests. This happens if and when the Service Thread executing the
> native code calls its corresponding java methods and faces an OOM
> exception, this pending exception makes the thread exit early from
> SensorInfo∷trigger() function before it can update its pending requests
> counter (_pending_trigger_count). This pending exception is never
> cleared and that makes the thread loop in
> LowMemoryDetector::process_sensor_changes().
> >
> > Hotspot changes:
> >
> > http://cr.openjdk.java.net/~poonam/8178536/webrev.hotspot/
> >
> > These changes check for the pending exception and clear it, and make
> sure that the pending requests counters are kept in sync on both the
> Java and the VM side.
> >
> > [Poonam] First of all apologies for this weird bold font. Not sure
> why my email client sent two emails one with this bold font and one
> without.
> >
> > This fix would drop any notification when it runs out of memory.
> This approach is reasonable.  Even we preallocate the notification
> object, the listener code may fail due to OOM.
> >
> >
> >
> > There are two places doing the object allocation - VM allocates
> MemoryUsage and the Sensor::trigger implementation.  Another idea is to
> change Sensor::trigger to take the values of init, used, committed, max
> instead MemoryUsage so that the VM code can be simplified and the Java
> code will catch all exception and return a boolean to indicate
> success/fail.   This will also prepare moving the invocation of the
> user listener code in a Java thread instead of ServiceThread to address
> the issue about ServiceThread is a  hidden thread that can’t be
> debugged.
> >
> >
> > [Poonam] yes, that could also be another way. But for this bug, I
> would like to keep the changes in sync with how the exceptions in the
> Java code called from the VM side of the Service Thread are handled.
> For example, ServiceThread::service_thread_entry() also calls
> GCNotifier::sendNotification() and here’s how the pending exception is
> handled.
> >
> >
> >
> >
> >
> >      if(has_gc_notification_event) {
> >
> >          GCNotifier::sendNotification(CHECK);
> >
> >      }
> >
> > void GCNotifier::sendNotification(TRAPS) {
> >
> >    GCNotifier::sendNotificationInternal(THREAD);
> >
> >    // Clearing pending exception to avoid premature termination of
> >
> >    // the service thread
> >
> >    if (HAS_PENDING_EXCEPTION) {
> >
> >      CLEAR_PENDING_EXCEPTION;
> >
> >    }
> >
> > }
> >
> >
> >
> > When we get to the issue of enabling the debugging of the user code
> currently being executed on the ServiceThread, we will have to be
> looking at these other notifications as well and I would prefer that we
> make those changes together under the same bug and not with this
> change.
> >
> >
> >
> > Please let me know what you think.
> >
> >
> >
> > Thanks,
> >
> > Poonam
> >
> >
> >
> > JDK changes:
> >
> > http://cr.openjdk.java.net/~poonam/8178536/webrev.jdk/
> >
> > These changes make the triggerAction() a no-op since we need to call
> this method if MemoryService::create_MemoryUsage_obj() encounters an
> OOM exception and we want to avoid further potential OOM exceptions in
> triggerAction(MemoryUsage).
> >
> >
> >
> > Testcase:
> >
> > I have written a testcase though I won't be integrating it along with
> the fix because the failure is unreliable; the Service Thread does not
> always fall into the situation that causes it to loop i.e. it has
> _pending_trigger_count>0 and hits an OOM in its Java calls.
> >
> > http://cr.openjdk.java.net/~poonam/8178536/webrev.hotspot.testcase/
> >
> >
> >
> > OK.
> >
> >
> >
> > Mandy
> >
> >
> >
> 


More information about the hotspot-runtime-dev mailing list