Request for review, 6766644: Redefinition of compiled method fails with assertion "Can not load classes with the Compiler thread"

Mon Jan 31 18:20:10 PST 2011

Hi Keith,

I've looked at the new webrev - thanks.

Keith McGuigan said the following on 02/01/11 00:32:
> On Jan 30, 2011, at 6:58 PM, David Holmes wrote:
> 
> Since Service_lock is a 'special' lock (see mutex.hpp), no locks can be 
> acquired when holding the service-lock.  I believe the locking code 
> asserts this and I was careful to not acquire new locks when holding the 
> Service_lock.  So deadlock should not be an issue but let me know if you 
> see somewhere I screwed this up.

Okay that makes me feel somewhat better.

> I'm trying to come up with some tests for this now.  This risk here 
> ought to be low as the behavior of low-memory detection will be 
> completely unchanged unless JVMTI compiled-method events are enabled.  
> When they are enabled, the low-memory detection actions may be delayed 
> by the JVMTI event native code.  I don't suspect that this is a problem 
> but I'm investigating to to make sure.

My concern is that we may introduce a previously impossible cycle 
between these various threads. Event processing can be delayed by 
low-memory processing and vice-versa, and I didn't check through to see 
if it was impossible to somehow get a cycle between the two.

>> I'm also a little concerned that changing the LowMemoryDetector (LMD) 
>> thread in to the Service thread is somewhat disruptive when doing 
>> stack analysis on running apps or core dumps or crash logs, because 
>> people have been used to seeing the LMD thread for many years and now 
>> it has changed its identity.
> 
> The thread "is_hidden_from_external_view", (which I know doesn't hide it 
> completely), so it's visibility should be pretty limited.

But it is completely visible in the contexts I listed: stack dumps for 
live processes and core files. Good compromise keeping the name for 
JDK6. May want to give a heads-up to all JDK developers though. :)

>> Looking in jvmtiImpl.cpp
>>
>> I'd prefer to see the locking in all the JvmtiDeferredEventQueue 
>> methods rather than having the call-sites do the locking, as it is 
>> less error-prone. But I see that the main processing loop in the 
>> service thread makes this impossible due to the dual use of the 
>> "service lock".
> 
> It is possible to massage the service thread loop to make that work.  I 
> did have it coded up that way at one point but decided to change it back 
> and put the onus on the caller to acquire the lock instead.  I think 
> this is better since it makes it more obvious in the calling code that 
> the lock is being acquired.  Putting the lock acquisition inside the 
> calls instead would sort of "hide" the lock and I think it's safer to 
> make it obvious.

That "hiding" is called encapsulation and is generally considered a 
"good thing" :) "client-side locking" is generally considered error 
prone. That said, inside the VM where deadlocks are a big concern, more 
visibility of locks isn't necessarily a bad thing.

>> Where we have:
>>
>> 961 void JvmtiDeferredEventQueue::enqueue(const JvmtiDeferredEvent& 
>> event) {
>> ...
>> 967   QueueNode* node = new QueueNode(event);
>>
>> and
>>
>> 1007 void JvmtiDeferredEventQueue::add_pending_event(
>> 1008     const JvmtiDeferredEvent& event) {
>> 1009
>> 1010   QueueNode* node = new QueueNode(event);
>>
>> Will an allocation failure here terminate the VM? I suspect it will 
>> but I don't think it should. Can we not add a link field to the event 
>> objects rather than have to create Nodes to hold them?
> 
> I can add a NULL check to those spots.  There's no need for us to segv 
> if we can't allocate memory for this.  We do need a C-heap allocated 
> object since we're passing these events from thread-to-thread.  Adding 
> the link in the event object itself wouldn't help - it would just mean 
> we have to allocate the event object in C-heap instead.

As per your other email NULL check is pointless as "new" will never 
return null but will instead call vm_exit_out_of_memory to abort the VM 
- which is exactly my point. I don't believe that a failure to allocate 
in this code should be considered a fatal error for the VM.

I don't understand why we would have to allocate the event objects in 
C-heap, all we want is a link field so that they can form their own 
queue without the need for wrapper Nodes.

  447 class JvmtiDeferredEvent VALUE_OBJ_CLASS_SPEC {
  448   friend class JvmtiDeferredEventQueue;
  449  private:
        JvmtiDeferredEvent _next; // for chaining events together
       public:
        JvmtiDeferredEvent next() { return _next; }
        void set_next(JvmtiDeferredEvent e) {
         assert(_next == NULL, "overwriting linked event!");
         _next = e;
        }

Further this would allow you to dequeue all the events at once instead 
of the service thread going through pulling them off one at a time.

>> In dequeue, this:
>>
>> 988   if (_queue_head == NULL) {
>> 989     // Just in case this happens in product; it shouldn't be let's 
>> not crash
>> 990     return JvmtiDeferredEvent();
>> 991   }
>>
>> doesn't look right. Are we returning a stack allocated instance here? 
>> (There's also a typo in the comment.)
> 
> Yes, we're returning a stack-allocated instance, which returns to the 
> caller as a temp and is copied into a local variable there.  I'll fix 
> the comment.

It isn't "legal" to return a stack-allocated object from a method. I 
expect we'll get away with it but still ...

Couldn't you just allocate a static instance to use in this case?

>> In jvmtiExport.cpp,  I don't understand why here:
>>
>> 832 void JvmtiExport::post_dynamic_code_generated(const char *name, 
>> const void *code_begin, const void *code_end) {
>> ...
>> 1841
>> 1842   JvmtiDeferredEventQueue::wait_for_empty_queue();
>> 1843
>>
>> we have to wait for an empty queue, particularly as the queue may 
>> become non-empty at any moment. Normally such a wait must be atomic 
>> with respect to an action that requires the queue to be empty, but 
>> that doesn't seem to be the case here.
> 
<snip>
> that's perfectly fine.  It's more of a 'flush_queue' call and I suppose 
> I could call it that explicitly (and maybe add a 'flush marker' in the 
> queue so that this won't wait for events that were generated while this 
> was waiting).  Actually if I do that I can get rid of the 'notify' in 
> dequeue and instead use a special flush event.  Hmmm... that might work 
> better.

Ok so in general I like the form of the new architecture but I'm a bit 
unclear on the details. In general non-service threads will only call:

- enqueue() to add an event
- add_pending_event() to add an event in the special case of a safepoint 
being active
- flush_queue() to wait for earlier events to get posted

The service thread will:

- loop in wait() until there is an event (enqueue will do the notifyAll)
- dequeue() to pull off that event

Now it seems that flush_queue is also called by the service thread, but 
I can't see where from?

I'm a little concerned about all the locking/unlocking that seems to 
happen; and the fact that there are potentially a lot of spurious 
notifications. Though this is mitigated if we only expect a couple of 
threads to be involved, there are ways to restructure things so that 
event processing can be batched, including a single notifyAll once all 
flush events are complete.

Cheers,
David