[RFC] [Plugin] Fixes 100% cpu load

Thu Sep 30 17:33:11 PDT 2010

* Andrew Su <asu at redhat.com> [2010-09-30 19:46]:
> 
> ----- "Deepak Bhole" <dbhole at redhat.com> wrote:
> 
> > From: "Deepak Bhole" <dbhole at redhat.com>
> > To: "Andrew Su" <asu at redhat.com>
> > Cc: distro-pkg-dev at openjdk.java.net
> > Sent: Thursday, September 30, 2010 7:15:55 PM GMT -05:00 US/Canada Eastern
> > Subject: Re: [RFC] [Plugin] Fixes 100% cpu load
> >
> > * Deepak Bhole <dbhole at redhat.com> [2010-09-30 18:55]:
> > > * Andrew Su <asu at redhat.com> [2010-09-30 17:57]:
> > > > 
> > > > ----- "Deepak Bhole" <dbhole at redhat.com> wrote:
> > > > 
> > > > > From: "Deepak Bhole" <dbhole at redhat.com>
> > > > > To: "Andrew Su" <asu at redhat.com>
> > > > > Cc: distro-pkg-dev at openjdk.java.net
> > > > > Sent: Thursday, September 30, 2010 5:01:44 PM GMT -05:00
> > US/Canada Eastern
> > > > > Subject: Re: [RFC] [Plugin] Fixes 100% cpu load
> > > > >
> > > > > Hi,
> > > > > 
> > > > > Sorry I couldn't get to this earlier.. wanted to study the
> > patch
> > > > > carefully before commenting.
> > > > > 
> > > > > * Andrew Su <asu at redhat.com> [2010-09-27 15:25]:
> > > > > > Hello,
> > > > > > 
> > > > > > This patch fixes the following:
> > > > > > *Plugin goes into a deadlock waiting for another thread to
> > > > > initialize applet, and no more threads are available to consume
> > > > > messages.
> > > > > > *Plugin handles "destroy" message when another thread trying
> > to
> > > > > create the applet is kicked off processor before setting the
> > status to
> > > > > PRE_INIT.
> > > > > > 
> > > > > 
> > > > > I am a bit confused about this part. As it stands now, the
> > sequence
> > > > > is
> > > > > as follows:
> > > > > 
> > > > > init message comes in -> start init
> > > > > destroy message comes in
> > > > >   if init is done -> destroy
> > > > >   if init is in progress -> wait, then destroy
> > > > > 
> > > > > Determination of if init is done or in progress is done via the
> > > > > status
> > > > > hashmap which contains one of PRE_INIT, IN_INIT or
> > INIT_COMPLETE.
> > > > > Destroy cannot be handled before PRE_INIT is set, as PRE_INIT is
> > the
> > > > > first thing set, and this code will prevent destroy from
> > proceeding
> > > > > until it is:
> > > > > 
> > > > >     // Wait till initialization finishes
> > > > >     while (!applets.containsKey(identifier) && 
> > > > >             (
> > > > >                !status.containsKey(identifier) || 
> > > > >               
> > > > > status.get(identifier).equals(PAV_INIT_STATUS.PRE_INIT)
> > > > >             )
> > > > >           );
> > > > > 
> > > > > > 
> > > > > > Example.
> > > > > > We have 2 MessageConsumer threads.
> > > > > > Passes "destroy" message to both of them for applet 1 and
> > applet 2.
> > > > > > Thread 1: Spinlock waiting for another thread to initialize
> > applet
> > > > > 1.
> > > > > > Thread 2: Spinlock waiting for another thread to initialize
> > applet
> > > > > 2.
> > > > > > Both threads are now in deadlock.
> > > > > > 
> > > > > > Proposed Patch:
> > > > > > When message is destroy, don't spinlock, handle it right
> > away.
> > > > > > -Problem with this: handles destroy right before we start
> > > > > initializing the applet.
> > > > > > -Fix to new problem: Synchronize the two blocks
> > > > > > --Case 1:If in process of initializing, finish initializing.
> > handle
> > > > > the destroy.
> > > > > > --Case 2:If destroy already handled, don't initialize.
> > > > > > 
> > > > > 
> > > > > I don't understand the destroy-before-init case. Are you seeing
> > an
> > > > > instance
> > > > > where destroy gets sent before init does? And if that's what
> > happens,
> > > > > the while
> > > > > loop above would still keep destroy from proceeding until it is
> > > > > initialized. 
> > > > > 
> > > > > Is there a test case for this problem?
> > > > 
> > > > You are indeed correct that that is the sequence of handling the
> > messages, but.. 
> > > > Let's say I load a page with 10 applets (we have 5 threads that
> > handle message). 
> > > > Now before the applets load, navigate to another page that loads
> > applets or not, then we will get destroy messages. Since the messages
> > are queued, when there is no workers free to do work, the messages are
> > sent to the back of the queue to be handled later. Assuming a better
> > case where each of our 5 threads are given the handle message (this is
> > the message we use to initialize our applets). Since now all our
> > workers are busy processing the init, the next 5 handle messages gets
> > dequeued then placed at the back of queue. Next the 5 threads finishes
> > initializing the applets.
> > > 
> > > Where did you see the 5 thread count? There should be 6. 4
> > non-priority,
> > > 2 priority. Priority threads do not handle blocking calls, and can
> > only
> > > be used for messages registered as priority.
> > > 
> > > > Each dequeuing the from the readQueue. If the destroy messages are
> > the ones initialized, works as intended, if not, all 5 threads will
> > now go into a never-ending loop. If the user waits until all the
> > applets load before navigating somewhere else, it'll work out since
> > the inits will be processed and destroy will be carried out properly,
> > otherwise it'll just take up the cpu.
> > > > 
> > > 
> > > Okay, so I have 4 free workers (let's ignore priority for now).
> > > Thread 0, Thread 1, Thread 2 and Thread 3.
> > > 
> > > 10 init messages come in. And we have:
> > > 
> > > Thread 0 - Processing init 0
> > > Thread 1 - Processing init 1
> > > Thread 2 - Processing init 2
> > > Thread 3 - Processing init 3
> > > 
> > > The remaining init messages get queued. So the queue now has:
> > > [init 3, init 4, init 5, init 6, init 7, init 8, init 9]
> > > 
> > > 10 destroy messages come in. There are no free workers, so they get
> > > queued and the readQueue is:
> > > [init 3, init 4, init 5, init 6, init 7, init 8, init 9, destroy 0,
> > > destroy 1, destroy 2, destroy 3, destroy 4, destroy 5, destroy 6,
> > > destroy 7, destroy 8, destroy 9]
> > > 
> > > At some point, as the system tests each message and requeues them,
> > the
> > > readqueue becomes:
> > > 
> > > [destroy 4, destroy 5, destroy 6, destroy 7, destroy 8, destroy 9
> > ....]
> > > 
> > > And those first 4 destroy messages take up the 4 available threads.
> > > 
> > > Is this the correct description of the issue?
> Yes.
> 
> > > 
> > > Assuming so, wouldn't just keeping messages at the front of the
> > queue
> > > instead of re-adding them to the back work? The fix would be a
> > one-liner
> > > and we wouldn't need any locks. 
> > > 
> > > I don't think it will make things worse either. Messages are
> > re-added to
> > > the queue only when no free workers are available. Adding to the
> > back
> > > of the queue does not guarantee FIFO processing order since the
> > order is
> > > constantly changing (as seen in your example). However at the end
> > of
> > > each cycle, the order resembles FIFO. This means that FIFO order
> > works
> > > as well. If FIFO works, we can just make that the default.
> > >
> > 
> > Doh! So close. Just remembered why it needs to add them to the back
> > of
> > the queue. It is so that priority messages can be processed while the
> > others wait.
> > 
> > Easy fix though. We would just have to scan the queue to see if there
> > are any priority messages, and if so, move them to the front and
> > retry
> > processing.
> I don't think having it sort the queue is worth the hit on performance. 
> Since adding to the head of a queue will require recopying the rest of the messages, or keep track of which element is the head.
> 

It is a linked list. Cost of addition/deletion is constant.

As for list traversal cost, the cost will be the same O(n) as now since
we append to the end of the queue and look at the next and keep doing it
until a priority message is found and processed. Even with locks, this
will stay at O(n).

e.g. Say the list contains 4 non-priority messages (N) and 1 priority
message (P) for n=5. If at the start, the queue is [N, N, N, N, P], with the
current code the list changes as:

[N, N, N, P, N]
[N, N, N, P, N]
[N, N, P, N, N]
[N, P, N, N, N]
[P, N, N, N, N]
<process p and remove it from queue>
[N, N, N, N]

For cost O(n)

With the new code, it'd be:
[N, N, N, N, P] -> <check each item till P is hit -- cost = n = 5>
<bring p to front, cost = constant>
[P, N, N, N, N]
<remove p, cost = constant>
[N, N, N, N]

The cost is still  O(n)

Additionally, such a case where processors are full would be encountered
rarely since a vast majority of pages have a single applet which
wouldn't max out the thread pool.

Cheers,
Deepak

> > 
> > Deepak
> >  
> > > This does no solve the problem of the applets 'hanging around in a
> > > separate window' until destroy is processed, 
> When we set the handle using reframe at the bottom of the handle block, that's where the applet shows up in a separate window. If we do not reframe given that a destroy message is sent, we can avoid the window showing up at all. (I am still looking into doing this)
> 
> > > but I don't imagine
> > too
> > > many regular users see that case. And even if they do, the applet
> > is
> > > always guaranteed to be destroyed eventually, ensuring a consistent
> > > state.
> Yes they will be eventually destroyed, but during that time do we we really want to have it using up their resources?
> 
> > > 
> > > Cheers,
> > > Deepak
> > > 
> > > > > 
> > > > > Sorry to be nitpicking.. I am trying to get rid of locks from
> > the code
> > > > > as much
> > > > > as possible by re-designing message sequencing... if a lock is
> > to be
> > > > > added, I
> > > > > want to make sure there is no other solution first :)
> > > > > 
> > > > > Cheers,
> > > > > Deepak
> > > > > 
> > > > > > Cheers,
> > > > > >   Andrew
> > > > >