RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

Mon Oct 7 07:34:52 UTC 2019

Hi Daniil,

Yes, good, but:

>>      >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
>>      And if you have not done so, you should test this with the benchmark you 
>> have as
>>      a stress test and see that this does what we think.

Can you please test it with your benchmark, if you have not done so?

/Robbin

>>      Thanks, Robbin
>>      >>
>>      >> Thank you!
>>      >>
>>      >> Best regards,
>>      >> Daniil
>>      >>
>>      >> On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>>      >>
>>      >>      Hi Daniil,
>>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
>>      >>      > Hi David and Robbin,
>>      >>      >
>>      >>      > Could we consider  making the ServiceThread responsible for the
>>      >> ThreadIdTable resizing in the similar way how
>>      >>      > it works for  StringTable  and ResolvedMethodTable, rather than 
>> having
>>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>>      >>      > As I understand It should solve  the current  issue and  
>> address the
>>      >> concern that  the doing the resizing could be a relatively long and
>>      >>      > doing it without polling  for safepoints or while the holding
>>      >> Threads_lock is not desirable.
>>      >>      I originally rejected copying that part of the code from the other
>>      >>      tables as it seems to introduce unnecessary complexity. Having a
>>      >>      separate thread trying to grow the table when it could be 
>> concurrently
>>      >>      having threads added and removed seems like it could introduce 
>> hard to
>>      >>      diagnose performance pathologies. It also adds what we know to be a
>>      >>      potentially long running action to the workload of the service 
>> thread,
>>      >>      which means it may also impact the other tasks the service thread is
>>      >>      doing, thus potentially introducing even more hard to diagnose
>>      >>      performance pathologies.
>>      >>      So this change does concern me. But go ahead and trial it.
>>      >>      Thanks,
>>      >>      David
>>      >>      > Thank you,
>>      >>      > Daniil
>>      >>      >
>>      >>      >
>>      >>      > On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> 
>> wrote:
>>      >>      >
>>      >>      >      Hi Robbin,
>>      >>      >
>>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>>      >>      >      > Hi David,
>>      >>      >      >
>>      >>      >      >> What if the table is full and must be grown?
>>      >>      >      >
>>      >>      >      > The table uses chaining, it just means load factor tip 
>> over what is
>>      >>      >      > considered a good backing array size.
>>      >>      >
>>      >>      >      Coleen raised a good question in a separate discussion, 
>> which made me
>>      >>      >      realize that once the table has been initially populated all
>>      >> subsequent
>>      >>      >      additions, and hence all subsequent calls to grow() always 
>> happen
>>      >> with
>>      >>      >      the Threads_lock held. So we can't just defer the grow().
>>      >>      >
>>      >>      >      >> That aside, I'd like to know how expensive it is to 
>> grow this
>>      >> table.
>>      >>      >      >> What are we talking about here?
>>      >>      >      >
>>      >>      >      > We use global counter which on write_synchronize must 
>> scan all
>>      >>      >      > threads to make sure they have seen the update (there some
>>      >>      >      > optimization to avoid it if there is no readers at all). 
>> Since this
>>      >>      >      > table contains the threads, we get double penalized, for 
>> each new
>>      >>      >      > thread the synchronization cost increase AND the number 
>> of items.
>>      >>      >      >
>>      >>      >      > With concurrent reads you still need many thousands of 
>> threads, but
>>      >>      >      > I think I saw someone mentioning 100k threads, assuming 
>> concurrent
>>      >>      >      > queries the resize can take hundreds of ms to finish. 
>> Note that
>>      >> reads
>>      >>      >      > and inserts still in operate roughly at the same speed 
>> while
>>      >>      >      > resizing. So a longer resize is only problematic if we 
>> do not
>>      >>      >      > respect safepoints.
>>      >>      >      I think if anything were capable of running 100K threads 
>> we would be
>>      >>      >      hitting far worse scalability bottlenecks than this. But 
>> this does
>>      >> seem
>>      >>      >      problematic.
>>      >>      >
>>      >>      >      Thanks,
>>      >>      >      David
>>      >>      >      -----
>>      >>      >
>>      >>      >      > Thanks, Robbin
>>      >>      >      >
>>      >>      >      >>
>>      >>      >      >> David
>>      >>      >      >>
>>      >>      >      >>> /Robbin
>>      >>      >      >>>
>>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
>>      >>      >      >>>> Hi Daniil,
>>      >>      >      >>>>
>>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>>      >>      >      >>>>> Please review a change that fixes the issue. The 
>> problem
>>      >> here is
>>      >>      >      >>>>> that that the thread is added to the ThreadIdTable
>>      >> (introduced in
>>      >>      >      >>>>> [3]) while the Threads_lock is held by
>>      >>      >      >>>>> JVM_StartThread. When new thread is added  to the 
>> thread
>>      >> table the
>>      >>      >      >>>>> table checks if its load factor is greater than 
>> required and
>>      >> if so
>>      >>      >      >>>>> it grows itself while polling for safepoints.
>>      >>      >      >>>>> After changes [4]  an attempt to block the thread while
>>      >> holding the
>>      >>      >      >>>>> Threads_lock  results in assertion in
>>      >>      >      >>>>> Thread::check_possible_safepoint().
>>      >>      >      >>>>>
>>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you, 
>> David!)  is
>>      >> to skip
>>      >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() 
>> method if the
>>      >>      >      >>>>> current thread owns the Threads_lock.
>>      >>      >      >>>>
>>      >>      >      >>>> Sorry but looking at the fix in context now I think 
>> it would be
>>      >>      >      >>>> better to do this:
>>      >>      >      >>>>
>>      >>      >      >>>>      while (gt.do_task(jt)) {
>>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
>>      >>      >      >>>>          gt.pause(jt);
>>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
>>      >>      >      >>>>          gt.cont(jt);
>>      >>      >      >>>>        }
>>      >>      >      >>>>      }
>>      >>      >      >>>>
>>      >>      >      >>>> This way we don't waste time with the pause/cont when 
>> there's no
>>      >>      >      >>>> safepoint pause going to happen - and the owner() 
>> check is
>>      >> quicker
>>      >>      >      >>>> than owned_by_self(). That partially addresses a general
>>      >> concern I
>>      >>      >      >>>> have about how long it may take to grow the table, as 
>> we are
>>      >>      >      >>>> deferring safepoints until it is complete in this
>>      >> JVM_StartThread
>>      >>      >      >>>> usecase.
>>      >>      >      >>>>
>>      >>      >      >>>> In the test you don't need all of:
>>      >>      >      >>>>
>>      >>      >      >>>>    32  * @run clean ThreadStartTest
>>      >>      >      >>>>    33  * @run build ThreadStartTest
>>      >>      >      >>>>    34  * @run main ThreadStartTest
>>      >>      >      >>>>
>>      >>      >      >>>> just the last @run suffices to build and run the test.
>>      >>      >      >>>>
>>      >>      >      >>>> Thanks,
>>      >>      >      >>>> David
>>      >>      >      >>>> -----
>>      >>      >      >>>>
>>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed 
>> successfully,
>>      >> tier3 is
>>      >>      >      >>>>> in progress.
>>      >>      >      >>>>>
>>      >>      >      >>>>> [1] Webrev:
>>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>>      >>      >      >>>>> [2] Bug: 
>> https://bugs.openjdk.java.net/browse/JDK-8231666
>>      >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>>      >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>      >>      >      >>>>>
>>      >>      >      >>>>> Best regards,
>>      >>      >      >>>>> Danill
>>      >>      >      >>>>>
>>      >>      >      >>>>>
>>      >>      >
>>      >>      >
>>      >>      >
>>      >>
>>      >>
>>
>>