RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

Wed Oct 2 17:14:36 UTC 2019

Hi Daniil,

On 2019-10-02 18:21, Daniil Titov wrote:
> Hi David and Robbin,
> 
> Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
> it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
> As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and
> doing it without polling  for safepoints or while the holding Threads_lock is not desirable.

Yes, thanks.

/Robbin

> 
> Thank you,
> Daniil
> 
> 
> On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
> 
>      Hi Robbin,
>      
>      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      > Hi David,
>      >
>      >> What if the table is full and must be grown?
>      >
>      > The table uses chaining, it just means load factor tip over what is
>      > considered a good backing array size.
>      
>      Coleen raised a good question in a separate discussion, which made me
>      realize that once the table has been initially populated all subsequent
>      additions, and hence all subsequent calls to grow() always happen with
>      the Threads_lock held. So we can't just defer the grow().
>      
>      >> That aside, I'd like to know how expensive it is to grow this table.
>      >> What are we talking about here?
>      >
>      > We use global counter which on write_synchronize must scan all
>      > threads to make sure they have seen the update (there some
>      > optimization to avoid it if there is no readers at all). Since this
>      > table contains the threads, we get double penalized, for each new
>      > thread the synchronization cost increase AND the number of items.
>      >
>      > With concurrent reads you still need many thousands of threads, but
>      > I think I saw someone mentioning 100k threads, assuming concurrent
>      > queries the resize can take hundreds of ms to finish. Note that reads
>      > and inserts still in operate roughly at the same speed while
>      > resizing. So a longer resize is only problematic if we do not
>      > respect safepoints.
>      I think if anything were capable of running 100K threads we would be
>      hitting far worse scalability bottlenecks than this. But this does seem
>      problematic.
>      
>      Thanks,
>      David
>      -----
>      
>      > Thanks, Robbin
>      >
>      >>
>      >> David
>      >>
>      >>> /Robbin
>      >>>
>      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >>>> Hi Daniil,
>      >>>>
>      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >>>>> Please review a change that fixes the issue. The problem here is
>      >>>>> that that the thread is added to the ThreadIdTable  (introduced in
>      >>>>> [3]) while the Threads_lock is held by
>      >>>>> JVM_StartThread. When new thread is added  to the thread table the
>      >>>>> table checks if its load factor is greater than required and if so
>      >>>>> it grows itself while polling for safepoints.
>      >>>>> After changes [4]  an attempt to block the thread while holding the
>      >>>>> Threads_lock  results in assertion in
>      >>>>> Thread::check_possible_safepoint().
>      >>>>>
>      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip
>      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>      >>>>> current thread owns the Threads_lock.
>      >>>>
>      >>>> Sorry but looking at the fix in context now I think it would be
>      >>>> better to do this:
>      >>>>
>      >>>>      while (gt.do_task(jt)) {
>      >>>>        if (Threads_lock->owner() == jt) {
>      >>>>          gt.pause(jt);
>      >>>>          ThreadBlockInVM tbivm(jt);
>      >>>>          gt.cont(jt);
>      >>>>        }
>      >>>>      }
>      >>>>
>      >>>> This way we don't waste time with the pause/cont when there's no
>      >>>> safepoint pause going to happen - and the owner() check is quicker
>      >>>> than owned_by_self(). That partially addresses a general concern I
>      >>>> have about how long it may take to grow the table, as we are
>      >>>> deferring safepoints until it is complete in this JVM_StartThread
>      >>>> usecase.
>      >>>>
>      >>>> In the test you don't need all of:
>      >>>>
>      >>>>    32  * @run clean ThreadStartTest
>      >>>>    33  * @run build ThreadStartTest
>      >>>>    34  * @run main ThreadStartTest
>      >>>>
>      >>>> just the last @run suffices to build and run the test.
>      >>>>
>      >>>> Thanks,
>      >>>> David
>      >>>> -----
>      >>>>
>      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
>      >>>>> in progress.
>      >>>>>
>      >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >>>>>
>      >>>>> Best regards,
>      >>>>> Danill
>      >>>>>
>      >>>>>
>      
> 
>