RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition
Robbin Ehn
robbin.ehn at oracle.com
Wed Oct 2 17:14:36 UTC 2019
Hi Daniil,
On 2019-10-02 18:21, Daniil Titov wrote:
> Hi David and Robbin,
>
> Could we consider making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
> it works for StringTable and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
> As I understand It should solve the current issue and address the concern that the doing the resizing could be a relatively long and
> doing it without polling for safepoints or while the holding Threads_lock is not desirable.
Yes, thanks.
/Robbin
>
> Thank you,
> Daniil
>
>
> On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
>
> Hi Robbin,
>
> On 2/10/2019 7:58 pm, Robbin Ehn wrote:
> > Hi David,
> >
> >> What if the table is full and must be grown?
> >
> > The table uses chaining, it just means load factor tip over what is
> > considered a good backing array size.
>
> Coleen raised a good question in a separate discussion, which made me
> realize that once the table has been initially populated all subsequent
> additions, and hence all subsequent calls to grow() always happen with
> the Threads_lock held. So we can't just defer the grow().
>
> >> That aside, I'd like to know how expensive it is to grow this table.
> >> What are we talking about here?
> >
> > We use global counter which on write_synchronize must scan all
> > threads to make sure they have seen the update (there some
> > optimization to avoid it if there is no readers at all). Since this
> > table contains the threads, we get double penalized, for each new
> > thread the synchronization cost increase AND the number of items.
> >
> > With concurrent reads you still need many thousands of threads, but
> > I think I saw someone mentioning 100k threads, assuming concurrent
> > queries the resize can take hundreds of ms to finish. Note that reads
> > and inserts still in operate roughly at the same speed while
> > resizing. So a longer resize is only problematic if we do not
> > respect safepoints.
> I think if anything were capable of running 100K threads we would be
> hitting far worse scalability bottlenecks than this. But this does seem
> problematic.
>
> Thanks,
> David
> -----
>
> > Thanks, Robbin
> >
> >>
> >> David
> >>
> >>> /Robbin
> >>>
> >>> On 2019-10-02 08:46, David Holmes wrote:
> >>>> Hi Daniil,
> >>>>
> >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
> >>>>> Please review a change that fixes the issue. The problem here is
> >>>>> that that the thread is added to the ThreadIdTable (introduced in
> >>>>> [3]) while the Threads_lock is held by
> >>>>> JVM_StartThread. When new thread is added to the thread table the
> >>>>> table checks if its load factor is greater than required and if so
> >>>>> it grows itself while polling for safepoints.
> >>>>> After changes [4] an attempt to block the thread while holding the
> >>>>> Threads_lock results in assertion in
> >>>>> Thread::check_possible_safepoint().
> >>>>>
> >>>>> The fix proposed by David Holmes ( thank you, David!) is to skip
> >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
> >>>>> current thread owns the Threads_lock.
> >>>>
> >>>> Sorry but looking at the fix in context now I think it would be
> >>>> better to do this:
> >>>>
> >>>> while (gt.do_task(jt)) {
> >>>> if (Threads_lock->owner() == jt) {
> >>>> gt.pause(jt);
> >>>> ThreadBlockInVM tbivm(jt);
> >>>> gt.cont(jt);
> >>>> }
> >>>> }
> >>>>
> >>>> This way we don't waste time with the pause/cont when there's no
> >>>> safepoint pause going to happen - and the owner() check is quicker
> >>>> than owned_by_self(). That partially addresses a general concern I
> >>>> have about how long it may take to grow the table, as we are
> >>>> deferring safepoints until it is complete in this JVM_StartThread
> >>>> usecase.
> >>>>
> >>>> In the test you don't need all of:
> >>>>
> >>>> 32 * @run clean ThreadStartTest
> >>>> 33 * @run build ThreadStartTest
> >>>> 34 * @run main ThreadStartTest
> >>>>
> >>>> just the last @run suffices to build and run the test.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>> -----
> >>>>
> >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
> >>>>> in progress.
> >>>>>
> >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
> >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
> >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
> >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
> >>>>>
> >>>>> Best regards,
> >>>>> Danill
> >>>>>
> >>>>>
>
>
>
More information about the serviceability-dev
mailing list