RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition
Daniil Titov
daniil.x.titov at oracle.com
Tue Oct 8 18:01:33 UTC 2019
Hi David and Robbin,
Thank you for reviewing this fix!
Best regards,
Daniil
On 10/8/19, 1:49 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
Great, thanks!
/Robbin
On 2019-10-07 18:41, Daniil Titov wrote:
> Hi Robbin,
>
> Yes, I ran my benchmark [1]. Please see below the output showing that the table was grown.
>
> ../jdk/build/linux-x64-release/images/jdk/bin/java -cp . -Xlog:thread+table=info ThreadStartupTest
> Starting the test
> [0.185s][info][thread,table] Grown to size:512
> The test finished.
> Execution time:15673 ms
>
>
> [1] https://cr.openjdk.java.net/~dtitov/tests/ThreadStartupTest.java
>
> Thanks!
> Daniil
>
>
> On 10/7/19, 12:34 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
>
> Hi Daniil,
>
> Yes, good, but:
>
> >> >> Testing: Mach5 tier1, tier2, and tier3 tests successfully passed.
> >> And if you have not done so, you should test this with the benchmark you
> >> have as
> >> a stress test and see that this does what we think.
>
> Can you please test it with your benchmark, if you have not done so?
>
> /Robbin
>
> >> Thanks, Robbin
> >> >>
> >> >> Thank you!
> >> >>
> >> >> Best regards,
> >> >> Daniil
> >> >>
> >> >> On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
> >> >>
> >> >> Hi Daniil,
> >> >> On 3/10/2019 2:21 am, Daniil Titov wrote:
> >> >> > Hi David and Robbin,
> >> >> >
> >> >> > Could we consider making the ServiceThread responsible for the
> >> >> ThreadIdTable resizing in the similar way how
> >> >> > it works for StringTable and ResolvedMethodTable, rather than
> >> having
> >> >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
> >> >> > As I understand It should solve the current issue and
> >> address the
> >> >> concern that the doing the resizing could be a relatively long and
> >> >> > doing it without polling for safepoints or while the holding
> >> >> Threads_lock is not desirable.
> >> >> I originally rejected copying that part of the code from the other
> >> >> tables as it seems to introduce unnecessary complexity. Having a
> >> >> separate thread trying to grow the table when it could be
> >> concurrently
> >> >> having threads added and removed seems like it could introduce
> >> hard to
> >> >> diagnose performance pathologies. It also adds what we know to be a
> >> >> potentially long running action to the workload of the service
> >> thread,
> >> >> which means it may also impact the other tasks the service thread is
> >> >> doing, thus potentially introducing even more hard to diagnose
> >> >> performance pathologies.
> >> >> So this change does concern me. But go ahead and trial it.
> >> >> Thanks,
> >> >> David
> >> >> > Thank you,
> >> >> > Daniil
> >> >> >
> >> >> >
> >> >> > On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com>
> >> wrote:
> >> >> >
> >> >> > Hi Robbin,
> >> >> >
> >> >> > On 2/10/2019 7:58 pm, Robbin Ehn wrote:
> >> >> > > Hi David,
> >> >> > >
> >> >> > >> What if the table is full and must be grown?
> >> >> > >
> >> >> > > The table uses chaining, it just means load factor tip
> >> over what is
> >> >> > > considered a good backing array size.
> >> >> >
> >> >> > Coleen raised a good question in a separate discussion,
> >> which made me
> >> >> > realize that once the table has been initially populated all
> >> >> subsequent
> >> >> > additions, and hence all subsequent calls to grow() always
> >> happen
> >> >> with
> >> >> > the Threads_lock held. So we can't just defer the grow().
> >> >> >
> >> >> > >> That aside, I'd like to know how expensive it is to
> >> grow this
> >> >> table.
> >> >> > >> What are we talking about here?
> >> >> > >
> >> >> > > We use global counter which on write_synchronize must
> >> scan all
> >> >> > > threads to make sure they have seen the update (there some
> >> >> > > optimization to avoid it if there is no readers at all).
> >> Since this
> >> >> > > table contains the threads, we get double penalized, for
> >> each new
> >> >> > > thread the synchronization cost increase AND the number
> >> of items.
> >> >> > >
> >> >> > > With concurrent reads you still need many thousands of
> >> threads, but
> >> >> > > I think I saw someone mentioning 100k threads, assuming
> >> concurrent
> >> >> > > queries the resize can take hundreds of ms to finish.
> >> Note that
> >> >> reads
> >> >> > > and inserts still in operate roughly at the same speed
> >> while
> >> >> > > resizing. So a longer resize is only problematic if we
> >> do not
> >> >> > > respect safepoints.
> >> >> > I think if anything were capable of running 100K threads
> >> we would be
> >> >> > hitting far worse scalability bottlenecks than this. But
> >> this does
> >> >> seem
> >> >> > problematic.
> >> >> >
> >> >> > Thanks,
> >> >> > David
> >> >> > -----
> >> >> >
> >> >> > > Thanks, Robbin
> >> >> > >
> >> >> > >>
> >> >> > >> David
> >> >> > >>
> >> >> > >>> /Robbin
> >> >> > >>>
> >> >> > >>> On 2019-10-02 08:46, David Holmes wrote:
> >> >> > >>>> Hi Daniil,
> >> >> > >>>>
> >> >> > >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
> >> >> > >>>>> Please review a change that fixes the issue. The
> >> problem
> >> >> here is
> >> >> > >>>>> that that the thread is added to the ThreadIdTable
> >> >> (introduced in
> >> >> > >>>>> [3]) while the Threads_lock is held by
> >> >> > >>>>> JVM_StartThread. When new thread is added to the
> >> thread
> >> >> table the
> >> >> > >>>>> table checks if its load factor is greater than
> >> required and
> >> >> if so
> >> >> > >>>>> it grows itself while polling for safepoints.
> >> >> > >>>>> After changes [4] an attempt to block the thread while
> >> >> holding the
> >> >> > >>>>> Threads_lock results in assertion in
> >> >> > >>>>> Thread::check_possible_safepoint().
> >> >> > >>>>>
> >> >> > >>>>> The fix proposed by David Holmes ( thank you,
> >> David!) is
> >> >> to skip
> >> >> > >>>>> the ThreadBlockInVM inside ThreadIdTable::grow()
> >> method if the
> >> >> > >>>>> current thread owns the Threads_lock.
> >> >> > >>>>
> >> >> > >>>> Sorry but looking at the fix in context now I think
> >> it would be
> >> >> > >>>> better to do this:
> >> >> > >>>>
> >> >> > >>>> while (gt.do_task(jt)) {
> >> >> > >>>> if (Threads_lock->owner() == jt) {
> >> >> > >>>> gt.pause(jt);
> >> >> > >>>> ThreadBlockInVM tbivm(jt);
> >> >> > >>>> gt.cont(jt);
> >> >> > >>>> }
> >> >> > >>>> }
> >> >> > >>>>
> >> >> > >>>> This way we don't waste time with the pause/cont when
> >> there's no
> >> >> > >>>> safepoint pause going to happen - and the owner()
> >> check is
> >> >> quicker
> >> >> > >>>> than owned_by_self(). That partially addresses a general
> >> >> concern I
> >> >> > >>>> have about how long it may take to grow the table, as
> >> we are
> >> >> > >>>> deferring safepoints until it is complete in this
> >> >> JVM_StartThread
> >> >> > >>>> usecase.
> >> >> > >>>>
> >> >> > >>>> In the test you don't need all of:
> >> >> > >>>>
> >> >> > >>>> 32 * @run clean ThreadStartTest
> >> >> > >>>> 33 * @run build ThreadStartTest
> >> >> > >>>> 34 * @run main ThreadStartTest
> >> >> > >>>>
> >> >> > >>>> just the last @run suffices to build and run the test.
> >> >> > >>>>
> >> >> > >>>> Thanks,
> >> >> > >>>> David
> >> >> > >>>> -----
> >> >> > >>>>
> >> >> > >>>>> Testing : Mach 5 tier1 and tier2 completed
> >> successfully,
> >> >> tier3 is
> >> >> > >>>>> in progress.
> >> >> > >>>>>
> >> >> > >>>>> [1] Webrev:
> >> >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
> >> >> > >>>>> [2] Bug:
> >> https://bugs.openjdk.java.net/browse/JDK-8231666
> >> >> > >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
> >> >> > >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
> >> >> > >>>>>
> >> >> > >>>>> Best regards,
> >> >> > >>>>> Danill
> >> >> > >>>>>
> >> >> > >>>>>
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >>
> >>
>
>
>
More information about the serviceability-dev
mailing list