RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition
Robbin Ehn
robbin.ehn at oracle.com
Mon Oct 7 07:34:52 UTC 2019
Hi Daniil,
Yes, good, but:
>> >> Testing: Mach5 tier1, tier2, and tier3 tests successfully passed.
>> And if you have not done so, you should test this with the benchmark you
>> have as
>> a stress test and see that this does what we think.
Can you please test it with your benchmark, if you have not done so?
/Robbin
>> Thanks, Robbin
>> >>
>> >> Thank you!
>> >>
>> >> Best regards,
>> >> Daniil
>> >>
>> >> On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>> >>
>> >> Hi Daniil,
>> >> On 3/10/2019 2:21 am, Daniil Titov wrote:
>> >> > Hi David and Robbin,
>> >> >
>> >> > Could we consider making the ServiceThread responsible for the
>> >> ThreadIdTable resizing in the similar way how
>> >> > it works for StringTable and ResolvedMethodTable, rather than
>> having
>> >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>> >> > As I understand It should solve the current issue and
>> address the
>> >> concern that the doing the resizing could be a relatively long and
>> >> > doing it without polling for safepoints or while the holding
>> >> Threads_lock is not desirable.
>> >> I originally rejected copying that part of the code from the other
>> >> tables as it seems to introduce unnecessary complexity. Having a
>> >> separate thread trying to grow the table when it could be
>> concurrently
>> >> having threads added and removed seems like it could introduce
>> hard to
>> >> diagnose performance pathologies. It also adds what we know to be a
>> >> potentially long running action to the workload of the service
>> thread,
>> >> which means it may also impact the other tasks the service thread is
>> >> doing, thus potentially introducing even more hard to diagnose
>> >> performance pathologies.
>> >> So this change does concern me. But go ahead and trial it.
>> >> Thanks,
>> >> David
>> >> > Thank you,
>> >> > Daniil
>> >> >
>> >> >
>> >> > On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com>
>> wrote:
>> >> >
>> >> > Hi Robbin,
>> >> >
>> >> > On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>> >> > > Hi David,
>> >> > >
>> >> > >> What if the table is full and must be grown?
>> >> > >
>> >> > > The table uses chaining, it just means load factor tip
>> over what is
>> >> > > considered a good backing array size.
>> >> >
>> >> > Coleen raised a good question in a separate discussion,
>> which made me
>> >> > realize that once the table has been initially populated all
>> >> subsequent
>> >> > additions, and hence all subsequent calls to grow() always
>> happen
>> >> with
>> >> > the Threads_lock held. So we can't just defer the grow().
>> >> >
>> >> > >> That aside, I'd like to know how expensive it is to
>> grow this
>> >> table.
>> >> > >> What are we talking about here?
>> >> > >
>> >> > > We use global counter which on write_synchronize must
>> scan all
>> >> > > threads to make sure they have seen the update (there some
>> >> > > optimization to avoid it if there is no readers at all).
>> Since this
>> >> > > table contains the threads, we get double penalized, for
>> each new
>> >> > > thread the synchronization cost increase AND the number
>> of items.
>> >> > >
>> >> > > With concurrent reads you still need many thousands of
>> threads, but
>> >> > > I think I saw someone mentioning 100k threads, assuming
>> concurrent
>> >> > > queries the resize can take hundreds of ms to finish.
>> Note that
>> >> reads
>> >> > > and inserts still in operate roughly at the same speed
>> while
>> >> > > resizing. So a longer resize is only problematic if we
>> do not
>> >> > > respect safepoints.
>> >> > I think if anything were capable of running 100K threads
>> we would be
>> >> > hitting far worse scalability bottlenecks than this. But
>> this does
>> >> seem
>> >> > problematic.
>> >> >
>> >> > Thanks,
>> >> > David
>> >> > -----
>> >> >
>> >> > > Thanks, Robbin
>> >> > >
>> >> > >>
>> >> > >> David
>> >> > >>
>> >> > >>> /Robbin
>> >> > >>>
>> >> > >>> On 2019-10-02 08:46, David Holmes wrote:
>> >> > >>>> Hi Daniil,
>> >> > >>>>
>> >> > >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>> >> > >>>>> Please review a change that fixes the issue. The
>> problem
>> >> here is
>> >> > >>>>> that that the thread is added to the ThreadIdTable
>> >> (introduced in
>> >> > >>>>> [3]) while the Threads_lock is held by
>> >> > >>>>> JVM_StartThread. When new thread is added to the
>> thread
>> >> table the
>> >> > >>>>> table checks if its load factor is greater than
>> required and
>> >> if so
>> >> > >>>>> it grows itself while polling for safepoints.
>> >> > >>>>> After changes [4] an attempt to block the thread while
>> >> holding the
>> >> > >>>>> Threads_lock results in assertion in
>> >> > >>>>> Thread::check_possible_safepoint().
>> >> > >>>>>
>> >> > >>>>> The fix proposed by David Holmes ( thank you,
>> David!) is
>> >> to skip
>> >> > >>>>> the ThreadBlockInVM inside ThreadIdTable::grow()
>> method if the
>> >> > >>>>> current thread owns the Threads_lock.
>> >> > >>>>
>> >> > >>>> Sorry but looking at the fix in context now I think
>> it would be
>> >> > >>>> better to do this:
>> >> > >>>>
>> >> > >>>> while (gt.do_task(jt)) {
>> >> > >>>> if (Threads_lock->owner() == jt) {
>> >> > >>>> gt.pause(jt);
>> >> > >>>> ThreadBlockInVM tbivm(jt);
>> >> > >>>> gt.cont(jt);
>> >> > >>>> }
>> >> > >>>> }
>> >> > >>>>
>> >> > >>>> This way we don't waste time with the pause/cont when
>> there's no
>> >> > >>>> safepoint pause going to happen - and the owner()
>> check is
>> >> quicker
>> >> > >>>> than owned_by_self(). That partially addresses a general
>> >> concern I
>> >> > >>>> have about how long it may take to grow the table, as
>> we are
>> >> > >>>> deferring safepoints until it is complete in this
>> >> JVM_StartThread
>> >> > >>>> usecase.
>> >> > >>>>
>> >> > >>>> In the test you don't need all of:
>> >> > >>>>
>> >> > >>>> 32 * @run clean ThreadStartTest
>> >> > >>>> 33 * @run build ThreadStartTest
>> >> > >>>> 34 * @run main ThreadStartTest
>> >> > >>>>
>> >> > >>>> just the last @run suffices to build and run the test.
>> >> > >>>>
>> >> > >>>> Thanks,
>> >> > >>>> David
>> >> > >>>> -----
>> >> > >>>>
>> >> > >>>>> Testing : Mach 5 tier1 and tier2 completed
>> successfully,
>> >> tier3 is
>> >> > >>>>> in progress.
>> >> > >>>>>
>> >> > >>>>> [1] Webrev:
>> >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>> >> > >>>>> [2] Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8231666
>> >> > >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>> >> > >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>> >> > >>>>>
>> >> > >>>>> Best regards,
>> >> > >>>>> Danill
>> >> > >>>>>
>> >> > >>>>>
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>>
>>
More information about the serviceability-dev
mailing list