RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition
Robbin Ehn
robbin.ehn at oracle.com
Fri Oct 4 06:30:36 UTC 2019
Hi Daniil,
>
> You might also want to put the max size check in the check_concurrent_work code:
>
> + // Resize if we have more items than preferred load factor
> + if ( load_factor > PREF_AVG_LIST_LEN && !_local_table->is_max_size_reached()) {
>
> so that we don't keep waking up the service thread for nothing if the table gets
> full.
Yes that would be a good, otherwise seems fine.
>
> Thanks,
> David
> -----
>
>> Testing: Mach5 tier1, tier2, and tier3 tests successfully passed.
And if you have not done so, you should test this with the benchmark you have as
a stress test and see that this does what we think.
Thanks, Robbin
>>
>> Thank you!
>>
>> Best regards,
>> Daniil
>>
>> On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>>
>> Hi Daniil,
>> On 3/10/2019 2:21 am, Daniil Titov wrote:
>> > Hi David and Robbin,
>> >
>> > Could we consider making the ServiceThread responsible for the
>> ThreadIdTable resizing in the similar way how
>> > it works for StringTable and ResolvedMethodTable, rather than having
>> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>> > As I understand It should solve the current issue and address the
>> concern that the doing the resizing could be a relatively long and
>> > doing it without polling for safepoints or while the holding
>> Threads_lock is not desirable.
>> I originally rejected copying that part of the code from the other
>> tables as it seems to introduce unnecessary complexity. Having a
>> separate thread trying to grow the table when it could be concurrently
>> having threads added and removed seems like it could introduce hard to
>> diagnose performance pathologies. It also adds what we know to be a
>> potentially long running action to the workload of the service thread,
>> which means it may also impact the other tasks the service thread is
>> doing, thus potentially introducing even more hard to diagnose
>> performance pathologies.
>> So this change does concern me. But go ahead and trial it.
>> Thanks,
>> David
>> > Thank you,
>> > Daniil
>> >
>> >
>> > On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
>> >
>> > Hi Robbin,
>> >
>> > On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>> > > Hi David,
>> > >
>> > >> What if the table is full and must be grown?
>> > >
>> > > The table uses chaining, it just means load factor tip over what is
>> > > considered a good backing array size.
>> >
>> > Coleen raised a good question in a separate discussion, which made me
>> > realize that once the table has been initially populated all
>> subsequent
>> > additions, and hence all subsequent calls to grow() always happen
>> with
>> > the Threads_lock held. So we can't just defer the grow().
>> >
>> > >> That aside, I'd like to know how expensive it is to grow this
>> table.
>> > >> What are we talking about here?
>> > >
>> > > We use global counter which on write_synchronize must scan all
>> > > threads to make sure they have seen the update (there some
>> > > optimization to avoid it if there is no readers at all). Since this
>> > > table contains the threads, we get double penalized, for each new
>> > > thread the synchronization cost increase AND the number of items.
>> > >
>> > > With concurrent reads you still need many thousands of threads, but
>> > > I think I saw someone mentioning 100k threads, assuming concurrent
>> > > queries the resize can take hundreds of ms to finish. Note that
>> reads
>> > > and inserts still in operate roughly at the same speed while
>> > > resizing. So a longer resize is only problematic if we do not
>> > > respect safepoints.
>> > I think if anything were capable of running 100K threads we would be
>> > hitting far worse scalability bottlenecks than this. But this does
>> seem
>> > problematic.
>> >
>> > Thanks,
>> > David
>> > -----
>> >
>> > > Thanks, Robbin
>> > >
>> > >>
>> > >> David
>> > >>
>> > >>> /Robbin
>> > >>>
>> > >>> On 2019-10-02 08:46, David Holmes wrote:
>> > >>>> Hi Daniil,
>> > >>>>
>> > >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>> > >>>>> Please review a change that fixes the issue. The problem
>> here is
>> > >>>>> that that the thread is added to the ThreadIdTable
>> (introduced in
>> > >>>>> [3]) while the Threads_lock is held by
>> > >>>>> JVM_StartThread. When new thread is added to the thread
>> table the
>> > >>>>> table checks if its load factor is greater than required and
>> if so
>> > >>>>> it grows itself while polling for safepoints.
>> > >>>>> After changes [4] an attempt to block the thread while
>> holding the
>> > >>>>> Threads_lock results in assertion in
>> > >>>>> Thread::check_possible_safepoint().
>> > >>>>>
>> > >>>>> The fix proposed by David Holmes ( thank you, David!) is
>> to skip
>> > >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>> > >>>>> current thread owns the Threads_lock.
>> > >>>>
>> > >>>> Sorry but looking at the fix in context now I think it would be
>> > >>>> better to do this:
>> > >>>>
>> > >>>> while (gt.do_task(jt)) {
>> > >>>> if (Threads_lock->owner() == jt) {
>> > >>>> gt.pause(jt);
>> > >>>> ThreadBlockInVM tbivm(jt);
>> > >>>> gt.cont(jt);
>> > >>>> }
>> > >>>> }
>> > >>>>
>> > >>>> This way we don't waste time with the pause/cont when there's no
>> > >>>> safepoint pause going to happen - and the owner() check is
>> quicker
>> > >>>> than owned_by_self(). That partially addresses a general
>> concern I
>> > >>>> have about how long it may take to grow the table, as we are
>> > >>>> deferring safepoints until it is complete in this
>> JVM_StartThread
>> > >>>> usecase.
>> > >>>>
>> > >>>> In the test you don't need all of:
>> > >>>>
>> > >>>> 32 * @run clean ThreadStartTest
>> > >>>> 33 * @run build ThreadStartTest
>> > >>>> 34 * @run main ThreadStartTest
>> > >>>>
>> > >>>> just the last @run suffices to build and run the test.
>> > >>>>
>> > >>>> Thanks,
>> > >>>> David
>> > >>>> -----
>> > >>>>
>> > >>>>> Testing : Mach 5 tier1 and tier2 completed successfully,
>> tier3 is
>> > >>>>> in progress.
>> > >>>>>
>> > >>>>> [1] Webrev:
>> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>> > >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>> > >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>> > >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>> > >>>>>
>> > >>>>> Best regards,
>> > >>>>> Danill
>> > >>>>>
>> > >>>>>
>> >
>> >
>> >
>>
>>
More information about the serviceability-dev
mailing list