RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition
David Holmes
david.holmes at oracle.com
Sat Oct 5 03:58:06 UTC 2019
Hi Daniil,
On 5/10/2019 1:23 pm, Daniil Titov wrote:
> Hi David and Robbin,
>
> Please review a new version of the fix that adds the max size check check_concurrent_work code [1].
That change seems fine.
>> I don't think you need to repeat the load factor check here:
>>
>> void ThreadIdTable::do_concurrent_work(JavaThread* jt) {
>> assert(_is_initialized, "Thread table is not initialized");
>> _has_work = false;
>> double load_factor = get_load_factor();
>> log_debug(thread, table)("Concurrent work, load factor: %g",
>> load_factor);
>> if (load_factor > PREF_AVG_LIST_LEN &&
>> !_local_table->is_max_size_reached()) {
>> grow(jt);
>> }
>> }
>>
>> as we will only execute this code if the load factor was seen to be too
>> high.
>
> I decided to leave it unchanged since in my understanding it could be the case when some threads exited and
> were removed from the table after the work was triggered but before the service thread called do_concurrent_work()
> method. In this case we might have the load factor back to the normal and therefore have no need to increase the size
> of the thread table.
True, but if new threads get added again you could just repeat the
process. This is a more stable process if you use an "edge trigger"
rather than a "level trigger". But either way we are making assumptions
about the pattern of adding and removing threads. So okay to leave as-is.
So, good to go.
Thanks,
David
> Testing: Mach5 tier1, tier2, and tier3 tests passed.
>
> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.03/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>
> Thank you,
> Daniil
>
> On 10/3/19, 11:30 PM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
>
> Hi Daniil,
>
> >
> > You might also want to put the max size check in the check_concurrent_work code:
> >
> > + // Resize if we have more items than preferred load factor
> > + if ( load_factor > PREF_AVG_LIST_LEN && !_local_table->is_max_size_reached()) {
> >
> > so that we don't keep waking up the service thread for nothing if the table gets
> > full.
>
> Yes that would be a good, otherwise seems fine.
>
> >
> > Thanks,
> > David
> > -----
> >
> >> Testing: Mach5 tier1, tier2, and tier3 tests successfully passed.
>
> And if you have not done so, you should test this with the benchmark you have as
> a stress test and see that this does what we think.
>
> Thanks, Robbin
>
>
> >>
> >> Thank you!
> >>
> >> Best regards,
> >> Daniil
> >>
> >> On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
> >>
> >> Hi Daniil,
> >> On 3/10/2019 2:21 am, Daniil Titov wrote:
> >> > Hi David and Robbin,
> >> >
> >> > Could we consider making the ServiceThread responsible for the
> >> ThreadIdTable resizing in the similar way how
> >> > it works for StringTable and ResolvedMethodTable, rather than having
> >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
> >> > As I understand It should solve the current issue and address the
> >> concern that the doing the resizing could be a relatively long and
> >> > doing it without polling for safepoints or while the holding
> >> Threads_lock is not desirable.
> >> I originally rejected copying that part of the code from the other
> >> tables as it seems to introduce unnecessary complexity. Having a
> >> separate thread trying to grow the table when it could be concurrently
> >> having threads added and removed seems like it could introduce hard to
> >> diagnose performance pathologies. It also adds what we know to be a
> >> potentially long running action to the workload of the service thread,
> >> which means it may also impact the other tasks the service thread is
> >> doing, thus potentially introducing even more hard to diagnose
> >> performance pathologies.
> >> So this change does concern me. But go ahead and trial it.
> >> Thanks,
> >> David
> >> > Thank you,
> >> > Daniil
> >> >
> >> >
> >> > On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
> >> >
> >> > Hi Robbin,
> >> >
> >> > On 2/10/2019 7:58 pm, Robbin Ehn wrote:
> >> > > Hi David,
> >> > >
> >> > >> What if the table is full and must be grown?
> >> > >
> >> > > The table uses chaining, it just means load factor tip over what is
> >> > > considered a good backing array size.
> >> >
> >> > Coleen raised a good question in a separate discussion, which made me
> >> > realize that once the table has been initially populated all
> >> subsequent
> >> > additions, and hence all subsequent calls to grow() always happen
> >> with
> >> > the Threads_lock held. So we can't just defer the grow().
> >> >
> >> > >> That aside, I'd like to know how expensive it is to grow this
> >> table.
> >> > >> What are we talking about here?
> >> > >
> >> > > We use global counter which on write_synchronize must scan all
> >> > > threads to make sure they have seen the update (there some
> >> > > optimization to avoid it if there is no readers at all). Since this
> >> > > table contains the threads, we get double penalized, for each new
> >> > > thread the synchronization cost increase AND the number of items.
> >> > >
> >> > > With concurrent reads you still need many thousands of threads, but
> >> > > I think I saw someone mentioning 100k threads, assuming concurrent
> >> > > queries the resize can take hundreds of ms to finish. Note that
> >> reads
> >> > > and inserts still in operate roughly at the same speed while
> >> > > resizing. So a longer resize is only problematic if we do not
> >> > > respect safepoints.
> >> > I think if anything were capable of running 100K threads we would be
> >> > hitting far worse scalability bottlenecks than this. But this does
> >> seem
> >> > problematic.
> >> >
> >> > Thanks,
> >> > David
> >> > -----
> >> >
> >> > > Thanks, Robbin
> >> > >
> >> > >>
> >> > >> David
> >> > >>
> >> > >>> /Robbin
> >> > >>>
> >> > >>> On 2019-10-02 08:46, David Holmes wrote:
> >> > >>>> Hi Daniil,
> >> > >>>>
> >> > >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
> >> > >>>>> Please review a change that fixes the issue. The problem
> >> here is
> >> > >>>>> that that the thread is added to the ThreadIdTable
> >> (introduced in
> >> > >>>>> [3]) while the Threads_lock is held by
> >> > >>>>> JVM_StartThread. When new thread is added to the thread
> >> table the
> >> > >>>>> table checks if its load factor is greater than required and
> >> if so
> >> > >>>>> it grows itself while polling for safepoints.
> >> > >>>>> After changes [4] an attempt to block the thread while
> >> holding the
> >> > >>>>> Threads_lock results in assertion in
> >> > >>>>> Thread::check_possible_safepoint().
> >> > >>>>>
> >> > >>>>> The fix proposed by David Holmes ( thank you, David!) is
> >> to skip
> >> > >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
> >> > >>>>> current thread owns the Threads_lock.
> >> > >>>>
> >> > >>>> Sorry but looking at the fix in context now I think it would be
> >> > >>>> better to do this:
> >> > >>>>
> >> > >>>> while (gt.do_task(jt)) {
> >> > >>>> if (Threads_lock->owner() == jt) {
> >> > >>>> gt.pause(jt);
> >> > >>>> ThreadBlockInVM tbivm(jt);
> >> > >>>> gt.cont(jt);
> >> > >>>> }
> >> > >>>> }
> >> > >>>>
> >> > >>>> This way we don't waste time with the pause/cont when there's no
> >> > >>>> safepoint pause going to happen - and the owner() check is
> >> quicker
> >> > >>>> than owned_by_self(). That partially addresses a general
> >> concern I
> >> > >>>> have about how long it may take to grow the table, as we are
> >> > >>>> deferring safepoints until it is complete in this
> >> JVM_StartThread
> >> > >>>> usecase.
> >> > >>>>
> >> > >>>> In the test you don't need all of:
> >> > >>>>
> >> > >>>> 32 * @run clean ThreadStartTest
> >> > >>>> 33 * @run build ThreadStartTest
> >> > >>>> 34 * @run main ThreadStartTest
> >> > >>>>
> >> > >>>> just the last @run suffices to build and run the test.
> >> > >>>>
> >> > >>>> Thanks,
> >> > >>>> David
> >> > >>>> -----
> >> > >>>>
> >> > >>>>> Testing : Mach 5 tier1 and tier2 completed successfully,
> >> tier3 is
> >> > >>>>> in progress.
> >> > >>>>>
> >> > >>>>> [1] Webrev:
> >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
> >> > >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
> >> > >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
> >> > >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
> >> > >>>>>
> >> > >>>>> Best regards,
> >> > >>>>> Danill
> >> > >>>>>
> >> > >>>>>
> >> >
> >> >
> >> >
> >>
> >>
>
>
>
More information about the serviceability-dev
mailing list