From daniil.x.titov at oracle.com  Tue Oct  1 20:57:01 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Tue, 01 Oct 2019 13:57:01 -0700
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
Message-ID: <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>

Hello,

Please review a new version of the change [1]  that fixes the problem with the  debugger not stopping in the low memory notification code. The fix moves the send notifications task from
not visible ServiceThread to a new visible NotificationThread. This version of the  change also introduces  a new VM option to opt-out from the new behavior.

Previous email threads:
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html 
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html 

The proposed CSR [3] is for adding  a new VM option UseNotificationThread  (default true) to opt-out from the new behavior. 

Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.

[1] Webrev:  http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
[2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299 
[3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593 

Thanks,
Daniil

? 


From david.holmes at oracle.com  Wed Oct  2 03:20:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 2 Oct 2019 13:20:22 +1000
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
 <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
Message-ID: <c3ea02e1-bd4c-838b-c3c4-fef718319090@oracle.com>

Hi Daniil,

Thanks again for your perseverance with this one.

This looks fine to me.

Thanks,
David
-----

On 2/10/2019 6:57 am, Daniil Titov wrote:
> Hello,
> 
> Please review a new version of the change [1]  that fixes the problem with the  debugger not stopping in the low memory notification code. The fix moves the send notifications task from
> not visible ServiceThread to a new visible NotificationThread. This version of the  change also introduces  a new VM option to opt-out from the new behavior.
> 
> Previous email threads:
> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html
> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html
> 
> The proposed CSR [3] is for adding  a new VM option UseNotificationThread  (default true) to opt-out from the new behavior.
> 
> Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.
> 
> [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299
> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
> 
> Thanks,
> Daniil
> 
> ?
> 
> 

From daniil.x.titov at oracle.com  Wed Oct  2 06:13:52 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Tue, 01 Oct 2019 23:13:52 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid thread
	transition
Message-ID: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>

Please review a change that fixes the issue. The problem here is that that the thread is added to the ThreadIdTable  (introduced in [3]) while the Threads_lock is held by 
JVM_StartThread. When new thread is added  to the thread table the table checks if its load factor is greater than required and if so it grows itself while polling for safepoints.
After changes [4]  an attempt to block the thread while holding the Threads_lock  results in assertion in Thread::check_possible_safepoint().

The fix  proposed by David Holmes ( thank you, David!)  is to skip the ThreadBlockInVM inside ThreadIdTable::grow() method if the current thread owns the Threads_lock.

Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in progress.

[1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/ 
[2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666 
[3] https://bugs.openjdk.java.net/browse/JDK-8185005 
[4] https://bugs.openjdk.java.net/browse/JDK-8184732 

Best regards,
Danill


From david.holmes at oracle.com  Wed Oct  2 06:46:00 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 2 Oct 2019 16:46:00 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
Message-ID: <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>

Hi Daniil,

On 2/10/2019 4:13 pm, Daniil Titov wrote:
> Please review a change that fixes the issue. The problem here is that that the thread is added to the ThreadIdTable  (introduced in [3]) while the Threads_lock is held by
> JVM_StartThread. When new thread is added  to the thread table the table checks if its load factor is greater than required and if so it grows itself while polling for safepoints.
> After changes [4]  an attempt to block the thread while holding the Threads_lock  results in assertion in Thread::check_possible_safepoint().
> 
> The fix  proposed by David Holmes ( thank you, David!)  is to skip the ThreadBlockInVM inside ThreadIdTable::grow() method if the current thread owns the Threads_lock.

Sorry but looking at the fix in context now I think it would be better 
to do this:

     while (gt.do_task(jt)) {
       if (Threads_lock->owner() == jt) {
         gt.pause(jt);
         ThreadBlockInVM tbivm(jt);
         gt.cont(jt);
       }
     }

This way we don't waste time with the pause/cont when there's no 
safepoint pause going to happen - and the owner() check is quicker than 
owned_by_self(). That partially addresses a general concern I have about 
how long it may take to grow the table, as we are deferring safepoints 
until it is complete in this JVM_StartThread usecase.

In the test you don't need all of:

   32  * @run clean ThreadStartTest
   33  * @run build ThreadStartTest
   34  * @run main ThreadStartTest

just the last @run suffices to build and run the test.

Thanks,
David
-----

> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in progress.
> 
> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
> 
> Best regards,
> Danill
> 
> 

From robbin.ehn at oracle.com  Wed Oct  2 09:15:25 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 2 Oct 2019 11:15:25 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
Message-ID: <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>

Hi, since holding the Threads_lock while growing can block out safepoints for a
longer period, I would suggest just skip growing when holding Threads_lock.
E.g. return before creating the GrowTask.

/Robbin

On 2019-10-02 08:46, David Holmes wrote:
> Hi Daniil,
> 
> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>> Please review a change that fixes the issue. The problem here is that that the 
>> thread is added to the ThreadIdTable? (introduced in [3]) while the 
>> Threads_lock is held by
>> JVM_StartThread. When new thread is added? to the thread table the table 
>> checks if its load factor is greater than required and if so it grows itself 
>> while polling for safepoints.
>> After changes [4]? an attempt to block the thread while holding the 
>> Threads_lock? results in assertion in Thread::check_possible_safepoint().
>>
>> The fix? proposed by David Holmes ( thank you, David!)? is to skip the 
>> ThreadBlockInVM inside ThreadIdTable::grow() method if the current thread owns 
>> the Threads_lock.
> 
> Sorry but looking at the fix in context now I think it would be better to do this:
> 
>  ??? while (gt.do_task(jt)) {
>  ????? if (Threads_lock->owner() == jt) {
>  ??????? gt.pause(jt);
>  ??????? ThreadBlockInVM tbivm(jt);
>  ??????? gt.cont(jt);
>  ????? }
>  ??? }
> 
> This way we don't waste time with the pause/cont when there's no safepoint pause 
> going to happen - and the owner() check is quicker than owned_by_self(). That 
> partially addresses a general concern I have about how long it may take to grow 
> the table, as we are deferring safepoints until it is complete in this 
> JVM_StartThread usecase.
> 
> In the test you don't need all of:
> 
>  ? 32? * @run clean ThreadStartTest
>  ? 33? * @run build ThreadStartTest
>  ? 34? * @run main ThreadStartTest
> 
> just the last @run suffices to build and run the test.
> 
> Thanks,
> David
> -----
> 
>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in progress.
>>
>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>
>> Best regards,
>> Danill
>>
>>

From david.holmes at oracle.com  Wed Oct  2 09:30:18 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 2 Oct 2019 19:30:18 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
Message-ID: <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>

On 2/10/2019 7:15 pm, Robbin Ehn wrote:
> Hi, since holding the Threads_lock while growing can block out 
> safepoints for a
> longer period, I would suggest just skip growing when holding Threads_lock.
> E.g. return before creating the GrowTask.

What if the table is full and must be grown?

That aside, I'd like to know how expensive it is to grow this table. 
What are we talking about here?

David

> /Robbin
> 
> On 2019-10-02 08:46, David Holmes wrote:
>> Hi Daniil,
>>
>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>>> Please review a change that fixes the issue. The problem here is that 
>>> that the thread is added to the ThreadIdTable? (introduced in [3]) 
>>> while the Threads_lock is held by
>>> JVM_StartThread. When new thread is added? to the thread table the 
>>> table checks if its load factor is greater than required and if so it 
>>> grows itself while polling for safepoints.
>>> After changes [4]? an attempt to block the thread while holding the 
>>> Threads_lock? results in assertion in 
>>> Thread::check_possible_safepoint().
>>>
>>> The fix? proposed by David Holmes ( thank you, David!)? is to skip 
>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the 
>>> current thread owns the Threads_lock.
>>
>> Sorry but looking at the fix in context now I think it would be better 
>> to do this:
>>
>> ???? while (gt.do_task(jt)) {
>> ?????? if (Threads_lock->owner() == jt) {
>> ???????? gt.pause(jt);
>> ???????? ThreadBlockInVM tbivm(jt);
>> ???????? gt.cont(jt);
>> ?????? }
>> ???? }
>>
>> This way we don't waste time with the pause/cont when there's no 
>> safepoint pause going to happen - and the owner() check is quicker 
>> than owned_by_self(). That partially addresses a general concern I 
>> have about how long it may take to grow the table, as we are deferring 
>> safepoints until it is complete in this JVM_StartThread usecase.
>>
>> In the test you don't need all of:
>>
>> ?? 32? * @run clean ThreadStartTest
>> ?? 33? * @run build ThreadStartTest
>> ?? 34? * @run main ThreadStartTest
>>
>> just the last @run suffices to build and run the test.
>>
>> Thanks,
>> David
>> -----
>>
>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in 
>>> progress.
>>>
>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>>
>>> Best regards,
>>> Danill
>>>
>>>

From robbin.ehn at oracle.com  Wed Oct  2 09:58:15 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 2 Oct 2019 11:58:15 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
Message-ID: <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>

Hi David,

> What if the table is full and must be grown?

The table uses chaining, it just means load factor tip over what is considered a
good backing array size.

> 
> That aside, I'd like to know how expensive it is to grow this table. What are we 
> talking about here?

We use global counter which on write_synchronize must scan all threads to make
sure they have seen the update (there some optimization to avoid it if there is
no readers at all). Since this table contains the threads, we get double
penalized, for each new thread the synchronization cost increase AND the number
of items.

With concurrent reads you still need many thousands of threads, but I think I
saw someone mentioning 100k threads, assuming concurrent queries the resize can
take hundreds of ms to finish.
Note that reads and inserts still in operate roughly at the same speed while
resizing. So a longer resize is only problematic if we do not respect
safepoints.

Thanks, Robbin

> 
> David
> 
>> /Robbin
>>
>> On 2019-10-02 08:46, David Holmes wrote:
>>> Hi Daniil,
>>>
>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>>>> Please review a change that fixes the issue. The problem here is that that 
>>>> the thread is added to the ThreadIdTable? (introduced in [3]) while the 
>>>> Threads_lock is held by
>>>> JVM_StartThread. When new thread is added? to the thread table the table 
>>>> checks if its load factor is greater than required and if so it grows itself 
>>>> while polling for safepoints.
>>>> After changes [4]? an attempt to block the thread while holding the 
>>>> Threads_lock? results in assertion in Thread::check_possible_safepoint().
>>>>
>>>> The fix? proposed by David Holmes ( thank you, David!)? is to skip the 
>>>> ThreadBlockInVM inside ThreadIdTable::grow() method if the current thread 
>>>> owns the Threads_lock.
>>>
>>> Sorry but looking at the fix in context now I think it would be better to do 
>>> this:
>>>
>>> ???? while (gt.do_task(jt)) {
>>> ?????? if (Threads_lock->owner() == jt) {
>>> ???????? gt.pause(jt);
>>> ???????? ThreadBlockInVM tbivm(jt);
>>> ???????? gt.cont(jt);
>>> ?????? }
>>> ???? }
>>>
>>> This way we don't waste time with the pause/cont when there's no safepoint 
>>> pause going to happen - and the owner() check is quicker than 
>>> owned_by_self(). That partially addresses a general concern I have about how 
>>> long it may take to grow the table, as we are deferring safepoints until it 
>>> is complete in this JVM_StartThread usecase.
>>>
>>> In the test you don't need all of:
>>>
>>> ?? 32? * @run clean ThreadStartTest
>>> ?? 33? * @run build ThreadStartTest
>>> ?? 34? * @run main ThreadStartTest
>>>
>>> just the last @run suffices to build and run the test.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in progress.
>>>>
>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>>>
>>>> Best regards,
>>>> Danill
>>>>
>>>>

From david.holmes at oracle.com  Wed Oct  2 13:25:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 2 Oct 2019 23:25:48 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
Message-ID: <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>

Hi Robbin,

On 2/10/2019 7:58 pm, Robbin Ehn wrote:
> Hi David,
> 
>> What if the table is full and must be grown?
> 
> The table uses chaining, it just means load factor tip over what is 
> considered a good backing array size.

Coleen raised a good question in a separate discussion, which made me 
realize that once the table has been initially populated all subsequent 
additions, and hence all subsequent calls to grow() always happen with 
the Threads_lock held. So we can't just defer the grow().

>> That aside, I'd like to know how expensive it is to grow this table. 
>> What are we talking about here?
> 
> We use global counter which on write_synchronize must scan all
> threads to make sure they have seen the update (there some
> optimization to avoid it if there is no readers at all). Since this
> table contains the threads, we get double penalized, for each new
> thread the synchronization cost increase AND the number of items.
> 
> With concurrent reads you still need many thousands of threads, but
> I think I saw someone mentioning 100k threads, assuming concurrent
> queries the resize can take hundreds of ms to finish. Note that reads
> and inserts still in operate roughly at the same speed while 
> resizing. So a longer resize is only problematic if we do not
> respect safepoints.
I think if anything were capable of running 100K threads we would be 
hitting far worse scalability bottlenecks than this. But this does seem 
problematic.

Thanks,
David
-----

> Thanks, Robbin
> 
>>
>> David
>>
>>> /Robbin
>>>
>>> On 2019-10-02 08:46, David Holmes wrote:
>>>> Hi Daniil,
>>>>
>>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>>>>> Please review a change that fixes the issue. The problem here is 
>>>>> that that the thread is added to the ThreadIdTable? (introduced in 
>>>>> [3]) while the Threads_lock is held by
>>>>> JVM_StartThread. When new thread is added? to the thread table the 
>>>>> table checks if its load factor is greater than required and if so 
>>>>> it grows itself while polling for safepoints.
>>>>> After changes [4]? an attempt to block the thread while holding the 
>>>>> Threads_lock? results in assertion in 
>>>>> Thread::check_possible_safepoint().
>>>>>
>>>>> The fix? proposed by David Holmes ( thank you, David!)? is to skip 
>>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the 
>>>>> current thread owns the Threads_lock.
>>>>
>>>> Sorry but looking at the fix in context now I think it would be 
>>>> better to do this:
>>>>
>>>> ???? while (gt.do_task(jt)) {
>>>> ?????? if (Threads_lock->owner() == jt) {
>>>> ???????? gt.pause(jt);
>>>> ???????? ThreadBlockInVM tbivm(jt);
>>>> ???????? gt.cont(jt);
>>>> ?????? }
>>>> ???? }
>>>>
>>>> This way we don't waste time with the pause/cont when there's no 
>>>> safepoint pause going to happen - and the owner() check is quicker 
>>>> than owned_by_self(). That partially addresses a general concern I 
>>>> have about how long it may take to grow the table, as we are 
>>>> deferring safepoints until it is complete in this JVM_StartThread 
>>>> usecase.
>>>>
>>>> In the test you don't need all of:
>>>>
>>>> ?? 32? * @run clean ThreadStartTest
>>>> ?? 33? * @run build ThreadStartTest
>>>> ?? 34? * @run main ThreadStartTest
>>>>
>>>> just the last @run suffices to build and run the test.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is 
>>>>> in progress.
>>>>>
>>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>>>>
>>>>> Best regards,
>>>>> Danill
>>>>>
>>>>>

From daniil.x.titov at oracle.com  Wed Oct  2 16:21:11 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Wed, 02 Oct 2019 09:21:11 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
Message-ID: <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>

Hi David and Robbin,

Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()? 
As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and 
doing it without polling  for safepoints or while the holding Threads_lock is not desirable.

Thank you,
Daniil


?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:

    Hi Robbin,
    
    On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    > Hi David,
    > 
    >> What if the table is full and must be grown?
    > 
    > The table uses chaining, it just means load factor tip over what is 
    > considered a good backing array size.
    
    Coleen raised a good question in a separate discussion, which made me 
    realize that once the table has been initially populated all subsequent 
    additions, and hence all subsequent calls to grow() always happen with 
    the Threads_lock held. So we can't just defer the grow().
    
    >> That aside, I'd like to know how expensive it is to grow this table. 
    >> What are we talking about here?
    > 
    > We use global counter which on write_synchronize must scan all
    > threads to make sure they have seen the update (there some
    > optimization to avoid it if there is no readers at all). Since this
    > table contains the threads, we get double penalized, for each new
    > thread the synchronization cost increase AND the number of items.
    > 
    > With concurrent reads you still need many thousands of threads, but
    > I think I saw someone mentioning 100k threads, assuming concurrent
    > queries the resize can take hundreds of ms to finish. Note that reads
    > and inserts still in operate roughly at the same speed while 
    > resizing. So a longer resize is only problematic if we do not
    > respect safepoints.
    I think if anything were capable of running 100K threads we would be 
    hitting far worse scalability bottlenecks than this. But this does seem 
    problematic.
    
    Thanks,
    David
    -----
    
    > Thanks, Robbin
    > 
    >>
    >> David
    >>
    >>> /Robbin
    >>>
    >>> On 2019-10-02 08:46, David Holmes wrote:
    >>>> Hi Daniil,
    >>>>
    >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >>>>> Please review a change that fixes the issue. The problem here is 
    >>>>> that that the thread is added to the ThreadIdTable  (introduced in 
    >>>>> [3]) while the Threads_lock is held by
    >>>>> JVM_StartThread. When new thread is added  to the thread table the 
    >>>>> table checks if its load factor is greater than required and if so 
    >>>>> it grows itself while polling for safepoints.
    >>>>> After changes [4]  an attempt to block the thread while holding the 
    >>>>> Threads_lock  results in assertion in 
    >>>>> Thread::check_possible_safepoint().
    >>>>>
    >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip 
    >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the 
    >>>>> current thread owns the Threads_lock.
    >>>>
    >>>> Sorry but looking at the fix in context now I think it would be 
    >>>> better to do this:
    >>>>
    >>>>      while (gt.do_task(jt)) {
    >>>>        if (Threads_lock->owner() == jt) {
    >>>>          gt.pause(jt);
    >>>>          ThreadBlockInVM tbivm(jt);
    >>>>          gt.cont(jt);
    >>>>        }
    >>>>      }
    >>>>
    >>>> This way we don't waste time with the pause/cont when there's no 
    >>>> safepoint pause going to happen - and the owner() check is quicker 
    >>>> than owned_by_self(). That partially addresses a general concern I 
    >>>> have about how long it may take to grow the table, as we are 
    >>>> deferring safepoints until it is complete in this JVM_StartThread 
    >>>> usecase.
    >>>>
    >>>> In the test you don't need all of:
    >>>>
    >>>>    32  * @run clean ThreadStartTest
    >>>>    33  * @run build ThreadStartTest
    >>>>    34  * @run main ThreadStartTest
    >>>>
    >>>> just the last @run suffices to build and run the test.
    >>>>
    >>>> Thanks,
    >>>> David
    >>>> -----
    >>>>
    >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is 
    >>>>> in progress.
    >>>>>
    >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
    >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >>>>>
    >>>>> Best regards,
    >>>>> Danill
    >>>>>
    >>>>>
    

From robbin.ehn at oracle.com  Wed Oct  2 17:07:36 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 2 Oct 2019 19:07:36 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
Message-ID: <ebe6770b-9767-76a3-59ca-4dff6939a23b@oracle.com>

Hi David,

On 2019-10-02 15:25, David Holmes wrote:
> Hi Robbin,
> 
> On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>> Hi David,
>>
>>> What if the table is full and must be grown?
>>
>> The table uses chaining, it just means load factor tip over what is considered 
>> a good backing array size.
> 
> Coleen raised a good question in a separate discussion, which made me realize 
> that once the table has been initially populated all subsequent additions, and 
> hence all subsequent calls to grow() always happen with the Threads_lock held. 
> So we can't just defer the grow().

The other tables defer this to the service thread to 'avoid problems'.
Also note that if you are not blocking during resize
and resizing single threaded, calling the normal:
bool grow(Thread* thread, size_t size_limit_log2 = 0);
Is way faster, which do the resize in one go.

Thanks, Robbin

> 
>>> That aside, I'd like to know how expensive it is to grow this table. What are 
>>> we talking about here?
>>
>> We use global counter which on write_synchronize must scan all
>> threads to make sure they have seen the update (there some
>> optimization to avoid it if there is no readers at all). Since this
>> table contains the threads, we get double penalized, for each new
>> thread the synchronization cost increase AND the number of items.
>>
>> With concurrent reads you still need many thousands of threads, but
>> I think I saw someone mentioning 100k threads, assuming concurrent
>> queries the resize can take hundreds of ms to finish. Note that reads
>> and inserts still in operate roughly at the same speed while resizing. So a 
>> longer resize is only problematic if we do not
>> respect safepoints.
> I think if anything were capable of running 100K threads we would be hitting far 
> worse scalability bottlenecks than this. But this does seem problematic.
> 
> Thanks,
> David
> -----
> 
>> Thanks, Robbin
>>
>>>
>>> David
>>>
>>>> /Robbin
>>>>
>>>> On 2019-10-02 08:46, David Holmes wrote:
>>>>> Hi Daniil,
>>>>>
>>>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>>>>>> Please review a change that fixes the issue. The problem here is that that 
>>>>>> the thread is added to the ThreadIdTable? (introduced in [3]) while the 
>>>>>> Threads_lock is held by
>>>>>> JVM_StartThread. When new thread is added? to the thread table the table 
>>>>>> checks if its load factor is greater than required and if so it grows 
>>>>>> itself while polling for safepoints.
>>>>>> After changes [4]? an attempt to block the thread while holding the 
>>>>>> Threads_lock? results in assertion in Thread::check_possible_safepoint().
>>>>>>
>>>>>> The fix? proposed by David Holmes ( thank you, David!)? is to skip the 
>>>>>> ThreadBlockInVM inside ThreadIdTable::grow() method if the current thread 
>>>>>> owns the Threads_lock.
>>>>>
>>>>> Sorry but looking at the fix in context now I think it would be better to 
>>>>> do this:
>>>>>
>>>>> ???? while (gt.do_task(jt)) {
>>>>> ?????? if (Threads_lock->owner() == jt) {
>>>>> ???????? gt.pause(jt);
>>>>> ???????? ThreadBlockInVM tbivm(jt);
>>>>> ???????? gt.cont(jt);
>>>>> ?????? }
>>>>> ???? }
>>>>>
>>>>> This way we don't waste time with the pause/cont when there's no safepoint 
>>>>> pause going to happen - and the owner() check is quicker than 
>>>>> owned_by_self(). That partially addresses a general concern I have about 
>>>>> how long it may take to grow the table, as we are deferring safepoints 
>>>>> until it is complete in this JVM_StartThread usecase.
>>>>>
>>>>> In the test you don't need all of:
>>>>>
>>>>> ?? 32? * @run clean ThreadStartTest
>>>>> ?? 33? * @run build ThreadStartTest
>>>>> ?? 34? * @run main ThreadStartTest
>>>>>
>>>>> just the last @run suffices to build and run the test.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is in 
>>>>>> progress.
>>>>>>
>>>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>>>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>>>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>>>>>>
>>>>>> Best regards,
>>>>>> Danill
>>>>>>
>>>>>>

From daniil.x.titov at oracle.com  Wed Oct  2 17:08:57 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Wed, 02 Oct 2019 10:08:57 -0700
Subject: jmx-dev RFA: CSR: 8231593 : Add a command line option to control
 notification mechanism.
Message-ID: <C91A404E-1DE6-457E-8B09-8D8065AA8A01@oracle.com>

Please review/approve a CSR  request.

The proposed CSR [1] is for adding  a new VM option UseNotificationThread  (default true) to opt-out from
the new behavior introduced by the suggested fix [3] for the issue [2]  that is on review now in the separate 
email thread [4].

 [1] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593 
 [2] Issue: https://bugs.openjdk.java.net/browse/JDK-8170299 
 [3] http://cr.openjdk.java.net/~dtitov/8170299 
 [4]  https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-October/029397.html 

Thank you,
Daniil


From robbin.ehn at oracle.com  Wed Oct  2 17:14:36 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 2 Oct 2019 19:14:36 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
Message-ID: <b7361e9a-e47a-0ad3-0d2c-4bc3dcb23db4@oracle.com>

Hi Daniil,

On 2019-10-02 18:21, Daniil Titov wrote:
> Hi David and Robbin,
> 
> Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
> it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
> As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and
> doing it without polling  for safepoints or while the holding Threads_lock is not desirable.

Yes, thanks.

/Robbin

> 
> Thank you,
> Daniil
> 
> 
> ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
> 
>      Hi Robbin,
>      
>      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      > Hi David,
>      >
>      >> What if the table is full and must be grown?
>      >
>      > The table uses chaining, it just means load factor tip over what is
>      > considered a good backing array size.
>      
>      Coleen raised a good question in a separate discussion, which made me
>      realize that once the table has been initially populated all subsequent
>      additions, and hence all subsequent calls to grow() always happen with
>      the Threads_lock held. So we can't just defer the grow().
>      
>      >> That aside, I'd like to know how expensive it is to grow this table.
>      >> What are we talking about here?
>      >
>      > We use global counter which on write_synchronize must scan all
>      > threads to make sure they have seen the update (there some
>      > optimization to avoid it if there is no readers at all). Since this
>      > table contains the threads, we get double penalized, for each new
>      > thread the synchronization cost increase AND the number of items.
>      >
>      > With concurrent reads you still need many thousands of threads, but
>      > I think I saw someone mentioning 100k threads, assuming concurrent
>      > queries the resize can take hundreds of ms to finish. Note that reads
>      > and inserts still in operate roughly at the same speed while
>      > resizing. So a longer resize is only problematic if we do not
>      > respect safepoints.
>      I think if anything were capable of running 100K threads we would be
>      hitting far worse scalability bottlenecks than this. But this does seem
>      problematic.
>      
>      Thanks,
>      David
>      -----
>      
>      > Thanks, Robbin
>      >
>      >>
>      >> David
>      >>
>      >>> /Robbin
>      >>>
>      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >>>> Hi Daniil,
>      >>>>
>      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >>>>> Please review a change that fixes the issue. The problem here is
>      >>>>> that that the thread is added to the ThreadIdTable  (introduced in
>      >>>>> [3]) while the Threads_lock is held by
>      >>>>> JVM_StartThread. When new thread is added  to the thread table the
>      >>>>> table checks if its load factor is greater than required and if so
>      >>>>> it grows itself while polling for safepoints.
>      >>>>> After changes [4]  an attempt to block the thread while holding the
>      >>>>> Threads_lock  results in assertion in
>      >>>>> Thread::check_possible_safepoint().
>      >>>>>
>      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip
>      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>      >>>>> current thread owns the Threads_lock.
>      >>>>
>      >>>> Sorry but looking at the fix in context now I think it would be
>      >>>> better to do this:
>      >>>>
>      >>>>      while (gt.do_task(jt)) {
>      >>>>        if (Threads_lock->owner() == jt) {
>      >>>>          gt.pause(jt);
>      >>>>          ThreadBlockInVM tbivm(jt);
>      >>>>          gt.cont(jt);
>      >>>>        }
>      >>>>      }
>      >>>>
>      >>>> This way we don't waste time with the pause/cont when there's no
>      >>>> safepoint pause going to happen - and the owner() check is quicker
>      >>>> than owned_by_self(). That partially addresses a general concern I
>      >>>> have about how long it may take to grow the table, as we are
>      >>>> deferring safepoints until it is complete in this JVM_StartThread
>      >>>> usecase.
>      >>>>
>      >>>> In the test you don't need all of:
>      >>>>
>      >>>>    32  * @run clean ThreadStartTest
>      >>>>    33  * @run build ThreadStartTest
>      >>>>    34  * @run main ThreadStartTest
>      >>>>
>      >>>> just the last @run suffices to build and run the test.
>      >>>>
>      >>>> Thanks,
>      >>>> David
>      >>>> -----
>      >>>>
>      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
>      >>>>> in progress.
>      >>>>>
>      >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >>>>>
>      >>>>> Best regards,
>      >>>>> Danill
>      >>>>>
>      >>>>>
>      
> 
> 

From david.holmes at oracle.com  Wed Oct  2 22:26:55 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 3 Oct 2019 08:26:55 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
Message-ID: <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>

Hi Daniil,

On 3/10/2019 2:21 am, Daniil Titov wrote:
> Hi David and Robbin,
> 
> Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
> it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
> As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and
> doing it without polling  for safepoints or while the holding Threads_lock is not desirable.

I originally rejected copying that part of the code from the other 
tables as it seems to introduce unnecessary complexity. Having a 
separate thread trying to grow the table when it could be concurrently 
having threads added and removed seems like it could introduce hard to 
diagnose performance pathologies. It also adds what we know to be a 
potentially long running action to the workload of the service thread, 
which means it may also impact the other tasks the service thread is 
doing, thus potentially introducing even more hard to diagnose 
performance pathologies.

So this change does concern me. But go ahead and trial it.

Thanks,
David


> Thank you,
> Daniil
> 
> 
> ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
> 
>      Hi Robbin,
>      
>      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      > Hi David,
>      >
>      >> What if the table is full and must be grown?
>      >
>      > The table uses chaining, it just means load factor tip over what is
>      > considered a good backing array size.
>      
>      Coleen raised a good question in a separate discussion, which made me
>      realize that once the table has been initially populated all subsequent
>      additions, and hence all subsequent calls to grow() always happen with
>      the Threads_lock held. So we can't just defer the grow().
>      
>      >> That aside, I'd like to know how expensive it is to grow this table.
>      >> What are we talking about here?
>      >
>      > We use global counter which on write_synchronize must scan all
>      > threads to make sure they have seen the update (there some
>      > optimization to avoid it if there is no readers at all). Since this
>      > table contains the threads, we get double penalized, for each new
>      > thread the synchronization cost increase AND the number of items.
>      >
>      > With concurrent reads you still need many thousands of threads, but
>      > I think I saw someone mentioning 100k threads, assuming concurrent
>      > queries the resize can take hundreds of ms to finish. Note that reads
>      > and inserts still in operate roughly at the same speed while
>      > resizing. So a longer resize is only problematic if we do not
>      > respect safepoints.
>      I think if anything were capable of running 100K threads we would be
>      hitting far worse scalability bottlenecks than this. But this does seem
>      problematic.
>      
>      Thanks,
>      David
>      -----
>      
>      > Thanks, Robbin
>      >
>      >>
>      >> David
>      >>
>      >>> /Robbin
>      >>>
>      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >>>> Hi Daniil,
>      >>>>
>      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >>>>> Please review a change that fixes the issue. The problem here is
>      >>>>> that that the thread is added to the ThreadIdTable  (introduced in
>      >>>>> [3]) while the Threads_lock is held by
>      >>>>> JVM_StartThread. When new thread is added  to the thread table the
>      >>>>> table checks if its load factor is greater than required and if so
>      >>>>> it grows itself while polling for safepoints.
>      >>>>> After changes [4]  an attempt to block the thread while holding the
>      >>>>> Threads_lock  results in assertion in
>      >>>>> Thread::check_possible_safepoint().
>      >>>>>
>      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip
>      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>      >>>>> current thread owns the Threads_lock.
>      >>>>
>      >>>> Sorry but looking at the fix in context now I think it would be
>      >>>> better to do this:
>      >>>>
>      >>>>      while (gt.do_task(jt)) {
>      >>>>        if (Threads_lock->owner() == jt) {
>      >>>>          gt.pause(jt);
>      >>>>          ThreadBlockInVM tbivm(jt);
>      >>>>          gt.cont(jt);
>      >>>>        }
>      >>>>      }
>      >>>>
>      >>>> This way we don't waste time with the pause/cont when there's no
>      >>>> safepoint pause going to happen - and the owner() check is quicker
>      >>>> than owned_by_self(). That partially addresses a general concern I
>      >>>> have about how long it may take to grow the table, as we are
>      >>>> deferring safepoints until it is complete in this JVM_StartThread
>      >>>> usecase.
>      >>>>
>      >>>> In the test you don't need all of:
>      >>>>
>      >>>>    32  * @run clean ThreadStartTest
>      >>>>    33  * @run build ThreadStartTest
>      >>>>    34  * @run main ThreadStartTest
>      >>>>
>      >>>> just the last @run suffices to build and run the test.
>      >>>>
>      >>>> Thanks,
>      >>>> David
>      >>>> -----
>      >>>>
>      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
>      >>>>> in progress.
>      >>>>>
>      >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >>>>>
>      >>>>> Best regards,
>      >>>>> Danill
>      >>>>>
>      >>>>>
>      
> 
> 

From david.holmes at oracle.com  Wed Oct  2 22:36:46 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 3 Oct 2019 08:36:46 +1000
Subject: jmx-dev RFA: CSR: 8231593 : Add a command line option to
 control notification mechanism.
In-Reply-To: <C91A404E-1DE6-457E-8B09-8D8065AA8A01@oracle.com>
References: <C91A404E-1DE6-457E-8B09-8D8065AA8A01@oracle.com>
Message-ID: <49790458-626a-bf03-fb4e-1b01eae88215@oracle.com>

Hi Daniil,

I reviewed the CSR 3 days ago. Only one review is needed for the CSR 
process. We don't need to do an email RFA for a CSR request.

Cheers,
David

On 3/10/2019 3:08 am, Daniil Titov wrote:
> Please review/approve a CSR  request.
> 
> The proposed CSR [1] is for adding  a new VM option UseNotificationThread  (default true) to opt-out from
> the new behavior introduced by the suggested fix [3] for the issue [2]  that is on review now in the separate
> email thread [4].
> 
>   [1] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
>   [2] Issue: https://bugs.openjdk.java.net/browse/JDK-8170299
>   [3] http://cr.openjdk.java.net/~dtitov/8170299
>   [4]  https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-October/029397.html
> 
> Thank you,
> Daniil
> 
> 

From serguei.spitsyn at oracle.com  Wed Oct  2 23:24:31 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 2 Oct 2019 16:24:31 -0700
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <c3ea02e1-bd4c-838b-c3c4-fef718319090@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
 <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
 <c3ea02e1-bd4c-838b-c3c4-fef718319090@oracle.com>
Message-ID: <84e1ee7b-da77-25b9-cd98-36e8bfc66032@oracle.com>

Hi Daniil,

+1
I also prefer (agree with) a new VM option to opt-out from the new behavior.
Sorry for some latency in the review and discussion process.

Thanks,
Serguei


On 10/1/19 20:20, David Holmes wrote:
> Hi Daniil,
>
> Thanks again for your perseverance with this one.
>
> This looks fine to me.
>
> Thanks,
> David
> -----
>
> On 2/10/2019 6:57 am, Daniil Titov wrote:
>> Hello,
>>
>> Please review a new version of the change [1]? that fixes the problem 
>> with the? debugger not stopping in the low memory notification code. 
>> The fix moves the send notifications task from
>> not visible ServiceThread to a new visible NotificationThread. This 
>> version of the? change also introduces? a new VM option to opt-out 
>> from the new behavior.
>>
>> Previous email threads:
>> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html 
>>
>> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html 
>>
>>
>> The proposed CSR [3] is for adding? a new VM option 
>> UseNotificationThread? (default true) to opt-out from the new behavior.
>>
>> Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.
>>
>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299
>> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
>>
>> Thanks,
>> Daniil
>>
>> ?
>>
>>


From daniil.x.titov at oracle.com  Thu Oct  3 23:33:43 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Thu, 03 Oct 2019 16:33:43 -0700
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <84e1ee7b-da77-25b9-cd98-36e8bfc66032@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
 <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
 <c3ea02e1-bd4c-838b-c3c4-fef718319090@oracle.com>
 <84e1ee7b-da77-25b9-cd98-36e8bfc66032@oracle.com>
Message-ID: <33905952-5CB9-4DAD-86CD-2FD57254144E@oracle.com>

Hi David and Serguei,

Thank you, for reviewing this change!

Best regards,
Daniil


?On 10/2/19, 4:24 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:

    Hi Daniil,
    
    +1
    I also prefer (agree with) a new VM option to opt-out from the new behavior.
    Sorry for some latency in the review and discussion process.
    
    Thanks,
    Serguei
    
    
    On 10/1/19 20:20, David Holmes wrote:
    > Hi Daniil,
    >
    > Thanks again for your perseverance with this one.
    >
    > This looks fine to me.
    >
    > Thanks,
    > David
    > -----
    >
    > On 2/10/2019 6:57 am, Daniil Titov wrote:
    >> Hello,
    >>
    >> Please review a new version of the change [1]  that fixes the problem 
    >> with the  debugger not stopping in the low memory notification code. 
    >> The fix moves the send notifications task from
    >> not visible ServiceThread to a new visible NotificationThread. This 
    >> version of the  change also introduces  a new VM option to opt-out 
    >> from the new behavior.
    >>
    >> Previous email threads:
    >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html 
    >>
    >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html 
    >>
    >>
    >> The proposed CSR [3] is for adding  a new VM option 
    >> UseNotificationThread  (default true) to opt-out from the new behavior.
    >>
    >> Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.
    >>
    >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
    >> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299
    >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
    >>
    >> Thanks,
    >> Daniil
    >>
    >> ?
    >>
    >>
    
    
From daniil.x.titov at oracle.com  Fri Oct  4 03:38:01 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Thu, 03 Oct 2019 20:38:01 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
Message-ID: <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>

Hi David and Robbin,

Please review a new version of the fix that makes the service thread responsible for the thread table growth.

Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.02/
Bug: https://bugs.openjdk.java.net/browse/JDK-8231666 

Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.

Thank you!

Best regards,
Daniil 

?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:

    Hi Daniil,
    
    On 3/10/2019 2:21 am, Daniil Titov wrote:
    > Hi David and Robbin,
    > 
    > Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
    > it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
    > As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and
    > doing it without polling  for safepoints or while the holding Threads_lock is not desirable.
    
    I originally rejected copying that part of the code from the other 
    tables as it seems to introduce unnecessary complexity. Having a 
    separate thread trying to grow the table when it could be concurrently 
    having threads added and removed seems like it could introduce hard to 
    diagnose performance pathologies. It also adds what we know to be a 
    potentially long running action to the workload of the service thread, 
    which means it may also impact the other tasks the service thread is 
    doing, thus potentially introducing even more hard to diagnose 
    performance pathologies.
    
    So this change does concern me. But go ahead and trial it.
    
    Thanks,
    David
    
    
    > Thank you,
    > Daniil
    > 
    > 
    > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
    > 
    >      Hi Robbin,
    >      
    >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    >      > Hi David,
    >      >
    >      >> What if the table is full and must be grown?
    >      >
    >      > The table uses chaining, it just means load factor tip over what is
    >      > considered a good backing array size.
    >      
    >      Coleen raised a good question in a separate discussion, which made me
    >      realize that once the table has been initially populated all subsequent
    >      additions, and hence all subsequent calls to grow() always happen with
    >      the Threads_lock held. So we can't just defer the grow().
    >      
    >      >> That aside, I'd like to know how expensive it is to grow this table.
    >      >> What are we talking about here?
    >      >
    >      > We use global counter which on write_synchronize must scan all
    >      > threads to make sure they have seen the update (there some
    >      > optimization to avoid it if there is no readers at all). Since this
    >      > table contains the threads, we get double penalized, for each new
    >      > thread the synchronization cost increase AND the number of items.
    >      >
    >      > With concurrent reads you still need many thousands of threads, but
    >      > I think I saw someone mentioning 100k threads, assuming concurrent
    >      > queries the resize can take hundreds of ms to finish. Note that reads
    >      > and inserts still in operate roughly at the same speed while
    >      > resizing. So a longer resize is only problematic if we do not
    >      > respect safepoints.
    >      I think if anything were capable of running 100K threads we would be
    >      hitting far worse scalability bottlenecks than this. But this does seem
    >      problematic.
    >      
    >      Thanks,
    >      David
    >      -----
    >      
    >      > Thanks, Robbin
    >      >
    >      >>
    >      >> David
    >      >>
    >      >>> /Robbin
    >      >>>
    >      >>> On 2019-10-02 08:46, David Holmes wrote:
    >      >>>> Hi Daniil,
    >      >>>>
    >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >      >>>>> Please review a change that fixes the issue. The problem here is
    >      >>>>> that that the thread is added to the ThreadIdTable  (introduced in
    >      >>>>> [3]) while the Threads_lock is held by
    >      >>>>> JVM_StartThread. When new thread is added  to the thread table the
    >      >>>>> table checks if its load factor is greater than required and if so
    >      >>>>> it grows itself while polling for safepoints.
    >      >>>>> After changes [4]  an attempt to block the thread while holding the
    >      >>>>> Threads_lock  results in assertion in
    >      >>>>> Thread::check_possible_safepoint().
    >      >>>>>
    >      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip
    >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
    >      >>>>> current thread owns the Threads_lock.
    >      >>>>
    >      >>>> Sorry but looking at the fix in context now I think it would be
    >      >>>> better to do this:
    >      >>>>
    >      >>>>      while (gt.do_task(jt)) {
    >      >>>>        if (Threads_lock->owner() == jt) {
    >      >>>>          gt.pause(jt);
    >      >>>>          ThreadBlockInVM tbivm(jt);
    >      >>>>          gt.cont(jt);
    >      >>>>        }
    >      >>>>      }
    >      >>>>
    >      >>>> This way we don't waste time with the pause/cont when there's no
    >      >>>> safepoint pause going to happen - and the owner() check is quicker
    >      >>>> than owned_by_self(). That partially addresses a general concern I
    >      >>>> have about how long it may take to grow the table, as we are
    >      >>>> deferring safepoints until it is complete in this JVM_StartThread
    >      >>>> usecase.
    >      >>>>
    >      >>>> In the test you don't need all of:
    >      >>>>
    >      >>>>    32  * @run clean ThreadStartTest
    >      >>>>    33  * @run build ThreadStartTest
    >      >>>>    34  * @run main ThreadStartTest
    >      >>>>
    >      >>>> just the last @run suffices to build and run the test.
    >      >>>>
    >      >>>> Thanks,
    >      >>>> David
    >      >>>> -----
    >      >>>>
    >      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
    >      >>>>> in progress.
    >      >>>>>
    >      >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
    >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >      >>>>>
    >      >>>>> Best regards,
    >      >>>>> Danill
    >      >>>>>
    >      >>>>>
    >      
    > 
    > 
    

From david.holmes at oracle.com  Fri Oct  4 04:15:30 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 4 Oct 2019 14:15:30 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
Message-ID: <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>

Hi Daniil,

On 4/10/2019 1:38 pm, Daniil Titov wrote:
> Hi David and Robbin,
> 
> Please review a new version of the fix that makes the service thread responsible for the thread table growth.
> 
> Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.02/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231666

I don't think you need to repeat the load factor check here:

void ThreadIdTable::do_concurrent_work(JavaThread* jt) {
     assert(_is_initialized, "Thread table is not initialized");
     _has_work = false;
     double load_factor = get_load_factor();
     log_debug(thread, table)("Concurrent work, load factor: %g", 
load_factor);
     if (load_factor > PREF_AVG_LIST_LEN && 
!_local_table->is_max_size_reached()) {
       grow(jt);
     }
   }

as we will only execute this code if the load factor was seen to be too 
high.

You might also want to put the max size check in the 
check_concurrent_work code:

+   // Resize if we have more items than preferred load factor
+   if ( load_factor > PREF_AVG_LIST_LEN && 
!_local_table->is_max_size_reached()) {

so that we don't keep waking up the service thread for nothing if the 
table gets full.

Thanks,
David
-----

> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
> 
> Thank you!
> 
> Best regards,
> Daniil
> 
> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
> 
>      Hi Daniil,
>      
>      On 3/10/2019 2:21 am, Daniil Titov wrote:
>      > Hi David and Robbin,
>      >
>      > Could we consider  making the ServiceThread responsible for the ThreadIdTable resizing in the similar way how
>      > it works for  StringTable  and ResolvedMethodTable, rather than having ThreadIdTable::add() method calling ThreadIdTable::grow()?
>      > As I understand It should solve  the current  issue and  address the concern that  the doing the resizing could be a relatively long and
>      > doing it without polling  for safepoints or while the holding Threads_lock is not desirable.
>      
>      I originally rejected copying that part of the code from the other
>      tables as it seems to introduce unnecessary complexity. Having a
>      separate thread trying to grow the table when it could be concurrently
>      having threads added and removed seems like it could introduce hard to
>      diagnose performance pathologies. It also adds what we know to be a
>      potentially long running action to the workload of the service thread,
>      which means it may also impact the other tasks the service thread is
>      doing, thus potentially introducing even more hard to diagnose
>      performance pathologies.
>      
>      So this change does concern me. But go ahead and trial it.
>      
>      Thanks,
>      David
>      
>      
>      > Thank you,
>      > Daniil
>      >
>      >
>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
>      >
>      >      Hi Robbin,
>      >
>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      >      > Hi David,
>      >      >
>      >      >> What if the table is full and must be grown?
>      >      >
>      >      > The table uses chaining, it just means load factor tip over what is
>      >      > considered a good backing array size.
>      >
>      >      Coleen raised a good question in a separate discussion, which made me
>      >      realize that once the table has been initially populated all subsequent
>      >      additions, and hence all subsequent calls to grow() always happen with
>      >      the Threads_lock held. So we can't just defer the grow().
>      >
>      >      >> That aside, I'd like to know how expensive it is to grow this table.
>      >      >> What are we talking about here?
>      >      >
>      >      > We use global counter which on write_synchronize must scan all
>      >      > threads to make sure they have seen the update (there some
>      >      > optimization to avoid it if there is no readers at all). Since this
>      >      > table contains the threads, we get double penalized, for each new
>      >      > thread the synchronization cost increase AND the number of items.
>      >      >
>      >      > With concurrent reads you still need many thousands of threads, but
>      >      > I think I saw someone mentioning 100k threads, assuming concurrent
>      >      > queries the resize can take hundreds of ms to finish. Note that reads
>      >      > and inserts still in operate roughly at the same speed while
>      >      > resizing. So a longer resize is only problematic if we do not
>      >      > respect safepoints.
>      >      I think if anything were capable of running 100K threads we would be
>      >      hitting far worse scalability bottlenecks than this. But this does seem
>      >      problematic.
>      >
>      >      Thanks,
>      >      David
>      >      -----
>      >
>      >      > Thanks, Robbin
>      >      >
>      >      >>
>      >      >> David
>      >      >>
>      >      >>> /Robbin
>      >      >>>
>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >      >>>> Hi Daniil,
>      >      >>>>
>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >      >>>>> Please review a change that fixes the issue. The problem here is
>      >      >>>>> that that the thread is added to the ThreadIdTable  (introduced in
>      >      >>>>> [3]) while the Threads_lock is held by
>      >      >>>>> JVM_StartThread. When new thread is added  to the thread table the
>      >      >>>>> table checks if its load factor is greater than required and if so
>      >      >>>>> it grows itself while polling for safepoints.
>      >      >>>>> After changes [4]  an attempt to block the thread while holding the
>      >      >>>>> Threads_lock  results in assertion in
>      >      >>>>> Thread::check_possible_safepoint().
>      >      >>>>>
>      >      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to skip
>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>      >      >>>>> current thread owns the Threads_lock.
>      >      >>>>
>      >      >>>> Sorry but looking at the fix in context now I think it would be
>      >      >>>> better to do this:
>      >      >>>>
>      >      >>>>      while (gt.do_task(jt)) {
>      >      >>>>        if (Threads_lock->owner() == jt) {
>      >      >>>>          gt.pause(jt);
>      >      >>>>          ThreadBlockInVM tbivm(jt);
>      >      >>>>          gt.cont(jt);
>      >      >>>>        }
>      >      >>>>      }
>      >      >>>>
>      >      >>>> This way we don't waste time with the pause/cont when there's no
>      >      >>>> safepoint pause going to happen - and the owner() check is quicker
>      >      >>>> than owned_by_self(). That partially addresses a general concern I
>      >      >>>> have about how long it may take to grow the table, as we are
>      >      >>>> deferring safepoints until it is complete in this JVM_StartThread
>      >      >>>> usecase.
>      >      >>>>
>      >      >>>> In the test you don't need all of:
>      >      >>>>
>      >      >>>>    32  * @run clean ThreadStartTest
>      >      >>>>    33  * @run build ThreadStartTest
>      >      >>>>    34  * @run main ThreadStartTest
>      >      >>>>
>      >      >>>> just the last @run suffices to build and run the test.
>      >      >>>>
>      >      >>>> Thanks,
>      >      >>>> David
>      >      >>>> -----
>      >      >>>>
>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 is
>      >      >>>>> in progress.
>      >      >>>>>
>      >      >>>>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >      >>>>>
>      >      >>>>> Best regards,
>      >      >>>>> Danill
>      >      >>>>>
>      >      >>>>>
>      >
>      >
>      >
>      
> 
> 

From robbin.ehn at oracle.com  Fri Oct  4 06:30:36 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Fri, 4 Oct 2019 08:30:36 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
Message-ID: <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>

Hi Daniil,

> 
> You might also want to put the max size check in the check_concurrent_work code:
> 
> +?? // Resize if we have more items than preferred load factor
> +?? if ( load_factor > PREF_AVG_LIST_LEN && !_local_table->is_max_size_reached()) {
> 
> so that we don't keep waking up the service thread for nothing if the table gets 
> full.

Yes that would be a good, otherwise seems fine.

> 
> Thanks,
> David
> -----
> 
>> Testing:? Mach5 tier1, tier2, and tier3 tests successfully passed.

And if you have not done so, you should test this with the benchmark you have as 
a stress test and see that this does what we think.

Thanks, Robbin


>>
>> Thank you!
>>
>> Best regards,
>> Daniil
>>
>> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>>
>> ???? Hi Daniil,
>> ???? On 3/10/2019 2:21 am, Daniil Titov wrote:
>> ???? > Hi David and Robbin,
>> ???? >
>> ???? > Could we consider? making the ServiceThread responsible for the 
>> ThreadIdTable resizing in the similar way how
>> ???? > it works for? StringTable? and ResolvedMethodTable, rather than having 
>> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>> ???? > As I understand It should solve? the current? issue and? address the 
>> concern that? the doing the resizing could be a relatively long and
>> ???? > doing it without polling? for safepoints or while the holding 
>> Threads_lock is not desirable.
>> ???? I originally rejected copying that part of the code from the other
>> ???? tables as it seems to introduce unnecessary complexity. Having a
>> ???? separate thread trying to grow the table when it could be concurrently
>> ???? having threads added and removed seems like it could introduce hard to
>> ???? diagnose performance pathologies. It also adds what we know to be a
>> ???? potentially long running action to the workload of the service thread,
>> ???? which means it may also impact the other tasks the service thread is
>> ???? doing, thus potentially introducing even more hard to diagnose
>> ???? performance pathologies.
>> ???? So this change does concern me. But go ahead and trial it.
>> ???? Thanks,
>> ???? David
>> ???? > Thank you,
>> ???? > Daniil
>> ???? >
>> ???? >
>> ???? > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
>> ???? >
>> ???? >????? Hi Robbin,
>> ???? >
>> ???? >????? On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>> ???? >????? > Hi David,
>> ???? >????? >
>> ???? >????? >> What if the table is full and must be grown?
>> ???? >????? >
>> ???? >????? > The table uses chaining, it just means load factor tip over what is
>> ???? >????? > considered a good backing array size.
>> ???? >
>> ???? >????? Coleen raised a good question in a separate discussion, which made me
>> ???? >????? realize that once the table has been initially populated all 
>> subsequent
>> ???? >????? additions, and hence all subsequent calls to grow() always happen 
>> with
>> ???? >????? the Threads_lock held. So we can't just defer the grow().
>> ???? >
>> ???? >????? >> That aside, I'd like to know how expensive it is to grow this 
>> table.
>> ???? >????? >> What are we talking about here?
>> ???? >????? >
>> ???? >????? > We use global counter which on write_synchronize must scan all
>> ???? >????? > threads to make sure they have seen the update (there some
>> ???? >????? > optimization to avoid it if there is no readers at all). Since this
>> ???? >????? > table contains the threads, we get double penalized, for each new
>> ???? >????? > thread the synchronization cost increase AND the number of items.
>> ???? >????? >
>> ???? >????? > With concurrent reads you still need many thousands of threads, but
>> ???? >????? > I think I saw someone mentioning 100k threads, assuming concurrent
>> ???? >????? > queries the resize can take hundreds of ms to finish. Note that 
>> reads
>> ???? >????? > and inserts still in operate roughly at the same speed while
>> ???? >????? > resizing. So a longer resize is only problematic if we do not
>> ???? >????? > respect safepoints.
>> ???? >????? I think if anything were capable of running 100K threads we would be
>> ???? >????? hitting far worse scalability bottlenecks than this. But this does 
>> seem
>> ???? >????? problematic.
>> ???? >
>> ???? >????? Thanks,
>> ???? >????? David
>> ???? >????? -----
>> ???? >
>> ???? >????? > Thanks, Robbin
>> ???? >????? >
>> ???? >????? >>
>> ???? >????? >> David
>> ???? >????? >>
>> ???? >????? >>> /Robbin
>> ???? >????? >>>
>> ???? >????? >>> On 2019-10-02 08:46, David Holmes wrote:
>> ???? >????? >>>> Hi Daniil,
>> ???? >????? >>>>
>> ???? >????? >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>> ???? >????? >>>>> Please review a change that fixes the issue. The problem 
>> here is
>> ???? >????? >>>>> that that the thread is added to the ThreadIdTable  
>> (introduced in
>> ???? >????? >>>>> [3]) while the Threads_lock is held by
>> ???? >????? >>>>> JVM_StartThread. When new thread is added? to the thread 
>> table the
>> ???? >????? >>>>> table checks if its load factor is greater than required and 
>> if so
>> ???? >????? >>>>> it grows itself while polling for safepoints.
>> ???? >????? >>>>> After changes [4]? an attempt to block the thread while 
>> holding the
>> ???? >????? >>>>> Threads_lock? results in assertion in
>> ???? >????? >>>>> Thread::check_possible_safepoint().
>> ???? >????? >>>>>
>> ???? >????? >>>>> The fix? proposed by David Holmes ( thank you, David!)? is 
>> to skip
>> ???? >????? >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>> ???? >????? >>>>> current thread owns the Threads_lock.
>> ???? >????? >>>>
>> ???? >????? >>>> Sorry but looking at the fix in context now I think it would be
>> ???? >????? >>>> better to do this:
>> ???? >????? >>>>
>> ???? >????? >>>>????? while (gt.do_task(jt)) {
>> ???? >????? >>>>??????? if (Threads_lock->owner() == jt) {
>> ???? >????? >>>>????????? gt.pause(jt);
>> ???? >????? >>>>????????? ThreadBlockInVM tbivm(jt);
>> ???? >????? >>>>????????? gt.cont(jt);
>> ???? >????? >>>>??????? }
>> ???? >????? >>>>????? }
>> ???? >????? >>>>
>> ???? >????? >>>> This way we don't waste time with the pause/cont when there's no
>> ???? >????? >>>> safepoint pause going to happen - and the owner() check is 
>> quicker
>> ???? >????? >>>> than owned_by_self(). That partially addresses a general 
>> concern I
>> ???? >????? >>>> have about how long it may take to grow the table, as we are
>> ???? >????? >>>> deferring safepoints until it is complete in this 
>> JVM_StartThread
>> ???? >????? >>>> usecase.
>> ???? >????? >>>>
>> ???? >????? >>>> In the test you don't need all of:
>> ???? >????? >>>>
>> ???? >????? >>>>??? 32? * @run clean ThreadStartTest
>> ???? >????? >>>>??? 33? * @run build ThreadStartTest
>> ???? >????? >>>>??? 34? * @run main ThreadStartTest
>> ???? >????? >>>>
>> ???? >????? >>>> just the last @run suffices to build and run the test.
>> ???? >????? >>>>
>> ???? >????? >>>> Thanks,
>> ???? >????? >>>> David
>> ???? >????? >>>> -----
>> ???? >????? >>>>
>> ???? >????? >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, 
>> tier3 is
>> ???? >????? >>>>> in progress.
>> ???? >????? >>>>>
>> ???? >????? >>>>> [1] Webrev: 
>> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>> ???? >????? >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>> ???? >????? >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>> ???? >????? >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>> ???? >????? >>>>>
>> ???? >????? >>>>> Best regards,
>> ???? >????? >>>>> Danill
>> ???? >????? >>>>>
>> ???? >????? >>>>>
>> ???? >
>> ???? >
>> ???? >
>>
>>

From daniil.x.titov at oracle.com  Sat Oct  5 03:23:36 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Fri, 04 Oct 2019 20:23:36 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
Message-ID: <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>

Hi David and Robbin,

Please review a new version of the fix that adds the max size check check_concurrent_work code [1].

>    I don't think you need to repeat the load factor check here:
>    
>    void ThreadIdTable::do_concurrent_work(JavaThread* jt) {
>         assert(_is_initialized, "Thread table is not initialized");
>         _has_work = false;
>         double load_factor = get_load_factor();
>         log_debug(thread, table)("Concurrent work, load factor: %g", 
>    load_factor);
>         if (load_factor > PREF_AVG_LIST_LEN && 
>    !_local_table->is_max_size_reached()) {
>           grow(jt);
>         }
>       }
>    
>    as we will only execute this code if the load factor was seen to be too 
>    high.

I decided to leave it unchanged since in my understanding it could be the case when some threads exited and 
were removed from the table after the work was triggered but before the service thread called do_concurrent_work()
method. In this case we might have the load factor back to the normal and therefore have no need to increase the size
 of the thread table. 

Testing: Mach5 tier1, tier2, and tier3 tests passed.

[1] Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.03/ 
[2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666 

Thank you,
Daniil

?On 10/3/19, 11:30 PM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:

    Hi Daniil,
    
    > 
    > You might also want to put the max size check in the check_concurrent_work code:
    > 
    > +   // Resize if we have more items than preferred load factor
    > +   if ( load_factor > PREF_AVG_LIST_LEN && !_local_table->is_max_size_reached()) {
    > 
    > so that we don't keep waking up the service thread for nothing if the table gets 
    > full.
    
    Yes that would be a good, otherwise seems fine.
    
    > 
    > Thanks,
    > David
    > -----
    > 
    >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
    
    And if you have not done so, you should test this with the benchmark you have as 
    a stress test and see that this does what we think.
    
    Thanks, Robbin
    
    
    >>
    >> Thank you!
    >>
    >> Best regards,
    >> Daniil
    >>
    >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
    >>
    >>      Hi Daniil,
    >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
    >>      > Hi David and Robbin,
    >>      >
    >>      > Could we consider  making the ServiceThread responsible for the 
    >> ThreadIdTable resizing in the similar way how
    >>      > it works for  StringTable  and ResolvedMethodTable, rather than having 
    >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
    >>      > As I understand It should solve  the current  issue and  address the 
    >> concern that  the doing the resizing could be a relatively long and
    >>      > doing it without polling  for safepoints or while the holding 
    >> Threads_lock is not desirable.
    >>      I originally rejected copying that part of the code from the other
    >>      tables as it seems to introduce unnecessary complexity. Having a
    >>      separate thread trying to grow the table when it could be concurrently
    >>      having threads added and removed seems like it could introduce hard to
    >>      diagnose performance pathologies. It also adds what we know to be a
    >>      potentially long running action to the workload of the service thread,
    >>      which means it may also impact the other tasks the service thread is
    >>      doing, thus potentially introducing even more hard to diagnose
    >>      performance pathologies.
    >>      So this change does concern me. But go ahead and trial it.
    >>      Thanks,
    >>      David
    >>      > Thank you,
    >>      > Daniil
    >>      >
    >>      >
    >>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
    >>      >
    >>      >      Hi Robbin,
    >>      >
    >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    >>      >      > Hi David,
    >>      >      >
    >>      >      >> What if the table is full and must be grown?
    >>      >      >
    >>      >      > The table uses chaining, it just means load factor tip over what is
    >>      >      > considered a good backing array size.
    >>      >
    >>      >      Coleen raised a good question in a separate discussion, which made me
    >>      >      realize that once the table has been initially populated all 
    >> subsequent
    >>      >      additions, and hence all subsequent calls to grow() always happen 
    >> with
    >>      >      the Threads_lock held. So we can't just defer the grow().
    >>      >
    >>      >      >> That aside, I'd like to know how expensive it is to grow this 
    >> table.
    >>      >      >> What are we talking about here?
    >>      >      >
    >>      >      > We use global counter which on write_synchronize must scan all
    >>      >      > threads to make sure they have seen the update (there some
    >>      >      > optimization to avoid it if there is no readers at all). Since this
    >>      >      > table contains the threads, we get double penalized, for each new
    >>      >      > thread the synchronization cost increase AND the number of items.
    >>      >      >
    >>      >      > With concurrent reads you still need many thousands of threads, but
    >>      >      > I think I saw someone mentioning 100k threads, assuming concurrent
    >>      >      > queries the resize can take hundreds of ms to finish. Note that 
    >> reads
    >>      >      > and inserts still in operate roughly at the same speed while
    >>      >      > resizing. So a longer resize is only problematic if we do not
    >>      >      > respect safepoints.
    >>      >      I think if anything were capable of running 100K threads we would be
    >>      >      hitting far worse scalability bottlenecks than this. But this does 
    >> seem
    >>      >      problematic.
    >>      >
    >>      >      Thanks,
    >>      >      David
    >>      >      -----
    >>      >
    >>      >      > Thanks, Robbin
    >>      >      >
    >>      >      >>
    >>      >      >> David
    >>      >      >>
    >>      >      >>> /Robbin
    >>      >      >>>
    >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
    >>      >      >>>> Hi Daniil,
    >>      >      >>>>
    >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >>      >      >>>>> Please review a change that fixes the issue. The problem 
    >> here is
    >>      >      >>>>> that that the thread is added to the ThreadIdTable  
    >> (introduced in
    >>      >      >>>>> [3]) while the Threads_lock is held by
    >>      >      >>>>> JVM_StartThread. When new thread is added  to the thread 
    >> table the
    >>      >      >>>>> table checks if its load factor is greater than required and 
    >> if so
    >>      >      >>>>> it grows itself while polling for safepoints.
    >>      >      >>>>> After changes [4]  an attempt to block the thread while 
    >> holding the
    >>      >      >>>>> Threads_lock  results in assertion in
    >>      >      >>>>> Thread::check_possible_safepoint().
    >>      >      >>>>>
    >>      >      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is 
    >> to skip
    >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
    >>      >      >>>>> current thread owns the Threads_lock.
    >>      >      >>>>
    >>      >      >>>> Sorry but looking at the fix in context now I think it would be
    >>      >      >>>> better to do this:
    >>      >      >>>>
    >>      >      >>>>      while (gt.do_task(jt)) {
    >>      >      >>>>        if (Threads_lock->owner() == jt) {
    >>      >      >>>>          gt.pause(jt);
    >>      >      >>>>          ThreadBlockInVM tbivm(jt);
    >>      >      >>>>          gt.cont(jt);
    >>      >      >>>>        }
    >>      >      >>>>      }
    >>      >      >>>>
    >>      >      >>>> This way we don't waste time with the pause/cont when there's no
    >>      >      >>>> safepoint pause going to happen - and the owner() check is 
    >> quicker
    >>      >      >>>> than owned_by_self(). That partially addresses a general 
    >> concern I
    >>      >      >>>> have about how long it may take to grow the table, as we are
    >>      >      >>>> deferring safepoints until it is complete in this 
    >> JVM_StartThread
    >>      >      >>>> usecase.
    >>      >      >>>>
    >>      >      >>>> In the test you don't need all of:
    >>      >      >>>>
    >>      >      >>>>    32  * @run clean ThreadStartTest
    >>      >      >>>>    33  * @run build ThreadStartTest
    >>      >      >>>>    34  * @run main ThreadStartTest
    >>      >      >>>>
    >>      >      >>>> just the last @run suffices to build and run the test.
    >>      >      >>>>
    >>      >      >>>> Thanks,
    >>      >      >>>> David
    >>      >      >>>> -----
    >>      >      >>>>
    >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, 
    >> tier3 is
    >>      >      >>>>> in progress.
    >>      >      >>>>>
    >>      >      >>>>> [1] Webrev: 
    >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >>      >      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
    >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >>      >      >>>>>
    >>      >      >>>>> Best regards,
    >>      >      >>>>> Danill
    >>      >      >>>>>
    >>      >      >>>>>
    >>      >
    >>      >
    >>      >
    >>
    >>
    

From david.holmes at oracle.com  Sat Oct  5 03:58:06 2019
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 5 Oct 2019 13:58:06 +1000
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
 <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
Message-ID: <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>

Hi Daniil,

On 5/10/2019 1:23 pm, Daniil Titov wrote:
> Hi David and Robbin,
> 
> Please review a new version of the fix that adds the max size check check_concurrent_work code [1].

That change seems fine.

>>     I don't think you need to repeat the load factor check here:
>>     
>>     void ThreadIdTable::do_concurrent_work(JavaThread* jt) {
>>          assert(_is_initialized, "Thread table is not initialized");
>>          _has_work = false;
>>          double load_factor = get_load_factor();
>>          log_debug(thread, table)("Concurrent work, load factor: %g",
>>     load_factor);
>>          if (load_factor > PREF_AVG_LIST_LEN &&
>>     !_local_table->is_max_size_reached()) {
>>            grow(jt);
>>          }
>>        }
>>     
>>     as we will only execute this code if the load factor was seen to be too
>>     high.
> 
> I decided to leave it unchanged since in my understanding it could be the case when some threads exited and
> were removed from the table after the work was triggered but before the service thread called do_concurrent_work()
> method. In this case we might have the load factor back to the normal and therefore have no need to increase the size
>   of the thread table.

True, but if new threads get added again you could just repeat the 
process. This is a more stable process if you use an "edge trigger" 
rather than a "level trigger". But either way we are making assumptions 
about the pattern of adding and removing threads. So okay to leave as-is.

So, good to go.

Thanks,
David

> Testing: Mach5 tier1, tier2, and tier3 tests passed.
> 
> [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.03/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
> 
> Thank you,
> Daniil
> 
> ?On 10/3/19, 11:30 PM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
> 
>      Hi Daniil,
>      
>      >
>      > You might also want to put the max size check in the check_concurrent_work code:
>      >
>      > +   // Resize if we have more items than preferred load factor
>      > +   if ( load_factor > PREF_AVG_LIST_LEN && !_local_table->is_max_size_reached()) {
>      >
>      > so that we don't keep waking up the service thread for nothing if the table gets
>      > full.
>      
>      Yes that would be a good, otherwise seems fine.
>      
>      >
>      > Thanks,
>      > David
>      > -----
>      >
>      >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
>      
>      And if you have not done so, you should test this with the benchmark you have as
>      a stress test and see that this does what we think.
>      
>      Thanks, Robbin
>      
>      
>      >>
>      >> Thank you!
>      >>
>      >> Best regards,
>      >> Daniil
>      >>
>      >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>      >>
>      >>      Hi Daniil,
>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
>      >>      > Hi David and Robbin,
>      >>      >
>      >>      > Could we consider  making the ServiceThread responsible for the
>      >> ThreadIdTable resizing in the similar way how
>      >>      > it works for  StringTable  and ResolvedMethodTable, rather than having
>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>      >>      > As I understand It should solve  the current  issue and  address the
>      >> concern that  the doing the resizing could be a relatively long and
>      >>      > doing it without polling  for safepoints or while the holding
>      >> Threads_lock is not desirable.
>      >>      I originally rejected copying that part of the code from the other
>      >>      tables as it seems to introduce unnecessary complexity. Having a
>      >>      separate thread trying to grow the table when it could be concurrently
>      >>      having threads added and removed seems like it could introduce hard to
>      >>      diagnose performance pathologies. It also adds what we know to be a
>      >>      potentially long running action to the workload of the service thread,
>      >>      which means it may also impact the other tasks the service thread is
>      >>      doing, thus potentially introducing even more hard to diagnose
>      >>      performance pathologies.
>      >>      So this change does concern me. But go ahead and trial it.
>      >>      Thanks,
>      >>      David
>      >>      > Thank you,
>      >>      > Daniil
>      >>      >
>      >>      >
>      >>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> wrote:
>      >>      >
>      >>      >      Hi Robbin,
>      >>      >
>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      >>      >      > Hi David,
>      >>      >      >
>      >>      >      >> What if the table is full and must be grown?
>      >>      >      >
>      >>      >      > The table uses chaining, it just means load factor tip over what is
>      >>      >      > considered a good backing array size.
>      >>      >
>      >>      >      Coleen raised a good question in a separate discussion, which made me
>      >>      >      realize that once the table has been initially populated all
>      >> subsequent
>      >>      >      additions, and hence all subsequent calls to grow() always happen
>      >> with
>      >>      >      the Threads_lock held. So we can't just defer the grow().
>      >>      >
>      >>      >      >> That aside, I'd like to know how expensive it is to grow this
>      >> table.
>      >>      >      >> What are we talking about here?
>      >>      >      >
>      >>      >      > We use global counter which on write_synchronize must scan all
>      >>      >      > threads to make sure they have seen the update (there some
>      >>      >      > optimization to avoid it if there is no readers at all). Since this
>      >>      >      > table contains the threads, we get double penalized, for each new
>      >>      >      > thread the synchronization cost increase AND the number of items.
>      >>      >      >
>      >>      >      > With concurrent reads you still need many thousands of threads, but
>      >>      >      > I think I saw someone mentioning 100k threads, assuming concurrent
>      >>      >      > queries the resize can take hundreds of ms to finish. Note that
>      >> reads
>      >>      >      > and inserts still in operate roughly at the same speed while
>      >>      >      > resizing. So a longer resize is only problematic if we do not
>      >>      >      > respect safepoints.
>      >>      >      I think if anything were capable of running 100K threads we would be
>      >>      >      hitting far worse scalability bottlenecks than this. But this does
>      >> seem
>      >>      >      problematic.
>      >>      >
>      >>      >      Thanks,
>      >>      >      David
>      >>      >      -----
>      >>      >
>      >>      >      > Thanks, Robbin
>      >>      >      >
>      >>      >      >>
>      >>      >      >> David
>      >>      >      >>
>      >>      >      >>> /Robbin
>      >>      >      >>>
>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >>      >      >>>> Hi Daniil,
>      >>      >      >>>>
>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >>      >      >>>>> Please review a change that fixes the issue. The problem
>      >> here is
>      >>      >      >>>>> that that the thread is added to the ThreadIdTable
>      >> (introduced in
>      >>      >      >>>>> [3]) while the Threads_lock is held by
>      >>      >      >>>>> JVM_StartThread. When new thread is added  to the thread
>      >> table the
>      >>      >      >>>>> table checks if its load factor is greater than required and
>      >> if so
>      >>      >      >>>>> it grows itself while polling for safepoints.
>      >>      >      >>>>> After changes [4]  an attempt to block the thread while
>      >> holding the
>      >>      >      >>>>> Threads_lock  results in assertion in
>      >>      >      >>>>> Thread::check_possible_safepoint().
>      >>      >      >>>>>
>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is
>      >> to skip
>      >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
>      >>      >      >>>>> current thread owns the Threads_lock.
>      >>      >      >>>>
>      >>      >      >>>> Sorry but looking at the fix in context now I think it would be
>      >>      >      >>>> better to do this:
>      >>      >      >>>>
>      >>      >      >>>>      while (gt.do_task(jt)) {
>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
>      >>      >      >>>>          gt.pause(jt);
>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
>      >>      >      >>>>          gt.cont(jt);
>      >>      >      >>>>        }
>      >>      >      >>>>      }
>      >>      >      >>>>
>      >>      >      >>>> This way we don't waste time with the pause/cont when there's no
>      >>      >      >>>> safepoint pause going to happen - and the owner() check is
>      >> quicker
>      >>      >      >>>> than owned_by_self(). That partially addresses a general
>      >> concern I
>      >>      >      >>>> have about how long it may take to grow the table, as we are
>      >>      >      >>>> deferring safepoints until it is complete in this
>      >> JVM_StartThread
>      >>      >      >>>> usecase.
>      >>      >      >>>>
>      >>      >      >>>> In the test you don't need all of:
>      >>      >      >>>>
>      >>      >      >>>>    32  * @run clean ThreadStartTest
>      >>      >      >>>>    33  * @run build ThreadStartTest
>      >>      >      >>>>    34  * @run main ThreadStartTest
>      >>      >      >>>>
>      >>      >      >>>> just the last @run suffices to build and run the test.
>      >>      >      >>>>
>      >>      >      >>>> Thanks,
>      >>      >      >>>> David
>      >>      >      >>>> -----
>      >>      >      >>>>
>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully,
>      >> tier3 is
>      >>      >      >>>>> in progress.
>      >>      >      >>>>>
>      >>      >      >>>>> [1] Webrev:
>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >>      >      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
>      >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >>      >      >>>>>
>      >>      >      >>>>> Best regards,
>      >>      >      >>>>> Danill
>      >>      >      >>>>>
>      >>      >      >>>>>
>      >>      >
>      >>      >
>      >>      >
>      >>
>      >>
>      
> 
> 

From robbin.ehn at oracle.com  Mon Oct  7 07:34:52 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 7 Oct 2019 09:34:52 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
 <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
 <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>
Message-ID: <f8ed404f-e4c7-7b6a-aa20-1f660e84449f@oracle.com>

Hi Daniil,

Yes, good, but:

>> ???? >> Testing:? Mach5 tier1, tier2, and tier3 tests successfully passed.
>> ???? And if you have not done so, you should test this with the benchmark you 
>> have as
>> ???? a stress test and see that this does what we think.

Can you please test it with your benchmark, if you have not done so?

/Robbin

>> ???? Thanks, Robbin
>> ???? >>
>> ???? >> Thank you!
>> ???? >>
>> ???? >> Best regards,
>> ???? >> Daniil
>> ???? >>
>> ???? >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>> ???? >>
>> ???? >>????? Hi Daniil,
>> ???? >>????? On 3/10/2019 2:21 am, Daniil Titov wrote:
>> ???? >>????? > Hi David and Robbin,
>> ???? >>????? >
>> ???? >>????? > Could we consider? making the ServiceThread responsible for the
>> ???? >> ThreadIdTable resizing in the similar way how
>> ???? >>????? > it works for? StringTable? and ResolvedMethodTable, rather than 
>> having
>> ???? >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>> ???? >>????? > As I understand It should solve? the current? issue and  
>> address the
>> ???? >> concern that? the doing the resizing could be a relatively long and
>> ???? >>????? > doing it without polling? for safepoints or while the holding
>> ???? >> Threads_lock is not desirable.
>> ???? >>????? I originally rejected copying that part of the code from the other
>> ???? >>????? tables as it seems to introduce unnecessary complexity. Having a
>> ???? >>????? separate thread trying to grow the table when it could be 
>> concurrently
>> ???? >>????? having threads added and removed seems like it could introduce 
>> hard to
>> ???? >>????? diagnose performance pathologies. It also adds what we know to be a
>> ???? >>????? potentially long running action to the workload of the service 
>> thread,
>> ???? >>????? which means it may also impact the other tasks the service thread is
>> ???? >>????? doing, thus potentially introducing even more hard to diagnose
>> ???? >>????? performance pathologies.
>> ???? >>????? So this change does concern me. But go ahead and trial it.
>> ???? >>????? Thanks,
>> ???? >>????? David
>> ???? >>????? > Thank you,
>> ???? >>????? > Daniil
>> ???? >>????? >
>> ???? >>????? >
>> ???? >>????? > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> 
>> wrote:
>> ???? >>????? >
>> ???? >>????? >????? Hi Robbin,
>> ???? >>????? >
>> ???? >>????? >????? On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>> ???? >>????? >????? > Hi David,
>> ???? >>????? >????? >
>> ???? >>????? >????? >> What if the table is full and must be grown?
>> ???? >>????? >????? >
>> ???? >>????? >????? > The table uses chaining, it just means load factor tip 
>> over what is
>> ???? >>????? >????? > considered a good backing array size.
>> ???? >>????? >
>> ???? >>????? >????? Coleen raised a good question in a separate discussion, 
>> which made me
>> ???? >>????? >????? realize that once the table has been initially populated all
>> ???? >> subsequent
>> ???? >>????? >????? additions, and hence all subsequent calls to grow() always 
>> happen
>> ???? >> with
>> ???? >>????? >????? the Threads_lock held. So we can't just defer the grow().
>> ???? >>????? >
>> ???? >>????? >????? >> That aside, I'd like to know how expensive it is to 
>> grow this
>> ???? >> table.
>> ???? >>????? >????? >> What are we talking about here?
>> ???? >>????? >????? >
>> ???? >>????? >????? > We use global counter which on write_synchronize must 
>> scan all
>> ???? >>????? >????? > threads to make sure they have seen the update (there some
>> ???? >>????? >????? > optimization to avoid it if there is no readers at all). 
>> Since this
>> ???? >>????? >????? > table contains the threads, we get double penalized, for 
>> each new
>> ???? >>????? >????? > thread the synchronization cost increase AND the number 
>> of items.
>> ???? >>????? >????? >
>> ???? >>????? >????? > With concurrent reads you still need many thousands of 
>> threads, but
>> ???? >>????? >????? > I think I saw someone mentioning 100k threads, assuming 
>> concurrent
>> ???? >>????? >????? > queries the resize can take hundreds of ms to finish. 
>> Note that
>> ???? >> reads
>> ???? >>????? >????? > and inserts still in operate roughly at the same speed 
>> while
>> ???? >>????? >????? > resizing. So a longer resize is only problematic if we 
>> do not
>> ???? >>????? >????? > respect safepoints.
>> ???? >>????? >????? I think if anything were capable of running 100K threads 
>> we would be
>> ???? >>????? >????? hitting far worse scalability bottlenecks than this. But 
>> this does
>> ???? >> seem
>> ???? >>????? >????? problematic.
>> ???? >>????? >
>> ???? >>????? >????? Thanks,
>> ???? >>????? >????? David
>> ???? >>????? >????? -----
>> ???? >>????? >
>> ???? >>????? >????? > Thanks, Robbin
>> ???? >>????? >????? >
>> ???? >>????? >????? >>
>> ???? >>????? >????? >> David
>> ???? >>????? >????? >>
>> ???? >>????? >????? >>> /Robbin
>> ???? >>????? >????? >>>
>> ???? >>????? >????? >>> On 2019-10-02 08:46, David Holmes wrote:
>> ???? >>????? >????? >>>> Hi Daniil,
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>> ???? >>????? >????? >>>>> Please review a change that fixes the issue. The 
>> problem
>> ???? >> here is
>> ???? >>????? >????? >>>>> that that the thread is added to the ThreadIdTable
>> ???? >> (introduced in
>> ???? >>????? >????? >>>>> [3]) while the Threads_lock is held by
>> ???? >>????? >????? >>>>> JVM_StartThread. When new thread is added? to the 
>> thread
>> ???? >> table the
>> ???? >>????? >????? >>>>> table checks if its load factor is greater than 
>> required and
>> ???? >> if so
>> ???? >>????? >????? >>>>> it grows itself while polling for safepoints.
>> ???? >>????? >????? >>>>> After changes [4]? an attempt to block the thread while
>> ???? >> holding the
>> ???? >>????? >????? >>>>> Threads_lock? results in assertion in
>> ???? >>????? >????? >>>>> Thread::check_possible_safepoint().
>> ???? >>????? >????? >>>>>
>> ???? >>????? >????? >>>>> The fix? proposed by David Holmes ( thank you, 
>> David!)? is
>> ???? >> to skip
>> ???? >>????? >????? >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() 
>> method if the
>> ???? >>????? >????? >>>>> current thread owns the Threads_lock.
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> Sorry but looking at the fix in context now I think 
>> it would be
>> ???? >>????? >????? >>>> better to do this:
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>>????? while (gt.do_task(jt)) {
>> ???? >>????? >????? >>>>??????? if (Threads_lock->owner() == jt) {
>> ???? >>????? >????? >>>>????????? gt.pause(jt);
>> ???? >>????? >????? >>>>????????? ThreadBlockInVM tbivm(jt);
>> ???? >>????? >????? >>>>????????? gt.cont(jt);
>> ???? >>????? >????? >>>>??????? }
>> ???? >>????? >????? >>>>????? }
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> This way we don't waste time with the pause/cont when 
>> there's no
>> ???? >>????? >????? >>>> safepoint pause going to happen - and the owner() 
>> check is
>> ???? >> quicker
>> ???? >>????? >????? >>>> than owned_by_self(). That partially addresses a general
>> ???? >> concern I
>> ???? >>????? >????? >>>> have about how long it may take to grow the table, as 
>> we are
>> ???? >>????? >????? >>>> deferring safepoints until it is complete in this
>> ???? >> JVM_StartThread
>> ???? >>????? >????? >>>> usecase.
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> In the test you don't need all of:
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>>??? 32? * @run clean ThreadStartTest
>> ???? >>????? >????? >>>>??? 33? * @run build ThreadStartTest
>> ???? >>????? >????? >>>>??? 34? * @run main ThreadStartTest
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> just the last @run suffices to build and run the test.
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>> Thanks,
>> ???? >>????? >????? >>>> David
>> ???? >>????? >????? >>>> -----
>> ???? >>????? >????? >>>>
>> ???? >>????? >????? >>>>> Testing : Mach 5 tier1 and tier2 completed 
>> successfully,
>> ???? >> tier3 is
>> ???? >>????? >????? >>>>> in progress.
>> ???? >>????? >????? >>>>>
>> ???? >>????? >????? >>>>> [1] Webrev:
>> ???? >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>> ???? >>????? >????? >>>>> [2] Bug: 
>> https://bugs.openjdk.java.net/browse/JDK-8231666
>> ???? >>????? >????? >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>> ???? >>????? >????? >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>> ???? >>????? >????? >>>>>
>> ???? >>????? >????? >>>>> Best regards,
>> ???? >>????? >????? >>>>> Danill
>> ???? >>????? >????? >>>>>
>> ???? >>????? >????? >>>>>
>> ???? >>????? >
>> ???? >>????? >
>> ???? >>????? >
>> ???? >>
>> ???? >>
>>
>>

From daniil.x.titov at oracle.com  Mon Oct  7 16:41:24 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Mon, 07 Oct 2019 09:41:24 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <f8ed404f-e4c7-7b6a-aa20-1f660e84449f@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
 <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
 <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>
 <f8ed404f-e4c7-7b6a-aa20-1f660e84449f@oracle.com>
Message-ID: <137B65A2-5662-4569-A813-20510068DDF5@oracle.com>

Hi Robbin,

Yes, I ran my benchmark [1]. Please see below the output showing that the table was grown.

../jdk/build/linux-x64-release/images/jdk/bin/java -cp . -Xlog:thread+table=info ThreadStartupTest
Starting the test
[0.185s][info][thread,table] Grown to size:512
The test finished.
Execution time:15673 ms


[1] https://cr.openjdk.java.net/~dtitov/tests/ThreadStartupTest.java

Thanks!
Daniil


?On 10/7/19, 12:34 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:

    Hi Daniil,
    
    Yes, good, but:
    
    >>      >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
    >>      And if you have not done so, you should test this with the benchmark you 
    >> have as
    >>      a stress test and see that this does what we think.
    
    Can you please test it with your benchmark, if you have not done so?
    
    /Robbin
    
    >>      Thanks, Robbin
    >>      >>
    >>      >> Thank you!
    >>      >>
    >>      >> Best regards,
    >>      >> Daniil
    >>      >>
    >>      >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
    >>      >>
    >>      >>      Hi Daniil,
    >>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
    >>      >>      > Hi David and Robbin,
    >>      >>      >
    >>      >>      > Could we consider  making the ServiceThread responsible for the
    >>      >> ThreadIdTable resizing in the similar way how
    >>      >>      > it works for  StringTable  and ResolvedMethodTable, rather than 
    >> having
    >>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
    >>      >>      > As I understand It should solve  the current  issue and  
    >> address the
    >>      >> concern that  the doing the resizing could be a relatively long and
    >>      >>      > doing it without polling  for safepoints or while the holding
    >>      >> Threads_lock is not desirable.
    >>      >>      I originally rejected copying that part of the code from the other
    >>      >>      tables as it seems to introduce unnecessary complexity. Having a
    >>      >>      separate thread trying to grow the table when it could be 
    >> concurrently
    >>      >>      having threads added and removed seems like it could introduce 
    >> hard to
    >>      >>      diagnose performance pathologies. It also adds what we know to be a
    >>      >>      potentially long running action to the workload of the service 
    >> thread,
    >>      >>      which means it may also impact the other tasks the service thread is
    >>      >>      doing, thus potentially introducing even more hard to diagnose
    >>      >>      performance pathologies.
    >>      >>      So this change does concern me. But go ahead and trial it.
    >>      >>      Thanks,
    >>      >>      David
    >>      >>      > Thank you,
    >>      >>      > Daniil
    >>      >>      >
    >>      >>      >
    >>      >>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com> 
    >> wrote:
    >>      >>      >
    >>      >>      >      Hi Robbin,
    >>      >>      >
    >>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    >>      >>      >      > Hi David,
    >>      >>      >      >
    >>      >>      >      >> What if the table is full and must be grown?
    >>      >>      >      >
    >>      >>      >      > The table uses chaining, it just means load factor tip 
    >> over what is
    >>      >>      >      > considered a good backing array size.
    >>      >>      >
    >>      >>      >      Coleen raised a good question in a separate discussion, 
    >> which made me
    >>      >>      >      realize that once the table has been initially populated all
    >>      >> subsequent
    >>      >>      >      additions, and hence all subsequent calls to grow() always 
    >> happen
    >>      >> with
    >>      >>      >      the Threads_lock held. So we can't just defer the grow().
    >>      >>      >
    >>      >>      >      >> That aside, I'd like to know how expensive it is to 
    >> grow this
    >>      >> table.
    >>      >>      >      >> What are we talking about here?
    >>      >>      >      >
    >>      >>      >      > We use global counter which on write_synchronize must 
    >> scan all
    >>      >>      >      > threads to make sure they have seen the update (there some
    >>      >>      >      > optimization to avoid it if there is no readers at all). 
    >> Since this
    >>      >>      >      > table contains the threads, we get double penalized, for 
    >> each new
    >>      >>      >      > thread the synchronization cost increase AND the number 
    >> of items.
    >>      >>      >      >
    >>      >>      >      > With concurrent reads you still need many thousands of 
    >> threads, but
    >>      >>      >      > I think I saw someone mentioning 100k threads, assuming 
    >> concurrent
    >>      >>      >      > queries the resize can take hundreds of ms to finish. 
    >> Note that
    >>      >> reads
    >>      >>      >      > and inserts still in operate roughly at the same speed 
    >> while
    >>      >>      >      > resizing. So a longer resize is only problematic if we 
    >> do not
    >>      >>      >      > respect safepoints.
    >>      >>      >      I think if anything were capable of running 100K threads 
    >> we would be
    >>      >>      >      hitting far worse scalability bottlenecks than this. But 
    >> this does
    >>      >> seem
    >>      >>      >      problematic.
    >>      >>      >
    >>      >>      >      Thanks,
    >>      >>      >      David
    >>      >>      >      -----
    >>      >>      >
    >>      >>      >      > Thanks, Robbin
    >>      >>      >      >
    >>      >>      >      >>
    >>      >>      >      >> David
    >>      >>      >      >>
    >>      >>      >      >>> /Robbin
    >>      >>      >      >>>
    >>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
    >>      >>      >      >>>> Hi Daniil,
    >>      >>      >      >>>>
    >>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >>      >>      >      >>>>> Please review a change that fixes the issue. The 
    >> problem
    >>      >> here is
    >>      >>      >      >>>>> that that the thread is added to the ThreadIdTable
    >>      >> (introduced in
    >>      >>      >      >>>>> [3]) while the Threads_lock is held by
    >>      >>      >      >>>>> JVM_StartThread. When new thread is added  to the 
    >> thread
    >>      >> table the
    >>      >>      >      >>>>> table checks if its load factor is greater than 
    >> required and
    >>      >> if so
    >>      >>      >      >>>>> it grows itself while polling for safepoints.
    >>      >>      >      >>>>> After changes [4]  an attempt to block the thread while
    >>      >> holding the
    >>      >>      >      >>>>> Threads_lock  results in assertion in
    >>      >>      >      >>>>> Thread::check_possible_safepoint().
    >>      >>      >      >>>>>
    >>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you, 
    >> David!)  is
    >>      >> to skip
    >>      >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() 
    >> method if the
    >>      >>      >      >>>>> current thread owns the Threads_lock.
    >>      >>      >      >>>>
    >>      >>      >      >>>> Sorry but looking at the fix in context now I think 
    >> it would be
    >>      >>      >      >>>> better to do this:
    >>      >>      >      >>>>
    >>      >>      >      >>>>      while (gt.do_task(jt)) {
    >>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
    >>      >>      >      >>>>          gt.pause(jt);
    >>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
    >>      >>      >      >>>>          gt.cont(jt);
    >>      >>      >      >>>>        }
    >>      >>      >      >>>>      }
    >>      >>      >      >>>>
    >>      >>      >      >>>> This way we don't waste time with the pause/cont when 
    >> there's no
    >>      >>      >      >>>> safepoint pause going to happen - and the owner() 
    >> check is
    >>      >> quicker
    >>      >>      >      >>>> than owned_by_self(). That partially addresses a general
    >>      >> concern I
    >>      >>      >      >>>> have about how long it may take to grow the table, as 
    >> we are
    >>      >>      >      >>>> deferring safepoints until it is complete in this
    >>      >> JVM_StartThread
    >>      >>      >      >>>> usecase.
    >>      >>      >      >>>>
    >>      >>      >      >>>> In the test you don't need all of:
    >>      >>      >      >>>>
    >>      >>      >      >>>>    32  * @run clean ThreadStartTest
    >>      >>      >      >>>>    33  * @run build ThreadStartTest
    >>      >>      >      >>>>    34  * @run main ThreadStartTest
    >>      >>      >      >>>>
    >>      >>      >      >>>> just the last @run suffices to build and run the test.
    >>      >>      >      >>>>
    >>      >>      >      >>>> Thanks,
    >>      >>      >      >>>> David
    >>      >>      >      >>>> -----
    >>      >>      >      >>>>
    >>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed 
    >> successfully,
    >>      >> tier3 is
    >>      >>      >      >>>>> in progress.
    >>      >>      >      >>>>>
    >>      >>      >      >>>>> [1] Webrev:
    >>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >>      >>      >      >>>>> [2] Bug: 
    >> https://bugs.openjdk.java.net/browse/JDK-8231666
    >>      >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >>      >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >>      >>      >      >>>>>
    >>      >>      >      >>>>> Best regards,
    >>      >>      >      >>>>> Danill
    >>      >>      >      >>>>>
    >>      >>      >      >>>>>
    >>      >>      >
    >>      >>      >
    >>      >>      >
    >>      >>
    >>      >>
    >>
    >>
    

From robbin.ehn at oracle.com  Tue Oct  8 08:49:14 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 8 Oct 2019 10:49:14 +0200
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <137B65A2-5662-4569-A813-20510068DDF5@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
 <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
 <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>
 <f8ed404f-e4c7-7b6a-aa20-1f660e84449f@oracle.com>
 <137B65A2-5662-4569-A813-20510068DDF5@oracle.com>
Message-ID: <e333ec68-1ca6-8036-043c-e330252e71fe@oracle.com>

Great, thanks!

/Robbin

On 2019-10-07 18:41, Daniil Titov wrote:
> Hi Robbin,
> 
> Yes, I ran my benchmark [1]. Please see below the output showing that the table was grown.
> 
> ../jdk/build/linux-x64-release/images/jdk/bin/java -cp . -Xlog:thread+table=info ThreadStartupTest
> Starting the test
> [0.185s][info][thread,table] Grown to size:512
> The test finished.
> Execution time:15673 ms
> 
> 
> [1] https://cr.openjdk.java.net/~dtitov/tests/ThreadStartupTest.java
> 
> Thanks!
> Daniil
> 
> 
> ?On 10/7/19, 12:34 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
> 
>      Hi Daniil,
>      
>      Yes, good, but:
>      
>      >>      >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
>      >>      And if you have not done so, you should test this with the benchmark you
>      >> have as
>      >>      a stress test and see that this does what we think.
>      
>      Can you please test it with your benchmark, if you have not done so?
>      
>      /Robbin
>      
>      >>      Thanks, Robbin
>      >>      >>
>      >>      >> Thank you!
>      >>      >>
>      >>      >> Best regards,
>      >>      >> Daniil
>      >>      >>
>      >>      >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>      >>      >>
>      >>      >>      Hi Daniil,
>      >>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
>      >>      >>      > Hi David and Robbin,
>      >>      >>      >
>      >>      >>      > Could we consider  making the ServiceThread responsible for the
>      >>      >> ThreadIdTable resizing in the similar way how
>      >>      >>      > it works for  StringTable  and ResolvedMethodTable, rather than
>      >> having
>      >>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
>      >>      >>      > As I understand It should solve  the current  issue and
>      >> address the
>      >>      >> concern that  the doing the resizing could be a relatively long and
>      >>      >>      > doing it without polling  for safepoints or while the holding
>      >>      >> Threads_lock is not desirable.
>      >>      >>      I originally rejected copying that part of the code from the other
>      >>      >>      tables as it seems to introduce unnecessary complexity. Having a
>      >>      >>      separate thread trying to grow the table when it could be
>      >> concurrently
>      >>      >>      having threads added and removed seems like it could introduce
>      >> hard to
>      >>      >>      diagnose performance pathologies. It also adds what we know to be a
>      >>      >>      potentially long running action to the workload of the service
>      >> thread,
>      >>      >>      which means it may also impact the other tasks the service thread is
>      >>      >>      doing, thus potentially introducing even more hard to diagnose
>      >>      >>      performance pathologies.
>      >>      >>      So this change does concern me. But go ahead and trial it.
>      >>      >>      Thanks,
>      >>      >>      David
>      >>      >>      > Thank you,
>      >>      >>      > Daniil
>      >>      >>      >
>      >>      >>      >
>      >>      >>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com>
>      >> wrote:
>      >>      >>      >
>      >>      >>      >      Hi Robbin,
>      >>      >>      >
>      >>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
>      >>      >>      >      > Hi David,
>      >>      >>      >      >
>      >>      >>      >      >> What if the table is full and must be grown?
>      >>      >>      >      >
>      >>      >>      >      > The table uses chaining, it just means load factor tip
>      >> over what is
>      >>      >>      >      > considered a good backing array size.
>      >>      >>      >
>      >>      >>      >      Coleen raised a good question in a separate discussion,
>      >> which made me
>      >>      >>      >      realize that once the table has been initially populated all
>      >>      >> subsequent
>      >>      >>      >      additions, and hence all subsequent calls to grow() always
>      >> happen
>      >>      >> with
>      >>      >>      >      the Threads_lock held. So we can't just defer the grow().
>      >>      >>      >
>      >>      >>      >      >> That aside, I'd like to know how expensive it is to
>      >> grow this
>      >>      >> table.
>      >>      >>      >      >> What are we talking about here?
>      >>      >>      >      >
>      >>      >>      >      > We use global counter which on write_synchronize must
>      >> scan all
>      >>      >>      >      > threads to make sure they have seen the update (there some
>      >>      >>      >      > optimization to avoid it if there is no readers at all).
>      >> Since this
>      >>      >>      >      > table contains the threads, we get double penalized, for
>      >> each new
>      >>      >>      >      > thread the synchronization cost increase AND the number
>      >> of items.
>      >>      >>      >      >
>      >>      >>      >      > With concurrent reads you still need many thousands of
>      >> threads, but
>      >>      >>      >      > I think I saw someone mentioning 100k threads, assuming
>      >> concurrent
>      >>      >>      >      > queries the resize can take hundreds of ms to finish.
>      >> Note that
>      >>      >> reads
>      >>      >>      >      > and inserts still in operate roughly at the same speed
>      >> while
>      >>      >>      >      > resizing. So a longer resize is only problematic if we
>      >> do not
>      >>      >>      >      > respect safepoints.
>      >>      >>      >      I think if anything were capable of running 100K threads
>      >> we would be
>      >>      >>      >      hitting far worse scalability bottlenecks than this. But
>      >> this does
>      >>      >> seem
>      >>      >>      >      problematic.
>      >>      >>      >
>      >>      >>      >      Thanks,
>      >>      >>      >      David
>      >>      >>      >      -----
>      >>      >>      >
>      >>      >>      >      > Thanks, Robbin
>      >>      >>      >      >
>      >>      >>      >      >>
>      >>      >>      >      >> David
>      >>      >>      >      >>
>      >>      >>      >      >>> /Robbin
>      >>      >>      >      >>>
>      >>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
>      >>      >>      >      >>>> Hi Daniil,
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
>      >>      >>      >      >>>>> Please review a change that fixes the issue. The
>      >> problem
>      >>      >> here is
>      >>      >>      >      >>>>> that that the thread is added to the ThreadIdTable
>      >>      >> (introduced in
>      >>      >>      >      >>>>> [3]) while the Threads_lock is held by
>      >>      >>      >      >>>>> JVM_StartThread. When new thread is added  to the
>      >> thread
>      >>      >> table the
>      >>      >>      >      >>>>> table checks if its load factor is greater than
>      >> required and
>      >>      >> if so
>      >>      >>      >      >>>>> it grows itself while polling for safepoints.
>      >>      >>      >      >>>>> After changes [4]  an attempt to block the thread while
>      >>      >> holding the
>      >>      >>      >      >>>>> Threads_lock  results in assertion in
>      >>      >>      >      >>>>> Thread::check_possible_safepoint().
>      >>      >>      >      >>>>>
>      >>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you,
>      >> David!)  is
>      >>      >> to skip
>      >>      >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow()
>      >> method if the
>      >>      >>      >      >>>>> current thread owns the Threads_lock.
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> Sorry but looking at the fix in context now I think
>      >> it would be
>      >>      >>      >      >>>> better to do this:
>      >>      >>      >      >>>>
>      >>      >>      >      >>>>      while (gt.do_task(jt)) {
>      >>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
>      >>      >>      >      >>>>          gt.pause(jt);
>      >>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
>      >>      >>      >      >>>>          gt.cont(jt);
>      >>      >>      >      >>>>        }
>      >>      >>      >      >>>>      }
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> This way we don't waste time with the pause/cont when
>      >> there's no
>      >>      >>      >      >>>> safepoint pause going to happen - and the owner()
>      >> check is
>      >>      >> quicker
>      >>      >>      >      >>>> than owned_by_self(). That partially addresses a general
>      >>      >> concern I
>      >>      >>      >      >>>> have about how long it may take to grow the table, as
>      >> we are
>      >>      >>      >      >>>> deferring safepoints until it is complete in this
>      >>      >> JVM_StartThread
>      >>      >>      >      >>>> usecase.
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> In the test you don't need all of:
>      >>      >>      >      >>>>
>      >>      >>      >      >>>>    32  * @run clean ThreadStartTest
>      >>      >>      >      >>>>    33  * @run build ThreadStartTest
>      >>      >>      >      >>>>    34  * @run main ThreadStartTest
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> just the last @run suffices to build and run the test.
>      >>      >>      >      >>>>
>      >>      >>      >      >>>> Thanks,
>      >>      >>      >      >>>> David
>      >>      >>      >      >>>> -----
>      >>      >>      >      >>>>
>      >>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed
>      >> successfully,
>      >>      >> tier3 is
>      >>      >>      >      >>>>> in progress.
>      >>      >>      >      >>>>>
>      >>      >>      >      >>>>> [1] Webrev:
>      >>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
>      >>      >>      >      >>>>> [2] Bug:
>      >> https://bugs.openjdk.java.net/browse/JDK-8231666
>      >>      >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
>      >>      >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
>      >>      >>      >      >>>>>
>      >>      >>      >      >>>>> Best regards,
>      >>      >>      >      >>>>> Danill
>      >>      >>      >      >>>>>
>      >>      >>      >      >>>>>
>      >>      >>      >
>      >>      >>      >
>      >>      >>      >
>      >>      >>
>      >>      >>
>      >>
>      >>
>      
> 
> 

From daniel.daugherty at oracle.com  Tue Oct  8 14:49:26 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 8 Oct 2019 10:49:26 -0400
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
 <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
Message-ID: <9c8b3d50-bd75-5312-f15d-e584d1ae81f7@oracle.com>

Thumbs up!

Dan


On 10/1/19 4:57 PM, Daniil Titov wrote:
> Hello,
>
> Please review a new version of the change [1]  that fixes the problem with the  debugger not stopping in the low memory notification code. The fix moves the send notifications task from
> not visible ServiceThread to a new visible NotificationThread. This version of the  change also introduces  a new VM option to opt-out from the new behavior.
>
> Previous email threads:
> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html
> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html
>
> The proposed CSR [3] is for adding  a new VM option UseNotificationThread  (default true) to opt-out from the new behavior.
>
> Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.
>
> [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299
> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
>
> Thanks,
> Daniil
>
> ?
>
>


From daniil.x.titov at oracle.com  Tue Oct  8 18:01:33 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Tue, 08 Oct 2019 11:01:33 -0700
Subject: jmx-dev RFR: 8231666: ThreadIdTable::grow() invokes invalid
 thread transition
In-Reply-To: <e333ec68-1ca6-8036-043c-e330252e71fe@oracle.com>
References: <140927E0-55D7-490E-94BC-F294A8EE0901@oracle.com>
 <176b5148-77fd-4793-c4eb-17b137a36f8e@oracle.com>
 <d2af2637-a38a-24d1-9ff9-a6444b0f1300@oracle.com>
 <74c89be0-8d0c-3c5d-0038-b16f3a59de06@oracle.com>
 <fa1ac8de-1004-a1a8-83b0-92e1c8e6d9d0@oracle.com>
 <d21397f4-1f94-3f9d-02b1-ef3b1e8852c0@oracle.com>
 <F6F486C1-07A3-4825-99AA-B235C7D9A76B@oracle.com>
 <9082e266-45e1-4483-f7c7-d2da58203ed6@oracle.com>
 <E9222DAC-6D64-45DD-8F1D-8A326EB1DDA5@oracle.com>
 <336e6baa-509a-1b92-5641-e793c9499ec2@oracle.com>
 <a3b8beae-f093-c011-236f-4ac4b72ce754@oracle.com>
 <70DAA7FB-EC57-410A-A22F-4E6BB8B17CB8@oracle.com>
 <84f73dbf-1211-0031-b07a-343746a28994@oracle.com>
 <f8ed404f-e4c7-7b6a-aa20-1f660e84449f@oracle.com>
 <137B65A2-5662-4569-A813-20510068DDF5@oracle.com>
 <e333ec68-1ca6-8036-043c-e330252e71fe@oracle.com>
Message-ID: <F5FE2A6E-9C6B-4601-97CC-8E546CA56E09@oracle.com>

Hi David and Robbin,

Thank you for reviewing this fix!

Best regards,
Daniil

?On 10/8/19, 1:49 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:

    Great, thanks!
    
    /Robbin
    
    On 2019-10-07 18:41, Daniil Titov wrote:
    > Hi Robbin,
    > 
    > Yes, I ran my benchmark [1]. Please see below the output showing that the table was grown.
    > 
    > ../jdk/build/linux-x64-release/images/jdk/bin/java -cp . -Xlog:thread+table=info ThreadStartupTest
    > Starting the test
    > [0.185s][info][thread,table] Grown to size:512
    > The test finished.
    > Execution time:15673 ms
    > 
    > 
    > [1] https://cr.openjdk.java.net/~dtitov/tests/ThreadStartupTest.java
    > 
    > Thanks!
    > Daniil
    > 
    > 
    > ?On 10/7/19, 12:34 AM, "Robbin Ehn" <robbin.ehn at oracle.com> wrote:
    > 
    >      Hi Daniil,
    >      
    >      Yes, good, but:
    >      
    >      >>      >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.
    >      >>      And if you have not done so, you should test this with the benchmark you
    >      >> have as
    >      >>      a stress test and see that this does what we think.
    >      
    >      Can you please test it with your benchmark, if you have not done so?
    >      
    >      /Robbin
    >      
    >      >>      Thanks, Robbin
    >      >>      >>
    >      >>      >> Thank you!
    >      >>      >>
    >      >>      >> Best regards,
    >      >>      >> Daniil
    >      >>      >>
    >      >>      >> ?On 10/2/19, 3:26 PM, "David Holmes" <david.holmes at oracle.com> wrote:
    >      >>      >>
    >      >>      >>      Hi Daniil,
    >      >>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
    >      >>      >>      > Hi David and Robbin,
    >      >>      >>      >
    >      >>      >>      > Could we consider  making the ServiceThread responsible for the
    >      >>      >> ThreadIdTable resizing in the similar way how
    >      >>      >>      > it works for  StringTable  and ResolvedMethodTable, rather than
    >      >> having
    >      >>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
    >      >>      >>      > As I understand It should solve  the current  issue and
    >      >> address the
    >      >>      >> concern that  the doing the resizing could be a relatively long and
    >      >>      >>      > doing it without polling  for safepoints or while the holding
    >      >>      >> Threads_lock is not desirable.
    >      >>      >>      I originally rejected copying that part of the code from the other
    >      >>      >>      tables as it seems to introduce unnecessary complexity. Having a
    >      >>      >>      separate thread trying to grow the table when it could be
    >      >> concurrently
    >      >>      >>      having threads added and removed seems like it could introduce
    >      >> hard to
    >      >>      >>      diagnose performance pathologies. It also adds what we know to be a
    >      >>      >>      potentially long running action to the workload of the service
    >      >> thread,
    >      >>      >>      which means it may also impact the other tasks the service thread is
    >      >>      >>      doing, thus potentially introducing even more hard to diagnose
    >      >>      >>      performance pathologies.
    >      >>      >>      So this change does concern me. But go ahead and trial it.
    >      >>      >>      Thanks,
    >      >>      >>      David
    >      >>      >>      > Thank you,
    >      >>      >>      > Daniil
    >      >>      >>      >
    >      >>      >>      >
    >      >>      >>      > ?On 10/2/19, 6:25 AM, "David Holmes" <david.holmes at oracle.com>
    >      >> wrote:
    >      >>      >>      >
    >      >>      >>      >      Hi Robbin,
    >      >>      >>      >
    >      >>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    >      >>      >>      >      > Hi David,
    >      >>      >>      >      >
    >      >>      >>      >      >> What if the table is full and must be grown?
    >      >>      >>      >      >
    >      >>      >>      >      > The table uses chaining, it just means load factor tip
    >      >> over what is
    >      >>      >>      >      > considered a good backing array size.
    >      >>      >>      >
    >      >>      >>      >      Coleen raised a good question in a separate discussion,
    >      >> which made me
    >      >>      >>      >      realize that once the table has been initially populated all
    >      >>      >> subsequent
    >      >>      >>      >      additions, and hence all subsequent calls to grow() always
    >      >> happen
    >      >>      >> with
    >      >>      >>      >      the Threads_lock held. So we can't just defer the grow().
    >      >>      >>      >
    >      >>      >>      >      >> That aside, I'd like to know how expensive it is to
    >      >> grow this
    >      >>      >> table.
    >      >>      >>      >      >> What are we talking about here?
    >      >>      >>      >      >
    >      >>      >>      >      > We use global counter which on write_synchronize must
    >      >> scan all
    >      >>      >>      >      > threads to make sure they have seen the update (there some
    >      >>      >>      >      > optimization to avoid it if there is no readers at all).
    >      >> Since this
    >      >>      >>      >      > table contains the threads, we get double penalized, for
    >      >> each new
    >      >>      >>      >      > thread the synchronization cost increase AND the number
    >      >> of items.
    >      >>      >>      >      >
    >      >>      >>      >      > With concurrent reads you still need many thousands of
    >      >> threads, but
    >      >>      >>      >      > I think I saw someone mentioning 100k threads, assuming
    >      >> concurrent
    >      >>      >>      >      > queries the resize can take hundreds of ms to finish.
    >      >> Note that
    >      >>      >> reads
    >      >>      >>      >      > and inserts still in operate roughly at the same speed
    >      >> while
    >      >>      >>      >      > resizing. So a longer resize is only problematic if we
    >      >> do not
    >      >>      >>      >      > respect safepoints.
    >      >>      >>      >      I think if anything were capable of running 100K threads
    >      >> we would be
    >      >>      >>      >      hitting far worse scalability bottlenecks than this. But
    >      >> this does
    >      >>      >> seem
    >      >>      >>      >      problematic.
    >      >>      >>      >
    >      >>      >>      >      Thanks,
    >      >>      >>      >      David
    >      >>      >>      >      -----
    >      >>      >>      >
    >      >>      >>      >      > Thanks, Robbin
    >      >>      >>      >      >
    >      >>      >>      >      >>
    >      >>      >>      >      >> David
    >      >>      >>      >      >>
    >      >>      >>      >      >>> /Robbin
    >      >>      >>      >      >>>
    >      >>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
    >      >>      >>      >      >>>> Hi Daniil,
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >      >>      >>      >      >>>>> Please review a change that fixes the issue. The
    >      >> problem
    >      >>      >> here is
    >      >>      >>      >      >>>>> that that the thread is added to the ThreadIdTable
    >      >>      >> (introduced in
    >      >>      >>      >      >>>>> [3]) while the Threads_lock is held by
    >      >>      >>      >      >>>>> JVM_StartThread. When new thread is added  to the
    >      >> thread
    >      >>      >> table the
    >      >>      >>      >      >>>>> table checks if its load factor is greater than
    >      >> required and
    >      >>      >> if so
    >      >>      >>      >      >>>>> it grows itself while polling for safepoints.
    >      >>      >>      >      >>>>> After changes [4]  an attempt to block the thread while
    >      >>      >> holding the
    >      >>      >>      >      >>>>> Threads_lock  results in assertion in
    >      >>      >>      >      >>>>> Thread::check_possible_safepoint().
    >      >>      >>      >      >>>>>
    >      >>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you,
    >      >> David!)  is
    >      >>      >> to skip
    >      >>      >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow()
    >      >> method if the
    >      >>      >>      >      >>>>> current thread owns the Threads_lock.
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> Sorry but looking at the fix in context now I think
    >      >> it would be
    >      >>      >>      >      >>>> better to do this:
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>>      while (gt.do_task(jt)) {
    >      >>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
    >      >>      >>      >      >>>>          gt.pause(jt);
    >      >>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
    >      >>      >>      >      >>>>          gt.cont(jt);
    >      >>      >>      >      >>>>        }
    >      >>      >>      >      >>>>      }
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> This way we don't waste time with the pause/cont when
    >      >> there's no
    >      >>      >>      >      >>>> safepoint pause going to happen - and the owner()
    >      >> check is
    >      >>      >> quicker
    >      >>      >>      >      >>>> than owned_by_self(). That partially addresses a general
    >      >>      >> concern I
    >      >>      >>      >      >>>> have about how long it may take to grow the table, as
    >      >> we are
    >      >>      >>      >      >>>> deferring safepoints until it is complete in this
    >      >>      >> JVM_StartThread
    >      >>      >>      >      >>>> usecase.
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> In the test you don't need all of:
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>>    32  * @run clean ThreadStartTest
    >      >>      >>      >      >>>>    33  * @run build ThreadStartTest
    >      >>      >>      >      >>>>    34  * @run main ThreadStartTest
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> just the last @run suffices to build and run the test.
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>> Thanks,
    >      >>      >>      >      >>>> David
    >      >>      >>      >      >>>> -----
    >      >>      >>      >      >>>>
    >      >>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed
    >      >> successfully,
    >      >>      >> tier3 is
    >      >>      >>      >      >>>>> in progress.
    >      >>      >>      >      >>>>>
    >      >>      >>      >      >>>>> [1] Webrev:
    >      >>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >      >>      >>      >      >>>>> [2] Bug:
    >      >> https://bugs.openjdk.java.net/browse/JDK-8231666
    >      >>      >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >      >>      >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >      >>      >>      >      >>>>>
    >      >>      >>      >      >>>>> Best regards,
    >      >>      >>      >      >>>>> Danill
    >      >>      >>      >      >>>>>
    >      >>      >>      >      >>>>>
    >      >>      >>      >
    >      >>      >>      >
    >      >>      >>      >
    >      >>      >>
    >      >>      >>
    >      >>
    >      >>
    >      
    > 
    > 
    

From daniil.x.titov at oracle.com  Tue Oct  8 18:07:09 2019
From: daniil.x.titov at oracle.com (Daniil Titov)
Date: Tue, 08 Oct 2019 11:07:09 -0700
Subject: jmx-dev RFR: 8170299: Debugger does not stop inside the low
 memory notifications code
In-Reply-To: <84e1ee7b-da77-25b9-cd98-36e8bfc66032@oracle.com>
References: <75B5F778-DC49-494B-AC12-270F301677CA@oracle.com>
 <60639d41-735a-00d3-c9db-1955f581b89a@oracle.com>
 <E2BFE392-EF77-4072-9D7C-65C088D4F007@oracle.com>
 <9783ca89-0af8-2167-436a-e5ff2db631a3@oracle.com>
 <9a805686-57bf-c158-a777-c3cb7e38f09f@oracle.com>
 <194bd23a-0f16-19a9-a3e7-d02fa6d58369@oracle.com>
 <4B9AB4C0-EA29-4D0E-9010-3A635454FE30@oracle.com>
 <c3ea02e1-bd4c-838b-c3c4-fef718319090@oracle.com>
 <84e1ee7b-da77-25b9-cd98-36e8bfc66032@oracle.com>
Message-ID: <AE4949F9-2714-4EB1-AFB4-DB54B456DF31@oracle.com>

Hi David and Serguei,

Thank you for reviewing this fix!

Best regards,
Daniil

?On 10/2/19, 4:24 PM, "serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com> wrote:

    Hi Daniil,
    
    +1
    I also prefer (agree with) a new VM option to opt-out from the new behavior.
    Sorry for some latency in the review and discussion process.
    
    Thanks,
    Serguei
    
    
    On 10/1/19 20:20, David Holmes wrote:
    > Hi Daniil,
    >
    > Thanks again for your perseverance with this one.
    >
    > This looks fine to me.
    >
    > Thanks,
    > David
    > -----
    >
    > On 2/10/2019 6:57 am, Daniil Titov wrote:
    >> Hello,
    >>
    >> Please review a new version of the change [1]  that fixes the problem 
    >> with the  debugger not stopping in the low memory notification code. 
    >> The fix moves the send notifications task from
    >> not visible ServiceThread to a new visible NotificationThread. This 
    >> version of the  change also introduces  a new VM option to opt-out 
    >> from the new behavior.
    >>
    >> Previous email threads:
    >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-August/028863.html 
    >>
    >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028608.html 
    >>
    >>
    >> The proposed CSR [3] is for adding  a new VM option 
    >> UseNotificationThread  (default true) to opt-out from the new behavior.
    >>
    >> Testing: Mach5 tests tier1, tier2, tier3, and tier7 successfully passed.
    >>
    >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8170299/webrev.05/
    >> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8170299
    >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8231593
    >>
    >> Thanks,
    >> Daniil
    >>
    >> ?
    >>
    >>