[External] : Re: JDK-8334085: Cannot reproduce failing test
Patricio Chilano Mateo
patricio.chilano.mateo at oracle.com
Tue Jul 23 23:13:23 UTC 2024
On 7/23/24 4:38 PM, Iñigo Mediavilla wrote:
> Thanks again Patricio,
>
> I guess I was struggling to make the connection between the
> synchronized blocks and the calls to thaw. Based on your last email, I
> think that I understand that connection, but feel free to correct me
> if I'm wrong:
>
> - There cannot be a call to thaw after a synchronized block has
> entered and before its execution has completed. If I think about why
> this is not possible, I think that a thaw cannot happen after
> something like Thread.yield because in that case the yield would fail
> and the virtual thread would be pinned. It also should not happen
> because of a return barrier because a thaw of that kind only happens
> when we need to load a higher frame in the stack, which by definition
> cannot happen inside a synchronized block method / block.
Right.
> - held_monitor_count is only affected by synchronized methods / blocks
> or by JNI's MonitorEnter / MonitorExit. Other synchronization
> primitives in the JVM do not affect that variable.
There is also ObjectLocker which is used to acquire a Java monitor from
inside the VM. Also note that the counter can be modified during
deoptimization if we need to relock objects for which synchronization
was eliminated.
Patricio
> On Tue, Jul 23, 2024 at 6:09 PM Patricio Chilano Mateo
> <patricio.chilano.mateo at oracle.com> wrote:
>
>
> On 7/23/24 5:59 AM, Iñigo Mediavilla wrote:
>> Hello Patricio,
>>
>> Thanks a lot for your explanation.
>>
>> Why is it safe for Thaw code to assume that all non-jni monitors
>> will be released at that point, but the same assumption cannot be
>> made for jni monitors ?
>>
>> What would happen if ?
>>
>> 1. A virtual thread is unmounted
>> 2. We thaw a few frames and execute code that acquires a non-JNI
>> monitor
>> 3. We call thaw again
>>
>> Or is that not possible ?
> That would not be possible unless there is a bug. All monitors
> acquired from synchronized methods/blocks should have been
> released once execution of the synchronized method/block
> completes, either normally or abruptly (see [1]).
> Monitors that are acquired through JNI function MonitorEnter
> though are not automatically exited and a call to JNI function
> MonitorExit is required, unless DetachCurrentThread is used to
> implicitly release them (see [2]).
>
> [1]
> https://docs.oracle.com/javase/specs/jls/se22/html/jls-17.html#jls-17.1
> [2]
> https://docs.oracle.com/en/java/javase/22/docs/specs/jni/functions.html#monitor-operations
>
> Patricio
>> Thanks
>> Íñigo
>>
>> On Mon, Jul 22, 2024 at 4:11 PM Patricio Chilano Mateo
>> <patricio.chilano.mateo at oracle.com> wrote:
>>
>> Hi Iñigo,
>>
>> The problem is that we can unmount a virtual thread, then
>> mount it again, thaw a few frames, execute code that acquires
>> a JNI monitor, and then call thaw again without releasing
>> that monitor. Thaw code assumes all monitors must be released
>> at that point but doesn't consider JNI acquired ones. In this
>> test this will happen if the vthread is unmounted
>> in System.out.println("Thread doing JNI call: " ...) because
>> of contention with the main thread
>> doing System.out.println("Main waiting for event."). You can
>> reproduce this issue by adding Thread.yield() before
>> jniMonitorEnterAndLetObjectDie().
>>
>> Thanks,
>> Patricio
>>
>> On 7/22/24 7:30 AM, Iñigo Mediavilla wrote:
>>> Hello Serguei,
>>>
>>> Thanks a lot for sharing the update and for solving the
>>> issue. Do you think that you could help me understand
>>> exactly what's happening ?
>>>
>>> Based on the DBG output shared in JBS, my understanding is
>>> that what happens in the test is the following:
>>>
>>> Main Thread
>>> ------------------------- ----------------------------
>>> 1. acquire java lock
>>> 2. starting thread
>>> 3. jni call
>>> 4. MonitorContendedEnter
>>> 5. release java lock
>>> 6. acquire java lock
>>> 7. MonitorContendedEntered
>>> 8. Thread in sync section
>>> 9. release java lock
>>> 10. why freeze doesn't pin ?
>>>
>>> What I'm struggling to understand is why after the thread
>>> releases the java lock, the virtual thread is still frozen,
>>> and specially why does it freeze while holding a jni monitor
>>> ? I've run tests locally trying to freeze a virtual thread
>>> holding a JNI lock and my virtual threads are always being
>>> pinned to the carrier with reason ("holding a lock").
>>>
>>> Thanks in advance
>>>
>>> Íñigo
>>>
>>> On Fri, Jul 19, 2024 at 10:41 PM Serguei Spitsyn
>>> <serguei.spitsyn at oracle.com> wrote:
>>>
>>> Hi Iñigo,
>>>
>>> Patricio helped to reproduce this issue and also
>>> identified the problem (please, see in the bug report).
>>> The fix is a one-liner. I’ll post a PR after some mach5
>>> testing.
>>>
>>> Thank you for involvement into this issue!
>>>
>>> Thanks,
>>>
>>> Serguei
>>>
>>> *From: *Iñigo Mediavilla <imediava at gmail.com>
>>> *Date: *Saturday, July 13, 2024 at 12:08 AM
>>> *To: *Chris Plummer <chris.plummer at oracle.com>
>>> *Cc: *dholmes at openjdk.org <dholmes at openjdk.org>,
>>> loom-dev at openjdk.org <loom-dev at openjdk.org>,
>>> sspitsyn at openjdk.org <sspitsyn at openjdk.org>
>>> *Subject: *Re: JDK-8334085: Cannot reproduce failing test
>>>
>>> I see, in that case Serguei would you want to still own
>>> this JBS or would you be OK if I try to have a look at it ?
>>>
>>> Iñigo
>>>
>>> El vie, 12 jul 2024, 19:11, Chris Plummer
>>> <chris.plummer at oracle.com> escribió:
>>>
>>> Failures are very intermittent. We last saw a
>>> failure in our CI testing
>>> on 2024-07-03. What command are you using to run the
>>> test?
>>>
>>> Chris
>>>
>>> On 7/12/24 2:34 AM, Iñigo Mediavilla wrote:
>>> > Hello,
>>> >
>>> > While looking at possible JBS tickets to work on,
>>> I saw JDK-8334085
>>> > where an assertion was reported to be failing for the
>>> > GetOwnedMonitorInfoTest. Before I even asked
>>> around to wonder if this
>>> > issue was already being looked at, I tried to
>>> reproduce the failure
>>> > locally, but I don't manage to make the test fail.
>>> Is this still an
>>> > issue in JDK-24 ? David can you still reproduce
>>> the failing test ?
>>> >
>>> > Best
>>> >
>>> > Íñigo Mediavilla Saiz
>>> >
>>> >
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240723/edabd0da/attachment-0001.htm>
More information about the loom-dev
mailing list