Experiencing an issue with ScheduledExecutorService alongside VT

Sun Jul 21 08:01:54 UTC 2024

I will rewrite everything again as it lacks some context and information.

Hey team

We are experiencing some weird halting issues when scheduling tasks with
ScheduledExecutorService and VT.

We are observing some *random* near deadlock issues / halting of executors
which we are not able to reproduce when using platform threads.
It happens after a random amount of time.

This is one example of a near deadlock situation, where the scheduling
platform thread successfully acquires a semaphore permit, assumingly
starting a virtual thread, but the debug log "Scheduling..." is not being
printed at all,  preventing the finally block or the catch Throwable block
from releasing the acquired semaphore permit, and reaching to drained
semaphore.

Few notes:
- We are testing on EA build  23-loom+4-102
- The scheduler is running on a platform thread

*- Flight recorder does not indicate any pinned VT*- This code is running
within a Docker container based on ubuntu:20.04 image
- Using G1 GC

Has anyone experienced this issue before? I hope we are doing something
wrong 😅

```ScheduledFuture<?> deviceFutureTask = scheduler.scheduleAtFixedRate(()
-> {

try {

logger.debug("[{}] Trying to acquire permit to schedule task [{}] for
device. Number of available permits: [{}]",

device, task, semaphore.availablePermits());

if (semaphore.tryAcquire(waitingTimeout, TimeUnit.MILLISECONDS)) {

logger.debug("[{}] Acquired permit to schedule [{}] task for device",
device, task);

Thread.ofVirtual().start(() -> {

try {

logger.debug("[{}] Scheduling [{}] task for device", device, task);

// some I/O intensive work

logger.debug("[{}] Finished processing [{}] task for device", device, task);

} catch (Exception e) {

logger.error("[{}] Failed to process [{}] task for device", device, task,
e);

} finally {

semaphore.release();

}

});

} else {

logger.error("Timed out while waiting for permit to schedule task [{}] for
device [{}]", task, device);

}

} catch (Throwable t) {

logger.error("Failed to execute task [{}] for device [{}]", task, deviceId,
t);

semaphore.release();

}

}, (long) (entropy * schedulingInterval), schedulingInterval, TimeUnit.
MILLISECONDS);```

On Sun, 21 Jul 2024 at 10:16, Yuval Lombard <yuval.l at securithings.com>
wrote:

> Did not mention by I based the code on version 23-loom+4-102
>
> On Sun, 21 Jul 2024 at 10:05, Yuval Lombard <yuval.l at securithings.com>
> wrote:
>
>> Hey team
>>
>> We are experiencing some weird halting issues when scheduling tasks with
>> ScheduledExecutorService and VT.
>>
>> We are observing some near deadlock issues / halting of executors which
>> we are not able to reproduce when using platform threads
>>
>> This is one example of a near deadlock situation, where for some reason
>> virtual threads are starting to execute a task, and not reaching the
>> finally block nor the catch Throwable block to release the acquired
>> semaphore permit, reaching to drained semaphore.
>>
>> Are you familiar with this behavior? I hope we are doing something
>> wrong 😅
>>
>> Note - The scheduler is running on a platform thread
>>
>> ```ScheduledFuture<?> deviceFutureTask = scheduler.scheduleAtFixedRate(()
>> -> {
>>
>> try {
>>
>> logger.debug("[{}] Trying to acquire permit to schedule task [{}] for
>> device. Number of available permits: [{}]",
>>
>> device, task, semaphore.availablePermits());
>>
>> if (semaphore.tryAcquire(waitingTimeout, TimeUnit.MILLISECONDS)) {
>>
>> logger.debug("[{}] Acquired permit to schedule [{}] task for device",
>> device, task);
>>
>> Thread.ofVirtual().start(() -> {
>>
>> try {
>>
>> logger.debug("[{}] Scheduling [{}] task for device", device, task);
>>
>> // some I/O intensive work
>>
>> logger.debug("[{}] Finished processing [{}] task for device", device,
>> task);
>>
>> } catch (Exception e) {
>>
>> logger.error("[{}] Failed to process [{}] task for device", device,
>> task, e);
>>
>> } finally {
>>
>> semaphore.release();
>>
>> }
>>
>> });
>>
>> } else {
>>
>> logger.error("Timed out while waiting for permit to schedule task [{}]
>> for device [{}]", task, device);
>>
>> }
>>
>> } catch (Throwable t) {
>>
>> logger.error("Failed to execute task [{}] for device [{}]", task,
>> deviceId, t);
>>
>> semaphore.release();
>>
>> }
>>
>> }, (long) (entropy * schedulingInterval), schedulingInterval, TimeUnit.
>> MILLISECONDS);```
>>
>>
>> We are
>> --
>>
>> Kind regards,
>>
>> *Yuval Lombard*
>>
>> *Lead Software Engineer*
>>
>> +972.50.548.0111
>>
>> yuval.l at securithings.com
>>
>> [image: logo_black.png]
>>
>
>
> --
>
> Kind regards,
>
> *Yuval Lombard*
>
> *Lead Software Engineer*
>
> +972.50.548.0111
>
> yuval.l at securithings.com
>
> [image: logo_black.png]
>

-- 

Kind regards,

*Yuval Lombard*

*Lead Software Engineer*

+972.50.548.0111

yuval.l at securithings.com

[image: logo_black.png]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240721/fcb622eb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_black.png
Type: image/png
Size: 99833 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240721/fcb622eb/logo_black-0001.png>