[External] : Re: Virtual thread memory leak because TrackingRootContainer keeps threads

Alex Otenko oleksandr.otenko at gmail.com
Thu Aug 1 09:29:50 UTC 2024


I don't know if it helps, but Go has extra communication between the
routine closing the channel and the routine "stuck" reading from it. It is
not implicit in "this routine is stuck, let me behave like the channel is
closed". In that vein one might ask why the queue became unreachable
without extra communication, eg via interrupting the thread. You'd need to
elaborate more why that would be a perfectly good concurrent system, not
just puzzle over the contrived example.

On Thu, 1 Aug 2024, 10:12 robert engels, <robaho at icloud.com> wrote:

> Go is not Java, but Go routines are very similar to the ephemeral virtual
> threads it appears the Java team is striving for.
>
> Still, they use closable channels to implement this - with the goal of
> readability and predictability in mind.
>
> I am not certain but I don’t think Erlang processes have this behavior
> either.
>
> The software world is moving to “being explicit” above all else. This
> would go against that.
>
> Still, I would love to know how this would even be implemented. The easy
> solution would be just to remove the VT thread stack memory and all
> references would be cleanup by GC - but I think if you don’t go up the
> stack with finally blocks you will have the same consistency problems as
> Thread.stop()
>
> On Aug 1, 2024, at 3:42 AM, robert engels <robaho at icloud.com> wrote:
>
> 
> You are absolutely correct, but I thought it was obvious that A is an
> object with side effects that the runtime is not permitted to optimize away
> because of which.
>
> Because the runtime is permitted in some cases to omit the calling of
> certain methods doesn’t change the aspect of consistency I am referring to.
>
> On Aug 1, 2024, at 12:23 AM, Alex Otenko <oleksandr.otenko at gmail.com>
> wrote:
>
> 
>
> I think this is only because you haven't encountered the problems that you
> are describing elsewhere.
>
> Here's a modified example: no queues, no VT.
>
> a = new SomeLargeObject();
>
> if (false) {
>   a.doSomethingSideEfdecting();
> }
>
> The code isn't written like that, but the condition is known to be false
> to the optimiser.
>
> What should happen to a? When should its finalizer be invoked? What
> guarantees that the object is even created?
>
> If you admit some things about this code, the same will apply to the stuck
> VT.
>
> On Thu, 1 Aug 2024, 00:15 Robert Engels, <robaho at icloud.com> wrote:
>
>> Also, I will share this, as it seems very similar to the Thread.stop()
>> deprecation:
>>
>> Why is Thread.stop deprecated?
>> Because it is inherently unsafe. Stopping a thread causes it to unlock
>> all the monitors that it has locked. (The monitors are unlocked as the
>> ThreadDeath exception propagates up the stack.) If any of the objects
>> previously protected by these monitors were in an inconsistent state, other
>> threads may now view these objects in an inconsistent state. Such objects
>> are said to be damaged. When threads operate on damaged objects, arbitrary
>> behavior can result. This behavior may be subtle and difficult to detect,
>> or it may be pronounced. Unlike other unchecked exceptions, ThreadDeath kills
>> threads silently; thus, the user has no warning that his program may be
>> corrupted. The corruption can manifest itself at any time after the actual
>> damage occurs, even hours or days in the future.
>>
>> Here we are talking about something even worse - no exception up the
>> stack - just object references vanishing - regardless of locks,
>> stack-levels, etc.
>>
>> Imagine an auxiliary thread that had a soft/weak reference to A and and a
>> reference queue. It would see A reference added to the code - but no review
>> of any logs, etc. would allow the developer to determine why it became
>> unreachable.
>>
>> The logs would be inconsistent.
>>
>>
>> On Jul 31, 2024, at 6:07 PM, Robert Engels <robaho at icloud.com> wrote:
>>
>> The reason it needs to be there is that a developer needs to be able to
>> reason about the code - and the state changes that moved the system to its
>> current state.
>>
>> If you know you created an A, but the A does not appear in the heap dump,
>> and you can determine that the finally block never executed - how are you
>> supposed to reason and validate the system - remember the take() in this
>> case could be several levels deep - so not anyway near obvious to say “it
>> must be due to a virtual thread vanishing”.
>>
>>
>>
>> On Jul 31, 2024, at 6:03 PM, Alex Otenko <oleksandr.otenko at gmail.com>
>> wrote:
>>
>> I don't see objects missing from a heap dump as a problem. I don't think
>> there is any guarantee about that in the JVM spec. In fact, there is no
>> requirement to provide any heap dump. Is there?..
>>
>> So it really is just about the finalizers being executed or not. I
>> actually think not executing them is better (no guarantees of when they
>> should be executed,  is there?), but maybe there are good reasons to get
>> them executed.
>>
>> On Wed, 31 Jul 2024, 23:59 robert engels, <robaho at icloud.com> wrote:
>>
>>> The program order - except in a case of abrupt termination - must hold.
>>> So having A not be present in a heap dump with the finally block never
>>> being executed is a problem. It is breaking the specification on
>>> reachability.
>>>
>>> So change the spec but then you have different behavior if a platform
>>> thread is executing the code versus a virtual thread - which would be
>>> another nightmare for readability and auditing.
>>>
>>> I am honestly baffled how this has proceeded so far without someone from
>>> the loom team saying - you’re right this would be crazy to do.
>>>
>>> It has if people are writing systems code with no concern for the
>>> application code sitting on top.
>>>
>>> On Jul 31, 2024, at 5:28 PM, Alex Otenko <oleksandr.otenko at gmail.com>
>>> wrote:
>>>
>>> 
>>>
>>> Like, imagine it is not GCed, but swapped out to /dev/null. Eh? What's
>>> the problem with that? We'll swap it back in, when the VT can continue.
>>> Deal? :)
>>>
>>> On Wed, 31 Jul 2024, 23:22 Alex Otenko, <oleksandr.otenko at gmail.com>
>>> wrote:
>>>
>>>> I think you may need to revisit heap dump format. It is all best effort
>>>> to capture JVM state, not actual state, and not true memory representation.
>>>> Like, the pointers are 64 bit, even if compressed oops are in use, ...
>>>>
>>>> I've seen heap dumps with missing stack traces and missing GC roots.
>>>> So, no, I don't think we should worry too much about objects that get
>>>> optimised away.
>>>>
>>>> Side-effecting finalizers bug me. But if you can reclaim memory without
>>>> executing them, I think it may be fine. After all, that's the behaviour of
>>>> the program where the object is still alive.
>>>>
>>>> On Wed, 31 Jul 2024, 23:03 robert engels, <robaho at icloud.com> wrote:
>>>>
>>>>> As a follow-up, even if the heap dump could show the A instance - what
>>>>> about inspection of its fields? Or if you trace its back references what is
>>>>> its root?
>>>>>
>>>>> A virtual thread may not be a GC root, but it’s stack data has to be
>>>>> or every monitoring/analysis technique is broken - and systems will be
>>>>> impossible to audit.
>>>>>
>>>>>
>>>>> On Jul 31, 2024, at 4:59 PM, robert engels <robaho at icloud.com> wrote:
>>>>>
>>>>> 
>>>>> Yes, but it’s a graph, because if A doesn’t have a finalizer but
>>>>> references something that does you face the same problem.
>>>>>
>>>>> I think it will. R very hard to profile / debug applications if this
>>>>> were to come to pass. So I open a profiler and expect to see and instance
>>>>> of A but I don’t, and I don’t see any log message (assume finally block
>>>>> printed something) - am I just suppose to assume… well it was a virtual
>>>>> thread and it “vanished” because it couldn’t make progress.
>>>>>
>>>>> That is ludicrous imo. And also completely unnecessary and against 20
>>>>> plus years of Java design.
>>>>>
>>>>> On Jul 31, 2024, at 4:23 PM, Alex Otenko <oleksandr.otenko at gmail.com>
>>>>> wrote:
>>>>>
>>>>> 
>>>>>
>>>>> I think your observation about finalizers is important. But without
>>>>> finalizers - how do you detect that the object is not alive?
>>>>>
>>>>> More broadly - it is not the object that matters; the code must behave
>>>>> like if the object were alive. (Think of unwrapped Optionals and Integers)
>>>>>
>>>>> On Wed, 31 Jul 2024, 22:03 robert engels, <robaho at icloud.com> wrote:
>>>>>
>>>>>> This analysis is incorrect. The guarantees about program order say
>>>>>> that ‘a’ must be alive OR the finally block must execute. There is no gray
>>>>>> area here. This MUST hold.
>>>>>>
>>>>>> You don’t even need to references GC roots - the specification
>>>>>> considers reachability from instance of Thread. A VirtualThread extends
>>>>>> Thread so proper OO means it must act as a Thread for all behaviors of
>>>>>> Thread.
>>>>>>
>>>>>> On Jul 31, 2024, at 1:29 PM, Michal Domagala <outsider404 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>> Documentation says: "The finally block always executes when the try
>>>>>> block exits"
>>>>>> Not "The finally block always executes"
>>>>>>
>>>>>> I think there are 3 areas to seek an answer about correct VT behavior.
>>>>>>
>>>>>> First is general, "sky-level" rules. GC rule says that "what is
>>>>>> unreferenced and is not GC root, is collectable". The rule says that VT
>>>>>> should be collectable, but many may contest that the rule - as very generic
>>>>>> - is not applicable for blocked VT. But the rule works: blocked VT is
>>>>>> successfully GC-able, if only observability is off.
>>>>>>
>>>>>> Second is habits. I agree that there is a habit that "finally block"
>>>>>> is always executed. But maybe it is just a habit? I think VT GC case in
>>>>>> something fresh and cannot be matched to existing experience, but if do
>>>>>> "kill -s SIGTERM", which is more cooperative than SIGKILL or unplug power,
>>>>>> "finally block" is not executed
>>>>>>
>>>>>> Third is common sense. If something is unreachable, better reclaim
>>>>>> resources and pray it works than have a memory leak and countdown to OOM.
>>>>>>
>>>>>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240801/6a6add5c/attachment-0001.htm>


More information about the loom-dev mailing list