RFR: 8264123: add ThreadsList.is_valid() support
Daniel D.Daugherty
dcubed at openjdk.java.net
Tue Mar 30 16:33:22 UTC 2021
On Tue, 30 Mar 2021 02:16:30 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>
> Hi Dan,
>
> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
>
> Any why is this all product code rather than debug?
>
> Thanks,
> David
@dholmes-ora - Thanks for the review!
> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
Please see my reply about the ThreadsList destructor code that you missed.
> Any why is this all product code rather than debug?
Please see my reply about guarantee() versus assert().
> src/hotspot/share/runtime/threadSMR.cpp line 285:
>
>> 283: }
>> 284:
>> 285: guarantee(ThreadsList::is_valid(current_list), "current_list="
>
> Why a guarantee and not an assert? What is the cost of doing this check?
This is a guarantee() instead of an assert() because this failure mode
is rarely caught with 'release' bits and very, very rarely caught with
'fastdebug' or 'slowdebug' bits. It's a very tight race.
My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.
I don't have data on what this correctness check costs, but it's just
a value check on the new `_magic` field value. Very much like other
"magic" field value checks that we have all over the VM...
> src/hotspot/share/runtime/threadSMR.cpp line 319:
>
>> 317: // We only validate hazard_ptrs that are not tagged since a tagged
>> 318: // hazard ptr can be deleted at any time.
>> 319: guarantee(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT
>
> Same query about guarantee versus assert
Same answer and to reiterate the important part:
My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.
> src/hotspot/share/runtime/threadSMR.cpp line 383:
>
>> 381: if (hazard_ptr == NULL) return;
>> 382: // If the hazard ptr is unverified, then ignore it since it could
>> 383: // be deleted at any time now.
>
> Why the difference in comment compared to line 317/318 when the subsequent checks are identical?
The `ValidateHazardPtrsClosure` is called from places that will use
the collected hazard ptr for traversal. Deletion while being traversed
would lead to the crashes that we want to avoid.
The `ScanHazardPtrGatherThreadsListClosure` is used to gather
hazard ptrs where the only the address value of the hazard ptr is
used and no traversal takes place. This is why I added the comment
on L300-302 above.
> src/hotspot/share/runtime/threadSMR.hpp line 209:
>
>> 207: bool includes(const JavaThread * const p) const;
>> 208:
>> 209: static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }
>
> This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.
You missed L652 in the `ThreadsList::~ThreadsList()` above:
_magic = 0xDEADBEEF;
When the destructor runs, the magic value is stomped. I've captured
failures where the is_valid() check observes 0xDEADBEEF and also
failures where the ThreadsList has been recycled into something else
and the `_magic` value is some other value that doesn't match
`THREADS_LIST_MAGIC`.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3255
More information about the hotspot-runtime-dev
mailing list