RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty dcubed at openjdk.java.net
Tue Mar 30 16:33:22 UTC 2021


On Tue, 30 Mar 2021 02:16:30 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>> 
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>
> Hi Dan,
> 
> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
> 
> Any why is this all product code rather than debug?
> 
> Thanks,
> David

@dholmes-ora - Thanks for the review!

> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.

Please see my reply about the ThreadsList destructor code that you missed.

> Any why is this all product code rather than debug?

Please see my reply about guarantee() versus assert().

> src/hotspot/share/runtime/threadSMR.cpp line 285:
> 
>> 283:     }
>> 284: 
>> 285:     guarantee(ThreadsList::is_valid(current_list), "current_list="
> 
> Why a guarantee and not an assert? What is the cost of doing this check?

This is a guarantee() instead of an assert() because this failure mode
is rarely caught with 'release' bits and very, very rarely caught with
'fastdebug' or 'slowdebug' bits. It's a very tight race.

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

I don't have data on what this correctness check costs, but it's just
a value check on the new `_magic` field value. Very much like other
"magic" field value checks that we have all over the VM...

> src/hotspot/share/runtime/threadSMR.cpp line 319:
> 
>> 317:       // We only validate hazard_ptrs that are not tagged since a tagged
>> 318:       // hazard ptr can be deleted at any time.
>> 319:       guarantee(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT
> 
> Same query about guarantee versus assert

Same answer and to reiterate the important part:

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

> src/hotspot/share/runtime/threadSMR.cpp line 383:
> 
>> 381:     if (hazard_ptr == NULL) return;
>> 382:     // If the hazard ptr is unverified, then ignore it since it could
>> 383:     // be deleted at any time now.
> 
> Why the difference in comment compared to line 317/318 when the subsequent checks are identical?

The `ValidateHazardPtrsClosure` is called from places that will use
the collected hazard ptr for traversal. Deletion while being traversed
would lead to the crashes that we want to avoid.

The `ScanHazardPtrGatherThreadsListClosure` is used to gather
hazard ptrs where the only the address value of the hazard ptr is
used and no traversal takes place. This is why I added the comment
on L300-302 above.

> src/hotspot/share/runtime/threadSMR.hpp line 209:
> 
>> 207:   bool includes(const JavaThread * const p) const;
>> 208: 
>> 209:   static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }
> 
> This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.

You missed L652 in the `ThreadsList::~ThreadsList()` above:

 _magic = 0xDEADBEEF;

When the destructor runs, the magic value is stomped. I've captured
failures where the is_valid() check observes 0xDEADBEEF and also
failures where the ThreadsList has been recycled into something else
and the `_magic` value is some other value that doesn't match
`THREADS_LIST_MAGIC`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255


More information about the hotspot-runtime-dev mailing list