RFR: 8340103: Add internal set_flag function to VMATree [v11]

Mon Nov 4 15:50:32 UTC 2024

On Thu, 24 Oct 2024 11:45:41 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi!
>> 
>> The old VirtualMemoryTracker has a method set_reserved_region_type(address, flag). We implement this for the new VMATree implementation by altering the signature slightly to set_reserved_region_type(address, size, flag). This simplifies the implementation greatly for our new data structure and leads to trivial changes for the callers (all callers already know the size).
>> 
>> This PR implements the internal implementation along with tests, but does not change any callers.
>> 
>> I also do a few cleanups:
>> 
>> - Change `Node` to `TreeNode` in tests, we've seen build failures because of this (probably a precompiled headers issue)
>> - Add a few `print_on` methods for  easy debugging
>> - Add a `size` alias, it was a bit confusing that some functions took an argument `position sz`, so changed that to `size sz` 
>> 
>> Thanks.
>
> Johan Sjölen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix incorrect merge

I slowly am getting up for air after Lilliput. Sorry for the dragged-out reviews.

src/hotspot/share/nmt/nmtTreap.hpp line 334:

> 332:     TreapNode* start = closest_leq(addr);
> 333:     TreapNode* end = closest_gt(addr);
> 334:     return Range{start, end};

I am fine with curly bracket initializer, but does it have to be so dense? What does the styleguide say? I never wrote plain arrays like this, always with spaces interleaved (`int[] i = { 1, 2, 3 };`)

src/hotspot/share/nmt/vmatree.cpp line 229:

> 227: }
> 228: #endif
> 229: 

This coding would benefit from better (or actually, any :-) comments.

Please describe what this function does. Important points are:
- what is the supposed behavior when tagging over multiple ranges?
- what if there are address holes in that range?

src/hotspot/share/nmt/vmatree.cpp line 243:

> 241:   RegionData new_data = RegionData(out_state(range.start).stack(), tag);
> 242: 
> 243:   position end = MIN2(from + size, range.end->key());

I get that `key` is the address which is the key for the treap, but using "key" at places where we use the address is super confusing for the casual code reader. Can we give TreapNode an alias, for "key"? E.g. "address" or "position"?

src/hotspot/share/nmt/vmatree.cpp line 245:

> 243:   position end = MIN2(from + size, range.end->key());
> 244:   SummaryDiff diff = register_mapping(from, end, type, new_data);
> 245:   size = size - (end - from);

Can you please not modify the input parameters, instead use temporary local vars? E.g. "pos" and "remaining".

src/hotspot/share/nmt/vmatree.cpp line 250:

> 248:   // If end < from + sz then there are multiple ranges for which to set the flag.
> 249:   while (end < from + size) {
> 250:     range = _tree.find_enclosing_range(from);

This is a bit of a brain teaser.

I first wondered why we needed to find the "from" node again, since we already have it. But the last `register_mapping` call can actually have coalesced ranges and either the old `from` node or the old `to` node can have disappeared. 

So, we have two cases: 
- the old "to" node is still there, in which case we could just read its RegionData state and use that for the next segment (modulo tag)
- the old "to" node disappeared due to merging (meaning, tagging the last segment causes it to coalesce with the next segment), in which case there is nothing to do, since this segment is already correctly tagged.

But I am not 100% sure. So, re-querying the range from the treap is probably the safest way to go. 

But can you please add a comment stating that register_mapping may have modified the tree structure and caused segments to coalesce, just invalidating the old `range` ?

src/hotspot/share/nmt/vmatree.hpp line 151:

> 149:   IntervalState& out_state(TreapNode* node) {
> 150:     return node->val().out;
> 151:   }

Why are these instance methods? Also, TreapNode* should be const. But maybe we can do things even better.

In cases like these I sometimes just use an ephemeral utility class. E.g.

class VMATreeNode {
const TreapNode* const _n;
public:
VMATreeNode(const TreapNode* node) .. assign _n
MemTag mem_tag_in() const { return _n->val().in.mem_tag();  } 
MemTag mem_tag_out() const { return _n->val().out.mem_tag();  } 
... all accessors in accessible form to save writing out the many dereferences ..
};

and then just use it as I go like this:

TreapNode* node = ...
if (VMATreeNode(node).mem_tag_in() == blabla )....

Bonus for a short name for VMATreeNode, because the code that really does anything then becomes a lot more readable.

test/hotspot/gtest/nmt/test_vmatree.cpp line 296:

> 294:     VMATree tree;
> 295: 
> 296:     VMATree::SummaryDiff result = tree.reserve_mapping(0, 500, rd);

How does using '0' gel with your find_enclosing_range, where NULL for start or end is "did not find"?

test/hotspot/gtest/nmt/test_vmatree.cpp line 313:

> 311:     EXPECT_EQ(0, diff.tag[i(mtNone)].reserve);
> 312:     EXPECT_EQ(500, diff.tag[i(mtGC)].reserve);
> 313:     EXPECT_EQ(100, diff.tag[i(mtClassShared)].reserve);

As  it is now, the test only compares sums, and needs quite a bit of boilerplate coding for that.

My proposal would be:

- give VMATree a test function that takes an array of ranges (plain simple stupid POD structs) and compares them item by item with its internal state. Let it return true on success, print an error on an outputStream* we give the function on error.
- then, in the test, hardcode the test array and let VMATree compare it.

Example call for above test:

struct testrange { from, to, stackindex, tag };
const testrange[] expected_form = {
 { 0, 500, NCS::StackIndex(), mtGC },
 { 500, 600, NCS::StackIndex(), mtClassShared },
};
tree.test_against(expected_form, tty);

That would be easier on the eye in the test file, more succinct, and catch more errors. Because we compare actual addresses, and we also compare expected stack indices. As it is now, this test would not catch if we accidentally screw up stacks.

test/hotspot/gtest/nmt/test_vmatree.cpp line 347:

> 345:     EXPECT_EQ(20, diff.tag[i(mtClassShared)].commit);
> 346:   }
> 347: }

I would like to see the following cases explicitly tested:
1) set a tag over a range spanning multiple existing segments with the same stack but different tags - here, I would like to see that set_tag causes the tree to coalesce these different ranges to a single range
2) set a tag over a range spanning multiple existing segments with different stacks and different tags - here, I would like to see that the original ranges stay intact, including their stack information, only the tags should now all be the same
3) set a tag within a range - here, I would like to see three ranges now

for 1, also for mismatched starts and ends where either start or end are in the middle of an existing range, which should cause the tree to split that range

-------------

PR Review: https://git.openjdk.org/jdk/pull/20994#pullrequestreview-2303017817
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827850013
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827873311
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827884139
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827892704
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827897330
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827866598
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827922473
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827929610
PR Review Comment: https://git.openjdk.org/jdk/pull/20994#discussion_r1827947503