RFR: 8351661: NMT: VMATree should support separate call-stacks for reserve and commit operations [v13]

Johan Sjölen jsjolen at openjdk.org
Wed Apr 23 21:38:03 UTC 2025


On Wed, 23 Apr 2025 07:57:13 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> In NMT detail mode, we need to have separate call-stacks for Reserve and Commit operations.
>> This PR adds a second stack to every node that will be used when committing (and uncommitting) the start node of a reserved region.
>
> Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin/master' into _8351661_separate_call_stack_reserve_commit
>  - one more test case for overlapping multiple regions
>  - 10 cases added for overlapping with multiple regions
>  - 24 test-cases added for full coverage of cases.
>  - fixed format of the missed Pre
>  - Pre conditions are also visualized.
>  - fixes. New check_tree impl and visualization.
>  - fix the GHA failures.
>  - improvements
>  - more fixes.
>  - ... and 4 more: https://git.openjdk.org/jdk/compare/e91e2160...b8b40862

Hi,

There have been many discussions regarding the semantics of these trees, which have caused the number of test cases to explode. I'd like to see a neat and understandable semantics of this, so I set out to write one.

This is the sketch of the semantics I wrote. The semantics are slightly wrong, so don't read into them too much. The comments starting with "In words:" show what I wanted to get out of each operation.


On the notation used
--------------------

An uppercase variable matches 0, 1 or many nodes, a lowercase variable matches at most one node (that means that it can also match 0 nodes).

A node has state (S, M, RS, CS) where
S \in Reserved | Committed | Released
M \in MemTag
RS \in Nil | Stack
CS \in Nil | stack
Position is implicit, as we are typically only interested in a node's position in relation to other nodes rather than their absolute position.

a = b means equality, not assignment, and a /= b inequality.
Comparison operators on nodes compare their position.
The state of a node is accessed using state(n).
The state of a node is equivalent to the `out` arrow in the VMATree, we avoid the in-arrows in this representation to keep the text succinct.

All nodes are always subject to *merging*, that is: Two nodes becoming one.Two nodes can be merged iff their states are the same.

We will write our semantics in terms of pre- and post-conditions on the tree with guard clauses.
The tree and nodes will be inspected with the use of pattern matching. The underscore, _, means "matches anything".
Let's look at an example.

Example of the notation
-----------------------

# RemoveNode a
Pre-condition:
T = X - a' - B
where state(a') = state(a)
Post-condition:
T = X - B

In words:
Before, we have a tree potentially containing an a'.
If that a' has the same state as a, then the post-condition will hold after this operation.
That is, the result is that the node is removed.

RightFit semantics
------------------

# Release [a, b)
Pre-condition:
T = X - a' - Y - b' - Z
where a' is >= a and b' < b

Post-condition:
T = X - a - b - Z
where state(a) = (Released, _, _, _)
      state(b) = state(b')

In words: When we release, we delete any nodes within the range, and add a node indicating where the released region starts and a node indicating where the released region ends.

# Reserve [a, b) with stack s, MemTag m
Pre-condition:
T = X - a' - Y - b' - Z
where a' is >= a and b' < b

Post-condition:
T = X - a - b - Z
where state(a) = (Reserved, m, s, Nil)
      state(b) = state(b')

In words:
When we reserve, we delete all nodes within the range and replace them with a - b, without changing the state between b' and Z.

# Commit [a, b) with stack s
Pre-condition:
T = X - a' - Y - b' - Z
where a' is >= a and b' < b
let E = every x in (a', Y) . state(x) = (Released, _, _, _) Comment: Let Z be every node which is released
Post-condition:
T = X - a' - Y - b' - Z
where forAll x in (a', Y) \ E . Comment: Every node which isn't Released
      state(x) = (Committed, _, _, s)
where state(b) = state(b')

In words:
When we commit we only change any non-Released nodes to Committed and add the callstack to the Committed Position.
This operation is implemented as if committing only ever changes the metainformation of pre-existing nodes,
without adding any nodes itself.
There are other ways of looking at Commit, where we explicitly commit any released regions. This can be done if a MemTag is supplied.

# Uncommit [a, b)
Pre-condition:
T = X - a' - Y - b' - Z
where a' is >= a and b' < b
let E = every x in (a', Y) . state(x) = (Released, _, _, _) Comment: Let Z be every node which is released
Post-condition:
T = X - a' - Y - b' - Z
where forAll x in (a', Y, b') \ E . Comment: Turn every Node that isn't Released into Reserved nodes
      state(x) = (Reserved, _, _, _)

In words:
Similar to Commit, except that it only changes nodes that aren't Released to Reserved.

Appendix:
# Commit [a, b) with stack s, MemTag m NOTE!!!  This also takes a MemTag
# Commit [a, b) with stack s
Pre-condition:
T = X - a' - Y - b' - Z
where a' is >= a and b' < b
let E = every x in (a', Y) . state(x) = (Released, _, _, _) Comment: Let E be every node which is released
Post-condition:
T = X - a' - Y - b' - Z
where forAll x in (a', Y) \ E . Comment: Every node which isn't Released
      state(x) = (Committed, _, _, s)
where state(b) = state(b')
where forAll x in E .
      state(x) = (Committed, m, s, s)

In words:
When we have a Commit which takes a MemTag, then we can switch all Released areas to Committed and provide the MemTag m and stack s.


I then set out to implement these semantics. I didn't want to be burdened by excessive language ceremony,so I turned to Python. Turns out, the sketch is slightly off at times, but it's very similar. The code is severely under-tested, but it shows a neat pattern of repetition. The VMA tree here is represented as a sorted array.

Note that we always extract three components out of the tree:

a) X: All nodes N < a
b) aYb: All nodes a >=  N <= b
c) Z: All nodes N > b

Also note that we only need the last element in X and the first element in Z, none other are relevant. For aYb, we only need to iterate it for Commit and Uncommit. I skipped the `in` parameter, it wasn't very necessary.

If you want to, feel free to port the test cases to Python and try some out. I think the semantics are nice. I can't promise there aren't any bugs, this was hammered out in a couple of hours.

I didn't do the summary accounting, that does require that we inspect everything we change in the tree.

NOTE! We can efficiently ( O(log n) ) grab ranges and merge them back together like I do with the array in a tree using split/merge operations. These are present in the `Treap` and can be implemented for the `RBTree`.


class State:
    def __init__(self, S, M, RS, CS):
        self.S = S
        self.M = M
        self.RS = RS
        self.CS = CS

class Node:
    def __init__(self, out, address):
        self.out = out
        self.address = address

    def __repr__(self):
        return str(self.address) + " " + str(self.out.S) + " " + str(self.out.M) + " [" + str(self.out.RS) + "," + str(self.out.CS) + "]"

def lt_r(address, vma):
    lt = []
    for n in vma:
        if n.address < address:
            lt.append(n)
    return lt

def gt_r(address, vma):
    gt = []
    for n in vma:
        if n.address > address:
            gt.append(n)
    return gt

def gte_le(gte, le, vma):
    r = []
    for n in vma:
        if n.address >= gte and n.address <= le:
            r.append(n)
    r.sort(key=lambda x: x.address)
    return r


def equiv(a, b):
    return a.out.S == b.out.S and a.out.M == b.out.M and a.out.RS == b.out.RS and a.out.CS == b.out.CS

def Release(vma, a, b):
    X = lt_r(a, vma)
    Z = gt_r(b, vma)
    aYb = gte_le(a, b, vma)

    n_a = []
    n_b = []

    if len(X) > 0:
        n_a = [Node(State('Released', None, None, None), a)]

    if len(aYb) > 0 and aYb[-1].out.S != 'Released':
        n_b = [Node(aYb[-1].out, b)]

    # Merge on the left
    if len(X) > 0 and equiv(X[-1], n_a[0]):
        addr = X[-1].address
        X = X[:-1]
        n_a[0].address = addr

    # Merge on the right
    if len(Z) > 0 and equiv(Z[0], n_b[0]):
        Z = Z[1:]
    if len(Z) > 0 and len(n_b) > 0 and equiv(Z[0], n_a[0]):
        n_b = []

    return X + n_a + n_b + Z

def Reserve(vma, a, b, M, RS):
    X = lt_r(a, vma)
    Z = gt_r(b, vma)
    aYb = gte_le(a, b, vma)

    n_a = [Node(State('Reserved', M, RS, None), a)]
    n_b = []
    if len(aYb) > 0:
        n_b = [Node(aYb[-1].out, b)]
    else:
        n_b = [Node(State('Released', None, None, None), b)]

    # Merge on the left
    if len(X) > 0 and equiv(X[-1], n_a[0]):
        addr = X[-1].address
        X = X[:-1]
        n_a[0].address = addr
    # Merge on the right
    if len(Z) > 0 and equiv(Z[0], n_b[0]):
        Z = Z[1:]
    if len(Z) > 0 and equiv(Z[0], n_a[0]):
        n_b = []

    return X + n_a + n_b + Z

def Commit(vma, a, b, CS):
    X = lt_r(a, vma)
    Z = gt_r(b, vma)
    aYb = gte_le(a, b, vma)

    # Now we have two choices:
    # 1. Just commit everything, that's equivalent to most of the Reserve stuff above
    # 2. Deal with each subregion, only committing what is Reserved and leaving the rest be.
    # Let's go with number 2, as that's the most painful.
    for n in aYb:
        if n.out.S != 'Released':
            n.out.S = 'Committed'
            n.out.CS = CS

    n_a = []
    n_b = []

    # Figure out n_a and n_b, that depends on aYb
    if len(aYb) > 0 and aYb[0].address == a:
        n_a = [aYb[0]]
        aYb = aYb[1:]
    else:
        n_a = [Node(State('Committed', None, None, CS), a)]
        if len(X) > 0:
            copy = X[-1]
            n_a[0].out.M = copy.out.M
            n_a[0].out.RS = copy.out.RS

    if len(aYb) > 0 and aYb[-1].address == b:
        n_b = [aYb[-1]]
        aYb = aYb[:-1]
    elif len(aYb) > 0 and aYb[-1].out.S == 'Released':
        n_b = [aYb[-1]]
        aYb = aYb[:-1]
    else:
        n_b = [Node(State('Released', None, None, None), b)]

    # Merge on the left
    if len(X) > 0 and equiv(X[-1], n_a[0]):
        addr = X[-1].address
        X = X[:-1]
        n_a[0].address = addr
    # Merge on the right
    if len(Z) > 0 and equiv(Z[0], n_b[0]):
        Z = Z[1:]
    if len(Z) > 0 and equiv(Z[0], n_a[0]):
        n_b = []

    return X + n_a + aYb + n_b + Z

def Uncommit(vma, a, b):
    X = lt_r(a, vma)
    Z = gt_r(b, vma)
    aYb = gte_le(a, b, vma)

    # This is almost identical to Commit, except we go to Reserved and remove the CS.
    for n in aYb:
        if n.out.S != 'Released':
            n.out.S = 'Reserved'
            n.out.CS = None

    n_a = []
    n_b = []

    # Figure out n_a and n_b, that depends on aYb
    if len(aYb) > 0 and aYb[0].address == a:
        n_a = [aYb[0]]
        aYb = aYb[1:]
    else:
        n_a = [Node(State('Reserved', None, None, CS), a)]
        if len(X) > 0:
            copy = X[-1]
            n_a[0].out.M = copy.out.M
            n_a[0].out.RS = copy.out.RS

    if len(aYb) > 0 and aYb[-1].address == b:
        n_b = [aYb[-1]]
        aYb = aYb[:-1]
    elif len(aYb) > 0 and aYb[-1].out.S == 'Released':
        n_b = [aYb[-1]]
        aYb = aYb[:-1]
    else:
        n_b = [Node(State('Released', None, None, None), b)]

    # Merge on the left
    if len(X) > 0 and equiv(X[-1], n_a[0]):
        addr = X[-1].address
        X = X[:-1]
        n_a[0].address = addr
    # Merge on the right
    if len(Z) > 0 and equiv(Z[0], n_b[0]):
        Z = Z[1:]
    if len(Z) > 0 and equiv(Z[0], n_a[0]):
        n_b = []

    return X + n_a + aYb + n_b + Z

# Release tests
#print(Release([], 0, 100))
#print(Release([Node(State('Reserved', None, None, None), 0), Node(State('Released', None, None, None), 100)], 0, 100))
# Reserve tests
#print(Reserve([], 0, 100, 'mtTest', 'STACK!'))
#r = Reserve(Reserve([], 0, 100, 'mtTest', 'STACK!'), 100, 200, 'mtTest', 'STACK!')
#print(Commit(r, 0, 100, 'STACK2!'))
r = Reserve(Reserve([], 0, 10, 'mtTest', 'STACK!'), 20, 30, 'mtTest', 'STACK!')
print(Commit(r, 0, 100, 'STACK2!'))
print(Uncommit(r, 0, 100))

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24028#issuecomment-2825552234


More information about the hotspot-runtime-dev mailing list