RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v5]

Mon Jan 30 20:54:58 UTC 2023

On Sun, 29 Jan 2023 06:41:10 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add lambda API unlink(Function f) per reviewers' request.
>
> I updated [JDK-8301296](https://bugs.openjdk.org/browse/JDK-8301296) to clarify the problems with the current ResourceHashtable design and my proposal for fixing them.
> 
> I think the above proposed fixes shouldn't block the progress of this PR, which is just an optimization that maintains the current behavior. So let's continue the discussions here.
> 
> There are two parts of this PR:
> 
> [1] For the optimization (counting the number of entries and exit the loop early), do you have any performance data that shows this is beneficial?
> 
> For this optimization to be beneficial, we must have two conditions -- (a) the table is too large so it's likely to have many unused entries, and (b) the hash function is bad so most of the unused entries are at the end of the table.
> 
> For (a), maybe it's better to change the table to ResizeableResourceHashtable?
> For (b), maybe you can also print out the occupancy of the table in your traces like this one (in your earlier PR https://github.com/openjdk/jdk/pull/12016)
> 
> 
> [12.824s][debug][hashtables] table_size = 109, break at 68
> 
> 
> If we have many entries (e.g., 40) but they all pack to the front end of the table, that means we have a bad hash function.
> 
> [2] As I suggested earlier, we should consolidate the code to use a single unlink loop, so you can apply this counting optimization in a single place. 
> 
> I am not quite sure why you would need the following in your "wrapper" functions:
> 
> 
> if (clean) {
>   *ptr = node->_next;
> } else {
>   ptr = &(node->_next);
> }
> 
> 
> and
> 
> 
> if (node->_next == nullptr) {
>   // nullify the bucket when reach the end of linked list.
>   *ptr = nullptr;
> }
> 
> 
> I wrote a version of the consolidated loop that's much simpler. It also aligns with the old code so the diff is more readable:
> 
> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8301136-resource-hash-unlink-all-suggestion
> 
> Note that I have both `unlink_all()` and `unlink_all(Function function)`, that's because the current API allows the user function to do two things (a) check if the entry should be freed, (b) perform special clean up on the entry. So if you want to free all entries but need to perform special clean up, you'd call  `unlink_all(Function function)`.

hi, @iklam 

`unlink(Function f)` is the general API. It removes a selection of nodes. We don't know who is going to be deleted. To be conservative. we have to meticulously maintain the open-chain. `unlink_all()` knows that all nodes are all gone, so we don't need to maintain linked-list at all. 

Here is an example that an open-chain consists of 3 nodes. It explains why unlink_all() is better than 3 unlink() in a row. We don't need to maintain intermediate states such as `*ptr = &B`. 

![unlink_all_3_nodes](https://user-images.githubusercontent.com/2386768/215590069-fff2584f-55d0-4225-996e-44a2f4f0022f.png)

For `unlink_all()`, we need to update *bucket when we are at the last node of the open chain. 
That's that last step in the figure. 

if (node->_next == nullptr) {
  // nullify the bucket when reach the end of linked list.
  *ptr = nullptr;
}

The destructor version is even simplfied. Because the object is dead anyway, we just cut out all overhead.

-------------

PR: https://git.openjdk.org/jdk/pull/12213