RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v5]

Sat Mar 11 02:34:22 UTC 2023

On Sun, 29 Jan 2023 06:41:10 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add lambda API unlink(Function f) per reviewers' request.
>
> I updated [JDK-8301296](https://bugs.openjdk.org/browse/JDK-8301296) to clarify the problems with the current ResourceHashtable design and my proposal for fixing them.
> 
> I think the above proposed fixes shouldn't block the progress of this PR, which is just an optimization that maintains the current behavior. So let's continue the discussions here.
> 
> There are two parts of this PR:
> 
> [1] For the optimization (counting the number of entries and exit the loop early), do you have any performance data that shows this is beneficial?
> 
> For this optimization to be beneficial, we must have two conditions -- (a) the table is too large so it's likely to have many unused entries, and (b) the hash function is bad so most of the unused entries are at the end of the table.
> 
> For (a), maybe it's better to change the table to ResizeableResourceHashtable?
> For (b), maybe you can also print out the occupancy of the table in your traces like this one (in your earlier PR https://github.com/openjdk/jdk/pull/12016)
> 
> 
> [12.824s][debug][hashtables] table_size = 109, break at 68
> 
> 
> If we have many entries (e.g., 40) but they all pack to the front end of the table, that means we have a bad hash function.
> 
> [2] As I suggested earlier, we should consolidate the code to use a single unlink loop, so you can apply this counting optimization in a single place. 
> 
> I am not quite sure why you would need the following in your "wrapper" functions:
> 
> 
> if (clean) {
>   *ptr = node->_next;
> } else {
>   ptr = &(node->_next);
> }
> 
> 
> and
> 
> 
> if (node->_next == nullptr) {
>   // nullify the bucket when reach the end of linked list.
>   *ptr = nullptr;
> }
> 
> 
> I wrote a version of the consolidated loop that's much simpler. It also aligns with the old code so the diff is more readable:
> 
> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8301136-resource-hash-unlink-all-suggestion
> 
> Note that I have both `unlink_all()` and `unlink_all(Function function)`, that's because the current API allows the user function to do two things (a) check if the entry should be freed, (b) perform special clean up on the entry. So if you want to free all entries but need to perform special clean up, you'd call  `unlink_all(Function function)`.

> @iklam
> 
> Like you said, I mixed up 2 things. My real intention is to introduce a new api `unlink_all()` because I used it in my project.
> 
> I understand that you want me to refactor unlink(Iterator) using lambda. In order to have an efficient unlink_all() and dtor, I have to factor out unlink_impl(). It's private and acts as the algorithmic template.
> 
> What should I do now? Should I split this JBS issue into 2? One is for the optimization and the other one for "unlink_all()"? that would make thing easier to review.
> 
> thanks, --lx

I think it’s best to split this PR into two. We can first address the unlink_all issue and do the refactoring in a subsequent PR. Thanks

-------------

PR: https://git.openjdk.org/jdk/pull/12213