RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory

Wed Jun 5 08:22:00 UTC 2024

On Tue, 4 Jun 2024 15:37:15 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();`
>> 
>> Where
>>   * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)`
>>   * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + <bytes beyond object>`
>>   * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == <bytes beyond object>`
>> 
>> So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)`
>> 
>> Because using the CHT with dynamically sized (and inlined)  types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. 
>> 
>> The only current (implicit) restriction regarding using  dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata).  I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. 
>> 
>> Running testing tier1-7
>
> src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1065:
> 
>> 1063:   assert(value_size >= sizeof(VALUE), "must include the VALUE");
>> 1064:   return sizeof(Node) - sizeof(VALUE) + value_size;
>> 1065: }
> 
> I'm not sure if this makes it less error prone, if I'm right that the size passed in is sizeof(Node) + sizeof(VALUE).

`size_t value_size` is supposed to be the dynamic size of VALUE. What is returned is the allocation size required to construct a CHT node which contains a VALUE with that dynamic size.

For the only current use of dynamic sized VALUE types, i.e. the SymbolTable. We call this with `Symbol::byte_size()`. Which is the dynamic size of that symbol. 

Not sure I understand exactly what you mean by:
> if I'm right that the size passed in is sizeof(Node) + sizeof(VALUE).

The `sizeof(Node) - sizeof(VALUE)` part of the calculation is a CHT implementation detail. That is excluding the data / payload how big is the node, or more plainly the size of the node metadata. 

The we add on the size of the data / payload. To end up with the size required for the node metadata and the data / payload.

Any users of CHT with dynamic types needs to know about `CHT::get_dynamic_node_size(size_t)` so in that sense it is still error prone. But the users do not have to know about the implementation of CHT to allocate a node. All they need to know is how much memory will the instance of a dynamically sized type require. 

(Some of the CHT implementations leaks through the restrictions it imposes, such that `sizeof(VALUE) % alignof(void*)` / `offset_of(Node, _value) + sizeof(_value) == sizeof(Node)`.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1627232232