RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

Fri Oct 4 10:44:53 UTC 2024

On Wed, 2 Oct 2024 21:29:28 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5
>> 
>> The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash).
>> 
>> I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.?
>> 
>> Also, this new implementation could simply replace the old one, instead of being an alternative. I am not sure if if would make any difference performance-wise.
>
> @rkennke The small loop looks to me that it will run over the end of the array.
> Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the  movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array:
>        // num_words (zero-based) = (haystack_len - 1) / 8;
>       __ movq(index, haystack_len);
>       __ subq(index, 1);
>       __ shrq(index, LogBytesPerWord);
> 
>       __ bind(L_loop);
>       __ movq(XMM_TMP1, Address(haystack, index, Address::times_8));
>       __ movq(Address(rsp, index, Address::times_8), XMM_TMP1);
>       __ subq(index, 1);
>       __ jcc(Assembler::positive, L_loop);

Yes, and that is intentional.

Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes.

Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct.

Then comes the actual loop.

Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1787528501