RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v6]

Thu Nov 5 14:10:59 UTC 2020

On Thu, 5 Nov 2020 09:27:18 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". 
>>  
>> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test.
>> 
>> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node.
>>  
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html 
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html
>> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html
>
> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   comprehensive comment and code cleanup for BitfieldInsert transformation

Once the requested comment change is implemented to explain why this transform is delayed I think this wil be ready to ship. That said ...

I am not 100% convinced that this change is properly justified. I agree with Andrew Haley's earlier observation that this is not going to accrue much benefit. It will only make a measurable difference for 1) heavy bit-twiddling apps which 2) are cpu bound rather than memory bound (i.e where the reduction in register -> register ops is not rendered irrelevant by memory transfer delays) and 3) spend a significant proportion of their time merging masked and shifted data. That's a pretty small target. Set against that limited potential gain is the cost of doing one or two extra checks at at every invocation of Or::ideal plus the once-per-compile cost of setting up and performing an extra compiler optimization step. So, it is not clear to me that this change will definitely be an improvement. I'm accepting it on the grounds that any degradation of compile-time performance is probably going to be marginal too (I hope so). In future I think it would be wise to spend time picking more reward
 ing candidates for optimization and, more importantly, discussing the likelihood of them being useful on the compiler-dev or aarch64-port lists before proceeding to implement them and present them for a review.  That should help to obtain a much better return on effort than that has resulted from this patch.

-------------

Marked as reviewed by adinn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/511