RFC: x86_64: Fixing the align() macro

Fri Aug 20 01:54:00 UTC 2021

Hi Scott,

On 8/19/21 5:10 PM, Gibbons, Scott wrote:
> Hi, everyone.  This is my first post to this forum, so please let me know if this type of discussion is welcome here, and an alternative forum if it's not :).
>
> I ran into an issue where align(64) breaks on the x86_64 platform within stubGenerator_x86_64.cpp.  I would like input on a fix *AND* how the community would like to see it tested.
>
> There are occasions where I need to align certain data to a 64-byte boundary for AVX-512 aligned instructions.  The issue that I uncovered was that the single-parameter align(int modulus) would sometimes not align properly with align(64).  Unwinding this, I found that align(int modulus) calls align(int modulus, int target) with the target parameter as offset() (i.e., align(modulus, offset());).  Further exploration showed that offset() was 'return _end - _start', which is effectively the size of the code segment and not the offset of the PC.  So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested.
>
> The workaround that I implemented was to use the two-parameter version of align(), passing pc() as the target.  This fixed the specific issue I was encountering, and now I'd like to implement a general solution.

The problem with using pc() is when the code buffer gets copied to a 
different location.

> One solution would be to change the single-parameter align() to use pc() instead of offset() in the call to the two-parameter align() function.  I believe this would solve the issue, however the single-parameter align() is used very frequently, and I'd like to minimize the number of potential issues this change could trigger.
>
> The second solution (my preferred) is to change the allocation alignment of code segments to 64-bytes.  This would effectively make the size equivalent to the PC for purposes of the modular arithmetic for alignment.  That is, _start % 64 is zero, so (_end - _start) % 64 is equivalent to (_end % 64), achieving the desired result.

The disadvantage to this is wasting space when extra alignment isn't 
needed.  How about the following compromise:

1. keep the default alignment as 32 bytes

2. keep using offset() for alignment

3. if align(x) is used with x > default allocation alignment (32), then 
make a note of that in the code buffer

4. if the code buffer is copied, use the max required alignment from 3) 
and the PC of the old code buffer to compute the required allocation 
alignment of the new code buffer.  For align(64), no adjustment would be 
needed 50% of the time.

dl

> I would really like to hear others' opinions and possible alternative approaches.  I'm also not sure how this change would affect relocation, so I'd like to hear that as well.
>
> I also would like to know how the community would approach creating an appropriate test for this to be included in the regression suites.
>
> Thanks,
> --Scott Gibbons
> Software Development Engineer, Runtime Engineering
> [cid:image003.jpg at 01D7951D.0C49EAC0]  DEVELOPER SOFTWARE ENGINEERING
> Ph: 1-503-456-7756
> Cell: 1-469-450-8390
> 2501 NE Century Blvd
> Hillsboro, OR 97124
> Intel Corporation | www.intel.com<https://webmail.intel.com/owa/redir.aspx?SURL=WYr7qZDpIv3m1SKFmeHJuzsfCBuGN-jwkBYQUSRR6yrupkscpgzUCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBpAG4AdABlAGwALgBjAG8AbQA.&URL=http%3a%2f%2fwww.intel.com>
>