RFC: x86_64: Fixing the align() macro

Fri Aug 20 16:49:56 UTC 2021

Thanks, Dean, for the comments.  I'm relatively new to this codebase and admittedly don't know the details, but the compromise you're suggesting seems like a lot of added complexity for a savings of ~16 bytes per code segment.  Let me explain, and you can point out where I misunderstand.

The align() is used to manipulate the address bits of the data that follows in order to meet architectural requirements or performance.  So if we copy one code buffer to another with different moduli of starting addresses (i.e., addr % 64 == 0 to addr % 64 != 0) it would seem to me that structures within the code buffer with 64-bit alignment requirements would need to be moved such that their addresses are correct for the architecture.  This means that padding will need to be added or removed within the code buffer to account for the different address alignments.  This would also necessitate fixups to all instructions which use the address (say, as a literal).  Adding or removing padding also has the effect of moving all other data at higher addresses which cascades the effect.  There may be enough information available to make all these fixups, but this seems to be an extraordinary amount of effort and highly error prone.

It feels more straightforward to me to sacrifice the address space than attack this complexity.  Are we really so tight on address space?  We could limit this to 64-bit architectures, if that would help.

Thanks,
--Scott Gibbons
Software Development Engineer, Runtime Engineering
  DEVELOPER SOFTWARE ENGINEERING
Ph: 1-503-456-7756
Cell: 1-469-450-8390
2501 NE Century Blvd
Hillsboro, OR 97124
Intel Corporation | www.intel.com

-----Original Message-----
From: hotspot-dev <hotspot-dev-retn at openjdk.java.net> On Behalf Of dean.long at oracle.com
Sent: Thursday, August 19, 2021 6:54 PM
To: hotspot-dev at openjdk.java.net
Subject: Re: RFC: x86_64: Fixing the align() macro

Hi Scott,

On 8/19/21 5:10 PM, Gibbons, Scott wrote:
> Hi, everyone.  This is my first post to this forum, so please let me know if this type of discussion is welcome here, and an alternative forum if it's not :).
>
> I ran into an issue where align(64) breaks on the x86_64 platform within stubGenerator_x86_64.cpp.  I would like input on a fix *AND* how the community would like to see it tested.
>
> There are occasions where I need to align certain data to a 64-byte boundary for AVX-512 aligned instructions.  The issue that I uncovered was that the single-parameter align(int modulus) would sometimes not align properly with align(64).  Unwinding this, I found that align(int modulus) calls align(int modulus, int target) with the target parameter as offset() (i.e., align(modulus, offset());).  Further exploration showed that offset() was 'return _end - _start', which is effectively the size of the code segment and not the offset of the PC.  So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested.
>
> The workaround that I implemented was to use the two-parameter version of align(), passing pc() as the target.  This fixed the specific issue I was encountering, and now I'd like to implement a general solution.

The problem with using pc() is when the code buffer gets copied to a different location.

> One solution would be to change the single-parameter align() to use pc() instead of offset() in the call to the two-parameter align() function.  I believe this would solve the issue, however the single-parameter align() is used very frequently, and I'd like to minimize the number of potential issues this change could trigger.
>
> The second solution (my preferred) is to change the allocation alignment of code segments to 64-bytes.  This would effectively make the size equivalent to the PC for purposes of the modular arithmetic for alignment.  That is, _start % 64 is zero, so (_end - _start) % 64 is equivalent to (_end % 64), achieving the desired result.

The disadvantage to this is wasting space when extra alignment isn't 
needed.  How about the following compromise:

1. keep the default alignment as 32 bytes

2. keep using offset() for alignment

3. if align(x) is used with x > default allocation alignment (32), then 
make a note of that in the code buffer

4. if the code buffer is copied, use the max required alignment from 3) 
and the PC of the old code buffer to compute the required allocation 
alignment of the new code buffer.  For align(64), no adjustment would be 
needed 50% of the time.

dl

> I would really like to hear others' opinions and possible alternative approaches.  I'm also not sure how this change would affect relocation, so I'd like to hear that as well.
>
> I also would like to know how the community would approach creating an appropriate test for this to be included in the regression suites.
>
> Thanks,
> --Scott Gibbons
> Software Development Engineer, Runtime Engineering
> [cid:image003.jpg at 01D7951D.0C49EAC0]  DEVELOPER SOFTWARE ENGINEERING
> Ph: 1-503-456-7756
> Cell: 1-469-450-8390
> 2501 NE Century Blvd
> Hillsboro, OR 97124
> Intel Corporation | www.intel.com<https://webmail.intel.com/owa/redir.aspx?SURL=WYr7qZDpIv3m1SKFmeHJuzsfCBuGN-jwkBYQUSRR6yrupkscpgzUCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBpAG4AdABlAGwALgBjAG8AbQA.&URL=http%3a%2f%2fwww.intel.com>
>