RFR: 8223089: Stack alignment for x86-32

Thu May 2 11:32:55 UTC 2019

* Andrew Haley:

> We've been seeing segfaults on 32-bit Linux x86.
>
> Recent Linux distributions' runtime libraries are compiled with SSE
> enabled; this means that the stack must be aligned on a 16-bit
> boundary when a function is called. GCC has defaulted to
> 16-bit-aligned code for many years but HotSpot does not, calling
> runtime routines with a misaligned stack.

This is not a review of your patch (sorry), but I'd like to share what
figured out when setting the compiler flags for current Fedora and
downstream.

We have built with SSE2 for many, many years in the downstream
distribution, but with -O2, so there wasn't any vectorization in older
GCC versions.  Very few functions, even those performing double
arithmetic, had stack spills, so that wasn't a problem in the GCC 4.8
days.  As long as you used -O2.

With -O3, vectorization could happen even with GCC 4.8, leading to more
stack spills and requiring an aligned stack.  I see -O3 in Hotspot build
logs, so this is one of the issues you might hitting.

Later GCC releases (those with the STV pass) also use SSE2 code for
integer code, even at -O2.  This resulted in practical problems (crahses
like the one you're trying to fix), which is why we now build everything
with -mstackrealign.  -mstackrealign has the nice property that it
preserves stack alignment if it encounters it (which is needed for
callback-based functions such as qsort_r), but does not actually require
it.  It disables GCC's STV pass, and the code isn't great in many cases.
Our main concern initially was that the flag had not seen much exposure
(which is probably good because its documentation is so misleading), but
we encountered only very few problems at the distribution level.  Most
of them were related to explicitly enabled fastcall/regparm calling
conventions leaving no registers available for stack realignment, stack
protector, and stack clash protection.

Anyway, we decided that at this point, i386 is for backwards
compatibility with legacy applications, not for performance, which is
why we went with the -mstackrealign approach.  For the same
compatibility reason, we didn't want to stop building with SSE2 instead
of the FPU.  Switching back to the FPU would have reintroduced excess
precision issues.

> There is some code in HotSpot to work around specific instances of
> this problem, but it is not applied consistently. If runtime code
> calls out to C library functions, the stack remains misaligned and a
> segfault can result, We can work around this by compiling the HotSpot
> runtime with -mrealign-stack but this causes all code generated by GCC
> to realign the stack, which is not efficient. It also prevents us from
> compiling HotSpot with SSE enabled.

If you want to compile Hotspot with SSE2, you will have to realign the
stack when JNI code calls back into Hotspot.  Otherwose, you lose
support for legacy JNI libraries which do not preserve stack alignment
(as required after that silent ABI bump).

Supporting such legacy code is the reason why the system libraries in
our distribution do not require stack realignment on i386, so compiling
Hotspot without SSE2 would also avoid the problem there, even if Hotspot
code calls such function such as malloc in response to requests from JNI
code.

Thanks,
Florian