RFC: 8216472: Stack overflow followed by crash
Alex Kashchenko
akashche at redhat.com
Fri Oct 25 11:31:37 UTC 2019
Hi Alan,
On 10/20/2019 09:12 AM, Alan Bateman wrote:
> On 08/10/2019 12:36, Alex Kashchenko wrote:
>> Hi,
>>
>> At Red Hat, we've got a windows-only crash in nio.dll with a JBoss
>> application, that appeared to be JDK-8216472 [1].
>>
>> After some investigation it was found, that
>> Java_sun_nio_ch_WindowsSelectorImpl_00024SubSelector_poll0 function
>> [2] allocates more than 50Kb on stack - because FD_SETSIZE is set to
>> 1024 [3], each fd_set structure takes 8200 bytes and 6 fd_set
>> structures are placed on stack [4][5]. If this function is called from
>> a deep recursive java call, it can cause stack overflow and JVM
>> process is either aborted or killed by OS.
>>
>> One of the solutions to this may be to set hotspot StackShadowPages
>> parameter to 14. Currently it is 7 on windows and 20 on linux.
>>
>> I'd like to solicit comments and suggestions about another solution,
>> that changes this native call in NIO moving fd_sets to heap:
>>
>> http://cr.openjdk.java.net/~akasko/jdk/8216472/webrev.00/
>>
>> PollOverflow reproducer there causes JVM abort (or silent kill)
>> without the patch.
>>
>> We tested this patch internally and haven't found any regressions;
>> tests included JCK and SPECjbb, though we haven't run any benchmarks
>> specifically oriented on networking performance.
> I finally got time to look at this a bit more.
>
> The Windows select based Selector does need some TLC. This issue with
> the FD_SET structures on the stack has been there since JDK 1.4 and I'm
> surprised it has been reported before this issue.
>
> The underlying issue is that this Selector wants poll and ends up
> translating a poll array to a set of FD_SET structures on each select.
> This shouldn't be needed, it should instead maintain the FD_SET in the
> java implementation so that the native method is a call to select,
> nothing more. Changing this would require a lot of work on the Selector
> (esp. the exceptfds) and is something we should do at some point.
>
> For the current issue I think we should do the following:
>
> 1. If select fails, throw the IOException and return IOS_THROWN. I don't
> think any of this code has been needed for many releases as the async
> close of a channel registered with a Selector will consistently delay
> closing the SOCKET until it is flushed from all registered Selectors.
> That should eliminate 3 of the 6 FD_SET structures.
>
> 2. Allocate the memory for the 3 FD_SET structures in the SubSelector.
> That is, manage the memory from the Java code, not the native method.
> That eliminates the 3 * malloc/free per select that is in the proposed
> patch. The memory can be freed when the sub selector thread terminates.
>
> My guess is that the changes will go through several iterations and
> happy to provide feedback on interim revisions as needed.
Thanks for your comments! Updated the patch including them:
http://cr.openjdk.java.net/~akasko/jdk/8216472/webrev.01/
--
-Alex
More information about the nio-dev
mailing list