RFC: 8216472: Stack overflow followed by crash

Alan Bateman Alan.Bateman at oracle.com
Sun Oct 20 08:12:45 UTC 2019


On 08/10/2019 12:36, Alex Kashchenko wrote:
> Hi,
>
> At Red Hat, we've got a windows-only crash in nio.dll with a JBoss 
> application,  that appeared to be JDK-8216472 [1].
>
> After some investigation it was found, that 
> Java_sun_nio_ch_WindowsSelectorImpl_00024SubSelector_poll0 function 
> [2] allocates more than 50Kb on stack - because FD_SETSIZE is set to 
> 1024 [3], each fd_set structure takes 8200 bytes and 6 fd_set 
> structures are placed on stack [4][5]. If this function is called from 
> a deep recursive java call, it can cause stack overflow and JVM 
> process is either aborted or killed by OS.
>
> One of the solutions to this may be to set hotspot StackShadowPages 
> parameter to 14. Currently it is 7 on windows and 20 on linux.
>
> I'd like to solicit comments and suggestions about another solution, 
> that changes this native call in NIO moving fd_sets to heap:
>
> http://cr.openjdk.java.net/~akasko/jdk/8216472/webrev.00/
>
> PollOverflow reproducer there causes JVM abort (or silent kill) 
> without the patch.
>
> We tested this patch internally and haven't found any regressions; 
> tests included JCK and SPECjbb, though we haven't run any benchmarks 
> specifically oriented on networking performance.
I finally got time to look at this a bit more.

The Windows select based Selector does need some TLC. This issue with 
the FD_SET structures on the stack has been there since JDK 1.4 and I'm 
surprised it has been reported before this issue.

The underlying issue is that this Selector wants poll and ends up 
translating a poll array to a set of FD_SET structures on each select. 
This shouldn't be needed, it should instead maintain the FD_SET in the 
java implementation so that the native method is a call to select, 
nothing more. Changing this would require a lot of work on the Selector 
(esp. the exceptfds) and is something we should do at some point.

For the current issue I think we should do the following:

1. If select fails, throw the IOException and return IOS_THROWN. I don't 
think any of this code has been needed for many releases as the async 
close of a channel registered with a Selector will consistently delay 
closing the SOCKET until it is flushed from all registered Selectors. 
That should eliminate 3 of the 6 FD_SET structures.

2. Allocate the memory for the 3 FD_SET structures in the SubSelector. 
That is, manage the memory from the Java code, not the native method. 
That eliminates the 3 * malloc/free per select that is in the proposed 
patch. The memory can be freed when the sub selector thread terminates.

My guess is that the changes will go through several iterations and 
happy to provide feedback on interim revisions as needed.

-Alan


More information about the nio-dev mailing list