RFC: 8216472: Stack overflow followed by crash

Alex Kashchenko akashche at redhat.com
Fri Oct 25 11:31:37 UTC 2019


Hi Alan,

On 10/20/2019 09:12 AM, Alan Bateman wrote:
> On 08/10/2019 12:36, Alex Kashchenko wrote:
>> Hi,
>>
>> At Red Hat, we've got a windows-only crash in nio.dll with a JBoss 
>> application,  that appeared to be JDK-8216472 [1].
>>
>> After some investigation it was found, that 
>> Java_sun_nio_ch_WindowsSelectorImpl_00024SubSelector_poll0 function 
>> [2] allocates more than 50Kb on stack - because FD_SETSIZE is set to 
>> 1024 [3], each fd_set structure takes 8200 bytes and 6 fd_set 
>> structures are placed on stack [4][5]. If this function is called from 
>> a deep recursive java call, it can cause stack overflow and JVM 
>> process is either aborted or killed by OS.
>>
>> One of the solutions to this may be to set hotspot StackShadowPages 
>> parameter to 14. Currently it is 7 on windows and 20 on linux.
>>
>> I'd like to solicit comments and suggestions about another solution, 
>> that changes this native call in NIO moving fd_sets to heap:
>>
>> http://cr.openjdk.java.net/~akasko/jdk/8216472/webrev.00/
>>
>> PollOverflow reproducer there causes JVM abort (or silent kill) 
>> without the patch.
>>
>> We tested this patch internally and haven't found any regressions; 
>> tests included JCK and SPECjbb, though we haven't run any benchmarks 
>> specifically oriented on networking performance.
> I finally got time to look at this a bit more.
> 
> The Windows select based Selector does need some TLC. This issue with 
> the FD_SET structures on the stack has been there since JDK 1.4 and I'm 
> surprised it has been reported before this issue.
> 
> The underlying issue is that this Selector wants poll and ends up 
> translating a poll array to a set of FD_SET structures on each select. 
> This shouldn't be needed, it should instead maintain the FD_SET in the 
> java implementation so that the native method is a call to select, 
> nothing more. Changing this would require a lot of work on the Selector 
> (esp. the exceptfds) and is something we should do at some point.
> 
> For the current issue I think we should do the following:
> 
> 1. If select fails, throw the IOException and return IOS_THROWN. I don't 
> think any of this code has been needed for many releases as the async 
> close of a channel registered with a Selector will consistently delay 
> closing the SOCKET until it is flushed from all registered Selectors. 
> That should eliminate 3 of the 6 FD_SET structures.
> 
> 2. Allocate the memory for the 3 FD_SET structures in the SubSelector. 
> That is, manage the memory from the Java code, not the native method. 
> That eliminates the 3 * malloc/free per select that is in the proposed 
> patch. The memory can be freed when the sub selector thread terminates.
> 
> My guess is that the changes will go through several iterations and 
> happy to provide feedback on interim revisions as needed.

Thanks for your comments! Updated the patch including them:

http://cr.openjdk.java.net/~akasko/jdk/8216472/webrev.01/

-- 
-Alex



More information about the nio-dev mailing list