RFR: 8016155: SIGBUS when running Kitchensink with ParallelScavenge and ParallelOld

Mon Aug 26 04:40:11 PDT 2013

On 2013-08-23 20:58, Jon Masamitsu wrote:
>
> On 8/23/2013 5:30 AM, Stefan Johansson wrote:
>> Hi all,
>>
>> I would like some reviews on my fix for bug:
>> http://bugs.sun.com/view_bug.do?bug_id=8016155
>>
>> Webrev:
>> http://cr.openjdk.java.net/~sjohanss/8016155/webrev.00
>>
>> Summary:
>> On Linux we have a problem that we hit a SIGBUS when one NUMA node 
>> runs out of large pages but the system as a whole has large pages 
>> left. To avoid this we need to ease the requirement on which node the 
>> memory should be allocated on. This can be done by using the memory 
>> policy MPOL_PREFERRED, which prefers a certain node, instead of 
>> MPOL_BIND, which requires a certain node.
>
> With your change what happens when the system as a whole
> runs out of large pages?

The change doesn't do anything specific for large pages it just sets the 
memory policy to MPOL_PREFERRED to guarantee that we don't forcefully 
use a NUMA node that can't back the given mapping. If we run out of 
large pages this will still be handled in the same way, we prefer that 
the memory is allocated on a given NUMA node, but if it isn't possible 
we'll use another.

I've verified that this is actually what happens by running SPEPjbb with 
an increasing heap and quite few large pages configured on the system. 
When the large pages are all used, we fall back on using regular sized 
pages and every thing runs along just fine.

Thanks for highlighting this case, Jon.

Stefan

>
> Jon
>
>>
>> Testing:
>> To verify the fix I've run Kitchensink as describe in the bug report, 
>> but also done some manual testing. To sanity test performance I've 
>> run SPECjbb2005 with and without UseNUMA before and after the fix and 
>> I haven't seen any problem. I also ran SPECjbb2005 on a system where 
>> one NUMA node has been configured with no large pages while the other 
>> has enough for the test. Without the fix this crashes immediately, 
>> but with the fix the results are sane.
>>
>> Thanks,
>> Stefan
>