RFR: 8016155: SIGBUS when running Kitchensink with ParallelScavenge and ParallelOld

Tue Aug 27 06:34:11 PDT 2013

Thanks Per and Jon for the reviews.

Stefan

On 2013-08-26 13:40, Stefan Johansson wrote:
> On 2013-08-23 20:58, Jon Masamitsu wrote:
>>
>> On 8/23/2013 5:30 AM, Stefan Johansson wrote:
>>> Hi all,
>>>
>>> I would like some reviews on my fix for bug:
>>> http://bugs.sun.com/view_bug.do?bug_id=8016155
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~sjohanss/8016155/webrev.00
>>>
>>> Summary:
>>> On Linux we have a problem that we hit a SIGBUS when one NUMA node 
>>> runs out of large pages but the system as a whole has large pages 
>>> left. To avoid this we need to ease the requirement on which node 
>>> the memory should be allocated on. This can be done by using the 
>>> memory policy MPOL_PREFERRED, which prefers a certain node, instead 
>>> of MPOL_BIND, which requires a certain node.
>>
>> With your change what happens when the system as a whole
>> runs out of large pages?
>
> The change doesn't do anything specific for large pages it just sets 
> the memory policy to MPOL_PREFERRED to guarantee that we don't 
> forcefully use a NUMA node that can't back the given mapping. If we 
> run out of large pages this will still be handled in the same way, we 
> prefer that the memory is allocated on a given NUMA node, but if it 
> isn't possible we'll use another.
>
> I've verified that this is actually what happens by running SPEPjbb 
> with an increasing heap and quite few large pages configured on the 
> system. When the large pages are all used, we fall back on using 
> regular sized pages and every thing runs along just fine.
>
> Thanks for highlighting this case, Jon.
>
> Stefan
>
>>
>> Jon
>>
>>>
>>> Testing:
>>> To verify the fix I've run Kitchensink as describe in the bug 
>>> report, but also done some manual testing. To sanity test 
>>> performance I've run SPECjbb2005 with and without UseNUMA before and 
>>> after the fix and I haven't seen any problem. I also ran SPECjbb2005 
>>> on a system where one NUMA node has been configured with no large 
>>> pages while the other has enough for the test. Without the fix this 
>>> crashes immediately, but with the fix the results are sane.
>>>
>>> Thanks,
>>> Stefan
>>
>