Crash on super large heap size using CMS and it's fix

Bengt Rutisson bengt.rutisson at
Thu Sep 13 13:00:06 UTC 2012

Hi John,

On 2012-09-12 20:39, John Cuthbertson wrote:
> Hi Bengt,
> Thanks for volunteering. I was until I saw your message.

I'm 9 hours ahead of you :-)

> Hal: Great catch.
> We should probably change the other casts/shifts  in 
> heapRegionRemSet.cpp and concurrentMark.cpp at the same time
> ./share/vm/gc_implementation/g1/heapRegionRemSet.cpp
>     _max_fine_entries = (size_t)(1 << max_entries_log);
> ./share/vm/gc_implementation/g1/concurrentMark.cpp
>   assert(((size_t)_bm.size() * (size_t)(1 << _shifter)) == _bmWordSize,
> even though they shouldn't overflow.

Good point. Hal sent out a review request on the mailing list. Maybe we 
should try to keep this conversation in that thread.

Hal, if you can prepare a patch that addresses John's comments I can 
help you with preparing an updated webrev that we can send out. If John 
thinks that looks good we will be all set to push the change.


> JohnC
> On 9/12/2012 12:50 AM, Bengt Rutisson wrote:
>> Hi Hal Mo,
>> Nice catch!
>> I filed "7197906:BlockOffsetArray::power_to_cards_back() needs to 
>> handle > 32 bit shifts" to track this issue. I can help you shepherd 
>> this fix in.
>> Thanks,
>> Bengt
>> On 2012-09-12 03:22, Jianhao Mo wrote:
>>> Hi all,
>>> This is Hal Mo<kungu.mjh at <mailto:kungu.mjh at>> 
>>> from Alibaba Group(with OCA).
>>> Our hadoop namenode crashed, when we set the heap size to 135G using 
>>> CMS GC.
>>> Attached please find the crash log(hs_err_pid.log).
>>> I can steadily reproduce the crash on a test machine with 190G 
>>> physical memory, by a simple command:
>>> $ java -Xmx135g -XX:+UseConcMarkSweepGC
>>> Then I build a debug jvm and use gdb to debug the problem.
>>> call stack
>>> C  []  memset+0x40
>>> V  [] 
>>>  BlockOffsetArray::set_remainder_to_point_to_start_incl(unsigned 
>>> long, unsigned long, bool)+0xce
>>> V  [] 
>>>  BlockOffsetArray::set_remainder_to_point_to_start(HeapWord*, 
>>> HeapWord*, bool)+0x71
>>> V  [] 
>>>  BlockOffsetArray::BlockOffsetArray(BlockOffsetSharedArray*, 
>>> MemRegion, bool)+0x9f
>>> V  [] 
>>>  BlockOffsetArrayNonContigSpace::BlockOffsetArrayNonContigSpace(BlockOffsetSharedArray*, 
>>> MemRegion)+0x37
>>> V  [] 
>>>  CompactibleFreeListSpace::CompactibleFreeListSpace(BlockOffsetSharedArray*, 
>>> MemRegion, bool, FreeBlockDictionary::DictionaryChoice)+0x9b
>>> V  [] 
>>>  ConcurrentMarkSweepGeneration::ConcurrentMarkSweepGeneration(ReservedSpace, 
>>> unsigned long, int, CardTableRS*, bool, 
>>> FreeBlockDictionary::DictionaryChoice)+0x1df
>>> V  []  GenerationSpec::init(ReservedSpace, int, 
>>> GenRemSet*)+0x37c
>>> V  []  GenCollectedHeap::initialize()+0x510
>>> V  []  Universe::initialize_heap()+0x31d
>>> V  []  universe_init()+0xa6
>>> V  []  init_globals()+0x34
>>> V  []  Threads::create_vm(JavaVMInitArgs*, 
>>> bool*)+0x23a
>>> V  []  JNI_CreateJavaVM+0x7a
>>> in function BlockOffsetArray::set_remainder_to_point_to_start_inc, 
>>> inside the for loop:
>>>     size_t reach = start_card - 1 + (power_to_cards_back(i+1) - 1);
>>> when i = 7, the value of reach was 0. then the loop could not break, 
>>> and
>>>     _array->set_offset_array(start_card_for_region, reach, offset, 
>>> reducing);
>>> accessed the wrong address, and crashed.
>>> the root cause was
>>> static size_t power_to_cards_back(uint i) {
>>>     return (size_t)(1 << (LogBase * i));
>>> }
>>> the literal 1 is a 32bit int, and 1<<32 overflow.
>>> Here was my fix(has been tested), also found in attached file 
>>> cms_large_heap_crash.patch
>>> +++ b/src/share/vm/memory/blockOffsetTable.hpp
>>> @@ -289,7 +289,7 @@
>>> };
>>> static size_t power_to_cards_back(uint i) {
>>> - return (size_t)(1 << (LogBase * i));
>>> + return (size_t)1 << (LogBase * i);
>>> }
>>> static size_t power_to_words_back(uint i) {
>>> return power_to_cards_back(i) * N_words;
>>> Contributed-by: Hal Mo <kungu.mjh at 
>>> <mailto:kungu.mjh at>>
>>> Similar situation also found in G1, but the size is mega(2^20) 
>>> based. 2^(32+20) is too large to overflow.
>>> Krystal remind me, this changeset cover the same code, 
>>>  .
>>> I do not  build it on visual studio, someone please help to review 
>>> the compatibility with VS.
>>> Regards,
>>> Hal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the hotspot-gc-dev mailing list