Crash on super large heap size using CMS and it's fix

Wed Sep 12 07:50:02 UTC 2012

Hi Hal Mo,

Nice catch!

I filed "7197906:BlockOffsetArray::power_to_cards_back() needs to handle 
 > 32 bit shifts" to track this issue. I can help you shepherd this fix in.

Thanks,
Bengt

On 2012-09-12 03:22, Jianhao Mo wrote:
> Hi all,
>
> This is Hal Mo<kungu.mjh at taobao.com <mailto:kungu.mjh at taobao.com>> 
> from Alibaba Group(with OCA).
>
> Our hadoop namenode crashed, when we set the heap size to 135G using 
> CMS GC.
> Attached please find the crash log(hs_err_pid.log).
>
> I can steadily reproduce the crash on a test machine with 190G 
> physical memory, by a simple command:
> $ java -Xmx135g -XX:+UseConcMarkSweepGC
>
> Then I build a debug jvm and use gdb to debug the problem.
>
> call stack
>
> C  [libc.so.6+0x7a9b0]  memset+0x40
> V  [libjvm.so+0x2b6c42] 
>  BlockOffsetArray::set_remainder_to_point_to_start_incl(unsigned long, 
> unsigned long, bool)+0xce
> V  [libjvm.so+0x2b7043] 
>  BlockOffsetArray::set_remainder_to_point_to_start(HeapWord*, 
> HeapWord*, bool)+0x71
> V  [libjvm.so+0x2b728d] 
>  BlockOffsetArray::BlockOffsetArray(BlockOffsetSharedArray*, 
> MemRegion, bool)+0x9f
> V  [libjvm.so+0x3c089f] 
>  BlockOffsetArrayNonContigSpace::BlockOffsetArrayNonContigSpace(BlockOffsetSharedArray*, 
> MemRegion)+0x37
> V  [libjvm.so+0x3be56f] 
>  CompactibleFreeListSpace::CompactibleFreeListSpace(BlockOffsetSharedArray*, 
> MemRegion, bool, FreeBlockDictionary::DictionaryChoice)+0x9b
> V  [libjvm.so+0x3fd2e1] 
>  ConcurrentMarkSweepGeneration::ConcurrentMarkSweepGeneration(ReservedSpace, 
> unsigned long, int, CardTableRS*, bool, 
> FreeBlockDictionary::DictionaryChoice)+0x1df
> V  [libjvm.so+0x4dc03e]  GenerationSpec::init(ReservedSpace, int, 
> GenRemSet*)+0x37c
> V  [libjvm.so+0x4ced40]  GenCollectedHeap::initialize()+0x510
> V  [libjvm.so+0x7c23c3]  Universe::initialize_heap()+0x31d
> V  [libjvm.so+0x7c27ec]  universe_init()+0xa6
> V  [libjvm.so+0x5056e2]  init_globals()+0x34
> V  [libjvm.so+0x7ac926]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x23a
> V  [libjvm.so+0x53f3d4]  JNI_CreateJavaVM+0x7a
>
> in function BlockOffsetArray::set_remainder_to_point_to_start_inc, 
> inside the for loop:
>     size_t reach = start_card - 1 + (power_to_cards_back(i+1) - 1);
> when i = 7, the value of reach was 0. then the loop could not break, and
>     _array->set_offset_array(start_card_for_region, reach, offset, 
> reducing);
> accessed the wrong address, and crashed.
>
> the root cause was
> static size_t power_to_cards_back(uint i) {
>     return (size_t)(1 << (LogBase * i));
> }
> the literal 1 is a 32bit int, and 1<<32 overflow.
>
>
> Here was my fix(has been tested), also found in attached file 
> cms_large_heap_crash.patch
>
> +++ b/src/share/vm/memory/blockOffsetTable.hpp
> @@ -289,7 +289,7 @@
> };
>
> static size_t power_to_cards_back(uint i) {
> - return (size_t)(1 << (LogBase * i));
> + return (size_t)1 << (LogBase * i);
> }
> static size_t power_to_words_back(uint i) {
> return power_to_cards_back(i) * N_words;
>
> Contributed-by: Hal Mo <kungu.mjh at taobao.com 
> <mailto:kungu.mjh at taobao.com>>
>
> Similar situation also found in G1, but the size is mega(2^20) based. 
> 2^(32+20) is too large to overflow.
>
> Krystal remind me, this changeset cover the same code, 
> http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/c3a720eefe82  .
> I do not  build it on visual studio, someone please help to review the 
> compatibility with VS.
>
> Regards,
>
> Hal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120912/60781e61/attachment.htm>