Crash on super large heap size using CMS and it's fix
John Cuthbertson
john.cuthbertson at oracle.com
Wed Sep 12 18:39:06 UTC 2012
Hi Bengt,
Thanks for volunteering. I was until I saw your message.
Hal: Great catch.
We should probably change the other casts/shifts in
heapRegionRemSet.cpp and concurrentMark.cpp at the same time
./share/vm/gc_implementation/g1/heapRegionRemSet.cpp
_max_fine_entries = (size_t)(1 << max_entries_log);
./share/vm/gc_implementation/g1/concurrentMark.cpp
assert(((size_t)_bm.size() * (size_t)(1 << _shifter)) == _bmWordSize,
even though they shouldn't overflow.
JohnC
On 9/12/2012 12:50 AM, Bengt Rutisson wrote:
>
> Hi Hal Mo,
>
> Nice catch!
>
> I filed "7197906:BlockOffsetArray::power_to_cards_back() needs to
> handle > 32 bit shifts" to track this issue. I can help you shepherd
> this fix in.
>
> Thanks,
> Bengt
>
>
> On 2012-09-12 03:22, Jianhao Mo wrote:
>> Hi all,
>>
>> This is Hal Mo<kungu.mjh at taobao.com <mailto:kungu.mjh at taobao.com>>
>> from Alibaba Group(with OCA).
>>
>> Our hadoop namenode crashed, when we set the heap size to 135G using
>> CMS GC.
>> Attached please find the crash log(hs_err_pid.log).
>>
>> I can steadily reproduce the crash on a test machine with 190G
>> physical memory, by a simple command:
>> $ java -Xmx135g -XX:+UseConcMarkSweepGC
>>
>> Then I build a debug jvm and use gdb to debug the problem.
>>
>> call stack
>>
>> C [libc.so.6+0x7a9b0] memset+0x40
>> V [libjvm.so+0x2b6c42]
>> BlockOffsetArray::set_remainder_to_point_to_start_incl(unsigned
>> long, unsigned long, bool)+0xce
>> V [libjvm.so+0x2b7043]
>> BlockOffsetArray::set_remainder_to_point_to_start(HeapWord*,
>> HeapWord*, bool)+0x71
>> V [libjvm.so+0x2b728d]
>> BlockOffsetArray::BlockOffsetArray(BlockOffsetSharedArray*,
>> MemRegion, bool)+0x9f
>> V [libjvm.so+0x3c089f]
>> BlockOffsetArrayNonContigSpace::BlockOffsetArrayNonContigSpace(BlockOffsetSharedArray*,
>> MemRegion)+0x37
>> V [libjvm.so+0x3be56f]
>> CompactibleFreeListSpace::CompactibleFreeListSpace(BlockOffsetSharedArray*,
>> MemRegion, bool, FreeBlockDictionary::DictionaryChoice)+0x9b
>> V [libjvm.so+0x3fd2e1]
>> ConcurrentMarkSweepGeneration::ConcurrentMarkSweepGeneration(ReservedSpace,
>> unsigned long, int, CardTableRS*, bool,
>> FreeBlockDictionary::DictionaryChoice)+0x1df
>> V [libjvm.so+0x4dc03e] GenerationSpec::init(ReservedSpace, int,
>> GenRemSet*)+0x37c
>> V [libjvm.so+0x4ced40] GenCollectedHeap::initialize()+0x510
>> V [libjvm.so+0x7c23c3] Universe::initialize_heap()+0x31d
>> V [libjvm.so+0x7c27ec] universe_init()+0xa6
>> V [libjvm.so+0x5056e2] init_globals()+0x34
>> V [libjvm.so+0x7ac926] Threads::create_vm(JavaVMInitArgs*, bool*)+0x23a
>> V [libjvm.so+0x53f3d4] JNI_CreateJavaVM+0x7a
>>
>> in function BlockOffsetArray::set_remainder_to_point_to_start_inc,
>> inside the for loop:
>> size_t reach = start_card - 1 + (power_to_cards_back(i+1) - 1);
>> when i = 7, the value of reach was 0. then the loop could not break, and
>> _array->set_offset_array(start_card_for_region, reach, offset,
>> reducing);
>> accessed the wrong address, and crashed.
>>
>> the root cause was
>> static size_t power_to_cards_back(uint i) {
>> return (size_t)(1 << (LogBase * i));
>> }
>> the literal 1 is a 32bit int, and 1<<32 overflow.
>>
>>
>> Here was my fix(has been tested), also found in attached file
>> cms_large_heap_crash.patch
>>
>> +++ b/src/share/vm/memory/blockOffsetTable.hpp
>> @@ -289,7 +289,7 @@
>> };
>>
>> static size_t power_to_cards_back(uint i) {
>> - return (size_t)(1 << (LogBase * i));
>> + return (size_t)1 << (LogBase * i);
>> }
>> static size_t power_to_words_back(uint i) {
>> return power_to_cards_back(i) * N_words;
>>
>> Contributed-by: Hal Mo <kungu.mjh at taobao.com
>> <mailto:kungu.mjh at taobao.com>>
>>
>> Similar situation also found in G1, but the size is mega(2^20) based.
>> 2^(32+20) is too large to overflow.
>>
>> Krystal remind me, this changeset cover the same code,
>> http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/c3a720eefe82 .
>> I do not build it on visual studio, someone please help to review
>> the compatibility with VS.
>>
>> Regards,
>>
>> Hal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120912/8d016e99/attachment.htm>
More information about the hotspot-gc-dev
mailing list