<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix"><br>
      <br>
      Hi John,<br>
      <br>
      On 2012-09-12 20:39, John Cuthbertson wrote:<br>
    </div>
    <blockquote cite="mid:5050D6CA.6020200@oracle.com" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      Hi Bengt,<br>
      <br>
      Thanks for volunteering. I was until I saw your message.<br>
    </blockquote>
    <br>
    I'm 9 hours ahead of you :-)<br>
    <br>
    <blockquote cite="mid:5050D6CA.6020200@oracle.com" type="cite"> <br>
      Hal: Great catch.<br>
      <br>
      We should probably change the other casts/shifts  in
      heapRegionRemSet.cpp and concurrentMark.cpp at the same time<br>
      <br>
      ./share/vm/gc_implementation/g1/heapRegionRemSet.cpp<br>
          _max_fine_entries = (size_t)(1 << max_entries_log);<br>
      <br>
      ./share/vm/gc_implementation/g1/concurrentMark.cpp<br>
        assert(((size_t)_bm.size() * (size_t)(1 << _shifter)) ==
      _bmWordSize,<br>
      <br>
      even though they shouldn't overflow.<br>
    </blockquote>
    <br>
    Good point. Hal sent out a review request on the mailing list. Maybe
    we should try to keep this conversation in that thread.<br>
    <br>
    Hal, if you can prepare a patch that addresses John's comments I can
    help you with preparing an updated webrev that we can send out. If
    John thinks that looks good we will be all set to push the change.<br>
    <br>
    Thanks,<br>
    Bengt<br>
    <br>
    <blockquote cite="mid:5050D6CA.6020200@oracle.com" type="cite"> <br>
      JohnC<br>
      <br>
      On 9/12/2012 12:50 AM, Bengt Rutisson wrote:
      <blockquote cite="mid:50503EAA.30206@oracle.com" type="cite">
        <meta content="text/html; charset=ISO-8859-1"
          http-equiv="Content-Type">
        <div class="moz-cite-prefix"><br>
          Hi Hal Mo,<br>
          <br>
          Nice catch!<br>
          <br>
          I filed "7197906:BlockOffsetArray::power_to_cards_back() needs
          to handle > 32 bit shifts" to track this issue. I can help
          you shepherd this fix in.<br>
          <br>
          Thanks,<br>
          Bengt<br>
          <br>
          <br>
          On 2012-09-12 03:22, Jianhao Mo wrote:<br>
        </div>
        <blockquote
cite="mid:CAKz_je4nJUcCFtV+24DtR3s7Xu4eiJ0=fyaHLhUX1-=OQmWV8Q@mail.gmail.com"
          type="cite">Hi all,<br>
          <br>
          This is Hal Mo<<a moz-do-not-send="true"
            href="mailto:kungu.mjh@taobao.com">kungu.mjh@taobao.com</a>>


          from Alibaba Group(with OCA).<br>
          <br>
          Our hadoop namenode crashed, when we set the heap size to 135G
          using CMS GC.<br>
          Attached please find the crash log(hs_err_pid.log).<br>
          <br>
          I can steadily reproduce the crash on a test machine with 190G
          physical memory, by a simple command:<br>
          $ java -Xmx135g -XX:+UseConcMarkSweepGC<br>
          <br>
          Then I build a debug jvm and use gdb to debug the problem.<br>
          <br>
          call stack<br>
          <br>
          C  [libc.so.6+0x7a9b0]  memset+0x40<br>
          V  [libjvm.so+0x2b6c42]
           BlockOffsetArray::set_remainder_to_point_to_start_incl(unsigned
          long, unsigned long, bool)+0xce<br>
          V  [libjvm.so+0x2b7043]
           BlockOffsetArray::set_remainder_to_point_to_start(HeapWord*,
          HeapWord*, bool)+0x71<br>
          V  [libjvm.so+0x2b728d]
           BlockOffsetArray::BlockOffsetArray(BlockOffsetSharedArray*,
          MemRegion, bool)+0x9f<br>
          V  [libjvm.so+0x3c089f]
           BlockOffsetArrayNonContigSpace::BlockOffsetArrayNonContigSpace(BlockOffsetSharedArray*,


          MemRegion)+0x37<br>
          V  [libjvm.so+0x3be56f]
           CompactibleFreeListSpace::CompactibleFreeListSpace(BlockOffsetSharedArray*,


          MemRegion, bool, FreeBlockDictionary::DictionaryChoice)+0x9b<br>
          V  [libjvm.so+0x3fd2e1]
           ConcurrentMarkSweepGeneration::ConcurrentMarkSweepGeneration(ReservedSpace,


          unsigned long, int, CardTableRS*, bool,
          FreeBlockDictionary::DictionaryChoice)+0x1df<br>
          V  [libjvm.so+0x4dc03e]  GenerationSpec::init(ReservedSpace,
          int, GenRemSet*)+0x37c<br>
          V  [libjvm.so+0x4ced40]  GenCollectedHeap::initialize()+0x510<br>
          V  [libjvm.so+0x7c23c3]  Universe::initialize_heap()+0x31d<br>
          V  [libjvm.so+0x7c27ec]  universe_init()+0xa6<br>
          V  [libjvm.so+0x5056e2]  init_globals()+0x34<br>
          V  [libjvm.so+0x7ac926]  Threads::create_vm(JavaVMInitArgs*,
          bool*)+0x23a<br>
          V  [libjvm.so+0x53f3d4]  JNI_CreateJavaVM+0x7a<br>
          <br>
          in function
          BlockOffsetArray::set_remainder_to_point_to_start_inc, inside
          the for loop:<br>
              size_t reach = start_card - 1 + (power_to_cards_back(i+1)
          - 1);<br>
          when i = 7, the value of reach was 0. then the loop could not
          break, and <br>
              _array->set_offset_array(start_card_for_region, reach,
          offset, reducing);<br>
          accessed the wrong address, and crashed.<br>
          <br>
          the root cause was <br>
          static size_t power_to_cards_back(uint i) {<br>
              return (size_t)(1 << (LogBase * i));<br>
          }<br>
          the literal 1 is a 32bit int, and 1<<32 overflow. <br>
          <br>
          <br>
          Here was my fix(has been tested), also found in attached file
          cms_large_heap_crash.patch<br>
          <br>
          +++ b/src/share/vm/memory/blockOffsetTable.hpp<br>
          @@ -289,7 +289,7 @@<br>
          };<br>
          <br>
          static size_t power_to_cards_back(uint i) {<br>
          - return (size_t)(1 << (LogBase * i));<br>
          + return (size_t)1 << (LogBase * i);<br>
          }<br>
          static size_t power_to_words_back(uint i) {<br>
          return power_to_cards_back(i) * N_words;<br>
          <br>
          Contributed-by: Hal Mo <<a moz-do-not-send="true"
            href="mailto:kungu.mjh@taobao.com">kungu.mjh@taobao.com</a>>


          <br>
          <br>
          Similar situation also found in G1, but the size is mega(2^20)
          based. 2^(32+20) is too large to overflow.<br>
          <br>
          Krystal remind me, this changeset cover the same code, <a
            moz-do-not-send="true"
href="http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/c3a720eefe82">http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/c3a720eefe82</a>
           .<br>
          I do not  build it on visual studio, someone please help to
          review the compatibility with VS.<br>
          <br>
          Regards,<br>
          <br>
          Hal<br>
        </blockquote>
        <br>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>