<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 9/11/13 10:46 PM, Aleksey Shipilev
      wrote:<br>
    </div>
    <blockquote cite="mid:5230D695.1060603@oracle.com" type="cite">
      <pre wrap="">(please disregard the previous message, if any, sent from un-subscribed
email address)

Hi there,

I did not believe my eyes at first. Please look at this and tell me I am
not imagining things because of too much green tea. I can submit the bug
after somebody else confirms.

There is some workload which runs significantly slower with G1 than with
Parallel GC. So, I profile this application, and what I see?

</pre>
      <blockquote type="cite">
        <pre wrap="">   |    |  +-  131.100   (60%)    ConcurrentG1RefineThread::run()
   |    |  |  +-  131.020   (60%)    DirtyCardQueueSet::apply_closure_to_completed_buffer(int, int, bool)
   |    |  |  |  +-  130.870   (60%)    DirtyCardQueueSet::apply_closure_to_completed_buffer_helper(CardTableEntryClosure*, int, BufferNode*)
   |    |  |  |  |  +-  128.650   (59%)    RefineCardTableEntryClosure::do_card_ptr(signed char*, int)
   |    |  |  |  |  |  +-  127.290   (59%)    G1RemSet::refine_card(signed char*, int, bool)
   |    |  |  |  |  |    +-  61.260   (28%)    G1HotCardCache::insert(signed char*)
   |    |  |  |  |  |    |  +-  59.650   (27%)    G1CardCounts::add_card_count(signed char*)
   |    |  |  |  |  |    |  |  +-  57.340   (26%)    G1CardCounts::ptr_2_card_num(const signed char*)
   |    |  |  |  |  |    |  |    +-  56.420   (26%)    FormatBuffer<256>::FormatBuffer(const char*, ...)
   |    |  |  |  |  |    |  |      +-  54.240   (25%)    vsnprintf
   |    |  |  |  |  |    |  |      |  +-  44.350   (20%)    vfprintf
   |    |  |  |  |  |    |  |      |  |  +-  8.950   (4%)    _IO_default_xsputn
   |    |  |  |  |  |    |  |      |  |  |  +-  1.660   (1%)    <static>@0x8b5f4 (<libc-2.15.so>)
   |    |  |  |  |  |    |  |      |  |  +-  5.160   (2%)    _IO_padn
   |    |  |  |  |  |    |  |      |  |  +-  4.470   (2%)    <static>@0x4872c (<libc-2.15.so>)
   |    |  |  |  |  |    |  |      |  |  +-  4.110   (2%)    free
   |    |  |  |  |  |    |  |      |  |  +-  1.780   (1%)    strchrnul
   |    |  |  |  |  |    |  |      |  +-  3.450   (2%)    <static>@0x7c3f7 (<libc-2.15.so>)
   |    |  |  |  |  |    |  |      |  +-  2.870   (1%)    <static>@0x7d270 (<libc-2.15.so>)
   |    |  |  |  |  |    |  |      |  +-  1.200   (1%)    strchrnul
   |    |  |  |  |  |    |  |      +-  0.290   (0%)    jio_vsnprintf
</pre>
      </blockquote>
      <pre wrap="">

Say what? FormatBuffer? In GC refinement thread (which supposedly drains
card marks)? In ptr_2_card_num() converter method?

Looking deeper:

  typedef FormatBuffer<> err_msg;
...
  void check_card_num(size_t card_num) {
    assert(card_num >= 0 && card_num < _committed_max_card_num,
            err_msg("card num out of range: "SIZE_FORMAT, card_num));
  }
...
  size_t ptr_2_card_num(const jbyte* card_ptr) {
    assert(card_ptr >= _ct_bot,
           err_msg("Inavalied card pointer: "
                   "card_ptr: " PTR_FORMAT ", "
                    "_ct_bot: " PTR_FORMAT,
                    card_ptr, _ct_bot));
     size_t card_num = pointer_delta(card_ptr, _ct_bot, sizeof(jbyte));
     check_card_num(card_num,
                   err_msg("card pointer out of range: " PTR_FORMAT,
card_ptr));
     return card_num;
   }

Well, d'uh! Even though the asserts are erased in the product build,
err_msg argument for check_card_num() call is still there, thrashing
out. So, I do this quick and safe change:
  <a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/</a>

Also, I churn down the workload into the essential one, trashing the heap:
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java</a>

Running with latest hsx/hotspot-gc on my 1x2x2 i5 laptop, Linux x86_64,
with "{baseline,patched}/bin/java -XX:+UseG1GC -jar microbenchmarks.jar
-t 4 -f 5", which means four threads, five forked JVM runs, I got:

  baseline:  125.3 +- 4.9 ns/op
   patched:   85.8 +- 3.5 ns/op

...which means 1.47x improvement!</pre>
    </blockquote>
    <br>
    Wow! Nice finding.<br>
    <br>
    Would it be possible to keep the assert message by turning
    check_card_num() into a #define instead? I don't know if that
    matters or not, though.<br>
    <br>
    Will you create a bug report?<br>
    <br>
    thanks,<br>
    StefanK<br>
    <br>
    <meta http-equiv="content-type" content="text/html;
      charset=ISO-8859-1">
    <blockquote cite="mid:5230D695.1060603@oracle.com" type="cite">
      <pre wrap="">

-Aleksey.
</pre>
    </blockquote>
    <br>
  </body>
</html>