<p dir="ltr">Maybe instead turn err_msg into a macro (e.g. ERR_MSG) that expands to NULL if assertions are disabled?</p>
<p dir="ltr">Sent from my phone</p>
<div class="gmail_quote">On Sep 11, 2013 5:16 PM, "Stefan Karlsson" <<a href="mailto:stefan.karlsson@oracle.com">stefan.karlsson@oracle.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <div>On 9/11/13 10:46 PM, Aleksey Shipilev
      wrote:<br>
    </div>
    <blockquote type="cite">
      <pre>(please disregard the previous message, if any, sent from un-subscribed
email address)

Hi there,

I did not believe my eyes at first. Please look at this and tell me I am
not imagining things because of too much green tea. I can submit the bug
after somebody else confirms.

There is some workload which runs significantly slower with G1 than with
Parallel GC. So, I profile this application, and what I see?

</pre>
      <blockquote type="cite">
        <pre>   |    |  +-  131.100   (60%)    ConcurrentG1RefineThread::run()
   |    |  |  +-  131.020   (60%)    DirtyCardQueueSet::apply_closure_to_completed_buffer(int, int, bool)
   |    |  |  |  +-  130.870   (60%)    DirtyCardQueueSet::apply_closure_to_completed_buffer_helper(CardTableEntryClosure*, int, BufferNode*)
   |    |  |  |  |  +-  128.650   (59%)    RefineCardTableEntryClosure::do_card_ptr(signed char*, int)
   |    |  |  |  |  |  +-  127.290   (59%)    G1RemSet::refine_card(signed char*, int, bool)
   |    |  |  |  |  |    +-  61.260   (28%)    G1HotCardCache::insert(signed char*)
   |    |  |  |  |  |    |  +-  59.650   (27%)    G1CardCounts::add_card_count(signed char*)
   |    |  |  |  |  |    |  |  +-  57.340   (26%)    G1CardCounts::ptr_2_card_num(const signed char*)
   |    |  |  |  |  |    |  |    +-  56.420   (26%)    FormatBuffer<256>::FormatBuffer(const char*, ...)
   |    |  |  |  |  |    |  |      +-  54.240   (25%)    vsnprintf
   |    |  |  |  |  |    |  |      |  +-  44.350   (20%)    vfprintf
   |    |  |  |  |  |    |  |      |  |  +-  8.950   (4%)    _IO_default_xsputn
   |    |  |  |  |  |    |  |      |  |  |  +-  1.660   (1%)    <static>@0x8b5f4 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
   |    |  |  |  |  |    |  |      |  |  +-  5.160   (2%)    _IO_padn
   |    |  |  |  |  |    |  |      |  |  +-  4.470   (2%)    <static>@0x4872c (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
   |    |  |  |  |  |    |  |      |  |  +-  4.110   (2%)    free
   |    |  |  |  |  |    |  |      |  |  +-  1.780   (1%)    strchrnul
   |    |  |  |  |  |    |  |      |  +-  3.450   (2%)    <static>@0x7c3f7 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
   |    |  |  |  |  |    |  |      |  +-  2.870   (1%)    <static>@0x7d270 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
   |    |  |  |  |  |    |  |      |  +-  1.200   (1%)    strchrnul
   |    |  |  |  |  |    |  |      +-  0.290   (0%)    jio_vsnprintf
</pre>
      </blockquote>
      <pre>

Say what? FormatBuffer? In GC refinement thread (which supposedly drains
card marks)? In ptr_2_card_num() converter method?

Looking deeper:

  typedef FormatBuffer<> err_msg;
...
  void check_card_num(size_t card_num) {
    assert(card_num >= 0 && card_num < _committed_max_card_num,
            err_msg("card num out of range: "SIZE_FORMAT, card_num));
  }
...
  size_t ptr_2_card_num(const jbyte* card_ptr) {
    assert(card_ptr >= _ct_bot,
           err_msg("Inavalied card pointer: "
                   "card_ptr: " PTR_FORMAT ", "
                    "_ct_bot: " PTR_FORMAT,
                    card_ptr, _ct_bot));
     size_t card_num = pointer_delta(card_ptr, _ct_bot, sizeof(jbyte));
     check_card_num(card_num,
                   err_msg("card pointer out of range: " PTR_FORMAT,
card_ptr));
     return card_num;
   }

Well, d'uh! Even though the asserts are erased in the product build,
err_msg argument for check_card_num() call is still there, thrashing
out. So, I do this quick and safe change:
  <a href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/" target="_blank">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/</a>

Also, I churn down the workload into the essential one, trashing the heap:
<a href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java" target="_blank">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java</a>

Running with latest hsx/hotspot-gc on my 1x2x2 i5 laptop, Linux x86_64,
with "{baseline,patched}/bin/java -XX:+UseG1GC -jar microbenchmarks.jar
-t 4 -f 5", which means four threads, five forked JVM runs, I got:

  baseline:  125.3 +- 4.9 ns/op
   patched:   85.8 +- 3.5 ns/op

...which means 1.47x improvement!</pre>
    </blockquote>
    <br>
    Wow! Nice finding.<br>
    <br>
    Would it be possible to keep the assert message by turning
    check_card_num() into a #define instead? I don't know if that
    matters or not, though.<br>
    <br>
    Will you create a bug report?<br>
    <br>
    thanks,<br>
    StefanK<br>
    <br>
    
    <blockquote type="cite">
      <pre>

-Aleksey.
</pre>
    </blockquote>
    <br>
  </div>

</blockquote></div>