<p dir="ltr">Maybe instead turn err_msg into a macro (e.g. ERR_MSG) that expands to NULL if assertions are disabled?</p>
<p dir="ltr">Sent from my phone</p>
<div class="gmail_quote">On Sep 11, 2013 5:16 PM, "Stefan Karlsson" <<a href="mailto:stefan.karlsson@oracle.com">stefan.karlsson@oracle.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 9/11/13 10:46 PM, Aleksey Shipilev
wrote:<br>
</div>
<blockquote type="cite">
<pre>(please disregard the previous message, if any, sent from un-subscribed
email address)
Hi there,
I did not believe my eyes at first. Please look at this and tell me I am
not imagining things because of too much green tea. I can submit the bug
after somebody else confirms.
There is some workload which runs significantly slower with G1 than with
Parallel GC. So, I profile this application, and what I see?
</pre>
<blockquote type="cite">
<pre> | | +- 131.100 (60%) ConcurrentG1RefineThread::run()
| | | +- 131.020 (60%) DirtyCardQueueSet::apply_closure_to_completed_buffer(int, int, bool)
| | | | +- 130.870 (60%) DirtyCardQueueSet::apply_closure_to_completed_buffer_helper(CardTableEntryClosure*, int, BufferNode*)
| | | | | +- 128.650 (59%) RefineCardTableEntryClosure::do_card_ptr(signed char*, int)
| | | | | | +- 127.290 (59%) G1RemSet::refine_card(signed char*, int, bool)
| | | | | | +- 61.260 (28%) G1HotCardCache::insert(signed char*)
| | | | | | | +- 59.650 (27%) G1CardCounts::add_card_count(signed char*)
| | | | | | | | +- 57.340 (26%) G1CardCounts::ptr_2_card_num(const signed char*)
| | | | | | | | +- 56.420 (26%) FormatBuffer<256>::FormatBuffer(const char*, ...)
| | | | | | | | +- 54.240 (25%) vsnprintf
| | | | | | | | | +- 44.350 (20%) vfprintf
| | | | | | | | | | +- 8.950 (4%) _IO_default_xsputn
| | | | | | | | | | | +- 1.660 (1%) <static>@0x8b5f4 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
| | | | | | | | | | +- 5.160 (2%) _IO_padn
| | | | | | | | | | +- 4.470 (2%) <static>@0x4872c (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
| | | | | | | | | | +- 4.110 (2%) free
| | | | | | | | | | +- 1.780 (1%) strchrnul
| | | | | | | | | +- 3.450 (2%) <static>@0x7c3f7 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
| | | | | | | | | +- 2.870 (1%) <static>@0x7d270 (<<a href="http://libc-2.15.so" target="_blank">libc-2.15.so</a>>)
| | | | | | | | | +- 1.200 (1%) strchrnul
| | | | | | | | +- 0.290 (0%) jio_vsnprintf
</pre>
</blockquote>
<pre>
Say what? FormatBuffer? In GC refinement thread (which supposedly drains
card marks)? In ptr_2_card_num() converter method?
Looking deeper:
typedef FormatBuffer<> err_msg;
...
void check_card_num(size_t card_num) {
assert(card_num >= 0 && card_num < _committed_max_card_num,
err_msg("card num out of range: "SIZE_FORMAT, card_num));
}
...
size_t ptr_2_card_num(const jbyte* card_ptr) {
assert(card_ptr >= _ct_bot,
err_msg("Inavalied card pointer: "
"card_ptr: " PTR_FORMAT ", "
"_ct_bot: " PTR_FORMAT,
card_ptr, _ct_bot));
size_t card_num = pointer_delta(card_ptr, _ct_bot, sizeof(jbyte));
check_card_num(card_num,
err_msg("card pointer out of range: " PTR_FORMAT,
card_ptr));
return card_num;
}
Well, d'uh! Even though the asserts are erased in the product build,
err_msg argument for check_card_num() call is still there, thrashing
out. So, I do this quick and safe change:
<a href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/" target="_blank">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/webrev/</a>
Also, I churn down the workload into the essential one, trashing the heap:
<a href="http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java" target="_blank">http://cr.openjdk.java.net/~shade/scratch/g1-cardmark-wtf/test/src/main/java/org/sample/MyBenchmark.java</a>
Running with latest hsx/hotspot-gc on my 1x2x2 i5 laptop, Linux x86_64,
with "{baseline,patched}/bin/java -XX:+UseG1GC -jar microbenchmarks.jar
-t 4 -f 5", which means four threads, five forked JVM runs, I got:
baseline: 125.3 +- 4.9 ns/op
patched: 85.8 +- 3.5 ns/op
...which means 1.47x improvement!</pre>
</blockquote>
<br>
Wow! Nice finding.<br>
<br>
Would it be possible to keep the assert message by turning
check_card_num() into a #define instead? I don't know if that
matters or not, though.<br>
<br>
Will you create a bug report?<br>
<br>
thanks,<br>
StefanK<br>
<br>
<blockquote type="cite">
<pre>
-Aleksey.
</pre>
</blockquote>
<br>
</div>
</blockquote></div>