<div dir="ltr">We have a faster implementation for card table scan, and on our implementation ~20s pause goes down to 1~2s with above-mentioned experiments. It doesn't solve the problem but it alleviates it (and actually shows that card table scan is faster).<div>If someone can sponsor this, I will create a patch for the review.<br><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 13, 2015 at 6:36 PM, Srinivas Ramakrishna <span dir="ltr"><<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Jon --<div><br></div><div>thanks, just a comment inline below....<br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Wed, May 13, 2015 at 4:06 PM, Jon Masamitsu <span dir="ltr"><<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span>
<br>
<br>
<div>On 5/13/2015 11:40 AM, Srinivas
Ramakrishna wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Jon --
<div><br>
</div>
<div>My expectation is that for any large block spanning
multiple cards, if you walk the BOT entries for that block
from lower addresses to higher addresses, the BOT entries
should be monotone increasing (albeit not strictly monotone)
until the last card for that block (but one). In the example
you gave below, is D > C > N_words, with the BOT entries
in some interior window of the same block/object going (low to
high addresses in the window) like " .... DDDDCCCCDDDD...." ?
In other words, there are lower-valued entries C within a
range of higher-valued entries D, with both C and D denoting
logarithmic entries for the block in question?</div>
</div>
</blockquote>
<br></span>
Yes, that's what I'm seeing.<span><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>If you look in the BOT methods, there was a parameter
"reducing" passed to some of the mark_block() methods which
was supposed to do some kind of assertion checking towards
maintenance of this monotonicity, but looking at it now, that
may not be sufficient and the assert could be strengthened at
the "end" of the range being marked. More worryingly, I
believe I had turned off even that assert before I left (ca
6uXX?) because it was failing at that time, but the
corresponding bug hadn't been fixed, and seems as though it
still isn't. This might possibly have some bearing on the
non-monotonicity in the entries you see.</div>
</div>
</blockquote>
<br></span>
Thanks for that bit of information. So what I'm seeing is not<br>
necessarily the cause for the long pauses. But see my next comment.<span><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I'll hopefully get some time to go over this code again. I
was looking at the code on the train ride into work this
morning, and trying to remember the details, but have not
quite delved again into all of its details yet or why the
assert I mention above should be failing.</div>
</div>
</blockquote>
<br></span>
After the logarithmic stride back to find the block containing the<br>
start of the object, there is a object by object walk forward to
find the<br>
to find the start of the object<br>
<br>
<blockquote type="cite"> 558 while (n <= addr) {<br>
559 debug_only(HeapWord* last = q); // for debugging<br>
560 q = n;<br>
561 n += _sp->block_size(n);<br>
</blockquote>
[...]<br>
<br>
From instrumentation I think the long pauses originate in this
forward walk<br>
(mostly in the block_size() method). So maybe the BOT
"DDDDDCCCCC...CDDDD"<br>
indicates that BOT at the start of an objects is being trashed and
we're<br>
going way too far back. I'm trying to use VerifyBlockOffsetArray
but it seems to<br>
have bit rotted. I'll keep trying.</div></blockquote><div><br></div></span><div>Ah, that's what I was afraid of. This must likely mean that when a large block is split, the suffix from the split</div><div>(which is typically returned to the free list) isn't getting fixed up as was the intention, or when the sweeper</div><div>coalesces smaller blocks into a larger one, the new larger block isn't getting fixed correctly (although that seems</div><div>unlikely). Using VerifyBlockOffsetArray and asserting if the forward walk is more than a card's worth in length</div><div>would seem to be one way to catch that badness closer to when it happens...</div><div><br></div><div>Thanks for the update, Jon!</div><div>-- ramki</div><div><div class="h5"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span><font color="#888888"><br>
<br>
Jon</font></span><div><div><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>It might be worthwhile (if it is possible) to do a bisect
on whatever the first known JDK (7uXX?) where the performance
regression appears and try to narrow down the change that may
have caused the regression, using James' test program. I have
a sneaking suspicion (because of the elided assert) that it
might well be a change I had made towards closing some
BOT-walking/card-scanning cracks when initializing CMS objects
ca 6uXX.</div>
<div><br>
</div>
<div>James, in case you have read thus far, can you provide the
exact version number of the 6uXX Oracle jdk you used that did
_not_ show the performance regression for your test program?</div>
<div><br>
</div>
<div>Thanks!</div>
<div>-- ramki</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 12, 2015 at 1:50 PM, Jon
Masamitsu <span dir="ltr"><<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Ramki,<br>
<br>
The logarithmic BOT is taking big steps back to<br>
find the start of a large object. But something is odd.<br>
<br>
Is there a reason why the BOT entries would decrease for a
while<br>
and then increase without a value that gives the start of
an object?<br>
<br>
&_array->_offset_array[index] = 0xffffffff6550f600
"DDDDDDDDDDDDDD ..."<br>
<br>
and then down the road.<br>
<br>
print &_array->_offset_array[0x110600U],
_array->_offset_array[0x10f600U ><br>
&_array->_offset_array[0x110600U] =
0xffffffff65510600
"CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC<br>
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDD<br>
<br>
'D' is 0x44<br>
<br>
N_words is 0x40<span><font color="#888888"><br>
<br>
Jon</font></span>
<div>
<div><br>
<br>
<div>On 05/12/2015 08:46 AM, Srinivas Ramakrishna
wrote:<br>
</div>
<blockquote type="cite">
<div>Thanks for that background Jon; I'll try and
look at the 8uxx code and run James's example when
I get a few spare cycles (which won't be soon,
unfortunately).</div>
<div><br>
</div>
<div>-- ramki<br>
<br>
ysr1729</div>
<div><br>
On May 8, 2015, at 14:25, Jon Masamitsu <<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div> Ramki,<br>
<br>
I compared the memory/blockOffsetTable.* files
between 6u35 and 7 and there <br>
were essentially no differences. The .cpp file
had some changes in assertions.<br>
The other two were the same except for comments.<br>
<br>
Between 6 and 6u35 (didn't pin point it), we
changed from the rather<br>
weak N_powers=3 :-) to N_powers=14.<br>
<br>
There seem to be two problematic cases.<br>
<br>
1) Spike in ParNew times that lasts for 2 (or a
few) GC's<br>
2) Continually growing ParNew times<br>
<br>
From just reading the CR<br>
<br>
<a href="https://bugs.openjdk.java.net/browse/JDK-8079274" target="_blank">https://bugs.openjdk.java.net/browse/JDK-8079274</a><br>
<br>
the test case seems to reproduce 1). The test
case<br>
has a large object that lives till the
application exits.<br>
If it were a broken logarithmic BOT, I would not
expect<br>
the ParNew pauses to shrink again. <br>
<br>
Jon<br>
<br>
<br>
<br>
<div>On 5/8/2015 11:49 AM, Srinivas Ramakrishna
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">This is indeed interesting...
<div><br>
</div>
<div>Jesper, did you figure out why (as
James seems to note) the problem is seen
in JDK 8 but not in JDK 6 (he's using
ParNew in both cases). Also, isn't the BOT
logarithmic, which will allow threads to
"quickly" traverse back to the start of
even a large object? Did something change
in the BOT that might have led to what
James sees as a regression, or did the
range that he characterizes as a "blind
spot" merely move from one place to
another?</div>
<div><br>
</div>
<div>Unfortunately, I have not been keeping
up with the changes in this area and am
not working with JDK 6 code much, so can't
offer any clues, but am surprised that
this would be a regression. (I would
expect the same performance pathologies in
this code to exist in 6 and forward -- has
it been established conclusively that this
is indeed a regression?)</div>
<div><br>
</div>
<div>thanks!</div>
<div>-- ramki</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, May 8, 2015
at 6:40 AM, James Lent <span dir="ltr"><<a href="mailto:jlent@tivo.com" target="_blank">jlent@tivo.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
The explanation of the problem added
by Jesper Wilhelmsson on <span> <span title="2015-05-06 08:59">2015-05-06
08:59 </span></span>to
JDK-8079274 is interesting, but, not
very satisfying. <br>
<br>
1) The problem affects more than this
one specific breakpoint. It impacts
objects in at least the range of about
480M to 512M. The comment "A larger
object probably triggers some higher
level code that are able to skip
scanning of this part of the memory"
does not inspire confidence. It makes
even more concerned that we do not
understand this issue and that there
could be other triggers. It seems
like "the higher level code" has a 32M
blind spot. Further I can create
scenarios where the GC code takes
progressively longer to clear the
issue. Presumably it completes the
lower block scan during each GC
cycle. Why does it take longer and
longer each time (my guess is it has
something to do with the number of
objects moved to survivor space). If
I cache 1 in 100 instead of 1 in 10
then there are 11 progressively slower
ParNewGC (after several fast one) and
the issue takes minutes to clear: <br>
<br>
$ /usr/lib/jvm/java-7-oracle/bin/java
-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -Xmx4G -Xms4G
LongParNewPause 536870384 100 100 <br>
0.308: [GC0.308: [ParNew:
272640K->3029K(306688K), 0.0113180
secs] 796927K->527316K(4160256K),
0.0114040 secs] [Times: user=0.05
sys=0.00, real=0.02 secs] <br>
0.367: [GC0.367: [ParNew:
275669K->7766K(306688K), 0.0140840
secs] 799956K->532053K(4160256K),
0.0141630 secs] [Times: user=0.04
sys=0.01, real=0.02 secs] <br>
0.430: [GC0.430: [ParNew:
280406K->10057K(306688K), 0.0120810
secs] 804693K->534344K(4160256K),
0.0121690 secs] [Times: user=0.05
sys=0.00, real=0.02 secs] <br>
0.490: [GC0.490: [ParNew:
282697K->15383K(306688K), 0.0128540
secs] 806984K->539670K(4160256K),
0.0129390 secs] [Times: user=0.05
sys=0.00, real=0.02 secs] <br>
0.551: [GC0.551: [ParNew:
288023K->21006K(306688K), 0.0146480
secs] 812310K->545294K(4160256K),
0.0147650 secs] [Times: user=0.05
sys=0.01, real=0.01 secs] <br>
0.613: [GC0.613: [ParNew:
293646K->26805K(306688K), 0.0141250
secs] 817934K->551092K(4160256K),
0.0142050 secs] [Times: user=0.05
sys=0.00, real=0.01 secs] <br>
0.675: [GC0.675: [ParNew:
299445K->24258K(306688K), 0.4816840
secs] 823732K->551436K(4160256K),
0.4817620 secs] [Times: user=1.90
sys=0.00, real=0.48 secs] <br>
1.205: [GC1.205: [ParNew:
296898K->17914K(306688K), 3.4216550
secs] 824076K->547788K(4160256K),
3.4217180 secs] [Times: user=13.50
sys=0.00, real=3.42 secs] <br>
4.674: [GC4.674: [ParNew:
290554K->17433K(306688K), 6.3406820
secs] 820428K->550031K(4160256K),
6.3407410 secs] [Times: user=25.00
sys=0.02, real=6.34 secs] <br>
11.062: [GC11.062: [ParNew:
290073K->17315K(306688K),
10.9493130 secs]
822671K->552636K(4160256K),
10.9493890 secs] [Times: user=42.98
sys=0.03, real=10.95 secs] <br>
22.059: [GC22.059: [ParNew:
289955K->17758K(306688K),
16.0539650 secs]
825276K->555798K(4160256K),
16.0540250 secs] [Times: user=62.44
sys=0.09, real=16.05 secs] <br>
38.161: [GC38.161: [ParNew:
290398K->17745K(306688K),
20.6900820 secs]
828438K->558512K(4160256K),
20.6901420 secs] [Times: user=80.79
sys=0.08, real=20.69 secs] <br>
58.898: [GC58.898: [ParNew:
290385K->21826K(306688K),
24.6781700 secs]
831152K->565310K(4160256K),
24.6782710 secs] [Times: user=95.58
sys=0.05, real=24.68 secs] <br>
83.625: [GC83.625: [ParNew:
294466K->21834K(306688K),
30.3501800 secs]
837950K->568044K(4160256K),
30.3502610 secs] [Times: user=115.80
sys=0.11, real=30.35 secs] <br>
114.022: [GC114.022: [ParNew:
294474K->21836K(306688K),
33.6648800 secs]
840684K->570772K(4160256K),
33.6649390 secs] [Times: user=127.38
sys=0.14, real=33.67 secs] <br>
147.736: [GC147.736: [ParNew:
294476K->18454K(306688K),
38.9828430 secs]
843412K->570117K(4160256K),
38.9829040 secs] [Times: user=149.14
sys=0.15, real=38.99 secs] <br>
186.766: [GC186.766: [ParNew:
291094K->26602K(306688K),
39.8726730 secs]
842757K->580977K(4160256K),
39.8727330 secs] [Times: user=156.35
sys=0.11, real=39.88 secs] <br>
226.686: [GC226.686: [ParNew:
299242K->26612K(306688K), 0.0174470
secs] 853617K->583714K(4160256K),
0.0175300 secs] [Times: user=0.06
sys=0.00, real=0.02 secs] <br>
<br>
2) The suggestion to use only 1 thread
is not acceptable to me. Use parallel
GC with one thread? Further it did
not work for me (in fact it made
things worse). "user" time does now
equal "real" time (and the new size is
1/4 as big) so I think the parameter
took effect: <br>
<br>
$ /usr/lib/jvm/java-8-oracle/bin/java
-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -Xmx4G -Xms4G
-XX:ParallelGCThreads=1
LongParNewPause 536870384 100 10 <br>
0.226: [GC (Allocation Failure) 0.226:
[ParNew: 68160K->6980K(76672K),
0.0489756 secs]
592447K->531267K(4185792K),
0.0490718 secs] [Times: user=0.05
sys=0.00, real=0.05 secs] <br>
0.291: [GC (Allocation Failure) 0.291:
[ParNew: 75140K->8512K(76672K),
29.8707281 secs]
599427K->539479K(4185792K),
29.8707954 secs] [Times: user=29.80
sys=0.00, real=29.88 secs] <br>
30.180: [GC (Allocation Failure)
30.180: [ParNew:
76672K->8093K(76672K), 30.6357699
secs] 607639K->545772K(4185792K),
30.6358257 secs] [Times: user=30.57
sys=0.00, real=30.63 secs] <br>
60.837: [GC (Allocation Failure)
60.837: [ParNew:
76253K->6830K(76672K), 69.2115255
secs] 613932K->552169K(4185792K),
69.2115783 secs] [Times: user=69.05
sys=0.01, real=69.22 secs] <br>
130.068: [GC (Allocation Failure)
130.068: [ParNew:
74990K->8226K(76672K), 132.5118903
secs] 620329K->560381K(4185792K),
132.5119459 secs] [Times: user=132.21
sys=0.00, real=132.52 secs] <br>
262.603: [GC (Allocation Failure)
262.603: [ParNew:
76386K->8512K(76672K), 0.0400138
secs] 628541K->568748K(4185792K),
0.0400687 secs] [Times: user=0.04
sys=0.00, real=0.04 secs] <br>
<br>
3) Applications should not be expected
to tune the GC settings to account for
the size of one static object. The
connection between the size of this
object and the GC performance was not
easy to even find. Further the
parameter ParGCCardsPerStrideChunk is
pretty low level and not one I think
many application developers are
familiar with.<br>
<br>
Error: VM option
'ParGCCardsPerStrideChunk' is
diagnostic and must be enabled via
-XX:+UnlockDiagnosticVMOptions.<span><br>
Error: Could not create the Java
Virtual Machine.<br>
Error: A fatal exception has
occurred. Program will exit.<br>
<br>
</span> 4) I can not reproduce this
issue in Oracle 6. This appears to be
a regression. How can fixing a
regression be called an enhancement
request? <br>
<span> <br>
<br>
<br>
<br>
<br>
<br>
<br>
<hr> <font color="Gray" face="Arial" size="1"><br>
This email and any attachments may
contain confidential and
privileged material for the sole
use of the intended recipient. Any
review, copying, or distribution
of this email (or any attachments)
by others is prohibited. If you
are not the intended recipient,
please contact the sender
immediately and permanently delete
this email and any attachments. No
employee or agent of TiVo Inc. is
authorized to conclude any binding
agreement on behalf of TiVo Inc.
by email. Binding agreements with
TiVo Inc. may only be made by a
signed written agreement.<br>
</font> </span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</blockquote>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div></div></div><br></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><span style="font-size:12.8000001907349px">Jungwoo Ha | Java Platform Team | <a href="mailto:jwha@google.com" target="_blank">jwha@google.com</a></span><br></div><div><br></div></div></div>
</div>