<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 04/28/2014 11:43 PM, Bengt Rutisson
wrote:<br>
</div>
<blockquote cite="mid:535F49F7.6060204@oracle.com" type="cite">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<div class="moz-cite-prefix"><br>
Hi Jon,<br>
<br>
On 4/28/14 11:17 PM, Jon Masamitsu wrote:<br>
</div>
<blockquote cite="mid:535EC568.7060207@oracle.com" type="cite">The
requirement that an evacuation failure not happen during this <br>
test is based on the expected behavior of the GC and is not a <br>
required behavior. In some instance the evacuation failure will
<br>
happen, but it is a not a GC failure if it does and is only an
<br>
unexpected path being followed. <br>
<br>
The test is not reliable but before removing it, I've made <br>
some changes to try and save it. I've modified the <br>
test to slow down the allocations and changed the allocation to
<br>
allocate smaller objects (which also has a side effect of
slowing <br>
allocations). The goal is to detect gross breakages of <br>
evacuation failure while risking only very, very rare spurious <br>
failures. <br>
<br>
I had reproduced the failure with the unmodified test and it <br>
would fail within 30 minutes. With the modifications, I haven't
<br>
seen the failure in a day of testing. <br>
<br>
If the modifications don't work, I'll remove the test. <br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Ejmasa/8038928/webrev.00/">http://cr.openjdk.java.net/~jmasa/8038928/webrev.00/</a>
<br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8038928">https://bugs.openjdk.java.net/browse/JDK-8038928</a>
<br>
</blockquote>
<br>
Slowing down the test does not seem like a stable solution. Just
like you point out.<br>
<br>
What do you think about this instead?<br>
<br>
The original code does:<br>
<br>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
// create 128MB of garbage. This should result in at least one GC<br>
for (int i = 0; i < 1024; i++) {<br>
garbage = new byte[128 * 1024];<br>
}<br>
<br>
We run with
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
-Xmx10M but no -Xmn set. We should only ever promote one object
each GC, so I assume that what happens when we get an evacuation
failure is that we get too many GCs that it fills up the old
space.<br>
<br>
How about specifying -Xmn and only allocate enough to fill the
young gen a few times. Instead of allocating 128MB we could maybe
run with -Xmn2M and allocate 8MB worth of objects. That should be
enough to get a few GCs but not enough to fill the old gen up. If
you want to be really safe you could also increase -Xmx to
something like 128M.<br>
</blockquote>
<br>
The new code for GCTest is<br>
<br>
public static void main(String [] args) {<br>
System.out.println("Creating garbage");<br>
// create 128MB of garbage. This should result in at least one
GC<br>
for (int i = 0; i < 1024; i++) {<br>
work(i);<br>
for (int k = 0; k < 1024; k++) {<br>
garbage = new byte[128];<br>
}<br>
}<br>
System.out.println("Done");<br>
}<br>
}<br>
<br>
This still does as many young GC's but promotes less since object
size<br>
is only about 128 bytes. I think that puts less pressure on the old
gen<br>
the way doing fewer GC's as you suggest. I ran this test (without
the<br>
work method) for about half day without any failures and then I<br>
got an evacuation failure. I looked at the heap and the old gen was<br>
full. I thought then that the allocation rate was just too high. I<br>
tried lowering the initiating occupancy but ran into another bug.<br>
I then added the work() method and it's been running (product and<br>
fastdebug) for a couple of days. <br>
<br>
I could reshape the heap as you say and avoid the evacuation<br>
failure but I don't know how much value that is. Might be the<br>
same as removing the requirement for no evacuation failure.<br>
I settled on this because I thought I was close to an evacuation<br>
failure (if something in the G1 changed like slower mixed
collections<br>
or concurrent marking cycle starting way too late, this might<br>
caught it) but not too close (so that it happened very rarely<br>
if it was just happenstance). <br>
<br>
My first thought was just to remove the requirement that no<br>
evacuation failure happened, but I vaguely felt it might be worth
trying<br>
to save. <br>
<br>
Jon<br>
<br>
<blockquote cite="mid:535F49F7.6060204@oracle.com" type="cite"> <br>
Thanks,<br>
Bengt<br>
<br>
<blockquote cite="mid:535EC568.7060207@oracle.com" type="cite"> <br>
Thanks. <br>
<br>
Jon <br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>