Request for review (S): 8005972: ParNew should not update the tenuring threshold when promotion failed has occurred

Mon Jan 14 13:13:09 UTC 2013

Hi Peter,

Thanks for looking at this!

On 1/11/13 11:18 PM, Peter B. Kessler wrote:
> I don't see -XX:+AlwaysPreTouch in your command line.  (Mostly because 
> I'm not sure I see the command line: for example, I don't see your 1GB 
> heap setting.)
>
> When you are watching GC performance before the first full collection, 
> you have to remember those faults for the OS to populate the old 
> generation.  If you do that during promotions, you do it one page at a 
> time.  -XX:+AlwaysPreTouch touches all of the committed old generation 
> during start-up, which turns out to be much faster.  (And doesn't bill 
> the time to GC. :-)
>
> Maybe those forced System.gc() calls at the beginning touch all of the 
> old generation?  Try a runs with and without -XX:+AlwaysPreTouch and 
> report the results, because I'm curious.

AlwaysPreTouch should not make a difference since SPECjbb does two 
System.gc()s in the begin. But I did do two runs with 
-XX:+AlwaysPreTouch just in case. The resulting log files are attached. 
As you can see the results are the same.

defnew:  56964
parnew:  59224

Here is the command line for the runs:

java -server -XX:+AlwaysPreTouch -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:-PrintGCCause -cp ./jbb.jar:./check.jar 
-Xms1g -Xmx1g spec.jbb.JBBmain -propfile SPECjbb.props

I add "-XX:+UseSerialGC" for the defnew runs and "-XX:+UseParNewGC 
-XX:ParallelGCThreads=1" for the parnew runs.

>
>
> I also see that the young generation spaces are different between the 
> runs.  Looking at the heap shape printed at th end of the GC logs
>
>     def new generation   total 314560K, used 232441K 
> [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
>      eden space 279616K,  76% used [0x00000000c0000000, 
> 0x00000000cd0b4d38, 0x00000000d1110000)
>      from space 34944K,  53% used [0x00000000d1110000, 
> 0x00000000d2359a18, 0x00000000d3330000)
>      to   space 34944K,   0% used [0x00000000d3330000, 
> 0x00000000d3330000, 0x00000000d5550000)
>
>     par new generation   total 314560K, used 74831K 
> [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
>      eden space 279616K,  18% used [0x00000000c0000000, 
> 0x00000000c33154f8, 0x00000000d1110000)
>      from space 34944K,  64% used [0x00000000d1110000, 
> 0x00000000d270e760, 0x00000000d3330000)
>      to   space 34944K,   0% used [0x00000000d3330000, 
> 0x00000000d3330000, 0x00000000d5550000)
>
>     PSYoungGen      total 329728K, used 93207K [0x00000000eaab0000, 
> 0x0000000100000000, 0x0000000100000000)
>      eden space 309952K, 23% used 
> [0x00000000eaab0000,0x00000000ef30dd08,0x00000000fd960000)
>      from space 19776K, 96% used 
> [0x00000000fd960000,0x00000000fec08000,0x00000000fecb0000)
>      to   space 19712K, 0% used 
> [0x00000000fecc0000,0x00000000fecc0000,0x0000000100000000)
>
> The PSYoung eden is 10% larger than the others (because the survivors 
> are smaller?).

Right. This is because the PS runs have UseAdaptiveSizePolicy enabled. I 
mostly included the PS runs for comparison. My main interest in in the 
difference between DefNew and ParNew. They have the same size eden and 
survivors.
> The sizes and occupancy of the survivors is different between DefNew 
> and ParNew if you look in the logs.
Yes, this is what triggered me to go and look at the code for how we 
update the tenuring threshold. It looks like we use the same ergonomics 
except for the difference that this review request is addressing. But 
this difference is not the cause of the performance difference since I 
don't get any promotion failures.

Thanks again for looking at this!
Bengt

>
>             ... peter
>
> Bengt Rutisson wrote:
>>
>>
>> Hi Ramki,
>>
>> On 1/11/13 6:50 PM, Srinivas Ramakrishna wrote:
>>> Hi Bengt --
>>>
>>> Try computing the GC overhead by normalizing wrt the work done (for 
>>> which the net allocation volume might be a good proxy). As you 
>>> state, the performance numbers will then likely make sense. Of 
>>> course, they still won't explain why ParNew does better.  As Vitaly 
>>> conjectures, the difference is likely in better object co-location 
>>> with ParNew's slightly more DFS-like evacuation compared with 
>>> DefNew's considerably more BFS-like evacuation because of the 
>>> latter's use of a pure Cheney scan compared with the use of (a) 
>>> marking stack(s) in the former, as far as i can remember the code. 
>>> One way to tell if that accounts for the difference is to measure 
>>> the cache-miss rates in the two cases (and may be use a good tool 
>>> like Solaris perf analyzer to show you where the misses are coming 
>>> from as well).
>>
>> Thanks for bringing the DFS/BFS difference up. This is exactly the 
>> kind of difference I was looking for. My guess is that this is what 
>> causes the difference in JBB score. I'll see if I can investigate 
>> this further.
>>> Also curious if you can share the two sets of GC logs, by chance? 
>>> (specJBB is a for-fee benchmark so is not freely available to the 
>>> individual developer.)
>>
>> I have a fairly large set of logs, but the runs are very stable so 
>> I'm just attaching logs for one run for each collector. For 
>> comparison I have also been running ParallelScavenge with one thread. 
>> I'm using separate gc logs and jbb logs. The log files called 
>> ".result" are the jbb output. The other logs are the gc logs.
>>
>> I'm running with a heap size of 1GB to avoid full GCs. All runs have 
>> the two System.gc() induced full GCs but no other. ParallelScavenge 
>> is performing even better than ParNew, but I am mostly interested in 
>> the difference between ParNew and DefNew.
>>
>> A quick summary of the data in the logs:
>>
>>         Score  #GCs  AverageGCTime
>> DefNew: 57903  2083  0.044053195391262644
>> ParNew: 61363  2213  0.05931835969272489
>> PS:     69697  2213  0.06117092860370538
>>
>> ParNew has a better score even though it does more GCs and they take 
>> longer.
>>
>> If you have any insights from looking at the logs I would be very 
>> happy to hear about it.
>>
>> Thanks,
>> Bengt
>>>
>>> thanks.
>>> -- ramki
>>>
>>> On Fri, Jan 11, 2013 at 4:57 AM, Bengt Rutisson 
>>> <bengt.rutisson at oracle.com <mailto:bengt.rutisson at oracle.com>> wrote:
>>>
>>>
>>>     Hi Vitaly,
>>>
>>>
>>>     On 1/11/13 1:45 PM, Vitaly Davidovich wrote:
>>>>
>>>>     Hi Bengt,
>>>>
>>>>     Regarding the benchmark score, are you saying ParNew has longer
>>>>     cumulative GC time or just the average is higher? If it's just
>>>>     average, maybe the total # of them (and cumulative time) is
>>>>     less.  I don't know the characteristics of this particular
>>>>     specjbb benchmark, but perhaps having fewer total GCs is better
>>>>     because of the overhead of getting all threads to a safe point,
>>>>     going go the OS to suspend them, and then restarting them.  After
>>>>     they're restarted, the CPU cache may be cold for it because the
>>>>     GC thread polluted it.  Or I'm entirely wrong in my speculation
>>>>     ... :).
>>>>
>>>
>>>     You have a good point about the number of GCs. The problem in my
>>>     runs is that ParNew does more GCs than DefNew. So there are both
>>>     more of them and their average time is higher, but the score is
>>>     still better. That ParNew does more GCs is not that strange. It
>>>     has a higher score, which means that it had higher throughput and
>>>     had time to create more objects. So, that is kind of expected. But
>>>     I don't understand how it can have higher throughput when the GCs
>>>     take longer. My current guess is that it does something
>>>     differently with how objects are copied in a way that is
>>>     beneficial for the execution time between GCs.
>>>
>>>     It also seems like ParNew keeps more objects alive for each GC.
>>>     That is either the reason why it does more and more frequent GCs
>>>     than DefNew, or it is an effect of the fact that more objects are
>>>     created due to the higher throughput. This is the reason I started
>>>     looking at the tenuring threshold.
>>>
>>>     Bengt
>>>
>>>
>>>>     Thanks
>>>>
>>>>     Sent from my phone
>>>>
>>>>     On Jan 11, 2013 6:02 AM, "Bengt Rutisson"
>>>>     <bengt.rutisson at oracle.com <mailto:bengt.rutisson at oracle.com>> 
>>>> wrote:
>>>>
>>>>
>>>>         Hi Ramki,
>>>>
>>>>         Thanks for looking at this!
>>>>
>>>>         On 1/10/13 9:28 PM, Srinivas Ramakrishna wrote:
>>>>>         Hi Bengt --
>>>>>
>>>>>         The change looks reasonable, but I have a comment and a
>>>>>         follow-up question.
>>>>>
>>>>>         Not your change, but I'd elide the "half the real survivor
>>>>>         size" since it's really a configurable parameter based on
>>>>>         TargetSurvivorRatio with default half.
>>>>>         I'd leave the comment as "set the new tenuring threshold and
>>>>>         desired survivor size".
>>>>
>>>>         I'm fine with removing this from the comment, but I thought
>>>>         the "half the real survivor size" aimed at the fact that we
>>>>         pass only the "to" capacity and not the "from" capacity in to
>>>>         compute_tenuring_threshold(). With that interpretation I
>>>>         think the comment is correct.
>>>>
>>>>         Would you like me to remove it anyway? Either way is fine
>>>>         with me.
>>>>
>>>>>         I'm curious though, as to what performance data prompted
>>>>>         this change,
>>>>         Good point. This change was preceded by an internal
>>>>         discussion in the GC team, so I should probably have
>>>>         explained the background more in my review request to the 
>>>> open.
>>>>
>>>>         I was comparing the ParNew and DefNew implementation since I
>>>>         am seeing some strange differences in some SPECjbb2005
>>>>         results. I am running ParNew with a single thread and get
>>>>         much better score than with DefNew. But I also get higher
>>>>         average GC times. So, I was trying to figure out what DefNew
>>>>         and ParNew does differently.
>>>>
>>>>         When I was looking at DefNewGeneration::collect() and
>>>>         ParNewGeneration::collect() I saw that they contain a whole
>>>>         lot of code duplication. It would be tempting to try to
>>>>         extract the common code out into DefNewGeneration since it is
>>>>         the super class. But there are some minor differences. One of
>>>>         them was this issue with how they handle the tenuring 
>>>> threshold.
>>>>
>>>>         We tried to figure out if there is a reason for ParNew and
>>>>         DefNew to behave different in this regard. We could not come
>>>>         up with any good reason for that. So, we needed to figure out
>>>>         if we should change ParNew or DefNew to make them consistent.
>>>>         The decision to change ParNew was based on two things. First,
>>>>         it seems wrong to use the data from a collection that got
>>>>         promotion failure. This collection will not have allowed the
>>>>         tenuring threshold to fulfill its purpose. Second,
>>>>         ParallelScavenge works the same way as DefNew.
>>>>
>>>>         BTW, the difference between DefNew and ParNew seems to have
>>>>         been there from the start. So, there is no bug or changeset
>>>>         in mercurial or TeamWare to explain why the difference was
>>>>         introduced.
>>>>
>>>>         (Just to be clear, this difference was not the cause of my
>>>>         performance issue. I still don't have a good explanation for
>>>>         how ParNew can have longer GC times but better SPECjbb score.)
>>>>
>>>>>         and whether it might make sense, upon a promotion failure to
>>>>>         do something about the tenuring threshold for the next
>>>>>         scavenge (i.e. for example make the tenuring threshold half
>>>>>         of its current value as a reaction to the fact that
>>>>>         promotion failed). Is it currently left at its previous
>>>>>         value or is it asjusted back to the default max value (which
>>>>>         latter may be the wrong thing to do) or something else?
>>>>
>>>>         As far as I can tell the tenuring threshold is left untouched
>>>>         if we get a promotion failure. It is probably a good idea to
>>>>         update it in some way. But I would prefer to handle that as a
>>>>         separate bug fix.
>>>>
>>>>         This change is mostly a small cleanup to make
>>>>         DefNewGeneration::collect() and ParNewGeneration::collect()
>>>>         be more consistent. We've done the thinking so, it's good to
>>>>         make the change in preparation for the next person that comes
>>>>         a long and has a few cycles over and would like to merge the
>>>>         two collect() methods in some way.
>>>>
>>>>         Thanks again for looking at this!
>>>>         Bengt
>>>>
>>>>>
>>>>>         -- ramki
>>>>>
>>>>>         On Thu, Jan 10, 2013 at 1:30 AM, Bengt Rutisson
>>>>>         <bengt.rutisson at oracle.com
>>>>>         <mailto:bengt.rutisson at oracle.com>> wrote:
>>>>>
>>>>>
>>>>>             Hi everyone,
>>>>>
>>>>>             Could I have a couple of reviews for this small change
>>>>>             to make DefNew and ParNew be more consistent in the way
>>>>>             they treat the tenuring threshold:
>>>>>
>>>>> http://cr.openjdk.java.net/~brutisso/8005972/webrev.00/
>>>>> <http://cr.openjdk.java.net/%7Ebrutisso/8005972/webrev.00/>
>>>>>
>>>>>             Thanks,
>>>>>             Bengt
>>>>>
>>>>>
>>>>
>>>
>>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alwayspretouch.zip
Type: application/zip
Size: 130581 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130114/e7df33ca/alwayspretouch.zip>