Request for review (S): 8005972: ParNew should not update the tenuring threshold when promotion failed has occurred
Peter B. Kessler
Peter.B.Kessler at Oracle.COM
Tue Jan 15 17:43:27 UTC 2013
Good that the full GC's do the -XX:+AlwaysPreTouch thing. Still, something to remember for GC performance measurements in general.
... peter
Bengt Rutisson wrote:
>
> Hi Peter,
>
> Thanks for looking at this!
>
> On 1/11/13 11:18 PM, Peter B. Kessler wrote:
>> I don't see -XX:+AlwaysPreTouch in your command line. (Mostly because
>> I'm not sure I see the command line: for example, I don't see your 1GB
>> heap setting.)
>>
>> When you are watching GC performance before the first full collection,
>> you have to remember those faults for the OS to populate the old
>> generation. If you do that during promotions, you do it one page at a
>> time. -XX:+AlwaysPreTouch touches all of the committed old generation
>> during start-up, which turns out to be much faster. (And doesn't bill
>> the time to GC. :-)
>>
>> Maybe those forced System.gc() calls at the beginning touch all of the
>> old generation? Try a runs with and without -XX:+AlwaysPreTouch and
>> report the results, because I'm curious.
>
> AlwaysPreTouch should not make a difference since SPECjbb does two
> System.gc()s in the begin. But I did do two runs with
> -XX:+AlwaysPreTouch just in case. The resulting log files are attached.
> As you can see the results are the same.
>
> defnew: 56964
> parnew: 59224
>
> Here is the command line for the runs:
>
> java -server -XX:+AlwaysPreTouch -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:-PrintGCCause -cp ./jbb.jar:./check.jar
> -Xms1g -Xmx1g spec.jbb.JBBmain -propfile SPECjbb.props
>
> I add "-XX:+UseSerialGC" for the defnew runs and "-XX:+UseParNewGC
> -XX:ParallelGCThreads=1" for the parnew runs.
>
>>
>>
>> I also see that the young generation spaces are different between the
>> runs. Looking at the heap shape printed at th end of the GC logs
>>
>> def new generation total 314560K, used 232441K
>> [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
>> eden space 279616K, 76% used [0x00000000c0000000,
>> 0x00000000cd0b4d38, 0x00000000d1110000)
>> from space 34944K, 53% used [0x00000000d1110000,
>> 0x00000000d2359a18, 0x00000000d3330000)
>> to space 34944K, 0% used [0x00000000d3330000,
>> 0x00000000d3330000, 0x00000000d5550000)
>>
>> par new generation total 314560K, used 74831K
>> [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
>> eden space 279616K, 18% used [0x00000000c0000000,
>> 0x00000000c33154f8, 0x00000000d1110000)
>> from space 34944K, 64% used [0x00000000d1110000,
>> 0x00000000d270e760, 0x00000000d3330000)
>> to space 34944K, 0% used [0x00000000d3330000,
>> 0x00000000d3330000, 0x00000000d5550000)
>>
>> PSYoungGen total 329728K, used 93207K [0x00000000eaab0000,
>> 0x0000000100000000, 0x0000000100000000)
>> eden space 309952K, 23% used
>> [0x00000000eaab0000,0x00000000ef30dd08,0x00000000fd960000)
>> from space 19776K, 96% used
>> [0x00000000fd960000,0x00000000fec08000,0x00000000fecb0000)
>> to space 19712K, 0% used
>> [0x00000000fecc0000,0x00000000fecc0000,0x0000000100000000)
>>
>> The PSYoung eden is 10% larger than the others (because the survivors
>> are smaller?).
>
> Right. This is because the PS runs have UseAdaptiveSizePolicy enabled. I
> mostly included the PS runs for comparison. My main interest in in the
> difference between DefNew and ParNew. They have the same size eden and
> survivors.
>> The sizes and occupancy of the survivors is different between DefNew
>> and ParNew if you look in the logs.
> Yes, this is what triggered me to go and look at the code for how we
> update the tenuring threshold. It looks like we use the same ergonomics
> except for the difference that this review request is addressing. But
> this difference is not the cause of the performance difference since I
> don't get any promotion failures.
>
> Thanks again for looking at this!
> Bengt
>
>>
>> ... peter
>>
>> Bengt Rutisson wrote:
>>>
>>>
>>> Hi Ramki,
>>>
>>> On 1/11/13 6:50 PM, Srinivas Ramakrishna wrote:
>>>> Hi Bengt --
>>>>
>>>> Try computing the GC overhead by normalizing wrt the work done (for
>>>> which the net allocation volume might be a good proxy). As you
>>>> state, the performance numbers will then likely make sense. Of
>>>> course, they still won't explain why ParNew does better. As Vitaly
>>>> conjectures, the difference is likely in better object co-location
>>>> with ParNew's slightly more DFS-like evacuation compared with
>>>> DefNew's considerably more BFS-like evacuation because of the
>>>> latter's use of a pure Cheney scan compared with the use of (a)
>>>> marking stack(s) in the former, as far as i can remember the code.
>>>> One way to tell if that accounts for the difference is to measure
>>>> the cache-miss rates in the two cases (and may be use a good tool
>>>> like Solaris perf analyzer to show you where the misses are coming
>>>> from as well).
>>>
>>> Thanks for bringing the DFS/BFS difference up. This is exactly the
>>> kind of difference I was looking for. My guess is that this is what
>>> causes the difference in JBB score. I'll see if I can investigate
>>> this further.
>>>> Also curious if you can share the two sets of GC logs, by chance?
>>>> (specJBB is a for-fee benchmark so is not freely available to the
>>>> individual developer.)
>>>
>>> I have a fairly large set of logs, but the runs are very stable so
>>> I'm just attaching logs for one run for each collector. For
>>> comparison I have also been running ParallelScavenge with one thread.
>>> I'm using separate gc logs and jbb logs. The log files called
>>> ".result" are the jbb output. The other logs are the gc logs.
>>>
>>> I'm running with a heap size of 1GB to avoid full GCs. All runs have
>>> the two System.gc() induced full GCs but no other. ParallelScavenge
>>> is performing even better than ParNew, but I am mostly interested in
>>> the difference between ParNew and DefNew.
>>>
>>> A quick summary of the data in the logs:
>>>
>>> Score #GCs AverageGCTime
>>> DefNew: 57903 2083 0.044053195391262644
>>> ParNew: 61363 2213 0.05931835969272489
>>> PS: 69697 2213 0.06117092860370538
>>>
>>> ParNew has a better score even though it does more GCs and they take
>>> longer.
>>>
>>> If you have any insights from looking at the logs I would be very
>>> happy to hear about it.
>>>
>>> Thanks,
>>> Bengt
>>>>
>>>> thanks.
>>>> -- ramki
>>>>
>>>> On Fri, Jan 11, 2013 at 4:57 AM, Bengt Rutisson
>>>> <bengt.rutisson at oracle.com <mailto:bengt.rutisson at oracle.com>> wrote:
>>>>
>>>>
>>>> Hi Vitaly,
>>>>
>>>>
>>>> On 1/11/13 1:45 PM, Vitaly Davidovich wrote:
>>>>>
>>>>> Hi Bengt,
>>>>>
>>>>> Regarding the benchmark score, are you saying ParNew has longer
>>>>> cumulative GC time or just the average is higher? If it's just
>>>>> average, maybe the total # of them (and cumulative time) is
>>>>> less. I don't know the characteristics of this particular
>>>>> specjbb benchmark, but perhaps having fewer total GCs is better
>>>>> because of the overhead of getting all threads to a safe point,
>>>>> going go the OS to suspend them, and then restarting them. After
>>>>> they're restarted, the CPU cache may be cold for it because the
>>>>> GC thread polluted it. Or I'm entirely wrong in my speculation
>>>>> ... :).
>>>>>
>>>>
>>>> You have a good point about the number of GCs. The problem in my
>>>> runs is that ParNew does more GCs than DefNew. So there are both
>>>> more of them and their average time is higher, but the score is
>>>> still better. That ParNew does more GCs is not that strange. It
>>>> has a higher score, which means that it had higher throughput and
>>>> had time to create more objects. So, that is kind of expected. But
>>>> I don't understand how it can have higher throughput when the GCs
>>>> take longer. My current guess is that it does something
>>>> differently with how objects are copied in a way that is
>>>> beneficial for the execution time between GCs.
>>>>
>>>> It also seems like ParNew keeps more objects alive for each GC.
>>>> That is either the reason why it does more and more frequent GCs
>>>> than DefNew, or it is an effect of the fact that more objects are
>>>> created due to the higher throughput. This is the reason I started
>>>> looking at the tenuring threshold.
>>>>
>>>> Bengt
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>> Sent from my phone
>>>>>
>>>>> On Jan 11, 2013 6:02 AM, "Bengt Rutisson"
>>>>> <bengt.rutisson at oracle.com <mailto:bengt.rutisson at oracle.com>>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Ramki,
>>>>>
>>>>> Thanks for looking at this!
>>>>>
>>>>> On 1/10/13 9:28 PM, Srinivas Ramakrishna wrote:
>>>>>> Hi Bengt --
>>>>>>
>>>>>> The change looks reasonable, but I have a comment and a
>>>>>> follow-up question.
>>>>>>
>>>>>> Not your change, but I'd elide the "half the real survivor
>>>>>> size" since it's really a configurable parameter based on
>>>>>> TargetSurvivorRatio with default half.
>>>>>> I'd leave the comment as "set the new tenuring threshold and
>>>>>> desired survivor size".
>>>>>
>>>>> I'm fine with removing this from the comment, but I thought
>>>>> the "half the real survivor size" aimed at the fact that we
>>>>> pass only the "to" capacity and not the "from" capacity in to
>>>>> compute_tenuring_threshold(). With that interpretation I
>>>>> think the comment is correct.
>>>>>
>>>>> Would you like me to remove it anyway? Either way is fine
>>>>> with me.
>>>>>
>>>>>> I'm curious though, as to what performance data prompted
>>>>>> this change,
>>>>> Good point. This change was preceded by an internal
>>>>> discussion in the GC team, so I should probably have
>>>>> explained the background more in my review request to the
>>>>> open.
>>>>>
>>>>> I was comparing the ParNew and DefNew implementation since I
>>>>> am seeing some strange differences in some SPECjbb2005
>>>>> results. I am running ParNew with a single thread and get
>>>>> much better score than with DefNew. But I also get higher
>>>>> average GC times. So, I was trying to figure out what DefNew
>>>>> and ParNew does differently.
>>>>>
>>>>> When I was looking at DefNewGeneration::collect() and
>>>>> ParNewGeneration::collect() I saw that they contain a whole
>>>>> lot of code duplication. It would be tempting to try to
>>>>> extract the common code out into DefNewGeneration since it is
>>>>> the super class. But there are some minor differences. One of
>>>>> them was this issue with how they handle the tenuring
>>>>> threshold.
>>>>>
>>>>> We tried to figure out if there is a reason for ParNew and
>>>>> DefNew to behave different in this regard. We could not come
>>>>> up with any good reason for that. So, we needed to figure out
>>>>> if we should change ParNew or DefNew to make them consistent.
>>>>> The decision to change ParNew was based on two things. First,
>>>>> it seems wrong to use the data from a collection that got
>>>>> promotion failure. This collection will not have allowed the
>>>>> tenuring threshold to fulfill its purpose. Second,
>>>>> ParallelScavenge works the same way as DefNew.
>>>>>
>>>>> BTW, the difference between DefNew and ParNew seems to have
>>>>> been there from the start. So, there is no bug or changeset
>>>>> in mercurial or TeamWare to explain why the difference was
>>>>> introduced.
>>>>>
>>>>> (Just to be clear, this difference was not the cause of my
>>>>> performance issue. I still don't have a good explanation for
>>>>> how ParNew can have longer GC times but better SPECjbb score.)
>>>>>
>>>>>> and whether it might make sense, upon a promotion failure to
>>>>>> do something about the tenuring threshold for the next
>>>>>> scavenge (i.e. for example make the tenuring threshold half
>>>>>> of its current value as a reaction to the fact that
>>>>>> promotion failed). Is it currently left at its previous
>>>>>> value or is it asjusted back to the default max value (which
>>>>>> latter may be the wrong thing to do) or something else?
>>>>>
>>>>> As far as I can tell the tenuring threshold is left untouched
>>>>> if we get a promotion failure. It is probably a good idea to
>>>>> update it in some way. But I would prefer to handle that as a
>>>>> separate bug fix.
>>>>>
>>>>> This change is mostly a small cleanup to make
>>>>> DefNewGeneration::collect() and ParNewGeneration::collect()
>>>>> be more consistent. We've done the thinking so, it's good to
>>>>> make the change in preparation for the next person that comes
>>>>> a long and has a few cycles over and would like to merge the
>>>>> two collect() methods in some way.
>>>>>
>>>>> Thanks again for looking at this!
>>>>> Bengt
>>>>>
>>>>>>
>>>>>> -- ramki
>>>>>>
>>>>>> On Thu, Jan 10, 2013 at 1:30 AM, Bengt Rutisson
>>>>>> <bengt.rutisson at oracle.com
>>>>>> <mailto:bengt.rutisson at oracle.com>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Could I have a couple of reviews for this small change
>>>>>> to make DefNew and ParNew be more consistent in the way
>>>>>> they treat the tenuring threshold:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~brutisso/8005972/webrev.00/
>>>>>> <http://cr.openjdk.java.net/%7Ebrutisso/8005972/webrev.00/>
>>>>>>
>>>>>> Thanks,
>>>>>> Bengt
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>
More information about the hotspot-gc-dev
mailing list