Strange Long ParNew GC Pauses (Sample Code Included)
Jon Masamitsu
jon.masamitsu at oracle.com
Tue May 12 20:50:29 UTC 2015
Ramki,
The logarithmic BOT is taking big steps back to
find the start of a large object. But something is odd.
Is there a reason why the BOT entries would decrease for a while
and then increase without a value that gives the start of an object?
&_array->_offset_array[index] = 0xffffffff6550f600 "DDDDDDDDDDDDDD ..."
and then down the road.
print &_array->_offset_array[0x110600U], _array->_offset_array[0x10f600U >
&_array->_offset_array[0x110600U] = 0xffffffff65510600
"CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDD
'D' is 0x44
N_words is 0x40
Jon
On 05/12/2015 08:46 AM, Srinivas Ramakrishna wrote:
> Thanks for that background Jon; I'll try and look at the 8uxx code and
> run James's example when I get a few spare cycles (which won't be
> soon, unfortunately).
>
> -- ramki
>
> ysr1729
>
> On May 8, 2015, at 14:25, Jon Masamitsu <jon.masamitsu at oracle.com
> <mailto:jon.masamitsu at oracle.com>> wrote:
>
>> Ramki,
>>
>> I compared the memory/blockOffsetTable.* files between 6u35 and 7 and
>> there
>> were essentially no differences. The .cpp file had some changes in
>> assertions.
>> The other two were the same except for comments.
>>
>> Between 6 and 6u35 (didn't pin point it), we changed from the rather
>> weak N_powers=3 :-) to N_powers=14.
>>
>> There seem to be two problematic cases.
>>
>> 1) Spike in ParNew times that lasts for 2 (or a few) GC's
>> 2) Continually growing ParNew times
>>
>> From just reading the CR
>>
>> https://bugs.openjdk.java.net/browse/JDK-8079274
>>
>> the test case seems to reproduce 1). The test case
>> has a large object that lives till the application exits.
>> If it were a broken logarithmic BOT, I would not expect
>> the ParNew pauses to shrink again.
>>
>> Jon
>>
>>
>>
>> On 5/8/2015 11:49 AM, Srinivas Ramakrishna wrote:
>>> This is indeed interesting...
>>>
>>> Jesper, did you figure out why (as James seems to note) the problem
>>> is seen in JDK 8 but not in JDK 6 (he's using ParNew in both cases).
>>> Also, isn't the BOT logarithmic, which will allow threads to
>>> "quickly" traverse back to the start of even a large object? Did
>>> something change in the BOT that might have led to what James sees
>>> as a regression, or did the range that he characterizes as a "blind
>>> spot" merely move from one place to another?
>>>
>>> Unfortunately, I have not been keeping up with the changes in this
>>> area and am not working with JDK 6 code much, so can't offer any
>>> clues, but am surprised that this would be a regression. (I would
>>> expect the same performance pathologies in this code to exist in 6
>>> and forward -- has it been established conclusively that this is
>>> indeed a regression?)
>>>
>>> thanks!
>>> -- ramki
>>>
>>> On Fri, May 8, 2015 at 6:40 AM, James Lent <jlent at tivo.com
>>> <mailto:jlent at tivo.com>> wrote:
>>>
>>> The explanation of the problem added by Jesper Wilhelmsson on
>>> 2015-05-06 08:59 to JDK-8079274 is interesting, but, not very
>>> satisfying.
>>>
>>> 1) The problem affects more than this one specific breakpoint.
>>> It impacts objects in at least the range of about 480M to 512M.
>>> The comment "A larger object probably triggers some higher level
>>> code that are able to skip scanning of this part of the memory"
>>> does not inspire confidence. It makes even more concerned that
>>> we do not understand this issue and that there could be other
>>> triggers. It seems like "the higher level code" has a 32M blind
>>> spot. Further I can create scenarios where the GC code takes
>>> progressively longer to clear the issue. Presumably it completes
>>> the lower block scan during each GC cycle. Why does it take
>>> longer and longer each time (my guess is it has something to do
>>> with the number of objects moved to survivor space). If I cache
>>> 1 in 100 instead of 1 in 10 then there are 11 progressively
>>> slower ParNewGC (after several fast one) and the issue takes
>>> minutes to clear:
>>>
>>> $ /usr/lib/jvm/java-7-oracle/bin/java -verbose:gc
>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870384
>>> 100 100
>>> 0.308: [GC0.308: [ParNew: 272640K->3029K(306688K), 0.0113180
>>> secs] 796927K->527316K(4160256K), 0.0114040 secs] [Times:
>>> user=0.05 sys=0.00, real=0.02 secs]
>>> 0.367: [GC0.367: [ParNew: 275669K->7766K(306688K), 0.0140840
>>> secs] 799956K->532053K(4160256K), 0.0141630 secs] [Times:
>>> user=0.04 sys=0.01, real=0.02 secs]
>>> 0.430: [GC0.430: [ParNew: 280406K->10057K(306688K), 0.0120810
>>> secs] 804693K->534344K(4160256K), 0.0121690 secs] [Times:
>>> user=0.05 sys=0.00, real=0.02 secs]
>>> 0.490: [GC0.490: [ParNew: 282697K->15383K(306688K), 0.0128540
>>> secs] 806984K->539670K(4160256K), 0.0129390 secs] [Times:
>>> user=0.05 sys=0.00, real=0.02 secs]
>>> 0.551: [GC0.551: [ParNew: 288023K->21006K(306688K), 0.0146480
>>> secs] 812310K->545294K(4160256K), 0.0147650 secs] [Times:
>>> user=0.05 sys=0.01, real=0.01 secs]
>>> 0.613: [GC0.613: [ParNew: 293646K->26805K(306688K), 0.0141250
>>> secs] 817934K->551092K(4160256K), 0.0142050 secs] [Times:
>>> user=0.05 sys=0.00, real=0.01 secs]
>>> 0.675: [GC0.675: [ParNew: 299445K->24258K(306688K), 0.4816840
>>> secs] 823732K->551436K(4160256K), 0.4817620 secs] [Times:
>>> user=1.90 sys=0.00, real=0.48 secs]
>>> 1.205: [GC1.205: [ParNew: 296898K->17914K(306688K), 3.4216550
>>> secs] 824076K->547788K(4160256K), 3.4217180 secs] [Times:
>>> user=13.50 sys=0.00, real=3.42 secs]
>>> 4.674: [GC4.674: [ParNew: 290554K->17433K(306688K), 6.3406820
>>> secs] 820428K->550031K(4160256K), 6.3407410 secs] [Times:
>>> user=25.00 sys=0.02, real=6.34 secs]
>>> 11.062: [GC11.062: [ParNew: 290073K->17315K(306688K), 10.9493130
>>> secs] 822671K->552636K(4160256K), 10.9493890 secs] [Times:
>>> user=42.98 sys=0.03, real=10.95 secs]
>>> 22.059: [GC22.059: [ParNew: 289955K->17758K(306688K), 16.0539650
>>> secs] 825276K->555798K(4160256K), 16.0540250 secs] [Times:
>>> user=62.44 sys=0.09, real=16.05 secs]
>>> 38.161: [GC38.161: [ParNew: 290398K->17745K(306688K), 20.6900820
>>> secs] 828438K->558512K(4160256K), 20.6901420 secs] [Times:
>>> user=80.79 sys=0.08, real=20.69 secs]
>>> 58.898: [GC58.898: [ParNew: 290385K->21826K(306688K), 24.6781700
>>> secs] 831152K->565310K(4160256K), 24.6782710 secs] [Times:
>>> user=95.58 sys=0.05, real=24.68 secs]
>>> 83.625: [GC83.625: [ParNew: 294466K->21834K(306688K), 30.3501800
>>> secs] 837950K->568044K(4160256K), 30.3502610 secs] [Times:
>>> user=115.80 sys=0.11, real=30.35 secs]
>>> 114.022: [GC114.022: [ParNew: 294474K->21836K(306688K),
>>> 33.6648800 secs] 840684K->570772K(4160256K), 33.6649390 secs]
>>> [Times: user=127.38 sys=0.14, real=33.67 secs]
>>> 147.736: [GC147.736: [ParNew: 294476K->18454K(306688K),
>>> 38.9828430 secs] 843412K->570117K(4160256K), 38.9829040 secs]
>>> [Times: user=149.14 sys=0.15, real=38.99 secs]
>>> 186.766: [GC186.766: [ParNew: 291094K->26602K(306688K),
>>> 39.8726730 secs] 842757K->580977K(4160256K), 39.8727330 secs]
>>> [Times: user=156.35 sys=0.11, real=39.88 secs]
>>> 226.686: [GC226.686: [ParNew: 299242K->26612K(306688K),
>>> 0.0174470 secs] 853617K->583714K(4160256K), 0.0175300 secs]
>>> [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>
>>> 2) The suggestion to use only 1 thread is not acceptable to me.
>>> Use parallel GC with one thread? Further it did not work for me
>>> (in fact it made things worse). "user" time does now equal
>>> "real" time (and the new size is 1/4 as big) so I think the
>>> parameter took effect:
>>>
>>> $ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc
>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G -XX:ParallelGCThreads=1
>>> LongParNewPause 536870384 100 10
>>> 0.226: [GC (Allocation Failure) 0.226: [ParNew:
>>> 68160K->6980K(76672K), 0.0489756 secs]
>>> 592447K->531267K(4185792K), 0.0490718 secs] [Times: user=0.05
>>> sys=0.00, real=0.05 secs]
>>> 0.291: [GC (Allocation Failure) 0.291: [ParNew:
>>> 75140K->8512K(76672K), 29.8707281 secs]
>>> 599427K->539479K(4185792K), 29.8707954 secs] [Times: user=29.80
>>> sys=0.00, real=29.88 secs]
>>> 30.180: [GC (Allocation Failure) 30.180: [ParNew:
>>> 76672K->8093K(76672K), 30.6357699 secs]
>>> 607639K->545772K(4185792K), 30.6358257 secs] [Times: user=30.57
>>> sys=0.00, real=30.63 secs]
>>> 60.837: [GC (Allocation Failure) 60.837: [ParNew:
>>> 76253K->6830K(76672K), 69.2115255 secs]
>>> 613932K->552169K(4185792K), 69.2115783 secs] [Times: user=69.05
>>> sys=0.01, real=69.22 secs]
>>> 130.068: [GC (Allocation Failure) 130.068: [ParNew:
>>> 74990K->8226K(76672K), 132.5118903 secs]
>>> 620329K->560381K(4185792K), 132.5119459 secs] [Times:
>>> user=132.21 sys=0.00, real=132.52 secs]
>>> 262.603: [GC (Allocation Failure) 262.603: [ParNew:
>>> 76386K->8512K(76672K), 0.0400138 secs]
>>> 628541K->568748K(4185792K), 0.0400687 secs] [Times: user=0.04
>>> sys=0.00, real=0.04 secs]
>>>
>>> 3) Applications should not be expected to tune the GC settings
>>> to account for the size of one static object. The connection
>>> between the size of this object and the GC performance was not
>>> easy to even find. Further the parameter
>>> ParGCCardsPerStrideChunk is pretty low level and not one I think
>>> many application developers are familiar with.
>>>
>>> Error: VM option 'ParGCCardsPerStrideChunk' is diagnostic and
>>> must be enabled via -XX:+UnlockDiagnosticVMOptions.
>>> Error: Could not create the Java Virtual Machine.
>>> Error: A fatal exception has occurred. Program will exit.
>>>
>>> 4) I can not reproduce this issue in Oracle 6. This appears to
>>> be a regression. How can fixing a regression be called an
>>> enhancement request?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> This email and any attachments may contain confidential and
>>> privileged material for the sole use of the intended recipient.
>>> Any review, copying, or distribution of this email (or any
>>> attachments) by others is prohibited. If you are not the
>>> intended recipient, please contact the sender immediately and
>>> permanently delete this email and any attachments. No employee
>>> or agent of TiVo Inc. is authorized to conclude any binding
>>> agreement on behalf of TiVo Inc. by email. Binding agreements
>>> with TiVo Inc. may only be made by a signed written agreement.
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150512/19a6f8e3/attachment.htm>
More information about the hotspot-gc-dev
mailing list