Strange Long ParNew GC Pauses (Sample Code Included)

Jon Masamitsu jon.masamitsu at oracle.com
Tue May 12 20:50:29 UTC 2015


Ramki,

The logarithmic BOT is taking big steps back to
find the start of a large object.  But something is odd.

Is there a reason why the BOT entries would decrease for a while
and then increase without a value that gives the start of an object?

&_array->_offset_array[index] = 0xffffffff6550f600 "DDDDDDDDDDDDDD ..."

and then down the road.

  print &_array->_offset_array[0x110600U], _array->_offset_array[0x10f600U >
&_array->_offset_array[0x110600U] = 0xffffffff65510600 
"CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDD

'D' is 0x44

N_words is 0x40

Jon

On 05/12/2015 08:46 AM, Srinivas Ramakrishna wrote:
> Thanks for that background Jon; I'll try and look at the 8uxx code and 
> run James's example when I get a few spare cycles (which won't be 
> soon, unfortunately).
>
> -- ramki
>
> ysr1729
>
> On May 8, 2015, at 14:25, Jon Masamitsu <jon.masamitsu at oracle.com 
> <mailto:jon.masamitsu at oracle.com>> wrote:
>
>> Ramki,
>>
>> I compared the memory/blockOffsetTable.* files between 6u35 and 7 and 
>> there
>> were essentially no differences.  The .cpp file had some changes in 
>> assertions.
>> The other two were the same except for comments.
>>
>> Between 6 and 6u35 (didn't pin point it), we changed from the rather
>> weak N_powers=3  :-) to N_powers=14.
>>
>> There seem to be two problematic cases.
>>
>> 1)  Spike in ParNew times that lasts for 2 (or a few) GC's
>> 2) Continually growing ParNew times
>>
>> From just reading the CR
>>
>> https://bugs.openjdk.java.net/browse/JDK-8079274
>>
>> the test case seems to reproduce 1).  The test case
>> has a large object that lives till the application exits.
>> If it were a broken logarithmic BOT, I would not expect
>> the ParNew pauses to shrink again.
>>
>> Jon
>>
>>
>>
>> On 5/8/2015 11:49 AM, Srinivas Ramakrishna wrote:
>>> This is indeed interesting...
>>>
>>> Jesper, did you figure out why (as James seems to note) the problem 
>>> is seen in JDK 8 but not in JDK 6 (he's using ParNew in both cases). 
>>> Also, isn't the BOT logarithmic, which will allow threads to 
>>> "quickly" traverse back to the start of even a large object? Did 
>>> something change in the BOT that might have led to what James sees 
>>> as a regression, or did the range that he characterizes as a "blind 
>>> spot" merely move from one place to another?
>>>
>>> Unfortunately, I have not been keeping up with the changes in this 
>>> area and am not working with JDK 6 code much, so can't offer any 
>>> clues, but am surprised that this would be a regression. (I would 
>>> expect the same performance pathologies in this code to exist in 6 
>>> and forward -- has it been established conclusively that this is 
>>> indeed a regression?)
>>>
>>> thanks!
>>> -- ramki
>>>
>>> On Fri, May 8, 2015 at 6:40 AM, James Lent <jlent at tivo.com 
>>> <mailto:jlent at tivo.com>> wrote:
>>>
>>>     The explanation of the problem added by Jesper Wilhelmsson on
>>>     2015-05-06 08:59 to JDK-8079274 is interesting, but, not very
>>>     satisfying.
>>>
>>>     1) The problem affects more than this one specific breakpoint. 
>>>     It impacts objects in at least the range of about 480M to 512M. 
>>>     The comment "A larger object probably triggers some higher level
>>>     code that are able to skip scanning of this part of the memory"
>>>     does not inspire confidence. It makes even more concerned that
>>>     we do not understand this issue and that there could be other
>>>     triggers.  It seems like "the higher level code" has a 32M blind
>>>     spot. Further I can create scenarios where the GC code takes
>>>     progressively longer to clear the issue. Presumably it completes
>>>     the lower block scan during each GC cycle.  Why does it take
>>>     longer and longer each time (my guess is it has something to do
>>>     with the number of objects moved to survivor space).  If I cache
>>>     1 in 100 instead of 1 in 10 then there are 11 progressively
>>>     slower ParNewGC (after several fast one) and the issue takes
>>>     minutes to clear:
>>>
>>>     $ /usr/lib/jvm/java-7-oracle/bin/java -verbose:gc
>>>     -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>>     -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870384
>>>     100 100
>>>     0.308: [GC0.308: [ParNew: 272640K->3029K(306688K), 0.0113180
>>>     secs] 796927K->527316K(4160256K), 0.0114040 secs] [Times:
>>>     user=0.05 sys=0.00, real=0.02 secs]
>>>     0.367: [GC0.367: [ParNew: 275669K->7766K(306688K), 0.0140840
>>>     secs] 799956K->532053K(4160256K), 0.0141630 secs] [Times:
>>>     user=0.04 sys=0.01, real=0.02 secs]
>>>     0.430: [GC0.430: [ParNew: 280406K->10057K(306688K), 0.0120810
>>>     secs] 804693K->534344K(4160256K), 0.0121690 secs] [Times:
>>>     user=0.05 sys=0.00, real=0.02 secs]
>>>     0.490: [GC0.490: [ParNew: 282697K->15383K(306688K), 0.0128540
>>>     secs] 806984K->539670K(4160256K), 0.0129390 secs] [Times:
>>>     user=0.05 sys=0.00, real=0.02 secs]
>>>     0.551: [GC0.551: [ParNew: 288023K->21006K(306688K), 0.0146480
>>>     secs] 812310K->545294K(4160256K), 0.0147650 secs] [Times:
>>>     user=0.05 sys=0.01, real=0.01 secs]
>>>     0.613: [GC0.613: [ParNew: 293646K->26805K(306688K), 0.0141250
>>>     secs] 817934K->551092K(4160256K), 0.0142050 secs] [Times:
>>>     user=0.05 sys=0.00, real=0.01 secs]
>>>     0.675: [GC0.675: [ParNew: 299445K->24258K(306688K), 0.4816840
>>>     secs] 823732K->551436K(4160256K), 0.4817620 secs] [Times:
>>>     user=1.90 sys=0.00, real=0.48 secs]
>>>     1.205: [GC1.205: [ParNew: 296898K->17914K(306688K), 3.4216550
>>>     secs] 824076K->547788K(4160256K), 3.4217180 secs] [Times:
>>>     user=13.50 sys=0.00, real=3.42 secs]
>>>     4.674: [GC4.674: [ParNew: 290554K->17433K(306688K), 6.3406820
>>>     secs] 820428K->550031K(4160256K), 6.3407410 secs] [Times:
>>>     user=25.00 sys=0.02, real=6.34 secs]
>>>     11.062: [GC11.062: [ParNew: 290073K->17315K(306688K), 10.9493130
>>>     secs] 822671K->552636K(4160256K), 10.9493890 secs] [Times:
>>>     user=42.98 sys=0.03, real=10.95 secs]
>>>     22.059: [GC22.059: [ParNew: 289955K->17758K(306688K), 16.0539650
>>>     secs] 825276K->555798K(4160256K), 16.0540250 secs] [Times:
>>>     user=62.44 sys=0.09, real=16.05 secs]
>>>     38.161: [GC38.161: [ParNew: 290398K->17745K(306688K), 20.6900820
>>>     secs] 828438K->558512K(4160256K), 20.6901420 secs] [Times:
>>>     user=80.79 sys=0.08, real=20.69 secs]
>>>     58.898: [GC58.898: [ParNew: 290385K->21826K(306688K), 24.6781700
>>>     secs] 831152K->565310K(4160256K), 24.6782710 secs] [Times:
>>>     user=95.58 sys=0.05, real=24.68 secs]
>>>     83.625: [GC83.625: [ParNew: 294466K->21834K(306688K), 30.3501800
>>>     secs] 837950K->568044K(4160256K), 30.3502610 secs] [Times:
>>>     user=115.80 sys=0.11, real=30.35 secs]
>>>     114.022: [GC114.022: [ParNew: 294474K->21836K(306688K),
>>>     33.6648800 secs] 840684K->570772K(4160256K), 33.6649390 secs]
>>>     [Times: user=127.38 sys=0.14, real=33.67 secs]
>>>     147.736: [GC147.736: [ParNew: 294476K->18454K(306688K),
>>>     38.9828430 secs] 843412K->570117K(4160256K), 38.9829040 secs]
>>>     [Times: user=149.14 sys=0.15, real=38.99 secs]
>>>     186.766: [GC186.766: [ParNew: 291094K->26602K(306688K),
>>>     39.8726730 secs] 842757K->580977K(4160256K), 39.8727330 secs]
>>>     [Times: user=156.35 sys=0.11, real=39.88 secs]
>>>     226.686: [GC226.686: [ParNew: 299242K->26612K(306688K),
>>>     0.0174470 secs] 853617K->583714K(4160256K), 0.0175300 secs]
>>>     [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>
>>>     2) The suggestion to use only 1 thread is not acceptable to me. 
>>>     Use parallel GC with one thread? Further it did not work for me
>>>     (in fact it made things worse).  "user" time does now equal
>>>     "real" time (and the new size is 1/4 as big) so I think the
>>>     parameter took effect:
>>>
>>>     $ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc
>>>     -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>>     -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G -XX:ParallelGCThreads=1
>>>     LongParNewPause 536870384 100 10
>>>     0.226: [GC (Allocation Failure) 0.226: [ParNew:
>>>     68160K->6980K(76672K), 0.0489756 secs]
>>>     592447K->531267K(4185792K), 0.0490718 secs] [Times: user=0.05
>>>     sys=0.00, real=0.05 secs]
>>>     0.291: [GC (Allocation Failure) 0.291: [ParNew:
>>>     75140K->8512K(76672K), 29.8707281 secs]
>>>     599427K->539479K(4185792K), 29.8707954 secs] [Times: user=29.80
>>>     sys=0.00, real=29.88 secs]
>>>     30.180: [GC (Allocation Failure) 30.180: [ParNew:
>>>     76672K->8093K(76672K), 30.6357699 secs]
>>>     607639K->545772K(4185792K), 30.6358257 secs] [Times: user=30.57
>>>     sys=0.00, real=30.63 secs]
>>>     60.837: [GC (Allocation Failure) 60.837: [ParNew:
>>>     76253K->6830K(76672K), 69.2115255 secs]
>>>     613932K->552169K(4185792K), 69.2115783 secs] [Times: user=69.05
>>>     sys=0.01, real=69.22 secs]
>>>     130.068: [GC (Allocation Failure) 130.068: [ParNew:
>>>     74990K->8226K(76672K), 132.5118903 secs]
>>>     620329K->560381K(4185792K), 132.5119459 secs] [Times:
>>>     user=132.21 sys=0.00, real=132.52 secs]
>>>     262.603: [GC (Allocation Failure) 262.603: [ParNew:
>>>     76386K->8512K(76672K), 0.0400138 secs]
>>>     628541K->568748K(4185792K), 0.0400687 secs] [Times: user=0.04
>>>     sys=0.00, real=0.04 secs]
>>>
>>>     3) Applications should not be expected to tune the GC settings
>>>     to account for the size of one static object.  The connection
>>>     between the size of this object and the GC performance was not
>>>     easy to even find.  Further the parameter
>>>     ParGCCardsPerStrideChunk is pretty low level and not one I think
>>>     many application developers are familiar with.
>>>
>>>     Error: VM option 'ParGCCardsPerStrideChunk' is diagnostic and
>>>     must be enabled via -XX:+UnlockDiagnosticVMOptions.
>>>     Error: Could not create the Java Virtual Machine.
>>>     Error: A fatal exception has occurred. Program will exit.
>>>
>>>     4) I can not reproduce this issue in Oracle 6.  This appears to
>>>     be a regression.  How can fixing a regression be called an
>>>     enhancement request?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     ------------------------------------------------------------------------
>>>
>>>     This email and any attachments may contain confidential and
>>>     privileged material for the sole use of the intended recipient.
>>>     Any review, copying, or distribution of this email (or any
>>>     attachments) by others is prohibited. If you are not the
>>>     intended recipient, please contact the sender immediately and
>>>     permanently delete this email and any attachments. No employee
>>>     or agent of TiVo Inc. is authorized to conclude any binding
>>>     agreement on behalf of TiVo Inc. by email. Binding agreements
>>>     with TiVo Inc. may only be made by a signed written agreement.
>>>
>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150512/19a6f8e3/attachment.htm>


More information about the hotspot-gc-dev mailing list