<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Looks good.<br>
<br>
Would you consider<br>
<br>
PaddedArray::create_unfreeable()<br>
<br>
in place of<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<br>
PaddedArray::create_immortal()<br>
<br>
I think "unfreeable" communicates the comment in the code<br>
that "
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
The memory can't be deleted ..." When I see "unfreeable"<br>
it makes me a little uneasy (as it should).<br>
<br>
Jon<br>
<br>
<br>
<div class="moz-cite-prefix">On 8/13/13 4:38 AM, Stefan Karlsson
wrote:<br>
</div>
<blockquote cite="mid:520A1AD1.8050304@oracle.com" type="cite"><a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~stefank/8022880/webrev.00/">http://cr.openjdk.java.net/~stefank/8022880/webrev.00/</a>
<br>
<br>
We've seen a couple of instances of false sharing when accessing
fields from the beginning and the end of the PSPromotionManager
instances. This both decreases the performance of the Parallel
Scavenge young GC and makes it hard to do reliable GC benchmarks
on bigger machines.
<br>
<br>
This was first seen in:
<br>
<a class="moz-txt-link-freetext" href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7196911">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7196911</a> :
command line length affects performance
<br>
<br>
The patch makes sure that each PSPromotionManager starts at a
cache-line-aligned address and is padded to have a
cache-line-aligned size.
<br>
<br>
It doesn't use the exiting Padded<T> class, since it
(unnecessarily) wastes too much memory, but instead introduces a
PaddedEnd<T> class. This class only pads enough to get the
cache-line-aligned size and it's up to the user to align the start
of the instance. This works well in this specific case, where all
the PSPromotionManagers are together in an Array. A
PaddedArray<T> class was added to hide the memory layout
code.
<br>
<br>
Testing:
<br>
<br>
1) JPRT
<br>
<br>
2) SPECjbb2005 - 2 socket, 8 core, HT machine on JDK8-b57 + recent
HotSpot + the patch
<br>
<br>
Flags:
<br>
-showversion -Xmx29g -Xms29g -Xmn27g -XX:SurvivorRatio=60
-XX:TargetSurvivorRatio=90 -XX:ParallelGCThreads=16
-XX:AllocatePrefetchDistance=256 -XX:AllocatePrefetchLines=4
-XX:LoopUnrollLimit=45 -XX:InitialTenuringThreshold=12
-XX:MaxTenuringThreshold=15 -XX:InlineSmallCode=4300
-XX:MaxInlineSize=270 -XX:FreqInlineSize=2700 -XX:+AggressiveOpts
-XX:+UseParallelOldGC -XX:-UseAdaptiveSizePolicy -XX:+PrintGC
<br>
<br>
Young GC times without cache aligned PSPromotionManager (ms):
<br>
36.1608
<br>
36.0164
<br>
36.3001
<br>
36.0763
<br>
36.2086
<br>
35.8151
<br>
<br>
with cache aligned PSPromotionManager:
<br>
26.2168
<br>
26.9931
<br>
27.3672
<br>
26.5155
<br>
26.0182
<br>
26.8202
<br>
<br>
Extra thanks goes to Claes Redestad for helping out with
performance analysis and implementation-detail discussions.
<br>
<br>
thanks,
<br>
StefanK
<br>
</blockquote>
<br>
</body>
</html>