[9] RFR(M): 8029799: vm/mlvm/anonloader/stress/oome prints warning: CodeHeap: # of free blocks > 10000
Albert Noll
albert.noll at oracle.com
Thu Feb 6 08:45:05 PST 2014
My previous mail contains an error. The size of a HeapBlock must be a multiple of CodeCacheSegmentSize and at least CodeCacheSegmentSize * CodeCacheMinBlockLength.
Albert
Von meinem iPhone gesendet
> Am 06.02.2014 um 17:32 schrieb Albert <albert.noll at oracle.com>:
>
> Hi,
>
> I have done more experiments to see the impact of CodeCacheMinBlockLength and CodeCacheSegmentSize.
> Both factors have an impact on the length of the freelist as well as on the memory that is possibly wasted.
>
> The table below contains detailed results. Here is a description of the numbers and how they are
> calculated:
>
> * freelist length: number of HeapBlocks that are in the freelist when the program finishes
> * freelist[kb] : total memory [kB] that is in the freelist when the program finishes.
> * unused bytes in cb: unused bytes in all CodeBlob that are in the code cache when the program
> finishes. This number is calculated by substracting the size of the HeapBlock in
> which the nmethod is stored from the nmethod size. Note that the HeapBlock size is
> a multiple of CodeCacheMinBlockLength * CodeCacheSegmentSize.
> * segmap[kB]: size of the segment map that is used to map addresses to HeapBlocks (i.e., find the
> beginning of an nmethod). Increasing CodeCacheSegmentSize decreases the segmap
> size. For example, a CodeCacheSegmentSize of 32 bytes requires 32kB of segmap
> memory per allocated MB in the code cache. A CodeCacheSegmentSize of 64 bytes
> requires 16kB of segmap memory per allocated MB in the code cache....
> max_used: maximum allocated memory in the code cache.
> wasted_memory: =SUM(freelist + unused bytes in cb + segmap)
> memory overhead = max_used / wasted_memory
>
> The executive summary of the results is that increasing CodeCacheSegmentSize has no negative
> impact on the memory overhead (also no positive). Increasing CodeCacheSegmentSize reduces
> the freelist length, which makes searching the freelist faster.
>
> Note that the results obtained with a modified freelist search algorithm. In the changed version,
> the compiler chooses the first block that is large enough from the freelist (first-fit). In the old version,
> the compiler looked for the smallest possible block in the freelist into which the code fits (best-fit).
> My experiments indicate that best-fit does not provide better results (less memory overhead) than
> first-fit.
>
> To summarize, switching to a larger CodeCacheSegmentSize seems reasonable.
>
>
> Here are the detailed results:
>
>
>
> failing test case
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 64 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 3085 2299 902 274 16436
> 3475 21.14%
> 3993 3366 887 283 16959
> 4536 26.75%
> 3843 2204 900 273 16377
> 3377 20.62%
> 3859 2260 898 273 16382
> 3431 20.94%
> 3860 2250 897 273 16385
> 3420 20.87%
>
>
>
>
>
>
>
> 22.07%
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 128 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 474 1020 2073 137 17451
> 3230 18.51%
> 504 1192 2064 136 17413
> 3392 19.48%
> 484 1188 2064 126 17414
> 3378 19.40%
> 438 1029 2061 136 17399
> 3226 18.54%
>
>
>
>
>
>
> 0 18.98%
>
>
>
>
>
>
>
>
>
>
>
> Nashorn
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 64 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 709 1190 662 1198 76118
> 3050 4.01%
> 688 4200 635 1234 78448
> 6069 7.74%
> 707 2617 648 1178 74343
> 4443 5.98%
> 685 1703 660 1205 76903
> 3568 4.64%
> 760 1638 675 1174 74563
> 3487 4.68%
>
>
>
>
>
>
>
> 5.41%
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 128 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 206 824 1253 607 77469
> 2684 3.46%
> 247 2019 1265 583 74017
> 3867 5.22%
> 239 958 1230 641 81588
> 2829 3.47%
> 226 1477 1246 595 76119
> 3318 4.36%
> 225 2390 1239 596 76051
> 4225 5.56%
>
>
>
>
>
>
>
> 4.41%
>
>
> compiler.compiler
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 64 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 440 943 263 298 18133
> 1504 8.29%
> 458 480 272 295 18443
> 1047 5.68%
> 536 1278 260 306 18776
> 1844 9.82%
> 426 684 268 304 18789
> 1256 6.68%
> 503 1430 258 310 18872
> 1998 10.59%
>
>
>
>
>
>
>
> 8.21% Average
>
>
>
>
>
>
>
>
>
>
> 4 Blocks, 128 bytes
>
>
>
>
>
>
> freelist length freelist[kB] unused bytes in cb segmap[kB] max_used
> wasted memory overhead
> 163 984 510 157 19233
> 1651 8.58%
> 132 729 492 151 18614
> 1372 7.37%
> 187 1212 498 152 18630
> 1862 9.99%
> 198 1268 496 155 18974
> 1919 10.11%
> 225 1268 496 152 18679
> 1916 10.26%
>
>
>
>
>
>
>
> 9.26%
>
>
>
>
>
>
>
>
>
>> On 02/05/2014 07:57 PM, Vladimir Kozlov wrote:
>>> On 2/5/14 8:28 AM, Albert wrote:
>>> Hi Vladimir,
>>>
>>> thanks for looking at this. I've done the proposed measurements. The
>>> code which I used to
>>> get the data is included in the following webrev:
>>>
>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
>>
>> Good.
>>
>>>
>>> I think some people might be interested in getting that data, so we
>>> might want to keep
>>> that additional output. The exact output format can be changed later
>>> (JDK-8005885).
>>
>> I agree that it is useful information.
>>
>>>
>>> Here are the results:
>>>
>>> - failing test case:
>>> - original: allocated in freelist: 2168kB, unused bytes in CodeBlob:
>>> 818kB, max_used: 21983kB
>>> - patch : alloacted in freelist: 1123kB, unused bytes in CodeBlob:
>>> 2188kB, max_used: 17572kB
>>> - nashorn:
>>> - original : allocated in freelist: 2426kB, unused bytes in CodeBlob:
>>> 1769kB, max_used: 201886kB
>>> - patch : allocated in freelist: 1150kB, unused bytes in CodeBlob:
>>> 3458kB, max_used: 202394kB
>>> - SPECJVM2008: compiler.compiler:
>>> - original : allocated in freelist: 168kB, unused bytes in
>>> CodeBlob: 342kB, max_used: 19837kB
>>> - patch : allocated in freelist: 873kB, unused bytes in
>>> CodeBlob: 671kB, max_used: 21184kB
>>>
>>> The minimum size that can be allocated from the code cache is
>>> platform-dependent.
>>> I.e., the minimum size depends on CodeCacheSegmentSize and
>>> CodeCacheMinBlockLength.
>>> On x86, for example, the min. allocatable size from the code cache is
>>> 64*4=256bytes.
>>
>> There is this comment in CodeHeap::search_freelist():
>> // Don't leave anything on the freelist smaller than CodeCacheMinBlockLength.
>>
>> What happens if we scale down CodeCacheMinBlockLength when we increase CodeCacheSegmentSize to keep the same bytes size of minimum block?:
>>
>> + FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize * 2);
>> + FLAG_SET_DEFAULT(CodeCacheMinBlockLength, CodeCacheMinBlockLength/2);
>>
>> Based on your table below those small nmethods will use only 256 bytes blocks instead of 512 (128*4).
>>
>> Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't know why for C2 it is 4. Could you also try CodeCacheMinBlockLength = 1?
>>
>> All above is with CodeCacheSegmentSize 128 bytes.
>>
>>> The size of adapters ranges from 400b to 600b.
>>> Here is the beginning of the nmethod size distribution of the failing
>>> test case:
>>
>> Is it possible it is in segments number and not in bytes? If it really bytes what such (32-48 bytes) nmethods look like?
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> nmethod size distribution (non-zombie java)
>>> -------------------------------------------------
>>> 0-16 bytes 0[bytes]
>>> 16-32 bytes 0
>>> 32-48 bytes 45
>>> 48-64 bytes 0
>>> 64-80 bytes 41
>>> 80-96 bytes 0
>>> 96-112 bytes 6247
>>> 112-128 bytes 0
>>> 128-144 bytes 249
>>> 144-160 bytes 0
>>> 160-176 bytes 139
>>> 176-192 bytes 0
>>> 192-208 bytes 177
>>> 208-224 bytes 0
>>> 224-240 bytes 180
>>> 240-256 bytes 0
>>> ...
>>>
>>>
>>> I do not see a problem for increasing the CodeCacheSegmentSize if tiered
>>> compilation
>>> is enabled.
>>>
>>> What do you think?
>>>
>>>
>>> Best,
>>> Albert
>>>
>>>
>>>> On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
>>>> I think the suggestion is reasonable since we increase CodeCache *5
>>>> for Tiered.
>>>> Albert, is it possible to collect data how much space is wasted in %
>>>> before and after this change: free space in which we can't allocate +
>>>> unused bytes at the end of nmethods/adapters? Can we squeeze an
>>>> adapter into 64 bytes?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>> On 2/4/14 7:41 AM, Albert wrote:
>>>>> Hi,
>>>>>
>>>>> could I get reviews for this patch (nightly failure)?
>>>>>
>>>>> webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>>
>>>>> problem: The freelist of the code cache exceeds 10'000 items, which
>>>>> results in a VM warning.
>>>>> The problem behind the warning is that the freelist
>>>>> is populated by a large number
>>>>> of small free blocks. For example, in failing test
>>>>> case (see header), the freelist grows
>>>>> up to more than 3500 items where the largest item on
>>>>> the list is 9 segments (one segment
>>>>> is 64 bytes). That experiment was done on my laptop.
>>>>> Such a large freelist can indeed be
>>>>> a performance problem, since we use a linear search
>>>>> to traverse the freelist.
>>>>> solution: One way to solve the problem is to increase the minimal
>>>>> allocation size in the code cache.
>>>>> This can be done by two means: we can increase
>>>>> CodeCacheMinBlockLength and/or
>>>>> CodeCacheSegmentSize. This patch follows the latter
>>>>> approach, since increasing
>>>>> CodeCacheSegmentSize decreases the size that is
>>>>> required by the segment map. More
>>>>> concretely, the patch doubles the
>>>>> CodeCacheSegmentSize from 64 byte to 128 bytes
>>>>> if tiered compilation is enabled.
>>>>> The patch also contains an optimization in the
>>>>> freelist search (stop searching if we found
>>>>> the appropriate size) and contains some code cleanups.
>>>>> testing: With the proposed change, the size of the freelist is
>>>>> reduced to 200 items. There is only
>>>>> a slight increase in memory required by code cache
>>>>> by at most 3% (all data measured
>>>>> for the failing test case on a Linux 64-bit system,
>>>>> 4 cores).
>>>>> To summarize, increasing the minimum allocation size
>>>>> in the code cache results in
>>>>> potentially more unused memory in the code cache due
>>>>> to unused bits at the end of
>>>>> an nmethod. The advantage is that we potentially
>>>>> have less fragmentation.
>>>>>
>>>>> proposal: - I think we could remove CodeCacheMinBlockLength without
>>>>> loss of generality or usability
>>>>> and instead adapt the parameter
>>>>> CodeCacheSegmentSize at Vm startup.
>>>>> Any opinions?
>>>>>
>>>>> Many thanks in advance,
>>>>> Albert
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140206/54edc3a6/attachment-0001.html
More information about the hotspot-compiler-dev
mailing list