[9] RFR(M): 8029799: vm/mlvm/anonloader/stress/oome prints warning: CodeHeap: # of free blocks > 10000

Thu Feb 6 08:32:07 PST 2014

Hi,

I have done more experiments to see the impact of 
CodeCacheMinBlockLength and CodeCacheSegmentSize.
Both factors have an impact on the length of the freelist as well as on 
the memory that is possibly wasted.

The table below contains detailed results. Here is a description of the 
numbers and how they are
calculated:

* freelist length: number of HeapBlocks that are in the freelist when 
the program finishes
* freelist[kb]     : total memory [kB] that is in the freelist when the 
program finishes.
* unused bytes in cb: unused bytes in all CodeBlob that are in the code 
cache when the program
                         finishes. This number is calculated by 
substracting the size of the HeapBlock in
                         which the nmethod is stored from the nmethod 
size. Note that the HeapBlock size is
                         a multiple of CodeCacheMinBlockLength * 
CodeCacheSegmentSize.
* segmap[kB]: size of the segment map that is used to map addresses to 
HeapBlocks (i.e., find the
                         beginning of an nmethod). Increasing 
CodeCacheSegmentSize decreases the segmap
                         size. For example, a CodeCacheSegmentSize of 32 
bytes requires 32kB of segmap
                         memory per allocated MB in the code cache. A 
CodeCacheSegmentSize of 64 bytes
                         requires 16kB of segmap memory per allocated MB 
in the code cache....
max_used: maximum allocated memory in the code cache.
wasted_memory: =SUM(freelist + unused bytes in cb + segmap)
memory overhead = max_used / wasted_memory

The executive summary of the results is that increasing 
CodeCacheSegmentSize has no negative
impact on the memory overhead (also no positive). Increasing 
CodeCacheSegmentSize reduces
the freelist length, which makes searching the freelist faster.

Note that the results obtained with a modified freelist search 
algorithm. In the changed version,
the compiler chooses the first block that is large enough from the 
freelist (first-fit). In the old version,
the compiler looked for the smallest possible block in the freelist into 
which the code fits (best-fit).
My experiments indicate that best-fit does not provide better results 
(less memory overhead) than
first-fit.

To summarize, switching to a larger CodeCacheSegmentSize seems reasonable.

Here are the detailed results:

	failing test case 	

	4 Blocks, 64 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
3085 	2299 	902 	274 	16436 	
	3475 	21.14% 	
3993 	3366 	887 	283 	16959 	
	4536 	26.75% 	
3843 	2204 	900 	273 	16377 	
	3377 	20.62% 	
3859 	2260 	898 	273 	16382 	
	3431 	20.94% 	
3860 	2250 	897 	273 	16385 	
	3420 	20.87% 	

	22.07% 	

	4 Blocks, 128 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
474 	1020 	2073 	137 	17451 	
	3230 	18.51% 	
504 	1192 	2064 	136 	17413 	
	3392 	19.48% 	
484 	1188 	2064 	126 	17414 	
	3378 	19.40% 	
438 	1029 	2061 	136 	17399 	
	3226 	18.54% 	

	0 	18.98% 	

	Nashorn 	

	4 Blocks, 64 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
709 	1190 	662 	1198 	76118 	
	3050 	4.01% 	
688 	4200 	635 	1234 	78448 	
	6069 	7.74% 	
707 	2617 	648 	1178 	74343 	
	4443 	5.98% 	
685 	1703 	660 	1205 	76903 	
	3568 	4.64% 	
760 	1638 	675 	1174 	74563 	
	3487 	4.68% 	

	5.41% 	

	4 Blocks, 128 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
206 	824 	1253 	607 	77469 	
	2684 	3.46% 	
247 	2019 	1265 	583 	74017 	
	3867 	5.22% 	
239 	958 	1230 	641 	81588 	
	2829 	3.47% 	
226 	1477 	1246 	595 	76119 	
	3318 	4.36% 	
225 	2390 	1239 	596 	76051 	
	4225 	5.56% 	

	4.41% 	

	compiler.compiler 	

	4 Blocks, 64 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
440 	943 	263 	298 	18133 	
	1504 	8.29% 	
458 	480 	272 	295 	18443 	
	1047 	5.68% 	
536 	1278 	260 	306 	18776 	
	1844 	9.82% 	
426 	684 	268 	304 	18789 	
	1256 	6.68% 	
503 	1430 	258 	310 	18872 	
	1998 	10.59% 	

	8.21% 	Average

	4 Blocks, 128 bytes 	

freelist length 	freelist[kB] 	unused bytes in cb 	segmap[kB] 	max_used 	
	wasted 	memory overhead 	
163 	984 	510 	157 	19233 	
	1651 	8.58% 	
132 	729 	492 	151 	18614 	
	1372 	7.37% 	
187 	1212 	498 	152 	18630 	
	1862 	9.99% 	
198 	1268 	496 	155 	18974 	
	1919 	10.11% 	
225 	1268 	496 	152 	18679 	
	1916 	10.26% 	

	9.26% 	

On 02/05/2014 07:57 PM, Vladimir Kozlov wrote:
> On 2/5/14 8:28 AM, Albert wrote:
>> Hi Vladimir,
>>
>> thanks for looking at this. I've done the proposed measurements. The
>> code which I used to
>> get the data is included in the following webrev:
>>
>> http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
>
> Good.
>
>>
>> I think some people might be interested in getting that data, so we
>> might want to keep
>> that additional output. The exact output format can be changed later
>> (JDK-8005885).
>
> I agree that it is useful information.
>
>>
>> Here are the results:
>>
>> - failing test case:
>>     - original: allocated in freelist: 2168kB, unused bytes in CodeBlob:
>> 818kB,   max_used: 21983kB
>>     - patch   : alloacted in freelist: 1123kB, unused bytes in CodeBlob:
>> 2188kB, max_used: 17572kB
>> - nashorn:
>>    - original : allocated in freelist: 2426kB, unused bytes in CodeBlob:
>> 1769kB, max_used: 201886kB
>>    - patch    : allocated in freelist: 1150kB, unused bytes in CodeBlob:
>> 3458kB, max_used: 202394kB
>> - SPECJVM2008: compiler.compiler:
>>    - original  : allocated in freelist:  168kB, unused bytes in
>> CodeBlob: 342kB, max_used: 19837kB
>>    - patch     : allocated in freelist:  873kB, unused bytes in
>> CodeBlob: 671kB, max_used: 21184kB
>>
>> The minimum size that can be allocated from the code cache is
>> platform-dependent.
>> I.e., the minimum size depends on CodeCacheSegmentSize and
>> CodeCacheMinBlockLength.
>> On x86, for example, the min. allocatable size from the code cache is
>> 64*4=256bytes.
>
> There is this comment in CodeHeap::search_freelist():
>   // Don't leave anything on the freelist smaller than 
> CodeCacheMinBlockLength.
>
> What happens if we scale down CodeCacheMinBlockLength when we increase 
> CodeCacheSegmentSize to keep the same bytes size of minimum block?:
>
> +     FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize * 2);
> +     FLAG_SET_DEFAULT(CodeCacheMinBlockLength, 
> CodeCacheMinBlockLength/2);
>
> Based on your table below those small nmethods will use only 256 bytes 
> blocks instead of 512 (128*4).
>
> Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't know 
> why for C2 it is 4. Could you also try CodeCacheMinBlockLength = 1?
>
> All above is with CodeCacheSegmentSize 128 bytes.
>
>> The size of adapters ranges from 400b to 600b.
>> Here is the beginning of the nmethod size distribution of the failing
>> test case:
>>
>
> Is it possible it is in segments number and not in bytes? If it really 
> bytes what such (32-48 bytes) nmethods look like?
>
> Thanks,
> Vladimir
>
>>
>> nmethod size distribution (non-zombie java)
>> -------------------------------------------------
>> 0-16 bytes                                0[bytes]
>> 16-32 bytes                                0
>> 32-48 bytes                                45
>> 48-64 bytes                                0
>> 64-80 bytes                                41
>> 80-96 bytes                                0
>> 96-112 bytes                               6247
>> 112-128 bytes                               0
>> 128-144 bytes                               249
>> 144-160 bytes                               0
>> 160-176 bytes                               139
>> 176-192 bytes                               0
>> 192-208 bytes                               177
>> 208-224 bytes                               0
>> 224-240 bytes                               180
>> 240-256 bytes                               0
>> ...
>>
>>
>> I do not see a problem for increasing the CodeCacheSegmentSize if tiered
>> compilation
>> is enabled.
>>
>> What do you think?
>>
>>
>> Best,
>> Albert
>>
>>
>> On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
>>> I think the suggestion is reasonable since we increase CodeCache *5
>>> for Tiered.
>>> Albert, is it possible to collect data how much space is wasted in %
>>> before and after this change: free space in which we can't allocate +
>>> unused bytes at the end of nmethods/adapters? Can we squeeze an
>>> adapter into 64 bytes?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/4/14 7:41 AM, Albert wrote:
>>>> Hi,
>>>>
>>>> could I get reviews for this patch (nightly failure)?
>>>>
>>>> webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>
>>>> problem: The freelist of the code cache exceeds 10'000 items, which
>>>> results in a VM warning.
>>>>                  The problem behind the warning is that the freelist
>>>> is populated by a large number
>>>>                  of small free blocks. For example, in failing test
>>>> case (see header), the freelist grows
>>>>                  up to more than 3500 items where the largest item on
>>>> the list is 9 segments (one segment
>>>>                  is 64 bytes). That experiment was done on my laptop.
>>>> Such a large freelist can indeed be
>>>>                  a performance problem, since we use a linear search
>>>> to traverse the freelist.
>>>> solution:  One way to solve the problem is to increase the minimal
>>>> allocation size in the code cache.
>>>>                  This can be done by two means: we can increase
>>>> CodeCacheMinBlockLength and/or
>>>>                  CodeCacheSegmentSize. This patch follows the latter
>>>> approach, since increasing
>>>>                  CodeCacheSegmentSize decreases the size that is
>>>> required by the segment map. More
>>>>                  concretely, the patch doubles the
>>>> CodeCacheSegmentSize from 64 byte to 128 bytes
>>>>                  if tiered compilation is enabled.
>>>>                  The patch also contains an optimization in the
>>>> freelist search (stop searching if we found
>>>>                  the appropriate size) and contains some code 
>>>> cleanups.
>>>> testing:    With the proposed change, the size of the freelist is
>>>> reduced to 200 items. There is only
>>>>                  a slight increase in memory required by code cache
>>>> by at most 3% (all data measured
>>>>                  for the failing test case on a Linux 64-bit system,
>>>> 4 cores).
>>>>                  To summarize, increasing the minimum allocation size
>>>> in the code cache results in
>>>>                  potentially more unused memory in the code cache due
>>>> to unused bits at the end of
>>>>                  an nmethod. The advantage is that we potentially
>>>> have less fragmentation.
>>>>
>>>> proposal: - I think we could remove CodeCacheMinBlockLength without
>>>> loss of generality or usability
>>>>                    and instead adapt the parameter
>>>> CodeCacheSegmentSize at Vm startup.
>>>>                    Any opinions?
>>>>
>>>> Many thanks in advance,
>>>> Albert
>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140206/38cac0c3/attachment-0001.html