[9] RFR(M): 8029799: vm/mlvm/anonloader/stress/oome prints warning: CodeHeap: # of free blocks > 10000

Christian Thalinger christian.thalinger at oracle.com
Sat Feb 8 10:24:05 PST 2014


On Feb 7, 2014, at 11:34 AM, Albert <albert.noll at oracle.com> wrote:

> Hi Vladimir,
> 
> thanks for the feedback. Please see comments inline:
> 
> On 02/07/2014 07:49 PM, Vladimir Kozlov wrote:
>> Albert,
>> 
>> Yes, please, file RFE to rework this code after segmented code cache is integrated. I agree that we can set sizes per segment.
>> 
> Ok, I will do that.
>> In new output you are mixing %dkB and %dK. Choose one.
>> 
> Done.

Almost:

+   tty->print_cr("Allocated in freelist:          %dkB", bytes_allocated_in_freelist()/K);
+   tty->print_cr("Unused bytes in CodeBlobs:      %dKB",  wasted_bytes/K);
+   tty->print_cr("Segment map size:               %dKB",  allocated_segments()/K); // 1 byte per segment

Use kB.

>> Next comment is misleading. It looks like it is ordered by size but it is ordered by address so it needs to say that:
>> 
>>   // Since the freelist is ordered (smaller->larger) and the element we want to insert
>> 399   // into the freelist is smaller than the first element, we can simply add 'b' as the
>> 400   // first element and we are done.
>> 
> Done.
>> I am not sure about changes in search_freelist. You may reduce opportunity to find block in free list for huge methods. Can you not do that now? You reduced size of table so searching should not be big problem now.
>> 
> It seems not really clear what strategy is best. When the current approach, we will end up having small items in the beginning of the freelist
> and larger items towards the end. I ran experiments with the failing test case and there was no difference in the freelist length (best-fit vs first fit).
> Especially with tiered, we should have smaller items in the beginning and larger items in the end, since we first compile C1 methods.
> I can leave it as it as, do more experiments, or change it back as it was. I would leave it as it is, but I have no problem with changing back.
> 
> Here is the new webrev:
> http://cr.openjdk.java.net/~anoll/8029799/webrev.03/

src/share/vm/runtime/globals.hpp:

!   notproduct(bool, VerifyCodeCache, false,                                  \
!           "Verify code code cache on memory allocation/deallocation")       \

Typo: “code code”

!   develop(uintx, CodeCacheSegmentSize, 64 PPC64_ONLY(+64) NOT_PPC64(TIERED_ONLY(+64)),\

I wonder if CodeCacheSegmentSize should be a platform dependent flag...

Otherwise this looks good.

> 
> Best,
> Albert
> 
>> Thanks,
>> Vladimir
>> 
>> On 2/7/14 8:06 AM, Albert wrote:
>>> Vladimir, Chris, thanks for looking at this.
>>> 
>>> The measurement results are attached to the bug.
>>> https://bugs.openjdk.java.net/browse/JDK-8029799
>>> 
>>> I've tried various settings and the current configuration seems to
>>> be good for non-tiered compilation. In the current settings, the minimum
>>> size that can be allocated from the code cache is 64 bytes for C1 and
>>> 256 bytes for C2. This is fine, since C1-compiled code is typically smaller
>>> than C2-compiled code. The tradeoff we are facing here is that smaller
>>> sizes can lead to more fragmentation (a lot of small chunks are on the
>>> freelist), however, the memory wasted at the end of a method is smaller.
>>> 
>>> When tiered compilation is enabled, we have C1 and C2 methods stored
>>> in the same code heap, so we have to decide for one minimum-allocatable
>>> size.
>>> The current implementation chooses the C2 setting. Since we compile C1
>>> methods
>>> when the application starts, we allocate small memory units that might
>>> be too
>>> small to fit a C2 version of the method. This is why the freelist can
>>> grow to > 10.000
>>> items.
>>> 
>>> The current patch increases the minimum-allocatable size ONLY when tiered
>>> compilation is enabled to 512 bytes. This leads to less memory overhead,
>>> since
>>> more memory units that are initially used for C1 methods can later be
>>> used by
>>> C2 methods.
>>> 
>>> I think that this problem can be solved by segmented code cache that we
>>> plan to
>>> integrate into 9. If we have multiple code heaps, individual code heaps
>>> can use
>>> different values for CodeCacheSegmentSize and CodeCacheMinBlockLength.
>>> I think we should look at this, since a memory overhead of > 20%, as in the
>>> failing test case, seems unreasonably large. Shall I file an RFE that
>>> suggests to
>>> look into this, once the segmented code cache patch is integrated?
>>> 
>>> For now, I think, there is not much more we can do.
>>> 
>>> Concerning the small method sizes:
>>> The size that is provided by PrintCodeCache2 is the instruction size
>>> (nm->insts_size)
>>> and not the size of the nmethod. I changed that in this patch, since the
>>> the output
>>> suggests something different: ("nmethod size distribution")
>>> 
>>> Here is the link to the webrev:
>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.02/
>>> 
>>> Best,
>>> Albert
>>> 
>>> 
>>> On 02/06/2014 10:29 PM, Christian Thalinger wrote:
>>>> On Feb 5, 2014, at 10:57 AM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>> 
>>>>> On 2/5/14 8:28 AM, Albert wrote:
>>>>>> Hi Vladimir,
>>>>>> 
>>>>>> thanks for looking at this. I've done the proposed measurements. The
>>>>>> code which I used to
>>>>>> get the data is included in the following webrev:
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
>>>>> Good.
>>>>> 
>>>>>> I think some people might be interested in getting that data, so we
>>>>>> might want to keep
>>>>>> that additional output. The exact output format can be changed later
>>>>>> (JDK-8005885).
>>>>> I agree that it is useful information.
>>>>> 
>>>>>> Here are the results:
>>>>>> 
>>>>>> - failing test case:
>>>>>>    - original: allocated in freelist: 2168kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 818kB,   max_used: 21983kB
>>>>>>    - patch   : alloacted in freelist: 1123kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 2188kB, max_used: 17572kB
>>>>>> - nashorn:
>>>>>>   - original : allocated in freelist: 2426kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 1769kB, max_used: 201886kB
>>>>>>   - patch    : allocated in freelist: 1150kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 3458kB, max_used: 202394kB
>>>>>> - SPECJVM2008: compiler.compiler:
>>>>>>   - original  : allocated in freelist:  168kB, unused bytes in
>>>>>> CodeBlob: 342kB, max_used: 19837kB
>>>>>>   - patch     : allocated in freelist:  873kB, unused bytes in
>>>>>> CodeBlob: 671kB, max_used: 21184kB
>>>>>> 
>>>>>> The minimum size that can be allocated from the code cache is
>>>>>> platform-dependent.
>>>>>> I.e., the minimum size depends on CodeCacheSegmentSize and
>>>>>> CodeCacheMinBlockLength.
>>>>>> On x86, for example, the min. allocatable size from the code cache is
>>>>>> 64*4=256bytes.
>>>>> There is this comment in CodeHeap::search_freelist():
>>>>>  // Don't leave anything on the freelist smaller than
>>>>> CodeCacheMinBlockLength.
>>>>> 
>>>>> What happens if we scale down CodeCacheMinBlockLength when we
>>>>> increase CodeCacheSegmentSize to keep the same bytes size of minimum
>>>>> block?:
>>>>> 
>>>>> +     FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize * 2);
>>>>> +     FLAG_SET_DEFAULT(CodeCacheMinBlockLength,
>>>>> CodeCacheMinBlockLength/2);
>>>>> 
>>>>> Based on your table below those small nmethods will use only 256
>>>>> bytes blocks instead of 512 (128*4).
>>>>> 
>>>>> Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't know
>>>>> why for C2 it is 4. Could you also try CodeCacheMinBlockLength = 1?
>>>>> 
>>>>> All above is with CodeCacheSegmentSize 128 bytes.
>>>>> 
>>>>>> The size of adapters ranges from 400b to 600b.
>>>>>> Here is the beginning of the nmethod size distribution of the failing
>>>>>> test case:
>>>>>> 
>>>>> Is it possible it is in segments number and not in bytes? If it
>>>>> really bytes what such (32-48 bytes) nmethods look like?
>>>> This is just a guess but these methods could be method handle
>>>> trampolines.  They are very small.
>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>>> nmethod size distribution (non-zombie java)
>>>>>> -------------------------------------------------
>>>>>> 0-16 bytes                                0[bytes]
>>>>>> 16-32 bytes                                0
>>>>>> 32-48 bytes                                45
>>>>>> 48-64 bytes                                0
>>>>>> 64-80 bytes                                41
>>>>>> 80-96 bytes                                0
>>>>>> 96-112 bytes                               6247
>>>>>> 112-128 bytes                               0
>>>>>> 128-144 bytes                               249
>>>>>> 144-160 bytes                               0
>>>>>> 160-176 bytes                               139
>>>>>> 176-192 bytes                               0
>>>>>> 192-208 bytes                               177
>>>>>> 208-224 bytes                               0
>>>>>> 224-240 bytes                               180
>>>>>> 240-256 bytes                               0
>>>>>> ...
>>>>>> 
>>>>>> 
>>>>>> I do not see a problem for increasing the CodeCacheSegmentSize if
>>>>>> tiered
>>>>>> compilation
>>>>>> is enabled.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Albert
>>>>>> 
>>>>>> 
>>>>>> On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
>>>>>>> I think the suggestion is reasonable since we increase CodeCache *5
>>>>>>> for Tiered.
>>>>>>> Albert, is it possible to collect data how much space is wasted in %
>>>>>>> before and after this change: free space in which we can't allocate +
>>>>>>> unused bytes at the end of nmethods/adapters? Can we squeeze an
>>>>>>> adapter into 64 bytes?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 2/4/14 7:41 AM, Albert wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> could I get reviews for this patch (nightly failure)?
>>>>>>>> 
>>>>>>>> webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>>>>> 
>>>>>>>> problem: The freelist of the code cache exceeds 10'000 items, which
>>>>>>>> results in a VM warning.
>>>>>>>>                 The problem behind the warning is that the freelist
>>>>>>>> is populated by a large number
>>>>>>>>                 of small free blocks. For example, in failing test
>>>>>>>> case (see header), the freelist grows
>>>>>>>>                 up to more than 3500 items where the largest item on
>>>>>>>> the list is 9 segments (one segment
>>>>>>>>                 is 64 bytes). That experiment was done on my laptop.
>>>>>>>> Such a large freelist can indeed be
>>>>>>>>                 a performance problem, since we use a linear search
>>>>>>>> to traverse the freelist.
>>>>>>>> solution:  One way to solve the problem is to increase the minimal
>>>>>>>> allocation size in the code cache.
>>>>>>>>                 This can be done by two means: we can increase
>>>>>>>> CodeCacheMinBlockLength and/or
>>>>>>>>                 CodeCacheSegmentSize. This patch follows the latter
>>>>>>>> approach, since increasing
>>>>>>>>                 CodeCacheSegmentSize decreases the size that is
>>>>>>>> required by the segment map. More
>>>>>>>>                 concretely, the patch doubles the
>>>>>>>> CodeCacheSegmentSize from 64 byte to 128 bytes
>>>>>>>>                 if tiered compilation is enabled.
>>>>>>>>                 The patch also contains an optimization in the
>>>>>>>> freelist search (stop searching if we found
>>>>>>>>                 the appropriate size) and contains some code
>>>>>>>> cleanups.
>>>>>>>> testing:    With the proposed change, the size of the freelist is
>>>>>>>> reduced to 200 items. There is only
>>>>>>>>                 a slight increase in memory required by code cache
>>>>>>>> by at most 3% (all data measured
>>>>>>>>                 for the failing test case on a Linux 64-bit system,
>>>>>>>> 4 cores).
>>>>>>>>                 To summarize, increasing the minimum allocation size
>>>>>>>> in the code cache results in
>>>>>>>>                 potentially more unused memory in the code cache due
>>>>>>>> to unused bits at the end of
>>>>>>>>                 an nmethod. The advantage is that we potentially
>>>>>>>> have less fragmentation.
>>>>>>>> 
>>>>>>>> proposal: - I think we could remove CodeCacheMinBlockLength without
>>>>>>>> loss of generality or usability
>>>>>>>>                   and instead adapt the parameter
>>>>>>>> CodeCacheSegmentSize at Vm startup.
>>>>>>>>                   Any opinions?
>>>>>>>> 
>>>>>>>> Many thanks in advance,
>>>>>>>> Albert
>>> 
> 



More information about the hotspot-compiler-dev mailing list