[9] RFR(M): 8029799: vm/mlvm/anonloader/stress/oome prints warning: CodeHeap: # of free blocks > 10000
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Feb 7 12:05:22 PST 2014
On 2/7/14 11:34 AM, Albert wrote:
> Hi Vladimir,
>
> thanks for the feedback. Please see comments inline:
>
> On 02/07/2014 07:49 PM, Vladimir Kozlov wrote:
>> Albert,
>>
>> Yes, please, file RFE to rework this code after segmented code cache
>> is integrated. I agree that we can set sizes per segment.
>>
> Ok, I will do that.
>> In new output you are mixing %dkB and %dK. Choose one.
>>
> Done.
>> Next comment is misleading. It looks like it is ordered by size but it
>> is ordered by address so it needs to say that:
>>
>> // Since the freelist is ordered (smaller->larger) and the element
>> we want to insert
>> 399 // into the freelist is smaller than the first element, we can
>> simply add 'b' as the
>> 400 // first element and we are done.
>>
> Done.
>> I am not sure about changes in search_freelist. You may reduce
>> opportunity to find block in free list for huge methods. Can you not
>> do that now? You reduced size of table so searching should not be big
>> problem now.
>>
> It seems not really clear what strategy is best. When the current
> approach, we will end up having small items in the beginning of the
> freelist
> and larger items towards the end. I ran experiments with the failing
> test case and there was no difference in the freelist length (best-fit
> vs first fit).
If there is no difference then leave it as it is in your changes. We can
at least get better search performance.
Changes are good.
Thanks,
Vladimir
> Especially with tiered, we should have smaller items in the beginning
> and larger items in the end, since we first compile C1 methods.
> I can leave it as it as, do more experiments, or change it back as it
> was. I would leave it as it is, but I have no problem with changing back.
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~anoll/8029799/webrev.03/
>
> Best,
> Albert
>
>> Thanks,
>> Vladimir
>>
>> On 2/7/14 8:06 AM, Albert wrote:
>>> Vladimir, Chris, thanks for looking at this.
>>>
>>> The measurement results are attached to the bug.
>>> https://bugs.openjdk.java.net/browse/JDK-8029799
>>>
>>> I've tried various settings and the current configuration seems to
>>> be good for non-tiered compilation. In the current settings, the minimum
>>> size that can be allocated from the code cache is 64 bytes for C1 and
>>> 256 bytes for C2. This is fine, since C1-compiled code is typically
>>> smaller
>>> than C2-compiled code. The tradeoff we are facing here is that smaller
>>> sizes can lead to more fragmentation (a lot of small chunks are on the
>>> freelist), however, the memory wasted at the end of a method is smaller.
>>>
>>> When tiered compilation is enabled, we have C1 and C2 methods stored
>>> in the same code heap, so we have to decide for one minimum-allocatable
>>> size.
>>> The current implementation chooses the C2 setting. Since we compile C1
>>> methods
>>> when the application starts, we allocate small memory units that might
>>> be too
>>> small to fit a C2 version of the method. This is why the freelist can
>>> grow to > 10.000
>>> items.
>>>
>>> The current patch increases the minimum-allocatable size ONLY when
>>> tiered
>>> compilation is enabled to 512 bytes. This leads to less memory overhead,
>>> since
>>> more memory units that are initially used for C1 methods can later be
>>> used by
>>> C2 methods.
>>>
>>> I think that this problem can be solved by segmented code cache that we
>>> plan to
>>> integrate into 9. If we have multiple code heaps, individual code heaps
>>> can use
>>> different values for CodeCacheSegmentSize and CodeCacheMinBlockLength.
>>> I think we should look at this, since a memory overhead of > 20%, as
>>> in the
>>> failing test case, seems unreasonably large. Shall I file an RFE that
>>> suggests to
>>> look into this, once the segmented code cache patch is integrated?
>>>
>>> For now, I think, there is not much more we can do.
>>>
>>> Concerning the small method sizes:
>>> The size that is provided by PrintCodeCache2 is the instruction size
>>> (nm->insts_size)
>>> and not the size of the nmethod. I changed that in this patch, since the
>>> the output
>>> suggests something different: ("nmethod size distribution")
>>>
>>> Here is the link to the webrev:
>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.02/
>>>
>>> Best,
>>> Albert
>>>
>>>
>>> On 02/06/2014 10:29 PM, Christian Thalinger wrote:
>>>> On Feb 5, 2014, at 10:57 AM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>>> On 2/5/14 8:28 AM, Albert wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> thanks for looking at this. I've done the proposed measurements. The
>>>>>> code which I used to
>>>>>> get the data is included in the following webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
>>>>> Good.
>>>>>
>>>>>> I think some people might be interested in getting that data, so we
>>>>>> might want to keep
>>>>>> that additional output. The exact output format can be changed later
>>>>>> (JDK-8005885).
>>>>> I agree that it is useful information.
>>>>>
>>>>>> Here are the results:
>>>>>>
>>>>>> - failing test case:
>>>>>> - original: allocated in freelist: 2168kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 818kB, max_used: 21983kB
>>>>>> - patch : alloacted in freelist: 1123kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 2188kB, max_used: 17572kB
>>>>>> - nashorn:
>>>>>> - original : allocated in freelist: 2426kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 1769kB, max_used: 201886kB
>>>>>> - patch : allocated in freelist: 1150kB, unused bytes in
>>>>>> CodeBlob:
>>>>>> 3458kB, max_used: 202394kB
>>>>>> - SPECJVM2008: compiler.compiler:
>>>>>> - original : allocated in freelist: 168kB, unused bytes in
>>>>>> CodeBlob: 342kB, max_used: 19837kB
>>>>>> - patch : allocated in freelist: 873kB, unused bytes in
>>>>>> CodeBlob: 671kB, max_used: 21184kB
>>>>>>
>>>>>> The minimum size that can be allocated from the code cache is
>>>>>> platform-dependent.
>>>>>> I.e., the minimum size depends on CodeCacheSegmentSize and
>>>>>> CodeCacheMinBlockLength.
>>>>>> On x86, for example, the min. allocatable size from the code cache is
>>>>>> 64*4=256bytes.
>>>>> There is this comment in CodeHeap::search_freelist():
>>>>> // Don't leave anything on the freelist smaller than
>>>>> CodeCacheMinBlockLength.
>>>>>
>>>>> What happens if we scale down CodeCacheMinBlockLength when we
>>>>> increase CodeCacheSegmentSize to keep the same bytes size of minimum
>>>>> block?:
>>>>>
>>>>> + FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize *
>>>>> 2);
>>>>> + FLAG_SET_DEFAULT(CodeCacheMinBlockLength,
>>>>> CodeCacheMinBlockLength/2);
>>>>>
>>>>> Based on your table below those small nmethods will use only 256
>>>>> bytes blocks instead of 512 (128*4).
>>>>>
>>>>> Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't know
>>>>> why for C2 it is 4. Could you also try CodeCacheMinBlockLength = 1?
>>>>>
>>>>> All above is with CodeCacheSegmentSize 128 bytes.
>>>>>
>>>>>> The size of adapters ranges from 400b to 600b.
>>>>>> Here is the beginning of the nmethod size distribution of the failing
>>>>>> test case:
>>>>>>
>>>>> Is it possible it is in segments number and not in bytes? If it
>>>>> really bytes what such (32-48 bytes) nmethods look like?
>>>> This is just a guess but these methods could be method handle
>>>> trampolines. They are very small.
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>> nmethod size distribution (non-zombie java)
>>>>>> -------------------------------------------------
>>>>>> 0-16 bytes 0[bytes]
>>>>>> 16-32 bytes 0
>>>>>> 32-48 bytes 45
>>>>>> 48-64 bytes 0
>>>>>> 64-80 bytes 41
>>>>>> 80-96 bytes 0
>>>>>> 96-112 bytes 6247
>>>>>> 112-128 bytes 0
>>>>>> 128-144 bytes 249
>>>>>> 144-160 bytes 0
>>>>>> 160-176 bytes 139
>>>>>> 176-192 bytes 0
>>>>>> 192-208 bytes 177
>>>>>> 208-224 bytes 0
>>>>>> 224-240 bytes 180
>>>>>> 240-256 bytes 0
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> I do not see a problem for increasing the CodeCacheSegmentSize if
>>>>>> tiered
>>>>>> compilation
>>>>>> is enabled.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Albert
>>>>>>
>>>>>>
>>>>>> On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
>>>>>>> I think the suggestion is reasonable since we increase CodeCache *5
>>>>>>> for Tiered.
>>>>>>> Albert, is it possible to collect data how much space is wasted in %
>>>>>>> before and after this change: free space in which we can't
>>>>>>> allocate +
>>>>>>> unused bytes at the end of nmethods/adapters? Can we squeeze an
>>>>>>> adapter into 64 bytes?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 2/4/14 7:41 AM, Albert wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> could I get reviews for this patch (nightly failure)?
>>>>>>>>
>>>>>>>> webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>>>>>
>>>>>>>> problem: The freelist of the code cache exceeds 10'000 items, which
>>>>>>>> results in a VM warning.
>>>>>>>> The problem behind the warning is that the
>>>>>>>> freelist
>>>>>>>> is populated by a large number
>>>>>>>> of small free blocks. For example, in failing test
>>>>>>>> case (see header), the freelist grows
>>>>>>>> up to more than 3500 items where the largest
>>>>>>>> item on
>>>>>>>> the list is 9 segments (one segment
>>>>>>>> is 64 bytes). That experiment was done on my
>>>>>>>> laptop.
>>>>>>>> Such a large freelist can indeed be
>>>>>>>> a performance problem, since we use a linear
>>>>>>>> search
>>>>>>>> to traverse the freelist.
>>>>>>>> solution: One way to solve the problem is to increase the minimal
>>>>>>>> allocation size in the code cache.
>>>>>>>> This can be done by two means: we can increase
>>>>>>>> CodeCacheMinBlockLength and/or
>>>>>>>> CodeCacheSegmentSize. This patch follows the
>>>>>>>> latter
>>>>>>>> approach, since increasing
>>>>>>>> CodeCacheSegmentSize decreases the size that is
>>>>>>>> required by the segment map. More
>>>>>>>> concretely, the patch doubles the
>>>>>>>> CodeCacheSegmentSize from 64 byte to 128 bytes
>>>>>>>> if tiered compilation is enabled.
>>>>>>>> The patch also contains an optimization in the
>>>>>>>> freelist search (stop searching if we found
>>>>>>>> the appropriate size) and contains some code
>>>>>>>> cleanups.
>>>>>>>> testing: With the proposed change, the size of the freelist is
>>>>>>>> reduced to 200 items. There is only
>>>>>>>> a slight increase in memory required by code cache
>>>>>>>> by at most 3% (all data measured
>>>>>>>> for the failing test case on a Linux 64-bit
>>>>>>>> system,
>>>>>>>> 4 cores).
>>>>>>>> To summarize, increasing the minimum allocation
>>>>>>>> size
>>>>>>>> in the code cache results in
>>>>>>>> potentially more unused memory in the code
>>>>>>>> cache due
>>>>>>>> to unused bits at the end of
>>>>>>>> an nmethod. The advantage is that we potentially
>>>>>>>> have less fragmentation.
>>>>>>>>
>>>>>>>> proposal: - I think we could remove CodeCacheMinBlockLength without
>>>>>>>> loss of generality or usability
>>>>>>>> and instead adapt the parameter
>>>>>>>> CodeCacheSegmentSize at Vm startup.
>>>>>>>> Any opinions?
>>>>>>>>
>>>>>>>> Many thanks in advance,
>>>>>>>> Albert
>>>
>
More information about the hotspot-compiler-dev
mailing list