[9] RFR(M): 8029799: vm/mlvm/anonloader/stress/oome prints warning: CodeHeap: # of free blocks > 10000

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Feb 10 10:27:18 PST 2014


Good.

Vladimir

On 2/9/14 11:28 PM, Albert wrote:
> Christian, Vladimir, thanks for looking at this.
>
> Christian, I fixed the output in print_memory_overhead() and the typo
> in globals.hpp. Here is the webrev:
> http://cr.openjdk.java.net/~anoll/8029799/webrev.04/
>
> About making CodeCacheSegmentSize a platform-dependent flag. I am not
> sure either - I guess that's a matter of taste. I filed
> https://bugs.openjdk.java.net/browse/JDK-8034052,
> which aims at investigating using different CodeCacheSegmentSizes for
> different code heaps.
>
> Maybe we come back to this when we work on this issue.
>
>
> Best,
> Albert
>
> On 02/08/2014 07:24 PM, Christian Thalinger wrote:
>> On Feb 7, 2014, at 11:34 AM, Albert <albert.noll at oracle.com> wrote:
>>
>>> Hi Vladimir,
>>>
>>> thanks for the feedback. Please see comments inline:
>>>
>>> On 02/07/2014 07:49 PM, Vladimir Kozlov wrote:
>>>> Albert,
>>>>
>>>> Yes, please, file RFE to rework this code after segmented code cache
>>>> is integrated. I agree that we can set sizes per segment.
>>>>
>>> Ok, I will do that.
>>>> In new output you are mixing %dkB and %dK. Choose one.
>>>>
>>> Done.
>> Almost:
>>
>> +   tty->print_cr("Allocated in freelist:          %dkB",
>> bytes_allocated_in_freelist()/K);
>> +   tty->print_cr("Unused bytes in CodeBlobs:      %dKB",
>> wasted_bytes/K);
>> +   tty->print_cr("Segment map size:               %dKB",
>> allocated_segments()/K); // 1 byte per segment
>>
>> Use kB.
>>
>>>> Next comment is misleading. It looks like it is ordered by size but
>>>> it is ordered by address so it needs to say that:
>>>>
>>>>    // Since the freelist is ordered (smaller->larger) and the
>>>> element we want to insert
>>>> 399   // into the freelist is smaller than the first element, we can
>>>> simply add 'b' as the
>>>> 400   // first element and we are done.
>>>>
>>> Done.
>>>> I am not sure about changes in search_freelist. You may reduce
>>>> opportunity to find block in free list for huge methods. Can you not
>>>> do that now? You reduced size of table so searching should not be
>>>> big problem now.
>>>>
>>> It seems not really clear what strategy is best. When the current
>>> approach, we will end up having small items in the beginning of the
>>> freelist
>>> and larger items towards the end. I ran experiments with the failing
>>> test case and there was no difference in the freelist length
>>> (best-fit vs first fit).
>>> Especially with tiered, we should have smaller items in the beginning
>>> and larger items in the end, since we first compile C1 methods.
>>> I can leave it as it as, do more experiments, or change it back as it
>>> was. I would leave it as it is, but I have no problem with changing
>>> back.
>>>
>>> Here is the new webrev:
>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.03/
>> src/share/vm/runtime/globals.hpp:
>>
>> !   notproduct(bool, VerifyCodeCache,
>> false,                                  \
>> !           "Verify code code cache on memory
>> allocation/deallocation")       \
>>
>> Typo: “code code”
>>
>> !   develop(uintx, CodeCacheSegmentSize, 64 PPC64_ONLY(+64)
>> NOT_PPC64(TIERED_ONLY(+64)),\
>>
>> I wonder if CodeCacheSegmentSize should be a platform dependent flag...
>>
>> Otherwise this looks good.
>>
>>> Best,
>>> Albert
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 2/7/14 8:06 AM, Albert wrote:
>>>>> Vladimir, Chris, thanks for looking at this.
>>>>>
>>>>> The measurement results are attached to the bug.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>>
>>>>> I've tried various settings and the current configuration seems to
>>>>> be good for non-tiered compilation. In the current settings, the
>>>>> minimum
>>>>> size that can be allocated from the code cache is 64 bytes for C1 and
>>>>> 256 bytes for C2. This is fine, since C1-compiled code is typically
>>>>> smaller
>>>>> than C2-compiled code. The tradeoff we are facing here is that smaller
>>>>> sizes can lead to more fragmentation (a lot of small chunks are on the
>>>>> freelist), however, the memory wasted at the end of a method is
>>>>> smaller.
>>>>>
>>>>> When tiered compilation is enabled, we have C1 and C2 methods stored
>>>>> in the same code heap, so we have to decide for one
>>>>> minimum-allocatable
>>>>> size.
>>>>> The current implementation chooses the C2 setting. Since we compile C1
>>>>> methods
>>>>> when the application starts, we allocate small memory units that might
>>>>> be too
>>>>> small to fit a C2 version of the method. This is why the freelist can
>>>>> grow to > 10.000
>>>>> items.
>>>>>
>>>>> The current patch increases the minimum-allocatable size ONLY when
>>>>> tiered
>>>>> compilation is enabled to 512 bytes. This leads to less memory
>>>>> overhead,
>>>>> since
>>>>> more memory units that are initially used for C1 methods can later be
>>>>> used by
>>>>> C2 methods.
>>>>>
>>>>> I think that this problem can be solved by segmented code cache
>>>>> that we
>>>>> plan to
>>>>> integrate into 9. If we have multiple code heaps, individual code
>>>>> heaps
>>>>> can use
>>>>> different values for CodeCacheSegmentSize and CodeCacheMinBlockLength.
>>>>> I think we should look at this, since a memory overhead of > 20%,
>>>>> as in the
>>>>> failing test case, seems unreasonably large. Shall I file an RFE that
>>>>> suggests to
>>>>> look into this, once the segmented code cache patch is integrated?
>>>>>
>>>>> For now, I think, there is not much more we can do.
>>>>>
>>>>> Concerning the small method sizes:
>>>>> The size that is provided by PrintCodeCache2 is the instruction size
>>>>> (nm->insts_size)
>>>>> and not the size of the nmethod. I changed that in this patch,
>>>>> since the
>>>>> the output
>>>>> suggests something different: ("nmethod size distribution")
>>>>>
>>>>> Here is the link to the webrev:
>>>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.02/
>>>>>
>>>>> Best,
>>>>> Albert
>>>>>
>>>>>
>>>>> On 02/06/2014 10:29 PM, Christian Thalinger wrote:
>>>>>> On Feb 5, 2014, at 10:57 AM, Vladimir Kozlov
>>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>>
>>>>>>> On 2/5/14 8:28 AM, Albert wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>> thanks for looking at this. I've done the proposed measurements.
>>>>>>>> The
>>>>>>>> code which I used to
>>>>>>>> get the data is included in the following webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
>>>>>>> Good.
>>>>>>>
>>>>>>>> I think some people might be interested in getting that data, so we
>>>>>>>> might want to keep
>>>>>>>> that additional output. The exact output format can be changed
>>>>>>>> later
>>>>>>>> (JDK-8005885).
>>>>>>> I agree that it is useful information.
>>>>>>>
>>>>>>>> Here are the results:
>>>>>>>>
>>>>>>>> - failing test case:
>>>>>>>>     - original: allocated in freelist: 2168kB, unused bytes in
>>>>>>>> CodeBlob:
>>>>>>>> 818kB,   max_used: 21983kB
>>>>>>>>     - patch   : alloacted in freelist: 1123kB, unused bytes in
>>>>>>>> CodeBlob:
>>>>>>>> 2188kB, max_used: 17572kB
>>>>>>>> - nashorn:
>>>>>>>>    - original : allocated in freelist: 2426kB, unused bytes in
>>>>>>>> CodeBlob:
>>>>>>>> 1769kB, max_used: 201886kB
>>>>>>>>    - patch    : allocated in freelist: 1150kB, unused bytes in
>>>>>>>> CodeBlob:
>>>>>>>> 3458kB, max_used: 202394kB
>>>>>>>> - SPECJVM2008: compiler.compiler:
>>>>>>>>    - original  : allocated in freelist:  168kB, unused bytes in
>>>>>>>> CodeBlob: 342kB, max_used: 19837kB
>>>>>>>>    - patch     : allocated in freelist:  873kB, unused bytes in
>>>>>>>> CodeBlob: 671kB, max_used: 21184kB
>>>>>>>>
>>>>>>>> The minimum size that can be allocated from the code cache is
>>>>>>>> platform-dependent.
>>>>>>>> I.e., the minimum size depends on CodeCacheSegmentSize and
>>>>>>>> CodeCacheMinBlockLength.
>>>>>>>> On x86, for example, the min. allocatable size from the code
>>>>>>>> cache is
>>>>>>>> 64*4=256bytes.
>>>>>>> There is this comment in CodeHeap::search_freelist():
>>>>>>>   // Don't leave anything on the freelist smaller than
>>>>>>> CodeCacheMinBlockLength.
>>>>>>>
>>>>>>> What happens if we scale down CodeCacheMinBlockLength when we
>>>>>>> increase CodeCacheSegmentSize to keep the same bytes size of minimum
>>>>>>> block?:
>>>>>>>
>>>>>>> +     FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize
>>>>>>> * 2);
>>>>>>> +     FLAG_SET_DEFAULT(CodeCacheMinBlockLength,
>>>>>>> CodeCacheMinBlockLength/2);
>>>>>>>
>>>>>>> Based on your table below those small nmethods will use only 256
>>>>>>> bytes blocks instead of 512 (128*4).
>>>>>>>
>>>>>>> Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't know
>>>>>>> why for C2 it is 4. Could you also try CodeCacheMinBlockLength = 1?
>>>>>>>
>>>>>>> All above is with CodeCacheSegmentSize 128 bytes.
>>>>>>>
>>>>>>>> The size of adapters ranges from 400b to 600b.
>>>>>>>> Here is the beginning of the nmethod size distribution of the
>>>>>>>> failing
>>>>>>>> test case:
>>>>>>>>
>>>>>>> Is it possible it is in segments number and not in bytes? If it
>>>>>>> really bytes what such (32-48 bytes) nmethods look like?
>>>>>> This is just a guess but these methods could be method handle
>>>>>> trampolines.  They are very small.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>> nmethod size distribution (non-zombie java)
>>>>>>>> -------------------------------------------------
>>>>>>>> 0-16 bytes                                0[bytes]
>>>>>>>> 16-32 bytes                                0
>>>>>>>> 32-48 bytes                                45
>>>>>>>> 48-64 bytes                                0
>>>>>>>> 64-80 bytes                                41
>>>>>>>> 80-96 bytes                                0
>>>>>>>> 96-112 bytes                               6247
>>>>>>>> 112-128 bytes                               0
>>>>>>>> 128-144 bytes                               249
>>>>>>>> 144-160 bytes                               0
>>>>>>>> 160-176 bytes                               139
>>>>>>>> 176-192 bytes                               0
>>>>>>>> 192-208 bytes                               177
>>>>>>>> 208-224 bytes                               0
>>>>>>>> 224-240 bytes                               180
>>>>>>>> 240-256 bytes                               0
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not see a problem for increasing the CodeCacheSegmentSize if
>>>>>>>> tiered
>>>>>>>> compilation
>>>>>>>> is enabled.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Albert
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
>>>>>>>>> I think the suggestion is reasonable since we increase
>>>>>>>>> CodeCache *5
>>>>>>>>> for Tiered.
>>>>>>>>> Albert, is it possible to collect data how much space is wasted
>>>>>>>>> in %
>>>>>>>>> before and after this change: free space in which we can't
>>>>>>>>> allocate +
>>>>>>>>> unused bytes at the end of nmethods/adapters? Can we squeeze an
>>>>>>>>> adapter into 64 bytes?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 2/4/14 7:41 AM, Albert wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> could I get reviews for this patch (nightly failure)?
>>>>>>>>>>
>>>>>>>>>> webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8029799
>>>>>>>>>>
>>>>>>>>>> problem: The freelist of the code cache exceeds 10'000 items,
>>>>>>>>>> which
>>>>>>>>>> results in a VM warning.
>>>>>>>>>>                  The problem behind the warning is that the
>>>>>>>>>> freelist
>>>>>>>>>> is populated by a large number
>>>>>>>>>>                  of small free blocks. For example, in failing
>>>>>>>>>> test
>>>>>>>>>> case (see header), the freelist grows
>>>>>>>>>>                  up to more than 3500 items where the largest
>>>>>>>>>> item on
>>>>>>>>>> the list is 9 segments (one segment
>>>>>>>>>>                  is 64 bytes). That experiment was done on my
>>>>>>>>>> laptop.
>>>>>>>>>> Such a large freelist can indeed be
>>>>>>>>>>                  a performance problem, since we use a linear
>>>>>>>>>> search
>>>>>>>>>> to traverse the freelist.
>>>>>>>>>> solution:  One way to solve the problem is to increase the
>>>>>>>>>> minimal
>>>>>>>>>> allocation size in the code cache.
>>>>>>>>>>                  This can be done by two means: we can increase
>>>>>>>>>> CodeCacheMinBlockLength and/or
>>>>>>>>>>                  CodeCacheSegmentSize. This patch follows the
>>>>>>>>>> latter
>>>>>>>>>> approach, since increasing
>>>>>>>>>>                  CodeCacheSegmentSize decreases the size that is
>>>>>>>>>> required by the segment map. More
>>>>>>>>>>                  concretely, the patch doubles the
>>>>>>>>>> CodeCacheSegmentSize from 64 byte to 128 bytes
>>>>>>>>>>                  if tiered compilation is enabled.
>>>>>>>>>>                  The patch also contains an optimization in the
>>>>>>>>>> freelist search (stop searching if we found
>>>>>>>>>>                  the appropriate size) and contains some code
>>>>>>>>>> cleanups.
>>>>>>>>>> testing:    With the proposed change, the size of the freelist is
>>>>>>>>>> reduced to 200 items. There is only
>>>>>>>>>>                  a slight increase in memory required by code
>>>>>>>>>> cache
>>>>>>>>>> by at most 3% (all data measured
>>>>>>>>>>                  for the failing test case on a Linux 64-bit
>>>>>>>>>> system,
>>>>>>>>>> 4 cores).
>>>>>>>>>>                  To summarize, increasing the minimum
>>>>>>>>>> allocation size
>>>>>>>>>> in the code cache results in
>>>>>>>>>>                  potentially more unused memory in the code
>>>>>>>>>> cache due
>>>>>>>>>> to unused bits at the end of
>>>>>>>>>>                  an nmethod. The advantage is that we potentially
>>>>>>>>>> have less fragmentation.
>>>>>>>>>>
>>>>>>>>>> proposal: - I think we could remove CodeCacheMinBlockLength
>>>>>>>>>> without
>>>>>>>>>> loss of generality or usability
>>>>>>>>>>                    and instead adapt the parameter
>>>>>>>>>> CodeCacheSegmentSize at Vm startup.
>>>>>>>>>>                    Any opinions?
>>>>>>>>>>
>>>>>>>>>> Many thanks in advance,
>>>>>>>>>> Albert
>


More information about the hotspot-compiler-dev mailing list