RFR (M) 8164921: Memory leaked when instrumentation.retransformClasses() is called repeatedly

Mon Oct 10 08:28:55 UTC 2016

Hi Coleen,

On 2016-10-07 17:24, Coleen Phillimore wrote:
> Mikael, thanks again ...   embedded replies and 2 questions:
>
> On 10/7/16 10:39 AM, Mikael Gerdin wrote:
>> Hi Coleen,
>>
>> On 2016-10-06 19:54, Coleen Phillimore wrote:
>>>
>>> Here is an update to the Metadata leak change.  There was a bug
>>> introduced when cleaning this up, which Mikael also found.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8164921.02/webrev
>>>
>>> Changes include adding logging for report_metadata_oome, which
>>> necessitated removing ResourceMarks in ClassLoaderData::dump because the
>>> stream passed in already required a ResourceMark, so it got a nested
>>> ResourceMark message for the stream.
>>>
>>> I changed logging for tracing block allocations to log_trace(gc,
>>> metaspace, freelist, blocks).
>>>
>>> In BlockFreelist::get_block and BlockFreelist::return_block() assert
>>> that it's not called for a size smaller than the minimum allocation
>>> (which was the bug).  Renamed get_raw_word_size() to
>>> get_allocation_word_size().  This rounds up to the minimum allocation
>>> size which is the same as small_block_min_size.
>>>
>>> Also, I added a test that takes a long time to execute to verify this,
>>> and excluded it from JPRT.  I could skip adding this test if people
>>> don't want it.  Also, the test verifies that continuously redefining the
>>> same class gets memory for the new class that was released because the
>>> block sizes are the same.   When the test exits, it gets a metaspace OOM
>>> because loading new classes and allocating metadata can't use the blocks
>>> returned (wrong size). There is still fragmentation in this
>>> implementation, but it's better that things deallocated < 12 words are
>>> actually freed.  I'll file an RFE to work on a perfect algorithm, or to
>>> investigate finding a better one, although I consider this a stress test
>>> that uses all of metaspace to MaxMetaspaceSize, leaving allocation only
>>> to the block fragments left.  This isn't a typical use case.
>>>
>>> Some comments and corrections to my responses to Mikael below:
>>>
>>> On 10/4/16 12:15 PM, Coleen Phillimore wrote:
>>>>
>>>> Hi Mikael,
>>>>
>>>> Thanks for looking at this change.
>>>>
>>>> On 10/4/16 8:32 AM, Mikael Gerdin wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> On 2016-09-30 21:02, Coleen Phillimore wrote:
>>>>>> Summary: Return Metablocks smaller than dictionary's dark matter.
>>>>>>
>>>>>> This change contributed by Jon Masamitsu and myself.  To reclaim
>>>>>> "dark
>>>>>> matter" this change adds an array of small blocks by size, created
>>>>>> lazily, to return Metablocks smaller than the BinaryTreeDictionary
>>>>>> entry's minimum size.   This change also fixed a bug in small object
>>>>>> double free and adds debugging code to check for this case.
>>>>>>
>>>>>> With this change, the submitted test case runs indefinitely. Also
>>>>>> passed rbt tier 1-5 testing.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8164921.01/webrev
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8164921
>>>>>
>>>>> I'd prefer it if SmallBlocks didn't expose its implementation by
>>>>> returning its FreeLists by reference, could you change it to have
>>>>> * return_chunk()
>>>>> * get_chunk()
>>>>> * num_chunks(word_size)
>>>>>
>>>>> and get rid of list_at?
>>>>
>>>> Okay, I refactored this so small_chunks()->get_block() and
>>>> return_block() are used rather than getting list_at.  I didn't see
>>>> where you got num_chunks, but list_at is hidden.
>>
>> Thanks, that's much better!
>>
>>>>>
>>>>> --
>>>>>
>>>>> For how long do you plan to keep BlockFreelist::_all_blocks?
>>>>> I see that it's debug only but I fear that it could case problems
>>>>> with running out of native memory in our internal testing of debug
>>>>> builds.
>>>>
>>>> I thought about taking it out, but it helped me find the double free
>>>> bug.   I think if we add new code to metadata and have to call
>>>> deallocate_contents on it, we risk re-introducting these double free
>>>> bugs.   I could take it out.  I don't think this gets that big but I'd
>>>> hate to introduce some sort of OOM bug in our testing.
>>
>> Perhaps a trade off could be to have a bound on the size of the array
>> and deallocate and disable it if it grows too large?
>
> I just removed it.  I had run all the tests with the debugging code and
> they passed.  If we have to, we can add it back again pretty easily.

Ok.

>>
>>>>>
>>>>> BlockFreelist::min_size() is a bit of a misnomer since it returns the
>>>>> minimum size of blocks to be put on the BlockTreeDictionary, not the
>>>>> minimum size of blocks which are reusable.
>>>>
>>>> How about min_dictionary_size() ?
>>
>> That's fine.
>>
>>>>
>>>>>
>>>>> Is there any particular reason behind _small_blocks being lazy
>>>>> allocated and _dictionary not?
>>>>
>>>> We lazily create the BlockFreelists with this change.
>>>>
>>>>   // Lazily create a block_freelist
>>>>   if (block_freelists() == NULL) {
>>>>     _block_freelists = new BlockFreelist();
>>>>   }
>>>>
>>>> So the small_blocks are lazily created in the BlockFreelists but the
>>>> dictionary is not.  I guess if we're going to create the
>>>> BlockFreelists here, we'll most likely need both and maybe
>>>> small_blocks need not be lazily created.  Was that your suggestion?
>>>>
>>>> My concern with this change was all the space used by the
>>>> small_blocks() but if we're doing any deallocation within the
>>>> metaspace, at least one of the things will be <12 words.
>>>>
>>>> I'll make small_blocks() not be lazily allocated since BlockFreelist
>>>> are.  These are pretty expensive, but should be limited to a few
>>>> metaspaces.
>>>
>>> Class SpaceManager doesn't need small_blocks() so I left small_blocks as
>>> lazily allocated.
>>
>> Ok, I agree that the size of small_blocks() motivates its lazy
>> allocation scheme.
>>
>>>>>
>>>>> I would prefer if BlockFreelist::return_block would perform the
>>>>> checks in reverse order instead of having a return inside the first
>>>>> if block, something like
>>>>>
>>>>> if (word_size > small_block_max_size) {
>>>>> dict()->return_chunk(c)
>>>>> } else if (word_size > small_block_min_size) {
>>>>> small_blocks()->return_chunk(c)
>>>>> } else {
>>>>>   // dark matter
>>>>> }
>>>>
>>>> Why?  We don't want to cast Metablock into dark matter so check if
>>>> word_size < small_block_min_size first.
>>>>
>>>>   Metablock* free_chunk = ::new (p) Metablock(word_size);
>>>>   if (word_size < SmallBlocks::small_block_max_size()) {
>>>>     small_blocks()->return_chunk(word_size);
>>>>   } else {
>>>>     dictionary()->return_chunk(free_chunk);
>>>>   }
>>>>
>>>
>>> There is no dark matter in these functions anymore.
>>
>> Oh! Right.
>>
>>
>>>>>
>>>>>
>>>>> For BlockFreelist::get_block I realize that the code is a bit more
>>>>> complex but this also raises a few questions.
>>>>> * When we allocate from the dictionary we search for a block which is
>>>>> at least as large as we ask for and split it, returning whatever was
>>>>> left back to the free list. We don't search for "large enough" blocks
>>>>> from the small blocks manager, is that intentional or just to keep
>>>>> the code simple (I like simple)?
>>>>
>>>> I'm glad you asked about this so I could give background.   It turns
>>>> out that we deallocate metaspace items better this way.  I had a
>>>> version that did exactly what you said.  It was a simple sorted linked
>>>> list of returned blocks < min_dictionary_size (12) where get_block
>>>> returned the first block where the item would fit.  It had some best
>>>> fit algorithm so if the block returned was a lot bigger, it wouldn't
>>>> pick it.
>>>>
>>>> My implementation could get through 69300 retransformations before the
>>>> list didn't work anymore (too many small block fragments of the wrong
>>>> size) and metaspace was exhausted (metaspace was limited to 12M in
>>>> this test).  Jon's implementation ran this test indefinitely. So it's
>>>> somewhat simple but it worked really well.
>>>>
>>>>>
>>>>> In the last part of get_block where we return the unused part of a
>>>>> block retrieved from the dictionary uses compares with
>>>>> BlockFreelist::min_size() which, as I mentioned above, is the min
>>>>> size for blocks in the dictionary, not the min size for blocks to be
>>>>> reusable.
>>>>> I think this code can just call return_block unconditionally for any
>>>>> nonzero value of "unused" and let return_block deal with dark matter.
>>>>>
>>>> yes, I'll make that change.
>>>
>>> This change has to be conditional because I assert that
>>> BlockFreelist::return_block() is never called for <
>>> small_block_min_size.
>>
>> Ok.
>>
>>
>> There's one more thing, though.
>> In SpaceManager::retire_current_chunk the last piece of the current
>> allocation chunk is made reusable through a call to
>> SpaceManager::deallocate. There are two problems with this:
>>
>> * retire_current_chunk only calls deallocate for wasted blocks larger
>> than the min dictionary size, this means that wasted blocks which
>> would fit in SmallBlocks will be wasted instead of reusable.
>
> Yeah.   I didn't know if this was worth doing for 12 words or less. Do you?

To be honest: I don't know.
We added the SmallBlocks free list for a reason, I guess, so maybe we 
should try a bit harder to conserve metaspace memory.

>
>> * the remaining_words parameter which retire_current_chunk passes to
>> deallocate is _exact_ and must not be aligned up through
>> get_allocation_word_size.
>>
>
> Since it's already greater than the allocation_word_size, this isn't a
> problem.

Well, if in a 32 bit VM a deallocation is done for an object of 13 words 
then get_allocation_word_size will align up by 8 to 14 and the block 
returned could be too large.
I just realized that this cannot happen since allocations are always 
aligned up by 8, but I wanted to share my reasoning at least.

>
>>
>> Also,
>> are you sure about logging the output of ClassLoaderData::dump on the
>> info level? Perhaps logging the cld dump on the debug level is a good
>> trade-off? It looks like the trace level causes the VSM to print the
>> entire free lists to the log so that's probably a bit much.
>
> It is a lot of stuff I didn't want to see.  I could change it back to
> trace though because the thing I really wanted to see in
> MetaspaceAux::dump() is guarded by trace
>
>   if (log_is_enabled(Trace, gc, metaspace, freelist)) {
>     if (block_freelists() != NULL) block_freelists()->print_on(out);
>   }
>
> It seems like debug would be a better level though for both.

Yeah, I think that's a good trade-off.

/Mikael

>>
>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>>>>
>>>>>
>>>>> Are the changes to Method and ConstMethod the "bug fix to small
>>>>> object double free"?
>>>>
>>>> Yes.
>>>>> Is the problem that when a method was freed its annotations were
>>>>> deallocated as well? Could the annotations fields in the "old"
>>>>> ConstMethod be cleared instead so that the old annotation arrays
>>>>> could be kept or is that just needless complexity?
>>>>
>>>> I'd rather copy the annotations, because I don't know how the old
>>>> method, which could still be running, might use the annotations. I
>>>> don't want to mess with that but I see your point.
>>
>> That's fine I think, I wouldn't want to mess around with it either. I
>> just wanted to understand the reasoning behind the change, thanks for
>> confirming my suspicion.
>>
>
> thanks,
> Coleen
>
>> /Mikael
>>
>>>>
>>>> Coleen
>>>>
>>>>>
>>>>> Thanks
>>>>> /Mikael
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen and Jon
>>>>
>>>
>