RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism [v4]

Emanuel Peter epeter at openjdk.org
Mon Feb 5 13:42:34 UTC 2024


On Sat, 3 Feb 2024 20:33:53 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> @vnkozlov is there a way to measure memory fragmentation? I don't know how to answer that question.
>> And is there really a difference to how it was done before? Before we just put everything on the `comp_arena`, and never recover the memory until that arena is given up. Now we have a new `autovectorization_arena` for each AutoVectorization pass over all loops. I guess there could be multiple such passes in a single compilation. Before this patch, this means that each such AutoVectorization pass creates its SuperWord object, and allocates memory on the `comp_arena`, and the memory usage of all these passes adds up. With this patch, we at least are able to give up the memory after every pass.
>> 
>> Of course this is only helpful if the malloc/free'd chunks can properly be reused.
>> I think that chunks can be properly reused: that is what the `ChunkPool::take_from_pool/return_to_pool` are for. They are called from `ChunkPool::allocate_chunk/deallocate_chunk`, if there is a pool for the requested chunk-size/length.
>> 
>> I did some `rr` debugging. And I see that the first (`init_size`) chunk is allocated from a pool. But subsequent chunks (`grow`) have "non-standard" length, and are malloc/free'd. The reason is that `get_pool_for_size` does exact length comparison, and a random length grow will hit one of the pre-defined lengths only with a very low probability. I suppose that could eventually lead to fragmentation. We get "non-standard" lengths quickly when you for example pre-allocate a specific size, which is not a power of 2.
>> 
>> I wonder if we could not round up the chunk-size to the next bigger size for which we have a pool. Of course this would mean we have some padding in the chunks, but if they are short-lived chunks then at least the whole memory can be reclaimed. @jdksjolen you have done some work on Arenas. Do you have any wisdom to offer here?
>> 
>> An alternative: we can put the `autovectorization_arena` at `Compile`. That way, the chunks are kept until the end of compilation, and can be reused between the different AutoVectorization passes/phases.
>
>> And I see that the first (init_size) chunk is allocated from a pool. But subsequent chunks (grow) have "non-standard" length, and are malloc/free'd.
> 
> Yes, that was my concern.
> 
> There are chunks with different sizes: [arena.hpp#L66](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/arena.hpp#L66). Is your allocation sizes > 32*K `Chunk::size` "Default size of an Arena chunk"? `Arena::grow()` uses `MAX2(ARENA_ALIGN(x), (size_t) Chunk::size);`.
> Which of SuperWord allocations are big? Can we split them to fit into 32K?
> 
> I think, this should not stop you from doing this refactoring. Yes, it will allow return memory back sooner and it is up to OS how it optimize it. I read your offline discussion with Johan. He has interesting suggestion fro growable arrays (use C heap).

@vnkozlov After extensive discussions with @jdksjolen , I now decided to create a `VShareData` class, which has its own arena, which holds the really large array(s), so that they can be shared between the different SuperWord / AutoVectorizations of the loops. This means fragmentation for large arrays is not as low as before this change. All the smaller arrays and small allocations are ok if they are not shared, since they nicely fit in chunks anyway, and therefore we don't have to be worried about fragmentation so much.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17624#issuecomment-1927035458


More information about the hotspot-compiler-dev mailing list