[foreign-jextract] RFR: MemorySegmentPool + Allocator [v7]
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Thu Apr 22 00:27:30 UTC 2021
On Wed, 21 Apr 2021 23:54:24 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> Radoslaw Smogura has updated the pull request incrementally with two additional commits since the last revision:
>>
>> - Better releasing resources on scopes close
>> New benchamarks for bulk allocations
>> - Minor fixes to SpinLockQueue
>> Remove redundant LOCK release (set to 0) in put entry.
>> Fixed maxSize comparison
>> Removed unused imports
>>
>> Q: Could we reduce setRelease to just set in few places?
>> Q: Should this method be private and moved to Entry.close to prevent accidental adding element from other queue?
>
> test/micro/org/openjdk/bench/jdk/incubator/foreign/AllocatorsForLongRun.java line 149:
>
>> 147: private void readSegment(MemorySegment s) {
>> 148: final var size = s.byteSize();
>> 149: for (long l = 0; l < size; l += 256) {
>
> this is a loop on `long` - as thing stands - this is gonna defeat all optimizations. For the time being - replace with an `int` loop (and then cast coordinate back to `long`). Or just use `MemoryAccess.setByteAtIndex`.
Tweaked the loop and numbers didn't change much from yours:
Benchmark (allocations) Mode Cnt Score Error Units
AllocatorsForLongRun.arena 1 avgt 30 185.742 ? 4.349 ns/op
AllocatorsForLongRun.arena 16 avgt 30 616.261 ? 14.974 ns/op
AllocatorsForLongRun.arena 200 avgt 30 6574.496 ? 55.602 ns/op
AllocatorsForLongRun.malloc_free 1 avgt 30 25.888 ? 0.272 ns/op
AllocatorsForLongRun.malloc_free 16 avgt 30 602.258 ? 11.776 ns/op
AllocatorsForLongRun.malloc_free 200 avgt 30 10126.972 ? 151.182 ns/op
AllocatorsForLongRun.pool_allocator 1 avgt 30 35.907 ? 0.474 ns/op
AllocatorsForLongRun.pool_allocator 16 avgt 30 378.874 ? 8.533 ns/op
AllocatorsForLongRun.pool_allocator 200 avgt 30 4489.656 ? 40.615 ns/op
AllocatorsForLongRun.pool_allocator_exhausted 1 avgt 30 65.074 ? 3.399 ns/op
AllocatorsForLongRun.pool_allocator_exhausted 16 avgt 30 994.809 ? 22.971 ns/op
AllocatorsForLongRun.pool_allocator_exhausted 200 avgt 30 16247.051 ? 223.768 ns/op
AllocatorsForLongRun.pool_direct 1 avgt 30 15.827 ? 0.398 ns/op
AllocatorsForLongRun.pool_direct 16 avgt 30 269.499 ? 3.384 ns/op
AllocatorsForLongRun.pool_direct 200 avgt 30 3491.204 ? 35.959 ns/op
Seems like 16 allocations is the break even for arena - after which (on 200) arena is better than malloc (I can only imagine that advantage of arena will keep growing with number of allocations). Malloc/free is still surprisingly good, all things considered, especially hard to beat on single shot allocations.
The problem I see with the pool strategy is that it's faster than malloc - but not in a radical way (there's no 10x here). And you have to consider best case, and worst case (the best case is better than malloc, the worst case, exhausted, is worse). So it looks like something that can be a great thing, if that's what a program needs, and provided it's used as intended - but it doesn't seem (yet?) to deliver that kind of horizontal, across the board, scaling that would justify its inclusion in the API (although it's great to see that such an allocator can be written on top of the API).
What I like about the pool though, is the approach you had for the API - I think that when we will look at allocators again (as I said, we did have some other allocators we were looking at, not too different from what you are trying to do here), I think the API that will be offered will probably be very similar to what you have in here - as I think it's spot on, and plays to the advantages of the new memory API.
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/509
More information about the panama-dev
mailing list