[foreign-memaccess] Add direct access (DAX) support to MappedMemorySegment

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Apr 6 13:59:56 UTC 2021


On 06/04/2021 14:46, Marcel Käufler wrote:
> Hi Maurizio,
>
> you're right, performance-wise slicing doesn't seem to be a problem as 
> it apparently gets optimized pretty well.
>
>
> Benchmark                                               Mode 
> Cnt         Score        Error   Units
> SliceBenchmark.measureIndexedForce                      thrpt    5 
> 10478563.869 ± 115046.765   ops/s
> SliceBenchmark.measureIndexedForce:·gc.alloc.rate       thrpt 5        
> ≈ 10⁻⁵               MB/sec
> SliceBenchmark.measureIndexedForce:·gc.alloc.rate.norm  thrpt 5        
> ≈ 10⁻⁶                 B/op
> SliceBenchmark.measureIndexedForce:·gc.count            thrpt 
> 5           ≈ 0               counts
> SliceBenchmark.measureSlicedForce                       thrpt    5 
> 10670895.207 ±  19753.296   ops/s
> SliceBenchmark.measureSlicedForce:·gc.alloc.rate        thrpt 5        
> ≈ 10⁻⁵               MB/sec
> SliceBenchmark.measureSlicedForce:·gc.alloc.rate.norm   thrpt 5        
> ≈ 10⁻⁶                 B/op
> SliceBenchmark.measureSlicedForce:·gc.count             thrpt 
> 5           ≈ 0               counts
>
>
> The only API-wise issue I see here is that it might not be obvious 
> that forcing without a slice can have some serious performance 
> implications and on first glance it looks like there's no way of 
> precisely forcing changes like it is with MappedByteBuffers.
> This is also less of a problem with traditionally memory mapped files 
> as the file system keeps track of the dirty pages and only flushes 
> those on msync. Only when using DAX the force() has a runtime linear 
> to the segment size, independent of the dirty cache lines.
> But I also see that extending the API might not be necessary if one is 
> aware of the MemorySegment-philosophy: "Want to do something only on a 
> part of a segment -> slice it!".

I understand what you say. Perhaps we might consider adding something to 
the javadoc of MappedMemorySegments::force? E.g. "if working with big 
NVM mapped files, please slice" ? :-)

Cheers
Maurizio



>
>
> Best Regards
> Marcel
>
>
> On 06.04.21 12:04, Maurizio Cimadamore wrote:
>> Hi Marcel, replies inline
>>
>> On 03/04/2021 22:38, Marcel Käufler wrote:
>>> Hi all,
>>>
>>> I'm currently working with the Foreign Memory Access API and 
>>> (emulated) non-volatile RAM. With JDK 14 support for non-volatile 
>>> memory was added to MappedByteBuffers by mapping with 
>>> ExtendedMapMode.READ_ONLY_SYNC or ExtendedMapMode.READ_WRITE_SYNC.
>>> Calling force() on the MappedByteBuffer will then just flush caches 
>>> instead of invoking msync and also reading won't use the page cache.
>>>
>>> MappedMemorySegment already builds on the same logic and would be 
>>> NVM-aware but unfortunately mapping with an ExtendedMapMode is 
>>> currently not supported. The only way to map a MemorySegment in sync 
>>> mode is to first map a ByteBuffer and then use 
>>> MemorySegment.ofByteBuffer() which of course comes with some 
>>> limitations.
>>>
>>> From my observation the only issue is the openOptions() method in 
>>> MappedMemorySegmentImpl which does not consider the two SYNC modes. 
>>> After adding the modes to the respective conditions I was able call 
>>> `MemorySegment.mapFile(path, offset, size, 
>>> ExtendedMapMode.READ_WRITE_SYNC)` and it worked just as expected.
>>>
>>>
>>>     private static OpenOption[] openOptions(FileChannel.MapMode 
>>> mapMode) {
>>>         if (mapMode == FileChannel.MapMode.READ_ONLY || mapMode == 
>>> ExtendedMapMode.READ_ONLY_SYNC) {
>>>             return new OpenOption[] { StandardOpenOption.READ };
>>>         } else if (mapMode == FileChannel.MapMode.READ_WRITE || 
>>> mapMode == FileChannel.MapMode.PRIVATE || mapMode == 
>>> ExtendedMapMode.READ_WRITE_SYNC) {
>>>             return new OpenOption[] { StandardOpenOption.READ, 
>>> StandardOpenOption.WRITE };
>>>         } else {
>>>             throw new UnsupportedOperationException("Unsupported map 
>>> mode: " + mapMode);
>>>         }
>>>     }
>>>
>>> Is there anything against adding this?
>>
>> I agree there seems to be something odd here... this code was meant 
>> to replicate what was there in FileChannelImpl, but apparently 
>> something is amiss here and ExtendedMapMode have been left out.
>>
>> This should be fixed.
>>
>>>
>>>
>>> Additionally MappedByteBuffer offers a `force(int index, int 
>>> length)` method whereas for MappedMemorySegments there's only a 
>>> `MappedMemorySegments.force(memorySegment)`.
>>> In DAX mode the later is horribly slow because it iterates over the 
>>> whole segment in 64 byte steps to evict cache lines. A targeted 
>>> force can already be accomplished by slicing first and calling force 
>>> on the slice. When working on NVM and frequently flushing cache 
>>> lines, this creates a lot of throwaway MemorySegments for the gc to 
>>> collect. Admitted, this overhead is probably negligible compared to 
>>> the NVM write but a method with offset and length would be nice to 
>>> match the MappedByteBuffer API.
>>>
>>> Everything needed is also already present and it would be easy to 
>>> add a `force(MemorySegment segment, long offset, long length)`:
>>>
>>> In MappedMemorySegments:
>>>
>>>     public static void force(MemorySegment segment, long offset, 
>>> long length) {
>>>         toMappedSegment(segment).force(offset, length);
>>>     }
>>>
>>> In MappedMemorySegmentImpl:
>>>
>>>     public void force(long offset, long length) {
>>>         checkBounds(offset, length); // used from 
>>> AbstractMemorySegmentImpl if made protected (out-of-bounds message 
>>> with "new offset" and "new length" doesn't fit exactly, thought)
>>>         SCOPED_MEMORY_ACCESS.force(scope, unmapper.fileDescriptor(), 
>>> min, unmapper.isSync(), offset, length);
>>>     }
>>>
>>> Thoughts on this?
>>
>> As discussed in other related topics [1], while I've nothing against 
>> the proposed method, do you have any benchmark showing that there is 
>> additional GC pressure, or slower throughput when using
>>
>> force(segment.asSlice(offset, length)) ?
>>
>> The reason I'm asking is that the API already has a way to create 
>> slices out of a segment, which supports all the possible overloads 
>> that user might want to use (note that there are _four_ versions of 
>> asSlice). It would be sad to replicate all that into 
>> MappedMemorySegment, because what you are looking for here is, 
>> essentially, a slicing mechanism. Note also that, when Valhalla 
>> comes, the cost of creating slices should go down regardless of C2 
>> optimizations - so I'm wary here of adding what looks like an 
>> "interim" API.
>>
>> Of course if benchmarks show that, in this case, slice creation is a 
>> problem I have no issue adding an escape hatch for the time being.
>>
>> (I suggest creating a JMH benchmark and then profiling with the JMH 
>> option "-prof gc" which shows allocation rate).
>>
>> Maurizio
>>
>> [1] - 
>> https://mail.openjdk.java.net/pipermail/panama-dev/2021-April/012897.html
>>
>>>
>>>
>>> Best Regards
>>> Marcel
>


More information about the panama-dev mailing list