[foreign-memaccess] Add direct access (DAX) support to MappedMemorySegment

Tue Apr 6 10:04:18 UTC 2021

Hi Marcel, replies inline

On 03/04/2021 22:38, Marcel Käufler wrote:
> Hi all,
>
> I'm currently working with the Foreign Memory Access API and 
> (emulated) non-volatile RAM. With JDK 14 support for non-volatile 
> memory was added to MappedByteBuffers by mapping with 
> ExtendedMapMode.READ_ONLY_SYNC or ExtendedMapMode.READ_WRITE_SYNC.
> Calling force() on the MappedByteBuffer will then just flush caches 
> instead of invoking msync and also reading won't use the page cache.
>
> MappedMemorySegment already builds on the same logic and would be 
> NVM-aware but unfortunately mapping with an ExtendedMapMode is 
> currently not supported. The only way to map a MemorySegment in sync 
> mode is to first map a ByteBuffer and then use 
> MemorySegment.ofByteBuffer() which of course comes with some limitations.
>
> From my observation the only issue is the openOptions() method in 
> MappedMemorySegmentImpl which does not consider the two SYNC modes. 
> After adding the modes to the respective conditions I was able call 
> `MemorySegment.mapFile(path, offset, size, 
> ExtendedMapMode.READ_WRITE_SYNC)` and it worked just as expected.
>
>
>     private static OpenOption[] openOptions(FileChannel.MapMode 
> mapMode) {
>         if (mapMode == FileChannel.MapMode.READ_ONLY || mapMode == 
> ExtendedMapMode.READ_ONLY_SYNC) {
>             return new OpenOption[] { StandardOpenOption.READ };
>         } else if (mapMode == FileChannel.MapMode.READ_WRITE || 
> mapMode == FileChannel.MapMode.PRIVATE || mapMode == 
> ExtendedMapMode.READ_WRITE_SYNC) {
>             return new OpenOption[] { StandardOpenOption.READ, 
> StandardOpenOption.WRITE };
>         } else {
>             throw new UnsupportedOperationException("Unsupported map 
> mode: " + mapMode);
>         }
>     }
>
> Is there anything against adding this?

I agree there seems to be something odd here... this code was meant to 
replicate what was there in FileChannelImpl, but apparently something is 
amiss here and ExtendedMapMode have been left out.

This should be fixed.

>
>
> Additionally MappedByteBuffer offers a `force(int index, int length)` 
> method whereas for MappedMemorySegments there's only a 
> `MappedMemorySegments.force(memorySegment)`.
> In DAX mode the later is horribly slow because it iterates over the 
> whole segment in 64 byte steps to evict cache lines. A targeted force 
> can already be accomplished by slicing first and calling force on the 
> slice. When working on NVM and frequently flushing cache lines, this 
> creates a lot of throwaway MemorySegments for the gc to collect. 
> Admitted, this overhead is probably negligible compared to the NVM 
> write but a method with offset and length would be nice to match the 
> MappedByteBuffer API.
>
> Everything needed is also already present and it would be easy to add 
> a `force(MemorySegment segment, long offset, long length)`:
>
> In MappedMemorySegments:
>
>     public static void force(MemorySegment segment, long offset, long 
> length) {
>         toMappedSegment(segment).force(offset, length);
>     }
>
> In MappedMemorySegmentImpl:
>
>     public void force(long offset, long length) {
>         checkBounds(offset, length); // used from 
> AbstractMemorySegmentImpl if made protected (out-of-bounds message 
> with "new offset" and "new length" doesn't fit exactly, thought)
>         SCOPED_MEMORY_ACCESS.force(scope, unmapper.fileDescriptor(), 
> min, unmapper.isSync(), offset, length);
>     }
>
> Thoughts on this?

As discussed in other related topics [1], while I've nothing against the 
proposed method, do you have any benchmark showing that there is 
additional GC pressure, or slower throughput when using

force(segment.asSlice(offset, length)) ?

The reason I'm asking is that the API already has a way to create slices 
out of a segment, which supports all the possible overloads that user 
might want to use (note that there are _four_ versions of asSlice). It 
would be sad to replicate all that into MappedMemorySegment, because 
what you are looking for here is, essentially, a slicing mechanism. Note 
also that, when Valhalla comes, the cost of creating slices should go 
down regardless of C2 optimizations - so I'm wary here of adding what 
looks like an "interim" API.

Of course if benchmarks show that, in this case, slice creation is a 
problem I have no issue adding an escape hatch for the time being.

(I suggest creating a JMH benchmark and then profiling with the JMH 
option "-prof gc" which shows allocation rate).

Maurizio

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2021-April/012897.html

>
>
> Best Regards
> Marcel