Question: ByteBuffer vs MemorySegment for binary (de)serializiation and in-memory buffer pool

Sat Sep 3 22:26:00 UTC 2022

Radosław, I tried to implement your advice but I think I might have
implemented it incorrectly
With JMH, I get very poor results:

Benchmark                                                 Mode  Cnt
 Score        Error  Units
DiskManagerBenchmarks._01_01_writePageDiskManager        thrpt    4
552595.393 �  77869.814  ops/s
DiskManagerBenchmarks._01_02_writePageMappedDiskManager  thrpt    4
 174.588 �    111.846  ops/s
DiskManagerBenchmarks._02_01_readPageDiskManager         thrpt    4
640469.183 � 104851.381  ops/s
DiskManagerBenchmarks._02_02_readPageMappedDiskManager   thrpt    4
133564.674 �  10693.985  ops/s

The difference in writing is ~550,000 vs 174(!)
In reading it is ~640,000 vs ~130,000

This is the implementation code:

public void readPage(PageId pageId, MemorySegment pageBuffer) {
    int pageOffset = pageId.value() * Constants.PAGE_SIZE;
    MemorySegment mappedBuffer =
raf.getChannel().map(FileChannel.MapMode.READ_WRITE, pageOffset,
Constants.PAGE_SIZE, session);
    mappedBuffer.load();
    pageBuffer.copyFrom(mappedBuffer);
    mappedBuffer.unload();
}

public void writePage(PageId pageId, MemorySegment pageBuffer) {
    int pageOffset = pageId.value() * Constants.PAGE_SIZE;
    MemorySegment mappedBuffer =
raf.getChannel().map(FileChannel.MapMode.READ_WRITE, pageOffset,
Constants.PAGE_SIZE, session);
    mappedBuffer.copyFrom(pageBuffer);
    mappedBuffer.force();
    mappedBuffer.unload();
}

Am I doing something wrong here (I think I probably am)

On Fri, Sep 2, 2022 at 1:16 PM Gavin Ray <ray.gavin97 at gmail.com> wrote:

> Thank you very much for the advice, I will implement these suggestions =)
>
> On Fri, Sep 2, 2022 at 12:12 PM Radosław Smogura <mail at smogura.eu> wrote:
>
>> Hi Gavin,
>>
>>
>>
>> I see you do a good progress.
>>
>>
>>
>> This is good approach. Minor improvement would be to use
>> MemorySegment.ofBuffer(), to create memory segment from _*direct*_ byte
>> buffer. This way you would have consistency (using only MemorySegment) and
>> FileChannels or other methods to manage file size.
>>
>>
>>
>> Most probably you would like to use MappedByteBuffer.force() to flush
>> changes to disk (equivalent of sync in Linux) – i.e. to be sure transaction
>> is persisted or for write ahead log.
>>
>>
>>
>> In most cases if you want to work with zero-copy reads, you have to map a
>> whole file as direct buffer / memory segment. You would need to enlarge
>> file using (most probably file channel) or other methods, if you want to
>> append new data (otherwise sigbus or segfault can be generated – can result
>> in exception or crash).
>>
>>
>>
>> You can compare different approaches using JMH to measure reads and
>> writes performance.
>>
>>
>>
>> Kind regards,
>>
>> Rado Smogura
>>
>>
>>
>> *From: *Gavin Ray <ray.gavin97 at gmail.com>
>> *Sent: *Friday, September 2, 2022 5:50 PM
>> *To: *Johannes Lichtenberger <lichtenberger.johannes at gmail.com>
>> *Cc: *Maurizio Cimadamore <maurizio.cimadamore at oracle.com>;
>> panama-dev at openjdk.org
>> *Subject: *Re: Question: ByteBuffer vs MemorySegment for binary
>> (de)serializiation and in-memory buffer pool
>>
>>
>>
>> On a related note, is there any way to do zero-copy reads from files
>> using MemorySegments for non-Memory-Mapped files?
>>
>>
>>
>> Currently I'm using "SeekableByteChannel" and wrapping the MemorySegment
>> using ".asByteBuffer()"
>>
>> Is this the most performant way?
>>
>>
>>
>> ========================
>>
>>
>>
>> class DiskManager {
>>
>>     private final RandomAccessFile raf;
>>
>>     private final SeekableByteChannel dbFileChannel;
>>
>>
>>
>>     public void readPage(PageId pageId, MemorySegment pageBuffer) {
>>
>>         int pageOffset = pageId.value() * Constants.PAGE_SIZE;
>>
>>         dbFileChannel.position(pageOffset);
>>
>>         dbFileChannel.read(pageBuffer.asByteBuffer());
>>
>>     }
>>
>>
>>
>>     public void writePage(PageId pageId, MemorySegment pageBuffer) {
>>
>>         int pageOffset = pageId.value() * Constants.PAGE_SIZE;
>>
>>         dbFileChannel.position(pageOffset);
>>
>>         dbFileChannel.write(pageBuffer.asByteBuffer());
>>
>>     }
>>
>> }
>>
>>
>>
>> On Thu, Sep 1, 2022 at 6:13 PM Johannes Lichtenberger <
>> lichtenberger.johannes at gmail.com> wrote:
>>
>> I think it's a really good idea to use off-heap memory for the Buffer
>> Manager/the pages with the stored records. In my case, I'm working on an
>> immutable, persistent DBMS currently storing JSON and XML with only one
>> read-write trx per resource concurrently and if desired in parallel to N
>> read-only trx bound to specific revisions (in the relational world the term
>> for a resource is a relation/table). During an import of a close to 4Gb
>> JSON file with intermediate commits, I found out that depending on the
>> number of records/nodes accumulated in the trx intent log (a trx private
>> map more or less), after which a commit and thus a sync to disk with
>> removing the pages from the log is issued, the GC runs are >= 100ms most of
>> the times and the objects are long-lived and are promoted to the old gen
>> obviously, which seems to take these >= 100ms. That is I'll have to study
>> how Shenandoah works, but in this case, it brings no advantage regarding
>> the latency.
>>
>>
>>
>> Maybe it would make sense to store the data in the record instances also
>> off-head, as Gavin did with his simple Buffer Manager :-) that said
>> lowering the max records number after which to commit and sync to disk also
>> has a tremendous effect and with Shenandoah, the GC times are less than a
>> few ms at least.
>>
>>
>>
>> I'm using the Foreign Memory API however already to store the data in
>> memory-mapped files, once the pages (or page fragments) and records therein
>> are serialized and then written to the memory segment after compression and
>> hopefully soon encyrption.
>>
>>
>>
>> Kind regards
>>
>> Johannes
>>
>>
>>
>>
>>
>>
>>
>> Am Do., 1. Sept. 2022 um 22:52 Uhr schrieb Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com>:
>>
>>
>> On 01/09/2022 19:26, Gavin Ray wrote:
>> > I think this is where my impression of verbosity is coming from, in
>> > [1] I've linked a gist of ByteBuffer vs MemorySegment implementation
>> > of a page header struct,
>> > and it's the layout/varhandles that are the only difference, really.
>> >
>> Ok, I see what you mean, of course; thanks for the Gist.
>>
>> In this case I think the instance accessor we added on MemorySegment
>> will bring the code more or less to the same shape as what it used to be
>> with the ByteBuffer API.
>>
>> Using var handles is very useful when you want to access elements (e.g.
>> structs inside other structs inside arrays) as it takes all the offset
>> computation out of the way.
>>
>> If you're happy enough with hardwired offsets (and I agree that in this
>> case things might be good enough), then there's nothing wrong with using
>> the ready-made accessor methods.
>>
>> Maurizio
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220903/a83c6b21/attachment-0001.htm>