Question: ByteBuffer vs MemorySegment for binary (de)serializiation and in-memory buffer pool

Fri Sep 2 17:16:35 UTC 2022

Thank you very much for the advice, I will implement these suggestions =)

On Fri, Sep 2, 2022 at 12:12 PM Radosław Smogura <mail at smogura.eu> wrote:

> Hi Gavin,
>
>
>
> I see you do a good progress.
>
>
>
> This is good approach. Minor improvement would be to use
> MemorySegment.ofBuffer(), to create memory segment from _*direct*_ byte
> buffer. This way you would have consistency (using only MemorySegment) and
> FileChannels or other methods to manage file size.
>
>
>
> Most probably you would like to use MappedByteBuffer.force() to flush
> changes to disk (equivalent of sync in Linux) – i.e. to be sure transaction
> is persisted or for write ahead log.
>
>
>
> In most cases if you want to work with zero-copy reads, you have to map a
> whole file as direct buffer / memory segment. You would need to enlarge
> file using (most probably file channel) or other methods, if you want to
> append new data (otherwise sigbus or segfault can be generated – can result
> in exception or crash).
>
>
>
> You can compare different approaches using JMH to measure reads and writes
> performance.
>
>
>
> Kind regards,
>
> Rado Smogura
>
>
>
> *From: *Gavin Ray <ray.gavin97 at gmail.com>
> *Sent: *Friday, September 2, 2022 5:50 PM
> *To: *Johannes Lichtenberger <lichtenberger.johannes at gmail.com>
> *Cc: *Maurizio Cimadamore <maurizio.cimadamore at oracle.com>;
> panama-dev at openjdk.org
> *Subject: *Re: Question: ByteBuffer vs MemorySegment for binary
> (de)serializiation and in-memory buffer pool
>
>
>
> On a related note, is there any way to do zero-copy reads from files using
> MemorySegments for non-Memory-Mapped files?
>
>
>
> Currently I'm using "SeekableByteChannel" and wrapping the MemorySegment
> using ".asByteBuffer()"
>
> Is this the most performant way?
>
>
>
> ========================
>
>
>
> class DiskManager {
>
>     private final RandomAccessFile raf;
>
>     private final SeekableByteChannel dbFileChannel;
>
>
>
>     public void readPage(PageId pageId, MemorySegment pageBuffer) {
>
>         int pageOffset = pageId.value() * Constants.PAGE_SIZE;
>
>         dbFileChannel.position(pageOffset);
>
>         dbFileChannel.read(pageBuffer.asByteBuffer());
>
>     }
>
>
>
>     public void writePage(PageId pageId, MemorySegment pageBuffer) {
>
>         int pageOffset = pageId.value() * Constants.PAGE_SIZE;
>
>         dbFileChannel.position(pageOffset);
>
>         dbFileChannel.write(pageBuffer.asByteBuffer());
>
>     }
>
> }
>
>
>
> On Thu, Sep 1, 2022 at 6:13 PM Johannes Lichtenberger <
> lichtenberger.johannes at gmail.com> wrote:
>
> I think it's a really good idea to use off-heap memory for the Buffer
> Manager/the pages with the stored records. In my case, I'm working on an
> immutable, persistent DBMS currently storing JSON and XML with only one
> read-write trx per resource concurrently and if desired in parallel to N
> read-only trx bound to specific revisions (in the relational world the term
> for a resource is a relation/table). During an import of a close to 4Gb
> JSON file with intermediate commits, I found out that depending on the
> number of records/nodes accumulated in the trx intent log (a trx private
> map more or less), after which a commit and thus a sync to disk with
> removing the pages from the log is issued, the GC runs are >= 100ms most of
> the times and the objects are long-lived and are promoted to the old gen
> obviously, which seems to take these >= 100ms. That is I'll have to study
> how Shenandoah works, but in this case, it brings no advantage regarding
> the latency.
>
>
>
> Maybe it would make sense to store the data in the record instances also
> off-head, as Gavin did with his simple Buffer Manager :-) that said
> lowering the max records number after which to commit and sync to disk also
> has a tremendous effect and with Shenandoah, the GC times are less than a
> few ms at least.
>
>
>
> I'm using the Foreign Memory API however already to store the data in
> memory-mapped files, once the pages (or page fragments) and records therein
> are serialized and then written to the memory segment after compression and
> hopefully soon encyrption.
>
>
>
> Kind regards
>
> Johannes
>
>
>
>
>
>
>
> Am Do., 1. Sept. 2022 um 22:52 Uhr schrieb Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com>:
>
>
> On 01/09/2022 19:26, Gavin Ray wrote:
> > I think this is where my impression of verbosity is coming from, in
> > [1] I've linked a gist of ByteBuffer vs MemorySegment implementation
> > of a page header struct,
> > and it's the layout/varhandles that are the only difference, really.
> >
> Ok, I see what you mean, of course; thanks for the Gist.
>
> In this case I think the instance accessor we added on MemorySegment
> will bring the code more or less to the same shape as what it used to be
> with the ByteBuffer API.
>
> Using var handles is very useful when you want to access elements (e.g.
> structs inside other structs inside arrays) as it takes all the offset
> computation out of the way.
>
> If you're happy enough with hardwired offsets (and I agree that in this
> case things might be good enough), then there's nothing wrong with using
> the ready-made accessor methods.
>
> Maurizio
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220902/8b787118/attachment.htm>