<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi Gavin,</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I see you do a good progress.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This is good approach. Minor improvement would be to use MemorySegment.ofBuffer(), to create memory segment from _<i>direct</i>_ byte buffer. This way you would have consistency (using only MemorySegment) and FileChannels or other methods
to manage file size.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Most probably you would like to use MappedByteBuffer.force() to flush changes to disk (equivalent of sync in Linux) – i.e. to be sure transaction is persisted or for write ahead log.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In most cases if you want to work with zero-copy reads, you have to map a whole file as direct buffer / memory segment. You would need to enlarge file using (most probably file channel) or other methods, if you want to append new data (otherwise
sigbus or segfault can be generated – can result in exception or crash).</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">You can compare different approaches using JMH to measure reads and writes performance.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Kind regards,</p>
<p class="MsoNormal">Rado Smogura</p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="border:none;padding:0in"><b>From: </b><a href="mailto:ray.gavin97@gmail.com">Gavin Ray</a><br>
<b>Sent: </b>Friday, September 2, 2022 5:50 PM<br>
<b>To: </b><a href="mailto:lichtenberger.johannes@gmail.com">Johannes Lichtenberger</a><br>
<b>Cc: </b><a href="mailto:maurizio.cimadamore@oracle.com">Maurizio Cimadamore</a>;
<a href="mailto:panama-dev@openjdk.org">panama-dev@openjdk.org</a><br>
<b>Subject: </b>Re: Question: ByteBuffer vs MemorySegment for binary (de)serializiation and in-memory buffer pool</p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">On a related note, is there any way to do zero-copy reads from files using MemorySegments for non-Memory-Mapped files?<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Currently I'm using "SeekableByteChannel" and wrapping the MemorySegment using ".asByteBuffer()"<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Is this the most performant way?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">========================<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">class DiskManager {<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> private final RandomAccessFile raf;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> private final SeekableByteChannel dbFileChannel;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> public void readPage(PageId pageId, MemorySegment pageBuffer) {<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> int pageOffset = pageId.value() * Constants.PAGE_SIZE;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> dbFileChannel.position(pageOffset);<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> dbFileChannel.read(pageBuffer.asByteBuffer());<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> }<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> public void writePage(PageId pageId, MemorySegment pageBuffer) {<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> int pageOffset = pageId.value() * Constants.PAGE_SIZE;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> dbFileChannel.position(pageOffset);<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> dbFileChannel.write(pageBuffer.asByteBuffer());<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> }<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">}<o:p></o:p></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Thu, Sep 1, 2022 at 6:13 PM Johannes Lichtenberger <<a href="mailto:lichtenberger.johannes@gmail.com">lichtenberger.johannes@gmail.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">I think it's a really good idea to use off-heap memory for the Buffer Manager/the pages with the stored records. In my case, I'm working on an immutable, persistent DBMS currently storing JSON and XML with only one read-write trx per resource
concurrently and if desired in parallel to N read-only trx bound to specific revisions (in the relational world the term for a resource is a relation/table). During an import of a close to 4Gb JSON file with intermediate commits, I found out that depending
on the number of records/nodes accumulated in the trx intent log (a trx private map more or less), after which a commit and thus a sync to disk with removing the pages from the log is issued, the GC runs are >= 100ms most of the times and the objects are long-lived
and are promoted to the old gen obviously, which seems to take these >= 100ms. That is I'll have to study how Shenandoah works, but in this case, it brings no advantage regarding the latency.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Maybe it would make sense to store the data in the record instances also off-head, as Gavin did with his simple Buffer Manager :-) that said lowering the max records number after which to commit and sync to disk also has a tremendous effect
and with Shenandoah, the GC times are less than a few ms at least.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I'm using the Foreign Memory API however already to store the data in memory-mapped files, once the pages (or page fragments) and records therein are serialized and then written to the memory segment after compression and hopefully soon
encyrption.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Kind regards<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Johannes<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Am Do., 1. Sept. 2022 um 22:52 Uhr schrieb Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank">maurizio.cimadamore@oracle.com</a>>:<o:p></o:p></p>
</div>
</div>
</blockquote>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:9.6pt">
<br>
On 01/09/2022 19:26, Gavin Ray wrote:<br>
> I think this is where my impression of verbosity is coming from, in <br>
> [1] I've linked a gist of ByteBuffer vs MemorySegment implementation <br>
> of a page header struct,<br>
> and it's the layout/varhandles that are the only difference, really.<br>
><br>
Ok, I see what you mean, of course; thanks for the Gist.<br>
<br>
In this case I think the instance accessor we added on MemorySegment <br>
will bring the code more or less to the same shape as what it used to be <br>
with the ByteBuffer API.<br>
<br>
Using var handles is very useful when you want to access elements (e.g. <br>
structs inside other structs inside arrays) as it takes all the offset <br>
computation out of the way.<br>
<br>
If you're happy enough with hardwired offsets (and I agree that in this <br>
case things might be good enough), then there's nothing wrong with using <br>
the ready-made accessor methods.<br>
<br>
Maurizio<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>