<div dir="auto"><p dir="ltr">So, one of the "tricks" either way is to have a MemorySegment using the entire address space --  size is Long.MAX_VALUE (MemorySegment.NULL.reinterpret(Long.MAX_VALUE)). But it seems unintuitive to have to allocate memory in another step via an Arena for instance, take the address and copy data to this address in a subsequent step to the "ALL" MemorySegment. I almost lost track, but at least the VarHandle trick should not be needed in the future?</p>

<p dir="ltr">In my case I want to reserve virtual memory address ranges for different page size classes (for a database system), so in total pages really allocated and combined from the different page size classes should not be bigger than a predefined threshold for the buffer pool. But let's say we can have four 32kb pages or one 128kb page or two 64kb pages or any mix thereof, but not more than 128kb in total (very simplified).</p>

<p dir="ltr">As described in the other thread I'd have used the UmbraDB approach of mapping several regions via mmap (with only one page size class as an example). I could also pretouch some segments, but not all. I think in this case I'd still have all the bounds checks!?</p><p dir="ltr">So, mapping virtual address space via mmap anonymous and then slicing (will have to map and slice for all page size classes):</p><pre style="color:rgb(188,190,196);font-family:"jetbrains mono",monospace;font-size:9.8pt"><span style="color:rgb(122,126,133)">// Allocate the memory<br></span>MemorySegment reservedMemory = bufferManager.allocateMemory(totalMemorySize);<br><br><span style="color:rgb(122,126,133)">// Partition into page-sized chunks<br></span>List<MemorySegment> pages = <span style="color:rgb(207,142,109)">new </span>ArrayList<>();<br><br><span style="color:rgb(207,142,109)">for </span>(<span style="color:rgb(207,142,109)">long </span>offset = <span style="color:rgb(42,172,184)">0</span>; offset + pageSize < totalMemorySize; offset += pageSize) {<br>  MemorySegment pageSegment = reservedMemory.asSlice(offset, pageSize);<br>  pages.add(pageSegment);<br>}</pre><p dir="ltr">Not sure, but would it be better to store everything in the single segment (0 .. MAX), that is a couple of continous regions, allocate space for these regions and then slice this segment? I guess there's no advantage if I'm going to slice anyway plus I may have allocated way too much memory?</p><p dir="ltr">So, I'm also not sure what it means to instantiate a MemorySegment in this way: MemorySegment.NULL.reinterpret(Long.MAX_VALUE)</p><p dir="ltr">It's just the "wrapper" on the Java heap, but it doesn't map the whole virtual address space or does it!?</p><p dir="ltr">Sorry for my current overall confusion (I'll attribute it to my bad cold ;)) but once I'll have the energy and feel better, maybe sometime in the next weeks I really want to get this right and not spend many hours of my spare time to implement something which may be obviously stupid ;-)</p><p dir="ltr">Kind regards</p><p dir="ltr">Johannes</p></div>

<br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" rel="noreferrer noreferrer" target="_blank">maurizio.cimadamore@oracle.com</a>> schrieb am Do., 31. Okt. 2024, 11:00:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>


  <div>

    <p><br>

    </p>

    <div>On 31/10/2024 09:45, Mike Hearn wrote:<br>

    </div>

    <blockquote type="cite">

      
      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

            <div>

              <div>

                <p style="margin:0px 0px 1.2em">Hence my suggestion to

                  go back a little, and see what we can do to speed up

                  access for a segment created with:</p>

                <pre style="font-family:Consolas,Inconsolata,Courier,monospace;font-size:1em;line-height:1.2em;margin:1.2em 0px"><code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;white-space:pre-wrap;overflow:auto;border-radius:3px;border:1px solid rgb(204,204,204);padding:0.5em;color:rgb(51,51,51);background:repeat rgb(248,248,248);display:block">MemorySegment.NULL.reinterpret(Long.MAX_VALUE)

</code></pre>

                <p style="margin:0px 0px 1.2em">(which, as Ron correctly

                  points out, might not mean <em>exactly as fast as

                    Unsafe</em>)</p>

              </div>

            </div>

          </blockquote>

          <div>If a sign check is genuinely causing a meaningful slow

            down you could potentially re-spec such a NULL->MAX

            memory segment to not do it. In that case a negative number

            would be treated as unsigned. Alternatively, the sign bit

            could be masked out of the address which should be faster

            than a compare and branch. Given that such a memory segment

            is already requesting that safety checks be disabled, maybe

            the check for negative addresses isn't that important as

            there are already so many ways to segfault the VM with such

            a segment.</div>

          <div><br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>That is a tempting path we have considered in the past. The

      drawback of that is that you have now obtained a new segment which

      doesn't behave like other segments. E.g. all the memory access

      operations, bulk copy operations, and even slicing will need to

      specify that some of the checks apply for all segment _but_ this

      weird one. Heck, even the size of this segment would be

      negative...<br>

    </p>

    <p>To be precise: the sign check is not causing the slow down, but

      the fact that there is a sign check is the reason bound checks _in

      loops_ cannot be completely eliminated as C2 has to be mindful of

      overflows. But if you have a situation where memory access does

      not follow a pattern (which seems to be the case here), then bound

      check elimination wouldn't kick in anyway.</p>

    <p>I've read some exchange with Roland I had last year on this. The

      reason why random access is slower, has to do, fundamentally, with

      the fact that FFM has to execute more stuff than Unsafe -- there's

      no way around that. It used to be the case that, sometimes, C2

      would try to speculate and remove bound checks, and causing

      regressions when doing so (because the loop didn't run for long

      enough). But this has long been fixed:</p>

    <p><a href="https://bugs.openjdk.org/browse/JDK-8311932" rel="noreferrer noreferrer noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8311932</a></p>

    <p>The workaround I came up with in the past:</p>

    <p><a href="https://mail.openjdk.org/pipermail/panama-dev/2023-July/019478.html" rel="noreferrer noreferrer noreferrer" target="_blank">https://mail.openjdk.org/pipermail/panama-dev/2023-July/019478.html</a></p>

    <p>Was working because it effectively changed the shape of the code,

      and caused C2 to back off, and not introduce an optimization that

      was causing more cost than benefit. That should no longer be a

      problem today -- as C2 should only optimize loops where the trip

      count is longer than a certain threshold.</p>

    <p>Stepping back... there's two way to approach this problem. One is

      to add more heroics so that C2 can somehow do more of what it's

      already doing. That's what we tried in the past, it works -- but

      up to a point. A more robust solution, IMHO, would be to find ways

      to reduce the complexity of the implementation when accessing a

      segment whose span is 0..MAX_VALUE. Maybe we can't eliminate _all_

      checks (e.g. alignment and offset sign), but it seems to me that

      we can eliminate most of them.<br>

    </p>

    <p>Maurizio<br>

    </p>

  </div>


</blockquote></div>