<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    sorry to everyone. wrong list :(<br>
    <br>
    <div class="moz-cite-prefix">Il 09/08/2024 17:56, Davide Perini ha
      scritto:<br>
    </div>
    <blockquote type="cite"
      cite="mid:3e924999-43dc-41c9-ae19-08e4249351c5@dpsoftware.org">Hi
      there,<br>
      thanks for the opportunity that you give us to write on this
      mailing-list.<br>
      <br>
      I'm am playing with the Vector API bundled in Java 22 and wow,
      they are amazing.<br>
      I have some serious benefits using them even for simple tasks on
      my AMD Ryzen 9 7950X3D CPU that uses Zen4 architecture.<br>
      <br>
      Can't wait to see how bigger the benefits will be on the upcoming
      processors that has some serious optimized AVX512 instructions
      (AMD Zen5 architecture and Intel AV10 instructions).<br>
      <br>
      I'll try to give you some context.<br>
      I am writing an open source software that is basically a free
      clone of the Philips Ambilight effect.<br>
      <br>
      What is it?<br>
      Basically you put a LED strip behind your monitor/TV, the software
      capture the screen,<br>
      it calculates the average colors of your screen, and sends those
      average values to a microcontroller (arduino) that drives the
      strip and light up the LEDs accordingly.<br>
      This effect is also known as dynamic bias light.<br>
      More info here if you are curious:<br>
      <a class="moz-txt-link-freetext"
        href="https://github.com/sblantipodi/firefly_luciferin"
        moz-do-not-send="true">https://github.com/sblantipodi/firefly_luciferin</a><br>
      <br>
      Most of the computations involved are on the GPU side but some
      intensive ones are on the CPU side.<br>
      <br>
      Let's go deeper on the Vector API.<br>
      GPU acquire the screen image 60 times per seconds (or even more),
      every frame is a Buffer that contains colors information for each
      pixel of the frame. <br>
      This buffer is a Java Direct IntBuffer that doesn't have a
      corresponding array inside the heap for performance reason.<br>
      <br>
      Once I have this IntBuffer I need to calculate the average colors
      of the screen and this thing can be made on the fly on the
      IntBuffer without copying the IntBuffer inside an Array. This kind
      of copy is really really heavy and degrade performance.<br>
      <br>
      Just a snippet that shows it without using the Vector API...<br>
      <div style="background-color:#1e1f22;color:#bcbec4">
        <pre
        style="font-family:'JetBrains Mono',monospace;font-size:9,8pt;"><span
        style="color:#cf8e6d;">for </span>(<span style="color:#cf8e6d;">int </span>y = <span
        style="color:#2aacb8;">0</span>; y < pixelInUseY; y++) {
    <span style="color:#cf8e6d;">for </span>(<span
        style="color:#cf8e6d;">int </span>x = <span
        style="color:#2aacb8;">0</span>; x < pixelInUseX; x++) {
        <span style="color:#cf8e6d;">int </span>offsetX = (xCoordinate + x);
        <span style="color:#cf8e6d;">int </span>offsetY = (yCoordinate + y);
        <span style="color:#cf8e6d;">int </span>bufferOffset = (Math.<span
        style="font-style:italic;">min</span>(offsetX, <span
        style="color:#c77dbb;">widthPlusStride</span>)) + ((offsetY < <span
        style="color:#c77dbb;">height</span>) ? (offsetY * <span
        style="color:#c77dbb;">widthPlusStride</span>) : (<span
        style="color:#c77dbb;">height </span>* <span
        style="color:#c77dbb;">widthPlusStride</span>));
        <span style="color:#cf8e6d;">int </span>rgb = <span
        style="color:#c77dbb;">rgbBuffer</span>.get(Math.<span
        style="font-style:italic;">min</span>(<span
        style="color:#c77dbb;">rgbBuffer</span>.capacity() - <span
        style="color:#2aacb8;">1</span>, bufferOffset));
        r += rgb >> <span style="color:#2aacb8;">16 </span>& <span
        style="color:#2aacb8;">0xFF</span>;
        g += rgb >> <span style="color:#2aacb8;">8 </span>& <span
        style="color:#2aacb8;">0xFF</span>;
        b += rgb & <span style="color:#2aacb8;">0xFF</span>;
        pickNumber++;
    }
}
<span style="color:#c77dbb;">leds</span>[key - <span
        style="color:#2aacb8;">1</span>] = ImageProcessor.<span
        style="font-style:italic;">correctColors</span>(r, g, b, pickNumber);</pre>
      </div>
      <br>
      Now I'm trying to use the Vector API to accelerate this
      computations even more and hey, it worked awesome.<br>
      Using AVX512 (Species512) the computations is 40%-80% faster than
      without the Vector API.<br>
      <div style="background-color:#1e1f22;color:#bcbec4">
        <pre
        style="font-family:'JetBrains Mono',monospace;font-size:9,8pt;"><span
        style="color:#cf8e6d;">int </span>firstLimit;
<span style="color:#cf8e6d;">int </span>secondLimit;
<span style="color:#7a7e85;">// Processing the buffer in the correct order is crucial for SIMD performance
</span><span style="color:#cf8e6d;">if </span>(pixelInUseX < pixelInUseY) {
    firstLimit = pixelInUseX;
    secondLimit = pixelInUseY;
} <span style="color:#cf8e6d;">else </span>{
    firstLimit = pixelInUseY;
    secondLimit = pixelInUseX;
}
<span style="color:#7a7e85;">// SIMD iteration
</span><span style="color:#cf8e6d;">for </span>(<span
        style="color:#cf8e6d;">int </span>x = <span
        style="color:#2aacb8;">0</span>; x < firstLimit; x++) {
    <span style="color:#cf8e6d;">for </span>(<span
        style="color:#cf8e6d;">int </span>y = <span
        style="color:#2aacb8;">0</span>; y < secondLimit; y += MainSingleton.<span
        style="font-style:italic;">getInstance</span>().<span
        style="color:#c77dbb;">SPECIES</span>.length()) {
        <span style="color:#cf8e6d;">int </span>offsetX;
        <span style="color:#cf8e6d;">int </span>offsetY;
        <span style="color:#cf8e6d;">if </span>(pixelInUseX < pixelInUseY) {
            offsetX = (xCoordinate + x);
            offsetY = (yCoordinate + y);
        } <span style="color:#cf8e6d;">else </span>{
            offsetX = (xCoordinate + y);
            offsetY = (yCoordinate + x);
        }
        <span style="color:#cf8e6d;">int </span>bufferOffset = (Math.<span
        style="font-style:italic;">min</span>(offsetX, <span
        style="color:#c77dbb;">widthPlusStride</span>)) + ((offsetY < <span
        style="color:#c77dbb;">height</span>) ? (offsetY * <span
        style="color:#c77dbb;">widthPlusStride</span>) : (<span
        style="color:#c77dbb;">height </span>* <span
        style="color:#c77dbb;">widthPlusStride</span>));
        <span style="color:#7a7e85;">// Load RGB values using SIMD
</span><span style="color:#7a7e85;">        </span><span
        style="color:#cf8e6d;">int</span>[] rgbArray = <span
        style="color:#cf8e6d;">new int</span>[MainSingleton.<span
        style="font-style:italic;">getInstance</span>().<span
        style="color:#c77dbb;">SPECIES</span>.length()];
        <span style="color:#c77dbb;">rgbBuffer</span>.position(bufferOffset);
        <span style="color:#c77dbb;">rgbBuffer</span>.get(rgbArray, <span
        style="color:#2aacb8;">0</span>, Math.<span
        style="font-style:italic;">min</span>(MainSingleton.<span
        style="font-style:italic;">getInstance</span>().<span
        style="color:#c77dbb;">SPECIES</span>.length(), <span
        style="color:#c77dbb;">rgbBuffer</span>.remaining()));
        IntVector rgbVector = IntVector.<span style="font-style:italic;">fromArray</span>(MainSingleton.<span
        style="font-style:italic;">getInstance</span>().<span
        style="color:#c77dbb;">SPECIES</span>, rgbArray, <span
        style="color:#2aacb8;">0</span>);
        r += rgbVector.lane(<span style="color:#2aacb8;">0</span>) >> <span
        style="color:#2aacb8;">16 </span>& <span
        style="color:#2aacb8;">0xFF</span>;
        g += rgbVector.lane(<span style="color:#2aacb8;">1</span>) >> <span
        style="color:#2aacb8;">8 </span>& <span
        style="color:#2aacb8;">0xFF</span>;
        b += rgbVector.lane(<span style="color:#2aacb8;">2</span>) & <span
        style="color:#2aacb8;">0xFF</span>;
        pickNumber++;
    }
}
<span style="color:#c77dbb;">leds</span>[key - <span
        style="color:#2aacb8;">1</span>] = ImageProcessor.<span
        style="font-style:italic;">correctColors</span>(r, g, b, pickNumber);</pre>
      </div>
      <br>
      The computation itself is at least ten times faster but at the end
      it's only 40%-80% faster because I'm not able to process the
      IntBuffer on the fly using Vector API.<br>
      As you can see in the previous snippet I need to copy part of the
      IntBuffer into an int[] array and then process it using the Vector
      API.<br>
      This copy alone is the thing that requires more time.<br>
      <br>
      Is it possible to process a direct IntBuffer with the Vector API
      without loosing time in an array copy?<br>
      <br>
      Thank you for this wonderful API.<br>
      <br>
      Kind regards<br>
      Davide</blockquote>
    <br>
  </body>
</html>