JEP Vector API (Incubator). A funny use case and a question.

Fri Aug 9 15:57:49 UTC 2024

sorry to everyone. wrong list :(

Il 09/08/2024 17:56, Davide Perini ha scritto:
> Hi there,
> thanks for the opportunity that you give us to write on this mailing-list.
>
> I'm am playing with the Vector API bundled in Java 22 and wow, they 
> are amazing.
> I have some serious benefits using them even for simple tasks on my 
> AMD Ryzen 9 7950X3D CPU that uses Zen4 architecture.
>
> Can't wait to see how bigger the benefits will be on the upcoming 
> processors that has some serious optimized AVX512 instructions (AMD 
> Zen5 architecture and Intel AV10 instructions).
>
> I'll try to give you some context.
> I am writing an open source software that is basically a free clone of 
> the Philips Ambilight effect.
>
> What is it?
> Basically you put a LED strip behind your monitor/TV, the software 
> capture the screen,
> it calculates the average colors of your screen, and sends those 
> average values to a microcontroller (arduino) that drives the strip 
> and light up the LEDs accordingly.
> This effect is also known as dynamic bias light.
> More info here if you are curious:
> https://github.com/sblantipodi/firefly_luciferin
>
> Most of the computations involved are on the GPU side but some 
> intensive ones are on the CPU side.
>
> Let's go deeper on the Vector API.
> GPU acquire the screen image 60 times per seconds (or even more), 
> every frame is a Buffer that contains colors information for each 
> pixel of the frame.
> This buffer is a Java Direct IntBuffer that doesn't have a 
> corresponding array inside the heap for performance reason.
>
> Once I have this IntBuffer I need to calculate the average colors of 
> the screen and this thing can be made on the fly on the IntBuffer 
> without copying the IntBuffer inside an Array. This kind of copy is 
> really really heavy and degrade performance.
>
> Just a snippet that shows it without using the Vector API...
> for (int y =0; y < pixelInUseY; y++) {
>      for (int x =0; x < pixelInUseX; x++) {
>          int offsetX = (xCoordinate + x);
>          int offsetY = (yCoordinate + y);
>          int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY <height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
>          int rgb =rgbBuffer.get(Math.min(rgbBuffer.capacity() -1, bufferOffset));
>          r += rgb >>16 &0xFF;
>          g += rgb >>8 &0xFF;
>          b += rgb &0xFF;
>          pickNumber++;
>      }
> }
> leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);
>
> Now I'm trying to use the Vector API to accelerate this computations 
> even more and hey, it worked awesome.
> Using AVX512 (Species512) the computations is 40%-80% faster than 
> without the Vector API.
> int firstLimit;
> int secondLimit;
> // Processing the buffer in the correct order is crucial for SIMD 
> performance if (pixelInUseX < pixelInUseY) {
>      firstLimit = pixelInUseX;
>      secondLimit = pixelInUseY;
> }else {
>      firstLimit = pixelInUseY;
>      secondLimit = pixelInUseX;
> }
> // SIMD iteration for (int x =0; x < firstLimit; x++) {
>      for (int y =0; y < secondLimit; y += MainSingleton.getInstance().SPECIES.length()) {
>          int offsetX;
>          int offsetY;
>          if (pixelInUseX < pixelInUseY) {
>              offsetX = (xCoordinate + x);
>              offsetY = (yCoordinate + y);
>          }else {
>              offsetX = (xCoordinate + y);
>              offsetY = (yCoordinate + x);
>          }
>          int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY <height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
>          // Load RGB values using SIMD int[] rgbArray =new int[MainSingleton.getInstance().SPECIES.length()];
>          rgbBuffer.position(bufferOffset);
>          rgbBuffer.get(rgbArray,0, Math.min(MainSingleton.getInstance().SPECIES.length(),rgbBuffer.remaining()));
>          IntVector rgbVector = IntVector.fromArray(MainSingleton.getInstance().SPECIES, rgbArray,0);
>          r += rgbVector.lane(0) >>16 &0xFF;
>          g += rgbVector.lane(1) >>8 &0xFF;
>          b += rgbVector.lane(2) &0xFF;
>          pickNumber++;
>      }
> }
> leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);
>
> The computation itself is at least ten times faster but at the end 
> it's only 40%-80% faster because I'm not able to process the IntBuffer 
> on the fly using Vector API.
> As you can see in the previous snippet I need to copy part of the 
> IntBuffer into an int[] array and then process it using the Vector API.
> This copy alone is the thing that requires more time.
>
> Is it possible to process a direct IntBuffer with the Vector API 
> without loosing time in an array copy?
>
> Thank you for this wonderful API.
>
> Kind regards
> Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20240809/6de284ae/attachment-0001.htm>