JEP Vector API (Incubator). A funny use case and a question.
Davide Perini
perini.davide at dpsoftware.org
Fri Aug 9 15:57:49 UTC 2024
sorry to everyone. wrong list :(
Il 09/08/2024 17:56, Davide Perini ha scritto:
> Hi there,
> thanks for the opportunity that you give us to write on this mailing-list.
>
> I'm am playing with the Vector API bundled in Java 22 and wow, they
> are amazing.
> I have some serious benefits using them even for simple tasks on my
> AMD Ryzen 9 7950X3D CPU that uses Zen4 architecture.
>
> Can't wait to see how bigger the benefits will be on the upcoming
> processors that has some serious optimized AVX512 instructions (AMD
> Zen5 architecture and Intel AV10 instructions).
>
> I'll try to give you some context.
> I am writing an open source software that is basically a free clone of
> the Philips Ambilight effect.
>
> What is it?
> Basically you put a LED strip behind your monitor/TV, the software
> capture the screen,
> it calculates the average colors of your screen, and sends those
> average values to a microcontroller (arduino) that drives the strip
> and light up the LEDs accordingly.
> This effect is also known as dynamic bias light.
> More info here if you are curious:
> https://github.com/sblantipodi/firefly_luciferin
>
> Most of the computations involved are on the GPU side but some
> intensive ones are on the CPU side.
>
> Let's go deeper on the Vector API.
> GPU acquire the screen image 60 times per seconds (or even more),
> every frame is a Buffer that contains colors information for each
> pixel of the frame.
> This buffer is a Java Direct IntBuffer that doesn't have a
> corresponding array inside the heap for performance reason.
>
> Once I have this IntBuffer I need to calculate the average colors of
> the screen and this thing can be made on the fly on the IntBuffer
> without copying the IntBuffer inside an Array. This kind of copy is
> really really heavy and degrade performance.
>
> Just a snippet that shows it without using the Vector API...
> for (int y =0; y < pixelInUseY; y++) {
> for (int x =0; x < pixelInUseX; x++) {
> int offsetX = (xCoordinate + x);
> int offsetY = (yCoordinate + y);
> int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY <height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
> int rgb =rgbBuffer.get(Math.min(rgbBuffer.capacity() -1, bufferOffset));
> r += rgb >>16 &0xFF;
> g += rgb >>8 &0xFF;
> b += rgb &0xFF;
> pickNumber++;
> }
> }
> leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);
>
> Now I'm trying to use the Vector API to accelerate this computations
> even more and hey, it worked awesome.
> Using AVX512 (Species512) the computations is 40%-80% faster than
> without the Vector API.
> int firstLimit;
> int secondLimit;
> // Processing the buffer in the correct order is crucial for SIMD
> performance if (pixelInUseX < pixelInUseY) {
> firstLimit = pixelInUseX;
> secondLimit = pixelInUseY;
> }else {
> firstLimit = pixelInUseY;
> secondLimit = pixelInUseX;
> }
> // SIMD iteration for (int x =0; x < firstLimit; x++) {
> for (int y =0; y < secondLimit; y += MainSingleton.getInstance().SPECIES.length()) {
> int offsetX;
> int offsetY;
> if (pixelInUseX < pixelInUseY) {
> offsetX = (xCoordinate + x);
> offsetY = (yCoordinate + y);
> }else {
> offsetX = (xCoordinate + y);
> offsetY = (yCoordinate + x);
> }
> int bufferOffset = (Math.min(offsetX,widthPlusStride)) + ((offsetY <height) ? (offsetY *widthPlusStride) : (height *widthPlusStride));
> // Load RGB values using SIMD int[] rgbArray =new int[MainSingleton.getInstance().SPECIES.length()];
> rgbBuffer.position(bufferOffset);
> rgbBuffer.get(rgbArray,0, Math.min(MainSingleton.getInstance().SPECIES.length(),rgbBuffer.remaining()));
> IntVector rgbVector = IntVector.fromArray(MainSingleton.getInstance().SPECIES, rgbArray,0);
> r += rgbVector.lane(0) >>16 &0xFF;
> g += rgbVector.lane(1) >>8 &0xFF;
> b += rgbVector.lane(2) &0xFF;
> pickNumber++;
> }
> }
> leds[key -1] = ImageProcessor.correctColors(r, g, b, pickNumber);
>
> The computation itself is at least ten times faster but at the end
> it's only 40%-80% faster because I'm not able to process the IntBuffer
> on the fly using Vector API.
> As you can see in the previous snippet I need to copy part of the
> IntBuffer into an int[] array and then process it using the Vector API.
> This copy alone is the thing that requires more time.
>
> Is it possible to process a direct IntBuffer with the Vector API
> without loosing time in an array copy?
>
> Thank you for this wonderful API.
>
> Kind regards
> Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20240809/6de284ae/attachment-0001.htm>
More information about the openjfx-dev
mailing list