comments on performance / foreign-abi

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Jan 21 09:52:36 UTC 2020


On 21/01/2020 03:47, Michael Zucchi wrote:
>
> I guess for an immutable from an immutable it's unexpected that it 
> creates one every call so that point is important to note, thanks.  In 
> this particular case on my machine it only gets called once so it wont 
> make any difference as yet.
I thought about caching the result of baseAdress() on memory segment 
creation - but... again, we run into C2 issues if we do that. I suspect 
when Hotspot code will become more friendly to the new API, we will be 
able to remove some (redundant) allocation.
>
> For what it's worth I can get another 13% by creating a thread-local 
> "stack" allocator so it only uses one permanent segment and any 
> addresses are created only once.  It's not so much the perf but it 
> simplifies the code and exception handling - the only problem is there 
> is no way to ever free the stack memory since by definition it must be 
> performed on another thread! (unless i use malloc/unsafe/free, or 
> expose such detail to callers).
>
>             MemoryStack stack = mstacks.get();
>             long fp = stack.frame();
>
>             try {
>                 MemoryAddress lenp = stack.alloca(8); // keep alignment
>                 MemoryAddress list;
>                 int res, len;
>
>                 res = (int)clGetDeviceIDs.invokeExact(addr(), type, 0, 
> MemoryAddress.NULL, lenp);
>
>                 len = Native.getInt(lenp);
>                 list = stack.alloca(len * 8);
>
>                 res = (int)clGetDeviceIDs.invokeExact(addr(), type, 
> len, list, lenp);
>
>                 CLDevice[] out = new CLDevice[len];
>                 for (int i=0;i<out.length;i++) {
>                     MemoryAddress addr = 
> (MemoryAddress)addrVHandle.get(list, (long)i);
>
>                     out[i] = Native.resolve(addr, CLDevice::new);
>                 }
>
>                 return out;
>             } catch (Throwable t) {
>                 throw new RuntimeException(t);
>             } finally {
>                 stack.frame(fp);
>             }
>
> And thanks for your other detailed response.

I see - I believe what you are seeing is caused by the performance gap 
between JNI and Panama - which at the moment is relatively big due to 
the absence of any optimizations - I'm sure Jorn has some more precise 
numbers (I have seen in the past) which compares JNI and Panama for 
basic calls. It would be interesting to see if they fall in the same 
ballpark of 3-5x.

Maurizio


>
> As there's no immediate need I think I will keep working on the api 
> before providing a benchmark as that was my plan until i blocked 
> temporarily.  It's probably a few afternoons away from getting to a 
> functional level - depending on how much i redo based on what i've 
> learnt so far.  Beer today though.
>
>  Michael
>
> On 21/1/20 11:14 am, Maurizio Cimadamore wrote:
>> Btw - quick code comment; I think the best way to squeeze 
>> performances out of the memory API at the moment (because of the 
>> aforementioned C2 issues) is to use the indexed VarHandle (the 
>> commented one), but manually hoist the call to baseAddress() outside 
>> the loop, as in:
>>
>> MemoryAddress base = list.baseAddress();
>>  for (int i=0;i<out.length;i++) {
>>         MemoryAddress addr = (MemoryAddress)addrVHandle.get(base, 
>> (long)i);
>>         ...
>> }
>>
>> This issue also comes up here:
>> https://bugs.openjdk.java.net/browse/JDK-8237082
>>
>> Maurizio
>>
>>
>> On 21/01/2020 00:21, Maurizio Cimadamore wrote:
>>> try (MemorySegment lenp = MemorySegment.allocateNative(4, 4)) {
>>>                 int len;
>>>                 int res;
>>>
>>>                 res = (int)clGetPlatformIDs.invokeExact(0, 
>>> MemoryAddress.NULL, lenp.baseAddress());
>>>                 len = Native.getInt(lenp.baseAddress());
>>>                 try (MemorySegment list = 
>>> MemorySegment.allocateNative(len * 8, 8)) {
>>>                     res = (int)clGetPlatformIDs.invokeExact(len, 
>>> list.baseAddress(), lenp.baseAddress());
>>>
>>>                     CLPlatform[] out = new CLPlatform[len];
>>>                     for (int i=0;i<out.length;i++) {
>>>                         //MemoryAddress addr = 
>>> (MemoryAddress)addrVHandle.get(list.baseAddress(), (long)i);
>>>                         MemoryAddress addr = 
>>> Native.getAddr(list.baseAddress().addOffset(i*8));
>>>
>>>                         out[i] = Native.resolve(addr, CLPlatform::new);
>>>                     }
>>>
>>>                     return out;
>>>                 }
>>>             } catch (Throwable t) {
>>>                 throw new RuntimeException(t);
>>>             }
>>
>


More information about the panama-dev mailing list