comments on performance / foreign-abi

Jorn Vernee jorn.vernee at oracle.com
Tue Jan 21 11:28:11 UTC 2020


On 21/01/2020 10:52, Maurizio Cimadamore wrote:
>
> On 21/01/2020 03:47, Michael Zucchi wrote:
>>
>> I guess for an immutable from an immutable it's unexpected that it 
>> creates one every call so that point is important to note, thanks.  
>> In this particular case on my machine it only gets called once so it 
>> wont make any difference as yet.
> I thought about caching the result of baseAdress() on memory segment 
> creation - but... again, we run into C2 issues if we do that. I 
> suspect when Hotspot code will become more friendly to the new API, we 
> will be able to remove some (redundant) allocation.
>>
>> For what it's worth I can get another 13% by creating a thread-local 
>> "stack" allocator so it only uses one permanent segment and any 
>> addresses are created only once.  It's not so much the perf but it 
>> simplifies the code and exception handling - the only problem is 
>> there is no way to ever free the stack memory since by definition it 
>> must be performed on another thread! (unless i use 
>> malloc/unsafe/free, or expose such detail to callers).
>>
>>             MemoryStack stack = mstacks.get();
>>             long fp = stack.frame();
>>
>>             try {
>>                 MemoryAddress lenp = stack.alloca(8); // keep alignment
>>                 MemoryAddress list;
>>                 int res, len;
>>
>>                 res = (int)clGetDeviceIDs.invokeExact(addr(), type, 
>> 0, MemoryAddress.NULL, lenp);
>>
>>                 len = Native.getInt(lenp);
>>                 list = stack.alloca(len * 8);
>>
>>                 res = (int)clGetDeviceIDs.invokeExact(addr(), type, 
>> len, list, lenp);
>>
>>                 CLDevice[] out = new CLDevice[len];
>>                 for (int i=0;i<out.length;i++) {
>>                     MemoryAddress addr = 
>> (MemoryAddress)addrVHandle.get(list, (long)i);
>>
>>                     out[i] = Native.resolve(addr, CLDevice::new);
>>                 }
>>
>>                 return out;
>>             } catch (Throwable t) {
>>                 throw new RuntimeException(t);
>>             } finally {
>>                 stack.frame(fp);
>>             }
>>
>> And thanks for your other detailed response.
>
> I see - I believe what you are seeing is caused by the performance gap 
> between JNI and Panama - which at the moment is relatively big due to 
> the absence of any optimizations - I'm sure Jorn has some more precise 
> numbers (I have seen in the past) which compares JNI and Panama for 
> basic calls. It would be interesting to see if they fall in the same 
> ballpark of 3-5x.

For just the call overhead, Panama should be about 20-25x slower than 
JNI (though that gap will be closed completely once we get the 
optimizations in). It might be that we're catching up somewhat because 
of the use of the JNI API from C (which I've found to be generally 
pretty slow), or maybe the added time on top of the time for the overall 
workload makes the difference seem smaller.

Jorn

>
> Maurizio
>
>
>>
>> As there's no immediate need I think I will keep working on the api 
>> before providing a benchmark as that was my plan until i blocked 
>> temporarily.  It's probably a few afternoons away from getting to a 
>> functional level - depending on how much i redo based on what i've 
>> learnt so far.  Beer today though.
>>
>>  Michael
>>
>> On 21/1/20 11:14 am, Maurizio Cimadamore wrote:
>>> Btw - quick code comment; I think the best way to squeeze 
>>> performances out of the memory API at the moment (because of the 
>>> aforementioned C2 issues) is to use the indexed VarHandle (the 
>>> commented one), but manually hoist the call to baseAddress() outside 
>>> the loop, as in:
>>>
>>> MemoryAddress base = list.baseAddress();
>>>  for (int i=0;i<out.length;i++) {
>>>         MemoryAddress addr = (MemoryAddress)addrVHandle.get(base, 
>>> (long)i);
>>>         ...
>>> }
>>>
>>> This issue also comes up here:
>>> https://bugs.openjdk.java.net/browse/JDK-8237082
>>>
>>> Maurizio
>>>
>>>
>>> On 21/01/2020 00:21, Maurizio Cimadamore wrote:
>>>> try (MemorySegment lenp = MemorySegment.allocateNative(4, 4)) {
>>>>                 int len;
>>>>                 int res;
>>>>
>>>>                 res = (int)clGetPlatformIDs.invokeExact(0, 
>>>> MemoryAddress.NULL, lenp.baseAddress());
>>>>                 len = Native.getInt(lenp.baseAddress());
>>>>                 try (MemorySegment list = 
>>>> MemorySegment.allocateNative(len * 8, 8)) {
>>>>                     res = (int)clGetPlatformIDs.invokeExact(len, 
>>>> list.baseAddress(), lenp.baseAddress());
>>>>
>>>>                     CLPlatform[] out = new CLPlatform[len];
>>>>                     for (int i=0;i<out.length;i++) {
>>>>                         //MemoryAddress addr = 
>>>> (MemoryAddress)addrVHandle.get(list.baseAddress(), (long)i);
>>>>                         MemoryAddress addr = 
>>>> Native.getAddr(list.baseAddress().addOffset(i*8));
>>>>
>>>>                         out[i] = Native.resolve(addr, 
>>>> CLPlatform::new);
>>>>                     }
>>>>
>>>>                     return out;
>>>>                 }
>>>>             } catch (Throwable t) {
>>>>                 throw new RuntimeException(t);
>>>>             }
>>>
>>


More information about the panama-dev mailing list