comments on performance / foreign-abi
Michael Zucchi
notzed at gmail.com
Tue Jan 21 03:47:34 UTC 2020
I guess for an immutable from an immutable it's unexpected that it
creates one every call so that point is important to note, thanks. In
this particular case on my machine it only gets called once so it wont
make any difference as yet.
For what it's worth I can get another 13% by creating a thread-local
"stack" allocator so it only uses one permanent segment and any
addresses are created only once. It's not so much the perf but it
simplifies the code and exception handling - the only problem is there
is no way to ever free the stack memory since by definition it must be
performed on another thread! (unless i use malloc/unsafe/free, or expose
such detail to callers).
MemoryStack stack = mstacks.get();
long fp = stack.frame();
try {
MemoryAddress lenp = stack.alloca(8); // keep alignment
MemoryAddress list;
int res, len;
res = (int)clGetDeviceIDs.invokeExact(addr(), type, 0,
MemoryAddress.NULL, lenp);
len = Native.getInt(lenp);
list = stack.alloca(len * 8);
res = (int)clGetDeviceIDs.invokeExact(addr(), type,
len, list, lenp);
CLDevice[] out = new CLDevice[len];
for (int i=0;i<out.length;i++) {
MemoryAddress addr =
(MemoryAddress)addrVHandle.get(list, (long)i);
out[i] = Native.resolve(addr, CLDevice::new);
}
return out;
} catch (Throwable t) {
throw new RuntimeException(t);
} finally {
stack.frame(fp);
}
And thanks for your other detailed response.
As there's no immediate need I think I will keep working on the api
before providing a benchmark as that was my plan until i blocked
temporarily. It's probably a few afternoons away from getting to a
functional level - depending on how much i redo based on what i've
learnt so far. Beer today though.
Michael
On 21/1/20 11:14 am, Maurizio Cimadamore wrote:
> Btw - quick code comment; I think the best way to squeeze performances
> out of the memory API at the moment (because of the aforementioned C2
> issues) is to use the indexed VarHandle (the commented one), but
> manually hoist the call to baseAddress() outside the loop, as in:
>
> MemoryAddress base = list.baseAddress();
> for (int i=0;i<out.length;i++) {
> MemoryAddress addr = (MemoryAddress)addrVHandle.get(base,
> (long)i);
> ...
> }
>
> This issue also comes up here:
> https://bugs.openjdk.java.net/browse/JDK-8237082
>
> Maurizio
>
>
> On 21/01/2020 00:21, Maurizio Cimadamore wrote:
>> try (MemorySegment lenp = MemorySegment.allocateNative(4, 4)) {
>> int len;
>> int res;
>>
>> res = (int)clGetPlatformIDs.invokeExact(0,
>> MemoryAddress.NULL, lenp.baseAddress());
>> len = Native.getInt(lenp.baseAddress());
>> try (MemorySegment list =
>> MemorySegment.allocateNative(len * 8, 8)) {
>> res = (int)clGetPlatformIDs.invokeExact(len,
>> list.baseAddress(), lenp.baseAddress());
>>
>> CLPlatform[] out = new CLPlatform[len];
>> for (int i=0;i<out.length;i++) {
>> //MemoryAddress addr =
>> (MemoryAddress)addrVHandle.get(list.baseAddress(), (long)i);
>> MemoryAddress addr =
>> Native.getAddr(list.baseAddress().addOffset(i*8));
>>
>> out[i] = Native.resolve(addr, CLPlatform::new);
>> }
>>
>> return out;
>> }
>> } catch (Throwable t) {
>> throw new RuntimeException(t);
>> }
>
More information about the panama-dev
mailing list