status of VM long loop optimizations - call for action
Rado Smogura
mail at smogura.eu
Thu Dec 16 17:08:25 UTC 2021
Hi,
It's (newest) Ubuntu. Full sources are here:
https://github.com/rsmogura/panama-io
For testing I created very simple main
public static void main(String[] args)throws Exception {
byte[]sendBuff =new byte[8192*4];
final var port = (short) (new Random().nextInt(1000) +20000);
var server =ServerSocketFactory.getDefault().createServerSocket(port);
Arrays.fill(sendBuff, (byte)1);
var executorService =Executors.newSingleThreadExecutor();
var serverThread =executorService.submit(() -> {
final var conn =server.accept();
final var out =conn.getOutputStream();
while (true) {
out.write(sendBuff);
}
});
var sock =new PosixSocketFactory().createSocket("127.0.0.1",port);
var in =sock.getInputStream();
while (true)
in.read(new byte[12]);
}
Kind regards,
Rado
On 16.12.2021 17:57, Maurizio Cimadamore wrote:
>
> On 16/12/2021 16:45, Rado Smogura wrote:
>> Hi,
>>
>>
>> Here's signature from man pages ssize_t read(int fd, void *buf,
>> size_t count);
>>
>> And one generated by jextract public static long read ( int __fd,
>> Addressable __buf, long __nbytes)
>>
>>
>> The buf is from Polled allocator, so it's previously allocated memory
>> segment. I tired with buf passed as MemorySegment and MemoryAddress.
>
> Such a signature should never cause an allocator to be called - it's
> all primitives.
>
> On which OS/platform are you on?
>
> Maurizio
>
>
>>
>>
>> Kind regards,
>>
>> Rado
>>
>> On 16.12.2021 17:35, Maurizio Cimadamore wrote:
>>>
>>>
>>> On 16/12/2021 16:24, Rado Smogura wrote:
>>>>
>>>> Hi,
>>>>
>>>> I don't know details of underlying ABI, however I think there
>>>> should not be need to allocate additional structs.
>>>>
>>>> For POSIX read we pass 3 arguments, which should fit registers, and
>>>> all are 32/64 bit values
>>>>
>>>> (long)mh$.invokeExact(__fd, __buf, __nbytes)
>>>>
>>>> This happens in both cases where buf is MemorySegment and
>>>> MemoryAdderss, rest are primitives.
>>>>
>>> What is the signature of the native function? Is the argument
>>> corresponding to __buf a struct or a pointer?
>>>
>>> Maurizio
>>>
>>>> Kind regards,
>>>>
>>>> Rado
>>>>
>>>> On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>>>>>> That's odd - I mean, the BindingContext is used when setting up
>>>>>> downcall method handles, or upcall stubs. But should not be
>>>>>> invoked in the hot path.
>>>>>
>>>>> Correction: the ofAllocator call you see might in fact even be in
>>>>> a hot path. A downcall method handle sometimes has to allocator
>>>>> memory for the temp buffers it uses. When that happens, the
>>>>> invocation is wrapped with a try-with-resources (well a MH chain
>>>>> equivalent to that is generated) and a new "binding context" with
>>>>> a SegmentAllocator is created. This should happen only when
>>>>> structs that are too big are passed by referenced by the ABI (I
>>>>> think that happens on Windows) - so we have to create a temp
>>>>> segment holding the struct, and pass the segment pointer to the
>>>>> underlying native function. The temp struct is then destroyed
>>>>> after the call.
>>>>>
>>>>> Upcalls also need an allocator, in case they receive structs by
>>>>> values (again, a temp segment might need to be allocated for the
>>>>> duration of the upcall).
>>>>>
>>>>> So, even if your downcall is fully intrinsified, you might still
>>>>> see calls to BindingContext::ofAllocator, depending on the shape
>>>>> of the called function. It is possible that C2 might have issue in
>>>>> scalarizing the Binding.Context allocation - but that's a separate
>>>>> problem from the one we were discussing (the impact of long loop
>>>>> optimizations).
>>>>>
>>>>> On that topic, I see that Roland has submitted a PR for the
>>>>> remaining perf issue we have seen in our micro benchmarks:
>>>>>
>>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk18/pull/35__;!!ACWV5N9M2RV99hQ!e9H5DbwL8T6FSBr9PJNuCT0yqzLh-MTz_AIKhaLak6meLU8VNeW0M-MHypUh0xGdjv4nKXs$
>>>>>
>>>>> I expect that, once integrated, we should then have full
>>>>> performance parity with current workarounds.
>>>>>
>>>>> Maurizio
>>>>>
More information about the panama-dev
mailing list