status of VM long loop optimizations - call for action

Rado Smogura mail at smogura.eu
Thu Dec 16 17:08:25 UTC 2021


Hi,


It's (newest) Ubuntu. Full sources are here: 
https://github.com/rsmogura/panama-io


For testing I created very simple main

public static void main(String[] args)throws Exception {
   byte[]sendBuff =new byte[8192*4];

   final var port = (short) (new Random().nextInt(1000) +20000);
   var server =ServerSocketFactory.getDefault().createServerSocket(port);
   Arrays.fill(sendBuff, (byte)1);

   var executorService =Executors.newSingleThreadExecutor();
   var serverThread =executorService.submit(() -> {
     final var conn =server.accept();
     final var out =conn.getOutputStream();
     while (true) {
       out.write(sendBuff);
     }
   });

   var sock =new PosixSocketFactory().createSocket("127.0.0.1",port);

   var in =sock.getInputStream();
   while (true)
     in.read(new byte[12]);
}


Kind regards,

Rado


On 16.12.2021 17:57, Maurizio Cimadamore wrote:
>
> On 16/12/2021 16:45, Rado Smogura wrote:
>> Hi,
>>
>>
>> Here's signature from man pages ssize_t read(int fd, void *buf, 
>> size_t count);
>>
>> And one generated by jextract public static long read ( int __fd, 
>> Addressable __buf, long __nbytes)
>>
>>
>> The buf is from Polled allocator, so it's previously allocated memory 
>> segment. I tired with buf passed as MemorySegment and MemoryAddress.
>
> Such a signature should never cause an allocator to be called - it's 
> all primitives.
>
> On which OS/platform are you on?
>
> Maurizio
>
>
>>
>>
>> Kind regards,
>>
>> Rado
>>
>> On 16.12.2021 17:35, Maurizio Cimadamore wrote:
>>>
>>>
>>> On 16/12/2021 16:24, Rado Smogura wrote:
>>>>
>>>> Hi,
>>>>
>>>> I don't know details of underlying ABI, however I think there 
>>>> should not be need to allocate additional structs.
>>>>
>>>> For POSIX read we pass 3 arguments, which should fit registers, and 
>>>> all are 32/64 bit values
>>>>
>>>> (long)mh$.invokeExact(__fd, __buf, __nbytes)
>>>>
>>>> This happens in both cases where buf is MemorySegment and 
>>>> MemoryAdderss, rest are primitives.
>>>>
>>> What is the signature of the native function? Is the argument 
>>> corresponding to __buf a struct or a pointer?
>>>
>>> Maurizio
>>>
>>>> Kind regards,
>>>>
>>>> Rado
>>>>
>>>> On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>>>>>> That's odd - I mean, the BindingContext is used when setting up 
>>>>>> downcall method handles, or upcall stubs. But should not be 
>>>>>> invoked in the hot path. 
>>>>>
>>>>> Correction: the ofAllocator call you see might in fact even be in 
>>>>> a hot path. A downcall method handle sometimes has to allocator 
>>>>> memory for the temp buffers it uses. When that happens, the 
>>>>> invocation is wrapped with a try-with-resources (well a MH chain 
>>>>> equivalent to that is generated) and a new "binding context" with 
>>>>> a SegmentAllocator is created. This should happen only when 
>>>>> structs that are too big are passed by referenced by the ABI (I 
>>>>> think that happens on Windows) - so we have to create a temp 
>>>>> segment holding the struct, and pass the segment pointer to the 
>>>>> underlying native function. The temp struct is then destroyed 
>>>>> after the call.
>>>>>
>>>>> Upcalls also need an allocator, in case they receive structs by 
>>>>> values (again, a temp segment might need to be allocated for the 
>>>>> duration of the upcall).
>>>>>
>>>>> So, even if your downcall is fully intrinsified, you might still 
>>>>> see calls to BindingContext::ofAllocator, depending on the shape 
>>>>> of the called function. It is possible that C2 might have issue in 
>>>>> scalarizing the Binding.Context allocation - but that's a separate 
>>>>> problem from the one we were discussing (the impact of long loop 
>>>>> optimizations).
>>>>>
>>>>> On that topic, I see that Roland has submitted a PR for the 
>>>>> remaining perf issue we have seen in our micro benchmarks:
>>>>>
>>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk18/pull/35__;!!ACWV5N9M2RV99hQ!e9H5DbwL8T6FSBr9PJNuCT0yqzLh-MTz_AIKhaLak6meLU8VNeW0M-MHypUh0xGdjv4nKXs$ 
>>>>>
>>>>> I expect that, once integrated, we should then have full 
>>>>> performance parity with current workarounds.
>>>>>
>>>>> Maurizio
>>>>>


More information about the panama-dev mailing list