Implementation of IO with Panama
Radosław Smogura
mail at smogura.eu
Mon Apr 19 10:41:46 UTC 2021
Hi Maurizio,
> On Apr 19, 2021, at 11:33 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
>
> I see,
> you need both allocation and deallocation; arena works well when you need to allocate lots of resources at the same time - sometimes native apps will need to do that, but where you want everything to be freed at once.
>
> It seems like you just need a better malloc/free. If that's the case we have a more general allocator we're working on which at some point we'll integrate under the SegmentAllocator umbrella. From what I can read from some of the javadoc in the allocator you use, what we're working on is exactly what you need here.
>
> Regarding your implementation, I see that clients needs to obtain an allocator based on a given scope. This acquires your allocator's scope (so that memory cannot go away) and returns a new segment allocator which works using the client's scope, but which returns segment backed by your allocator impl. This is clever, and it allows you to avoid creating a ResourceScope for every returned segment - do you think that resource scope creation would have killed performances?
Good question.
In such a case we would have to (1) acquire the parent scope, and (2) add segment to clean list. So from top of my head I would say that creating new resource scope for every segment would require (1) which can be done once when using allocator with a scope.
Frankly, I came out with idea of child allocator, when I realized that during some operations you need to allocate one or two struct or CStrings, and it would be just handy to manage it via try-with-resources.
> Then, on each allocator request you use `getSegmentForScope` which internally register a callback (on the client scope) for `putSegmentEntry`, which returns the segment to the pool. It seems all very consistent. With this design, the client can use confined, or shared segments, and you get same performances, as clients are not expected to call "close" on each segment - but you work on a scope granularity instead (which the API encourages).
>
> So, if a client use the SegmentAllocator API, everything should be safe right? But you also expose a lower level API to, presumably, avoid the segment allocator. How much is it gained by that?
The difference in performance when using low level API versus allocator is around (5.6M vs 5.7M req / sec). However this test was for allocating single segment.
> Thanks
> Maurizio
>
>
>
>
>> On 19/04/2021 10:08, Radosław Smogura wrote:
>> Hi Maurizio,
>>
>> I hope you have a good day.
>>
>> I’ve sent the allocator as PR (most for review purposes).
>>
>> I think that even this one requires a lot of improvements, I think maybe it’s good idea to add a pool allocator bound to scope, which can create sub-allocator bound to new scope?
>>
>> For benchmarks socket benchmarks are bit fragile as can slightly be different depending on socket buffer.
>>
>> I did test with file, and I can get consistent gain around 15% especially for small reads 16b - on the link there’s wiki with benchmarks results.
>>
>> The ArenaAllocator was very slow - it gave 50% of JNI I/O, and I think it can’t be used as if I wrap the InputStream I can’ require caller to open scope.
>>
>>
>> Kind regards,
>> Rado
>>
>>>> On Apr 17, 2021, at 12:12 AM, Radosław Smogura <mail at smogura.eu> wrote:
>>>
>>> Hi Maurizio,
>>>
>>> I think I know what's happening there - however I would like to find this in some documentation.
>>>
>>> The link you shard shown that there's a call to __erno_loation and than move to/from address from rax so result from this call.
>>>
>>> But I did objdump and on my Linux the errno is symbol in .tbss section (thread local one),
>>>
>>> [ /usr/include ]
>>> radek at radek-ubuntu # objdump -T /usr/lib/x86_64-linux-gnu/libc-2.32.so |grep errno
>>> 0000000000000010 g D .tbss 0000000000000004 GLIBC_PRIVATE errno
>>> 0000000000029030 g DF .text 0000000000000015 GLIBC_2.2.5 __errno_location
>>> 00000000001498d0 g DF .text 000000000000005f (GLIBC_2.2.5) clnt_sperrno
>>> 0000000000149930 g DF .text 0000000000000084 (GLIBC_2.2.5) clnt_perrno
>>> 000000000000006c g D .tbss 0000000000000004 GLIBC_PRIVATE __h_errno
>>> 0000000000129b60 g DF .text 0000000000000015 GLIBC_2.2.5 __h_errno_location
>>>
>>> D - dynamic or debugging symbol
>>> F - function
>>> g - global
>>>
>>> I don't know internals of thread-locals and dynamic symbols (I only would guess the first is implemented by kernel by mapping separate page for every thread), and I know linking sometimes can be more complicated (i.e. gcc has multiversioning - something like bootstrap methods).
>>>
>>> However I've found this
>>> https://urldefense.com/v3/__https://lwn.net/Articles/5851/__;!!GqivPVa7Brio!OGI60qGQqVMJBJUIK1dpoOkAZ3oRCyD4Wm4UhDNmPDD0jgVF3C9_ZXclz96XO4aKaaP0Rkg$ (probably outdated)
>>>
>>> I think that only manipulation of page map, could make this working.
>>>
>>> Definitely it's worth of checking this on BSD and OSX, as well checking what kind of approach should be used, and how it can be determined on runtime, but it looks like errno_location looks like something which should be used.
>>>
>>> From the other hand, I work on improving allocator, and I already see some good results, I'll send updates later.
>>>
>>> Kind regards,
>>> Rado
>>> [announce, patch] Thread-Local Storage (TLS) support for Linux, 2.5.28 [LWN.net]<https://urldefense.com/v3/__https://lwn.net/Articles/5851/__;!!GqivPVa7Brio!OGI60qGQqVMJBJUIK1dpoOkAZ3oRCyD4Wm4UhDNmPDD0jgVF3C9_ZXclz96XO4aKaaP0Rkg$ >
>>> From:: Ingo Molnar <mingo at elte.hu> To:: linux-kernel at vger.kernel.org: Subject: [announce, patch] Thread-Local Storage (TLS) support for Linux, 2.5.28: Date:
>>> lwn.net
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> Od: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
>>> Wysłane: piątek, 16 kwietnia 2021 16:30
>>> Do: Radosław Smogura <mail at smogura.eu>
>>> DW: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>> Temat: Re: Implementation of IO with Panama
>>>
>>>
>>>> On 16/04/2021 15:10, Radosław Smogura wrote:
>>>> I could grab address for it just looking for symbol from CLinker.
>>>>
>>>> But, AFIK, errno is thread local variable so maybe I should double check how this should be handled.
>>>
>>> Interesting reading from man page:
>>>
>>> ```
>>> errno is defined by the ISO C standard to be a modifiable lvalue of
>>> type int, and must not be explicitly declared; errno may be a
>>> macro.
>>> errno is thread-local; setting it in one thread does not
>>> affect its
>>> value in any other thread.
>>> ```
>>>
>>> this is subtle - it's an lvalue (e.g. can used e.g. for assignment) but
>>> is not explicitly declared - meaning it's not a variable in the proper
>>> sense.
>>>
>>> At this point I'm not sure that linking to it gives the expected results.
>>>
>>> In fact, firing up Godbolt shows that gcc uses __errno_location under
>>> the hood:
>>>
>>> https://urldefense.com/v3/__https://godbolt.org/z/rcMPTKxva__;!!GqivPVa7Brio!OGI60qGQqVMJBJUIK1dpoOkAZ3oRCyD4Wm4UhDNmPDD0jgVF3C9_ZXclz96XO4aKvdRCIgg$
>>>
>>> Maurizio
>>>
>>>
More information about the panama-dev
mailing list