Ability to extend a MemorySegment

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Jan 23 11:33:36 UTC 2020


On 23/01/2020 01:29, Samuel Audet wrote:
> Hi, Maurizio,
>
> On 1/22/20 9:33 PM, Maurizio Cimadamore wrote:
>> That said, I don't see anything in that list which points to 
>> something different as to what is being discussed here... Panama has 
>> always been about inventing new technologies which would allow Java 
>> programs to speak about foreign functions *and* foreign data. Most 
>> people in this mailing list are concerned with the former, and they 
>> can't see (or decide not to see) the connection with the latter.
>
> Since you mention this, could I pick you brains further on the point 
> about "foreign data"? For data manipulation, Arrow is all the rage 
> these days, so I think that what Panama comes up with needs to be 
> useful for this kind of framework. Arrow basically provides an 
> efficient in memory columnar representation of data (so that we can 
> apply vector operations on it) that is meant to share data between 
> libraries without copying it around, and therefore we also need an 
> efficient way to convert data to and from row-wise representations 
> from files, databases, etc, like this:
> https://github.com/bytedeco/javacpp-presets/tree/master/arrow#sample-usage 
>
>
> Have you given some thoughts about how Panama could help make this, or 
> anything else that Arrow offers for that matter, faster? That is, 
> excluding the overhead of native functions calls? In my opinion, if 
> Panama supported inlining of inline native functions, that would 
> pretty much do all that we need for data as well, but maybe I'm 
> missing something...? Sure, it wouldn't be "safe", but as far as I 
> understand, we could add a safety layer on top of it, and in the end, 
> we would get the same thing.

Speeding up native functions is one way to look at the problem - if you 
are writing a Java binding for Arrow, you need (as your and other 
examples show) many native calls to setup column builders and then to 
create tables, which can be expensive. One move in that direction would 
be to remove Java -> native state transitions (we will likely provide 
unsafe knobs to do that), which should help quite a lot. Other things 
strategies, like 'programmable intrinsics' can be used (e.g. where a 
given Java method maps directly to a well known piece of assembly - e.g. 
an inline function) - we can probably come up with ways to get 
MethodHandles for these too (although that's not something we are 
actively exploring, Paul and I noted in the past that the MethodHandle 
trick we're using in SystemABI can also be used for things other than, 
say, a native function).

All this said, I think from a Java perspective, the ultimate solution 
would be to code up what Arrow does in Java - so that e.g. the data it 
stores is allocated in segments, and is retrieved using handles. If you 
do that, then there's no more need for native calls (and C2 can optimize 
data motion pretty darn well). So one of the things that I'd be curious 
to see is if the memory access API will help some of these frameworks 
(not just Arrow, but I'm thinking also of things like Python ND-arrays) 
to be written completely in Java, and avoid native calls altogether. In 
principle, it's only matter of coding...

Maurizio


>
> Samuel


More information about the panama-dev mailing list