Ability to extend a MemorySegment
Samuel Audet
samuel.audet at gmail.com
Fri Jan 24 09:02:53 UTC 2020
On 1/23/20 8:33 PM, Maurizio Cimadamore wrote:
>
> On 23/01/2020 01:29, Samuel Audet wrote:
>> Hi, Maurizio,
>>
>> On 1/22/20 9:33 PM, Maurizio Cimadamore wrote:
>>> That said, I don't see anything in that list which points to
>>> something different as to what is being discussed here... Panama has
>>> always been about inventing new technologies which would allow Java
>>> programs to speak about foreign functions *and* foreign data. Most
>>> people in this mailing list are concerned with the former, and they
>>> can't see (or decide not to see) the connection with the latter.
>>
>> Since you mention this, could I pick you brains further on the point
>> about "foreign data"? For data manipulation, Arrow is all the rage
>> these days, so I think that what Panama comes up with needs to be
>> useful for this kind of framework. Arrow basically provides an
>> efficient in memory columnar representation of data (so that we can
>> apply vector operations on it) that is meant to share data between
>> libraries without copying it around, and therefore we also need an
>> efficient way to convert data to and from row-wise representations
>> from files, databases, etc, like this:
>> https://github.com/bytedeco/javacpp-presets/tree/master/arrow#sample-usage
>>
>>
>> Have you given some thoughts about how Panama could help make this, or
>> anything else that Arrow offers for that matter, faster? That is,
>> excluding the overhead of native functions calls? In my opinion, if
>> Panama supported inlining of inline native functions, that would
>> pretty much do all that we need for data as well, but maybe I'm
>> missing something...? Sure, it wouldn't be "safe", but as far as I
>> understand, we could add a safety layer on top of it, and in the end,
>> we would get the same thing.
>
> Speeding up native functions is one way to look at the problem - if you
> are writing a Java binding for Arrow, you need (as your and other
> examples show) many native calls to setup column builders and then to
> create tables, which can be expensive. One move in that direction would
> be to remove Java -> native state transitions (we will likely provide
> unsafe knobs to do that), which should help quite a lot. Other things
> strategies, like 'programmable intrinsics' can be used (e.g. where a
> given Java method maps directly to a well known piece of assembly - e.g.
> an inline function) - we can probably come up with ways to get
> MethodHandles for these too (although that's not something we are
> actively exploring, Paul and I noted in the past that the MethodHandle
> trick we're using in SystemABI can also be used for things other than,
> say, a native function).
>
> All this said, I think from a Java perspective, the ultimate solution
> would be to code up what Arrow does in Java - so that e.g. the data it
> stores is allocated in segments, and is retrieved using handles. If you
> do that, then there's no more need for native calls (and C2 can optimize
> data motion pretty darn well). So one of the things that I'd be curious
> to see is if the memory access API will help some of these frameworks
> (not just Arrow, but I'm thinking also of things like Python ND-arrays)
> to be written completely in Java, and avoid native calls altogether. In
> principle, it's only matter of coding...
Thanks for your thoughts!
Yes, we could rewrite everything in Java, that's one way to work around
the issues, but there are serious limitations to such an approach too.
Java code doesn't run, for example, on GPU, so we need C/C++ code to do
stuff on non-CPU devices anyway. Since we already need to have C/C++
code, why not reuse it for the CPU as well? That's basically how GraalVM
intends to solve this. Also let's not forget that Python code is
creeping into enterprise applications where typically only Java existed.
We would need a way to share those MemorySegment with CPython and still
manage their lifetimes correctly. Again, GraalVM is thinking about these
issues, and they are trying to solve this by essentially providing a
CPython-compatible runtime it controls. The more I think about it, the
more it makes sense, and the more I look at what people here are doing,
the more Panama appears to me as a sort of giant band-aid on C2 until we
get there with GraalVM.
In any case, it shouldn't prevent us from coming up with a user-friendly
high-level API to access C/C++ libraries, since GraalVM will need one
anyway. I understand that Panama is no longer planning to work on that,
which is fine, but it would have been nice if it had not given false
hopes to the community concerning this all these years.
Samuel
More information about the panama-dev
mailing list