[foreign-abi] RFR: JDK-8243669: Improve library loading for Panama libraries

Mon May 11 10:11:15 UTC 2020

On 09/05/2020 03:16, Samuel Audet wrote:
> On 5/6/20 8:36 AM, Maurizio Cimadamore wrote:
>> On 05/05/2020 23:56, Samuel Audet wrote:
>>> I'm just trying to drive the point home that we need some sort of 
>>> solution. GPUs, FPGAs, DSPs, and other accelerators in general are 
>>> not going to become magically irrelevant simply because OpenJDK does 
>>> not consider them important! They are important, they are here to 
>>> stay, and their importance is only going to continue to grow.
>>
>> We are aware of that, and nobody has really mentioned that said 
>> devices are not considered as important (and I think you should 
>> really stop making absurd claims without any evidence to back them 
>> up). I think the 
>
> I'm sorry if I'm making absurd claims about information that you're 
> not making available publicly :) It would be nice to get a roadmap of 
> some sort, even if it's just to mention: "Hey, we're actually not 
> ignoring these things!"
>
>> memory access API makes it fairly easy to create an ad-hoc memory 
>> segment backed by e.g. GPU memory - I've demonstrated how easy it is 
>> to wire things up and create your own memory sources:
>>
>> https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd
>>
>> Now, replace mmap/munmap with cudaMalloc/cudaFree and you will have a 
>> MemorySegment that can be used to model GPU memory. All the lifecycle 
>> aspects of "traditional", off-heap memory segments can in fact 
>> translate onto this ad-hoc segment, so that its use can be made safe.
>
> That looks like a good starting point, yes. Are saying that this is 
> intended to be a public API that end users can use to replace 
> mmap/munmap with not only cudaMalloc/cudaFree but whatever they might 
> wish?
That's the spirit, yes. We have to figure out how to make this piece of 
"more unsafe API" cohexist with the rest of the API, but that's the 
direction.
>
> Let's assume this is going to be all public. The next thing that 
> worries me is about simultaneous access from multiple threads. We have 
> no such restrictions in C++, so that is bound to cause issues down the 
> road. Does OpenJDK intend to force this onto the Java community in a 
> similar fashion to JPMS? Or are you open for debate on this, and other 
> points?
The above method already allows you to create unconfined segments. We 
are also exploring (in parallel) very hard ways on how to make these 
restrictions either disappear completely (by using some sort of GC-based 
handhsake), or be less intrusive (by using a broader definition of 
confinement which spans not across a single thread, but across multiple, 
logically related, threads).
>
>> Of course the memory access API is a building block - together with 
>> ABI support (another building block) it allows you to model and 
>> manipulate memory sources (of all kinds, provided you have some 
>> native library to interact with it); if you are looking for an 
>> high-end Cuda-like GPU library port written in Java, Panama simply 
>> isn't the place to look for it. But it should be possible (and 
>> hopefully easier) to build one given the tools we're building.
>
> Right, that's how I see it, but your lack of reply to my query about 
> the intended usability of these APIs here concerns me:
> https://github.com/bytedeco/javacpp/issues/391#issuecomment-623030899
>
I didn't see that comment. In general you can attach whatever index 
pre-processing capability you want with MemoryHandles.filterCoordinates. 
Once you have a function that goes from a logical index (or tuples of 
indices) into a index into the basic memory segment you can insert that 
function as a filter of the coordinate - and you will get back a var 
handle which features the desired access coordinates, with the right 
behavior.

In your example the filtering function could be something like this 
(taken from your example):

@Override
     public long index(long i, long j, long k) {
         return (offsets[0] + hyperslabStrides[0] * (i / blocks[0]) + (i 
% blocks[0])) * strides[0]
                 + (offsets[1] + hyperslabStrides[1] * (j / blocks[1]) + 
(j % blocks[1])) * strides[1]
                 + (offsets[2] + hyperslabStrides[2] * (k / blocks[2]) + 
(k % blocks[2])) * strides[2];
     }

So, assuming you have a plain indexed var handle whose only coordinate 
is a `long` (the offset of the element in the segment to be addressed), 
if you attach a method handle wrapping the above method to the such var 
handle, you will get back a var handle that takes three longs - in other 
words you will go from

VarHandle(MemoryAddress, long)

to

VarHandle(MemoryAddress, long, long, long)

where, on each access, the above function will be computed, yield a long 
index value which can then be used to access the underlying memory region.

Maurizio

> Samuel