[foreign-abi] RFR: JDK-8243669: Improve library loading for Panama libraries
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon May 11 10:11:15 UTC 2020
On 09/05/2020 03:16, Samuel Audet wrote:
> On 5/6/20 8:36 AM, Maurizio Cimadamore wrote:
>> On 05/05/2020 23:56, Samuel Audet wrote:
>>> I'm just trying to drive the point home that we need some sort of
>>> solution. GPUs, FPGAs, DSPs, and other accelerators in general are
>>> not going to become magically irrelevant simply because OpenJDK does
>>> not consider them important! They are important, they are here to
>>> stay, and their importance is only going to continue to grow.
>>
>> We are aware of that, and nobody has really mentioned that said
>> devices are not considered as important (and I think you should
>> really stop making absurd claims without any evidence to back them
>> up). I think the
>
> I'm sorry if I'm making absurd claims about information that you're
> not making available publicly :) It would be nice to get a roadmap of
> some sort, even if it's just to mention: "Hey, we're actually not
> ignoring these things!"
>
>> memory access API makes it fairly easy to create an ad-hoc memory
>> segment backed by e.g. GPU memory - I've demonstrated how easy it is
>> to wire things up and create your own memory sources:
>>
>> https://gist.github.com/mcimadamore/128ee904157bb6c729a10596e69edffd
>>
>> Now, replace mmap/munmap with cudaMalloc/cudaFree and you will have a
>> MemorySegment that can be used to model GPU memory. All the lifecycle
>> aspects of "traditional", off-heap memory segments can in fact
>> translate onto this ad-hoc segment, so that its use can be made safe.
>
> That looks like a good starting point, yes. Are saying that this is
> intended to be a public API that end users can use to replace
> mmap/munmap with not only cudaMalloc/cudaFree but whatever they might
> wish?
That's the spirit, yes. We have to figure out how to make this piece of
"more unsafe API" cohexist with the rest of the API, but that's the
direction.
>
> Let's assume this is going to be all public. The next thing that
> worries me is about simultaneous access from multiple threads. We have
> no such restrictions in C++, so that is bound to cause issues down the
> road. Does OpenJDK intend to force this onto the Java community in a
> similar fashion to JPMS? Or are you open for debate on this, and other
> points?
The above method already allows you to create unconfined segments. We
are also exploring (in parallel) very hard ways on how to make these
restrictions either disappear completely (by using some sort of GC-based
handhsake), or be less intrusive (by using a broader definition of
confinement which spans not across a single thread, but across multiple,
logically related, threads).
>
>> Of course the memory access API is a building block - together with
>> ABI support (another building block) it allows you to model and
>> manipulate memory sources (of all kinds, provided you have some
>> native library to interact with it); if you are looking for an
>> high-end Cuda-like GPU library port written in Java, Panama simply
>> isn't the place to look for it. But it should be possible (and
>> hopefully easier) to build one given the tools we're building.
>
> Right, that's how I see it, but your lack of reply to my query about
> the intended usability of these APIs here concerns me:
> https://github.com/bytedeco/javacpp/issues/391#issuecomment-623030899
>
I didn't see that comment. In general you can attach whatever index
pre-processing capability you want with MemoryHandles.filterCoordinates.
Once you have a function that goes from a logical index (or tuples of
indices) into a index into the basic memory segment you can insert that
function as a filter of the coordinate - and you will get back a var
handle which features the desired access coordinates, with the right
behavior.
In your example the filtering function could be something like this
(taken from your example):
@Override
public long index(long i, long j, long k) {
return (offsets[0] + hyperslabStrides[0] * (i / blocks[0]) + (i
% blocks[0])) * strides[0]
+ (offsets[1] + hyperslabStrides[1] * (j / blocks[1]) +
(j % blocks[1])) * strides[1]
+ (offsets[2] + hyperslabStrides[2] * (k / blocks[2]) +
(k % blocks[2])) * strides[2];
}
So, assuming you have a plain indexed var handle whose only coordinate
is a `long` (the offset of the element in the segment to be addressed),
if you attach a method handle wrapping the above method to the such var
handle, you will get back a var handle that takes three longs - in other
words you will go from
VarHandle(MemoryAddress, long)
to
VarHandle(MemoryAddress, long, long, long)
where, on each access, the above function will be computed, yield a long
index value which can then be used to access the underlying memory region.
Maurizio
> Samuel
More information about the panama-dev
mailing list