[foreign-memaccess] RFR: JDK-8241772: MemorySegment should offer a spliterator

Fri Mar 27 21:45:11 UTC 2020

>> Sidebar: for those of you worried that the lack of acquire() will 
>> make native interop harder, we are planning to
>> _remove_ confinement restrictions from addresses obtained from native 
>> libraries - meaning that if you get a
>> MemoryAddress from a native library, you will be able to share it 
>> with any thread you want, and such threads will be
>> able to dereference the address w/o any need to workaround the 
>> confinement guarantees.
>
>
> Is there going to be a way to tell if it's restricted? What about 
> MemoryAddress(s) that just *happens* to be in memory but not returned 
> by a function? They're basically the same in that they require 
> ForeignUnsafe(formally) in order to be usable, so I guess they are the 
> same?

The rules are going to be applied uniformly for addresses obtained from 
a foreign call as well as to addresses read from main memory (e.g. using 
a VarHandle).

There will be a way to tell if an address is restricted or not; one way 
to do that would be to have a native address _not_ to have a 
corresponding segment (e.g. MemoryAddress::segment returns null). 
Another way would be to have some 'primordial' native segment you can 
check against.

>
>
> Going along with the theme of choice, it would be nice if this was an 
> opt-in thing and not forced. There is a lot of variance being 
> introduced with this and it might not be clear what's what, especially 
> at larger scale.
There will be ways to add back restrictions, if people want them (I 
forgot to mention that). So, if you happen to know the bound of an 
address, there will be an API to add those back. While it's 
theoretically possible to also add an owning thread, and all that, I 
don't think developers should try too hard to make up segments out of 
thin air. Memory segments are entities that arise from java code, where 
everything is known, so it makes sense to speak about spatial, temporal 
and confinement bounds. An address you read from memory, or that you get 
from a native library doesn't really have much info attached. The common 
case will be to attach some form of bound check to the address; I 
believe in other cases the user is probably trying to use the 
MemorySegment API for something more ad-hoc. For instance, it is 
theoretically possible to inject custom cleanup function into a segment 
so that, for instance, you could call a certain function to clean up the 
memory backing a certain address. But here we're walking a fine line - 
if you have an abstraction with complex state which requires cleanup, it 
is perhaps more advisable to write your abstraction, make it 
autocloseable and wrap a memory address, rather than trying to inject 
all relevant logic into a memory segment.
>
>
>>
>> Moving forward, there are other things we need to do after this patch 
>> is done - as mentioned, providing a GC-backed
>> escape hatch would probably be a good addition; but we also have to 
>> add more ways to share a segment with other threads
>> in serial-confinement mode (e.g. one thread at a time); the main 
>> operation we're looking at is an 'handoff' operation
>> which replaces the owner thread with a different one (useful in 
>> producer/consumer use cases) and, possibly, also a
>> detach/attach pair of operations which can be used to temporarily 
>> remove ownership from a segment, and then have it
>> picked up by a second thread - this is is effectively similar to 
>> handoff, but where the two threads don't know each
>> other.
>>
>> In addition to that, I'm planning to add a bunch of helper methods to 
>> SequenceLayout to help with reshaping and
>> flattening sequence layouts; this operation is very useful when a 
>> client needs/want to define a splitearator which
>> works on multiple elements at a time (see the ParallelSum benchmark 
>> for an example). I'll be filing a PR for this
>> separately.
>
>
> Maybe it's just me, but it feels like Panama is taking on too much 
> responsibility here. I feel like thread management should be solely 
> the responsibility of the API user as there are just too many 
> alternate paths when managing a MemorySegment.

If we could have memory segment that can have both safety and 
mult-threading capabilities (e.g. so that any thread can use them) we 
would have done that long ago, and I'd be very happy not to do anything 
in that space and keep it minimal. But if we want the ability to release 
memory deterministically - and I think we do (direct buffers really 
don't work here), then we have to constraint the space a bit and 
introduce confinement restrictions. The alternative is not to have 
restrictions and to put heavy locks everywhere which will make segment 
code ~100x slower than byte buffers, at which point why even bother?

So, as much as I'd like not to look at the confinement problem, I'm 
afraid we have to stare it in the face, and come up with ways to make 
confinement less... confining :-) Some of the moves I anticipated above 
go in that direction.
>
> On the flip side, it's often frustrating when APIs don't provide hooks 
> for these kind of things, which then require more (potentially buggy) 
> code to get working... but even so, I don't know, just feels like a 
> lot more "on the plate" for Panama's API, if you will, that might 
> confuse API users who don't know the grainy details.
I think this is up to us to explain to users what this API is for and 
what the sweet spot is. It will never be as simple as an API as a 
byte[]. But I also think that a lot of people will be able to survive 
w/o knowing the gory details of how you can move a segment from one 
thread owner to another. These are, in a way, advanced operations that 
only few people will really need. But, for instance, "handoff" is not 
something that can easily be implemented outside, so it's a primitive we 
need to provide.
>
>
> It's probably safe to say that people probably don't read every last 
> letter of documentation before they use things, only when they run 
> into issues. At that point things could already be a confusing giant 
> web of MemoryAddress(s) which may be hard to debug.
>
>
>> Cheers
>> Maurizio
>>
>> -------------
>>
>> Commit messages:
>>   - Fix white spaces
>>   - * Fix javadoc
>>   - More fixes
>>   - Remove comments on TestSpliterator
>>   - Fix threshold in ParallelSum benchmark
>>   - Fix ParallelSum benchmark to always use default FJP
>>   - Fix semantics of tryAdvance
>>   - Add MemorySegment::spliterator
>>
>> Changes: https://git.openjdk.java.net/panama-foreign/pull/71/files
>>   Webrev: https://webrevs.openjdk.java.net/panama-foreign/71/webrev.00
>>    Issue: https://bugs.openjdk.java.net/browse/JDK-JDK-8241772
>>    Stats: 700 lines in 12 files changed: 618 ins; 29 del; 53 mod
>>    Patch: https://git.openjdk.java.net/panama-foreign/pull/71.diff
>>    Fetch: git fetch https://git.openjdk.java.net/panama-foreign 
>> pull/71/head:pull/71
>>
>> PR: https://git.openjdk.java.net/panama-foreign/pull/71