[foreign-abi] minimizing the surface of restricted native access

Ty Young youngty1997 at gmail.com
Mon Dec 2 14:59:40 UTC 2019


On 12/2/19 7:47 AM, Maurizio Cimadamore wrote:
>
> On 02/12/2019 13:36, Ty Young wrote:
>>
>> On 12/2/19 2:43 AM, Maurizio Cimadamore wrote:
>>> I think eventually, we will do that, but your idea of _everything_ 
>>> and ours (outlined in the writeup I've sent) seem to differ. Moving 
>>> forward, we will create a new foreign-jextract branch which will be 
>>> based off foreign-abi and add the minimal jextract/API bits. After 
>>> we get that branch in good shape (e.g. we can support all the use 
>>> cases/examples we can support now) we will gradually phase-out the 
>>> 'foreign' branch (and hence also the 'foreign-linkToNative' branch.
>>
>>
>>
>>>
>>> The problem is that public APIs already _have defined_ abstractions 
>>> such as Pointer a dozen times (see my writeup) and with rather 
>>> different characteristics too. 
>>
>>
>> None of that is exactly new in the Java API. 
>> Collections.unmodifiableList() for example modifies(by interception) 
>> the expected behavior of writing(adding) to a List interface 
>> implementation returned by it.
>
> I take your point - but the approach you suggest doesn't really scale 
> as there are several aces around which the memory segment API could be 
> split - to name a few:
>
> - is this a native segment, an heap segment, an array segment or a 
> memory mapped segment?
>
> - this this a read-only segment? Or read-write? (or, not supported for 
> now, no-read/no-write)
>
> - is this a 'trusted' segment or not (this is the axis we're talking 
> about here)
>
> There are probably others I'm omitting now - plus new ones which might 
> be required moving forward (e.g. need to writeback, allows unaligned 
> access, ...)
>
> Since we can't possibly have a new API type for each new kind of 
> segment, the lumpy move feels quite a forced move, to be honest.


Completely understand.


>
> I also think that you need to keep in mind that, in many cases (look 
> at what Ioannis wrote) the binder generator will insert the "right" 
> conversions for you (where the meaning of "right" depends on the 
> context). So it is important that we focus on way to provide the right 
> set of primitives for binder generators to do that (e.g. is there a 
> way for a binder generator to write a binding for strcat? The NOTHING 
> + rebase model addresses that use case quite well). So, for the 
> purpose of this discussion, let's not assume that the user doing the 
> 'rebase' will have to necessarily be the end user, because I do not 
> expect that to be the case.


It's the lack of API transparency that I take issue with. If you were to 
read what you quoted from strcat docs as a random Panama API user:



 >The  strcat() function appends the src string to the dest string, 
over‐writing the terminating null byte ('\0') at the end of dest,  and  
then
 >adds  a  terminating  null  byte.  The strings may not overlap, and 
the dest string must have enough space for the result. [...]
 >The strcat() [...] function return a pointer to the  resulting string 
dest.


You wouldn't think anything was special about the returned MemoryAddress 
until later when you surprisingly find out that it's a Nothing segment.


(You'd think people would read the documentation but... lets be honest, 
they won't.)


A solution(as I suggested earlier) is MemoryAddressDescriptor. All it 
does is act as an information provider(address, size, read/write state, 
heap type, etc), and nothing more. This way there is no confusion as to 
what is being worked with. The only way to perform actions on the 
described MemoryAddress is to get an object reference from the from() 
method.


>
> Maurizio
>
>
>>
>>
>> Not that it's a good thing and should be repeated(IMO it's bad design 
>> and shouldn't be) - but the problem with the "Nothing" segment 
>> approach is that it violates the expected behavior of a function call 
>> in a way. Your example of strcat is the perfect example of this: the 
>> input "dest" buffer is expected to be the output but it isn't, it's 
>> in actuality the "Nothing" segment in disguise. Yes, you can use the 
>> rebase() method to restore it, but that isn't exactly transparent 
>> because it (presumably) uses the same interface but with a completely 
>> different implementation.
>>
>>
>> IMO, it would be better if a completely different object was returned 
>> instead - sidestepping the "Nothing" segment usage altogether in such 
>> situations.
>>
>>
>> Presumably the method call in Java's eyes looks like:
>>
>>
>> public MemoryAddress strcat(MemoryAddress dest, MemoryAddress src)
>>
>>
>> Maybe instead it should be something like:
>>
>>
>> public MemoryAddressDescriptor strcat(MemoryAddress dest, 
>> MemoryAddress src)
>>
>>
>> where MemoryAddressDescriptor is an object which contains descriptive 
>> data about the returned Pointer. You would then be able to get the 
>> actual MemoryAddress object using:
>>
>>
>> public static MemoryAddress from(MemoryAddressDescriptor addrDesc)
>>
>>
>> Of note is that this sort of thing is already done for enums from C 
>> where they are translated to int values in the jextract java.foreign 
>> API.
>>
>>
>> A bonus here is that MemoryAddressDescriptor could be a Record/Value 
>> Type since it was created in C and will never change.
>>
>>
>> Really, the problem IMO is just the lack of transparency. It's 
>> Collections.unmodifiableList() all over again.
>>
>>
>>> As for structs - adding a common super-interface doesn't really add 
>>> all that value - at the end of the day, jextract needed to generate 
>>> the concrete struct interface for each particular struct that was 
>>> encountered in the parsed headers. The bigger problem is that 
>>> creating bindings for a native library is a complex, creative 
>>> process with some bits of mechanical work in it. With the current 
>>> jextract we tried to automate it all - and this resulted in some API 
>>> choices being made for you by the tool; if you want/need to back out 
>>> from some of these choices (and many existing bindings will need to 
>>> do that), you need an infinite set of control knobs to tell jextract 
>>> what to do - and this doesn't scale.
>>
>>
>> Would be interesting to see when you would need to do that. I haven't 
>> for what I'm using jextract for, personally.
>>


More information about the panama-dev mailing list