[foreign] RFR 8218772: Limit struct member pointers to size of the field

Wed Feb 13 18:44:56 UTC 2019

Henry Jen schreef op 2019-02-13 19:19:
> On Feb 13, 2019, at 9:42 AM, Jorn Vernee <jbvernee at xs4all.nl> wrote:
>> 
>> I don't think this bug is just a symptom of the bulk copy of Arrays. 
>> Note that you can also cause an overwrite when you have a struct with 
>> 2 ints, take a pointer to the first one, cast it to a long pointer, 
>> and write to it.
>> 
> 
> This is perfectly OK. The working theory is that as long as you
> operated in the allocated memory region, you should be able to cast as
> needed.
> 
> In you array case earlier, that’s not OK as the type of array has a
> length, so I would expect an exception. But if you cast the pointer to
> an Array of 3, then it should be able to overwrite the int. After all,
> [2i32]i32 and [3i32] and i32i32i32 are the same size of memory block.

The problem is that there is no way to know the size of the target array 
type once we're in References.OfArray::set, since the target pointer 
might not actually have that type. The earliest location where we do 
know this size is in RuntimeSupport, when retrieving the pointer to the 
field.

Jorn

> Cheers,
> Henry
> 
> 
>> It simply seems incorrect to me that a pointer to a struct's field 
>> provides access outside of that field's memory.
>> 
>> ---
>> 
>> FWIW, I think the Array abstraction is a useful one. It signals the 
>> difference between having a chunk of memory (Array) vs. having just a 
>> cursor into memory (Pointer), and as such, I think it's fine to say 
>> that writing an Array constitutes a copy of the array vs. just a copy 
>> of the pointer. If a bulk-copy is not wanted, users can use the 
>> elementPointer() instead (same with Struct::ptr).
>> 
>> The native types do not map perfectly into Java types, so there are 
>> some things that have to be learned when using the API. I think the 
>> pitfall here is the assumption that, since in C an array is just a 
>> pointer in a lot of cases, and setting a pointer does not incur a 
>> copy, the Array type in Java must also really be just a Pointer, and 
>> setting it should not incur a copy. But in the API we have 2 distinct 
>> types, Pointer and Array, so I don't think it's unreasonable to say 
>> that those 2 will behave differently.
>> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-02-13 17:57:
>>> Hi Jorn,
>>> I was looking at something very related to this - e.g. relationship
>>> between pointers and arrays, and, in general bulk-write operations 
>>> for
>>> structs and arrays (with Pointer::set) and I thought I might add
>>> something to this discussion, to see if the issues that you are
>>> running into are just bugs, or symptoms of something deeper.
>>> Over the last few weeks I've been toying with the idea of merging
>>> Array and Pointers - after all a BoundedPointer is expressive enough
>>> to represent both. We could e.g. setup an approach like the one 
>>> below:
>>> - Pointer has array-like accessors Pointer::get(long), 
>>> Pointer::set(long, X)
>>> - regular accessor defaults to zero offset - that is, Pointer::get()
>>> -> Pointer::get(0)
>>> - jextract does NOT generate setters for array struct fields, or 
>>> array
>>> global variables - only getters
>>> - an API is provided (well, one exists already) to do bulk copy
>>> between different arrays/structs
>>> While I like the unification this brings about (only one abstraction
>>> instead of two, Pointer and Array), and I also like the fact that we
>>> move towards a model where Pointer::get, Pointer::set are O(1)
>>> operations (with bulk operations left to higher level APIs), there is
>>> something that doesn't 100% convinces me: if we go down this path, 
>>> the
>>> following layouts:
>>> u64:u64:i32 (pointer to pointer to int)
>>> and
>>> u64:[5 i32] (pointer to array of int)
>>> will effectively have the same carrier type:
>>> Pointer<Pointer<Integer>>
>>> The difference will be visible only upon closer inspection: we could
>>> call Pointer::type() obtain a LayoutType, then do 
>>> LayoutType::layout()
>>> and see whether the pointee layout is a sequence or an address.
>>> Now, when we perform a get() on such a pointer, we can, given the
>>> layout, construct the right pointer with the right size info (if 
>>> any).
>>> But what about set() ? I see different options here, none of which
>>> seems particularly satisfying:
>>> 1) Pointer::set should throw if you are trying to set an array-like
>>> pointer (which would require bulk copy)
>>> 2) Pointer::set will silently perform bulk copy of the incoming
>>> pointer into the pointed region
>>> Of these, (2) is similar to what we have now, whereas (1) would be a
>>> stricter variant.
>>> The thing that puzzles me is that, looking at the code it will be
>>> absolutely impossible to understand what's going on - e.g. if native
>>> arrays and native pointers map to the same Java carrier (Pointer) it
>>> then becomes super hard to explain/understand/predict why a given
>>> operation (e.g. Pointer::set) resulted in bulk copy/exception.
>>> One of the principles I've been trying to adhere to when designing 
>>> the
>>> foreign API is to avoid surprises - e.g. the semantics of a given
>>> operation should be clear from the carrier types involved in the
>>> operation. In that respect, maybe a pointer with an optional length()
>>> is tolerable, but having Pointer::set behaving differently depending
>>> on the result of Pointer::type() is that acceptable or too subtle?
>>> This seems to suggest that the current Pointer vs. Array split 
>>> carries
>>> some weight. If carrier type are different then it's easy to see for
>>> the user which operation might or might not be supported (or might
>>> have bulk-logic).
>>> <sidebar>
>>> On the other hand, one might argue that this is already happening 
>>> with
>>> pointers to incomplete objects - if I have a Pointer<?> and I call
>>> get() I don't know if I'll get an exception or not. It again depends
>>> on whether the layout complete or not - e.g. if it's the void layout
>>> or, if it's partial layout (because it refers to an unresolved 
>>> struct)
>>> an exception will be thrown.
>>> </sidebar>
>>> If we do retain Pointer vs. Array then I think we are free to decide
>>> whether for Pointer::set we want (1) or (2) - that is, when we have
>>> Pointer<Array<Integer>>
>>> or
>>> Pointer<StructFoo>
>>> should Pointer::set do bulk copy, or should it throw? If the latter,
>>> should jextract even emit setters for struct fields of type
>>> array/struct? (or, perhaps, emulate such setters by performing
>>> explicit copy, rather than by using Pointer::set internally).
>>> These are all questions that are relevant, I believe, to the fix you
>>> brought up - I'm of course fine with the fix, but I'd like to also
>>> understand whether the bulk-copy on Pointer::set is one magic trick
>>> too far and also, more generally, what do Panama-land feels about the
>>> Array vs. Pointer split/lump.
>>> Feedback welcome.
>>> Cheers
>>> Maurizio
>>> On 13/02/2019 15:47, Jorn Vernee wrote:
>>>> Hi,
>>>> I found a bug where it was possible to overwrite trailing fields of 
>>>> a struct by writing an oversized array to a previous array field 
>>>> (see bug). Overwriting is also possible in other cases by forcing an 
>>>> oversized write to a struct field. This can be fixed (relatively 
>>>> easily) by limiting the size of memory regions of pointers to struct 
>>>> members to the size of their fields.
>>>> Please review the following.
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218772
>>>> Webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.00/
>>>> Thanks,
>>>> Jorn