[foreign] RFR 8218772: Limit struct member pointers to size of the field

Fri Feb 15 00:41:37 UTC 2019

Maurizio Cimadamore schreef op 2019-02-14 22:52:
> On 14/02/2019 21:23, Jorn Vernee wrote:
> 
>> Yes, this is a good idea.
>> 
>> Update webrev:
>> 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.02/
>> 
>> 
>> Struct::assign and Array::assign were pretty similar, so I factored
>> out the code into a Pointer::assign method. I also implemented
>> equals/hashCode/toString for LayoutTypeImpl for that, which seem
>> useful to have any ways. (I can imagine there are scenarios where a
>> user would want to have a Map<LayoutType, ...> for instance).
> 
> Not uber convinced about Pointer::assign, mostly because we already
> have a Pointer::copy - we should only have the latter, and perhaps
> that's the right name for the new method?

Sounds good. I've renamed 'assign' to 'copy'.

> Also, not greatly convinced by the need of the Array::ptr method -
> maybe hide that into the impl class?

Yeah, I'll hide that in the impl class. I originally added it to Array 
to avoid casting in the implementation, and I thought maybe if the 
method is useful for us, it's useful to users as well. But I guess it's 
a little too confusing to have both ptr() and elementPointer().

>> The new checking also uncovered a bug where allocation of a 0 length
>> array would return Pointer.nullPointer(), but this does not have the
>> right LayoutType. Besides, there's already a special case for 0
>> length regions by returning MemoryBoundInfo.NOTHING. I just removed
>> the nullPointer() return, so we end up with a pointer with the right
>> LayoutType, and perhaps more importantly, the right Scope.
>> 
>> However, this again caused a crash in ScopeTest, so I've also
>> updated the bounds checking code to remove the special checkBounds
>> case for accessing NOTHING at offset 0, and added a special case to
>> checkRange w.r.t. empty ranges, which fixes the issue.
> 
> Could you expand a bit on exactly who was calling checkRange with
> offset and length set to 0? I guess it was Pointer::addr?

I was seeing a segmentation fault with the following stack trace:

j  jdk.internal.foreign.memory.BoundedPointer.unsafeGetBits()J+111 
java.base at 13-internal
j  jdk.internal.foreign.memory.BoundedPointer.getBits()J+14 
java.base at 13-internal
j  
jdk.internal.foreign.memory.References$OfByte.getByte(Ljava/foreign/memory/Pointer;)B+4 
java.base at 13-internal
j  
java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)I+10 
java.base at 13-internal
j  
java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+15 
java.base at 13-internal
j  
java.lang.invoke.LambdaForm$MH.invoke_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+18 
java.base at 13-internal
j  java.foreign.memory.Array.get(J)Ljava/lang/Object;+23 
java.base at 13-internal
j  
ScopeTest.assertEmptyArray(Ljava/foreign/memory/Array;Ljava/lang/Object;)V+26
j  ScopeTest.testNullAllocation()V+44

This is because checkBounds explicitly allows access at offset = 0 and 
regionLength = 0 (bad imho). I think this wasn't previously a problem 
since NOTHING was only every used with References.OfGrumpy, which 
explicitly throws an exception when trying to access.

> The changes in MemoryBoundInfo look odd - for instance:
> 
> if (length == 0) {
> +            if (offset < 0 || offset > this.length) {
> +                throw new IllegalStateException("offset: " + offset +
> ", region length: " + this.length);
> +            }
> +        }
> 
> This  stuff inside the braces is 99% the same as checkBounds(offset),
> except for a '<' instead of '<='.
> 
> Wouldn't it be better to special case checkBounds to deal with zero
> length gracefully, and then just call it from checkRange?

The problem is that we have 2 requirements, 1.) bounds check for 
accessing a byte in a memory region. 2.) checking if one region is a 
sub-region of another. For 1. the offset can not be equal to the length 
of the region, because then we are reading outside of the region, but 
for 2. this is fine, as long as the sub-region has a length of 0.

checkBounds works like an array index check, so it works well for 1., 
but not for 2. It seemed better to handle 2. in checkRange, since we 
actually know the length of the range being accessed there, which could 
be 0. I really don't think checkBounds is the right place to try and fix 
this problem. Tbh, it seems like a mistake that checkRange relies on 
checkBounds in the first place.

I agree that the current code looks kind of cryptic. I've tried to 
improve this by removing the dependency on checkBounds from checkRange.

> Also, and probably even better - if this stuff is there just to handle
> NOTHING - wouldn't we better off overriding the check methods for the
> NOTHING instance (as we do for everything) ? I'm not a fan of these
> hard-coded special cases, which will be nearly impossible to read 2
> months from now.

Actually, I think I spoke too soon. I think the actual use case was to 
allow passing a null pointer to a native function, since the unboxing 
incurs a bounds check. The re-write of checkRange takes care of that as 
well.

Update webrev: 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.03/

(FWIW, re-writing the bounds check seems to have improved perf as well. 
My test runs are finishing about a minute faster).

Jorn

> Cheers
> Maurizio
> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-02-14 16:19:
>> On 14/02/2019 15:18, Maurizio Cimadamore wrote:
>> Why not using the Array.assign API point, which already performs all
>> these checks?
>> 
>> Similarly, I think References.OfStruct::set should use Struct.assign
>> -
>> as there's always a chance of mis-using the setter MH to write a
>> struct that is bigger or smaller than the expected one.
>> 
>> Maurizio
>> 
>> Maurizio
>> 
>> On 14/02/2019 14:31, Jorn Vernee wrote:
>> Okay, I have implemented the check in References.OfArray::set.
>> 
>> Updated webrev:
>> 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.01/
>> 
>> 
>> I was also tinkering a long time with simplifying the original code,
>> but kept running into issues with bounds checks in MemoryBoundInfo.
>> The current bounds checking code isn't really suited for dealing
>> with 0 length range checks.
>> 
>> Jorn
>> 
>> Jorn Vernee schreef op 2019-02-13 19:49:
>> Jorn Vernee schreef op 2019-02-13 19:44:
>> Henry Jen schreef op 2019-02-13 19:19:
>> On Feb 13, 2019, at 9:42 AM, Jorn Vernee <jbvernee at xs4all.nl> wrote:
>> 
>> 
>> I don't think this bug is just a symptom of the bulk copy of Arrays.
>> Note that you can also cause an overwrite when you have a struct
>> with 2 ints, take a pointer to the first one, cast it to a long
>> pointer, and write to it.
>> 
>> This is perfectly OK. The working theory is that as long as you
>> operated in the allocated memory region, you should be able to cast
>> as
>> needed.
>> 
>> In you array case earlier, that’s not OK as the type of array has
>> a
>> length, so I would expect an exception. But if you cast the pointer
>> to
>> an Array of 3, then it should be able to overwrite the int. After
>> all,
>> [2i32]i32 and [3i32] and i32i32i32 are the same size of memory
>> block.
> 
> The problem is that there is no way to know the size of the target
> array type once we're in References.OfArray::set, since the target
> pointer might not actually have that type. The earliest location where
> 
> we do know this size is in RuntimeSupport, when retrieving the pointer
> 
> to the field.
> 
> Actually, maybe we do know the type... I'm wondering about use of
> unsafe set operations in boxing code. I will just try it out and see.
> 
> Jorn
> 
>> Jorn
>> 
>> Cheers,
>> Henry
>> 
>> It simply seems incorrect to me that a pointer to a struct's field
>> provides access outside of that field's memory.
>> 
>> ---
>> 
>> FWIW, I think the Array abstraction is a useful one. It signals the
>> difference between having a chunk of memory (Array) vs. having just
>> a cursor into memory (Pointer), and as such, I think it's fine to
>> say that writing an Array constitutes a copy of the array vs. just a
>> copy of the pointer. If a bulk-copy is not wanted, users can use the
>> elementPointer() instead (same with Struct::ptr).
>> 
>> The native types do not map perfectly into Java types, so there are
>> some things that have to be learned when using the API. I think the
>> pitfall here is the assumption that, since in C an array is just a
>> pointer in a lot of cases, and setting a pointer does not incur a
>> copy, the Array type in Java must also really be just a Pointer, and
>> setting it should not incur a copy. But in the API we have 2
>> distinct types, Pointer and Array, so I don't think it's
>> unreasonable to say that those 2 will behave differently.
>> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-02-13 17:57:
>> Hi Jorn,
>> I was looking at something very related to this - e.g. relationship
>> between pointers and arrays, and, in general bulk-write operations
>> for
>> structs and arrays (with Pointer::set) and I thought I might add
>> something to this discussion, to see if the issues that you are
>> running into are just bugs, or symptoms of something deeper.
>> Over the last few weeks I've been toying with the idea of merging
>> Array and Pointers - after all a BoundedPointer is expressive enough
>> 
>> to represent both. We could e.g. setup an approach like the one
>> below:
>> - Pointer has array-like accessors Pointer::get(long),
>> Pointer::set(long, X)
>> - regular accessor defaults to zero offset - that is, Pointer::get()
>> 
>> -> Pointer::get(0)
>> - jextract does NOT generate setters for array struct fields, or
>> array
>> global variables - only getters
>> - an API is provided (well, one exists already) to do bulk copy
>> between different arrays/structs
>> While I like the unification this brings about (only one abstraction
>> 
>> instead of two, Pointer and Array), and I also like the fact that we
>> 
>> move towards a model where Pointer::get, Pointer::set are O(1)
>> operations (with bulk operations left to higher level APIs), there
>> is
>> something that doesn't 100% convinces me: if we go down this path,
>> the
>> following layouts:
>> u64:u64:i32 (pointer to pointer to int)
>> and
>> u64:[5 i32] (pointer to array of int)
>> will effectively have the same carrier type:
>> Pointer<Pointer<Integer>>
>> The difference will be visible only upon closer inspection: we could
>> 
>> call Pointer::type() obtain a LayoutType, then do
>> LayoutType::layout()
>> and see whether the pointee layout is a sequence or an address.
>> Now, when we perform a get() on such a pointer, we can, given the
>> layout, construct the right pointer with the right size info (if
>> any).
>> But what about set() ? I see different options here, none of which
>> seems particularly satisfying:
>> 1) Pointer::set should throw if you are trying to set an array-like
>> pointer (which would require bulk copy)
>> 2) Pointer::set will silently perform bulk copy of the incoming
>> pointer into the pointed region
>> Of these, (2) is similar to what we have now, whereas (1) would be a
>> 
>> stricter variant.
>> The thing that puzzles me is that, looking at the code it will be
>> absolutely impossible to understand what's going on - e.g. if native
>> 
>> arrays and native pointers map to the same Java carrier (Pointer) it
>> 
>> then becomes super hard to explain/understand/predict why a given
>> operation (e.g. Pointer::set) resulted in bulk copy/exception.
>> One of the principles I've been trying to adhere to when designing
>> the
>> foreign API is to avoid surprises - e.g. the semantics of a given
>> operation should be clear from the carrier types involved in the
>> operation. In that respect, maybe a pointer with an optional
>> length()
>> is tolerable, but having Pointer::set behaving differently depending
>> 
>> on the result of Pointer::type() is that acceptable or too subtle?
>> This seems to suggest that the current Pointer vs. Array split
>> carries
>> some weight. If carrier type are different then it's easy to see for
>> 
>> the user which operation might or might not be supported (or might
>> have bulk-logic).
>> <sidebar>
>> On the other hand, one might argue that this is already happening
>> with
>> pointers to incomplete objects - if I have a Pointer<?> and I call
>> get() I don't know if I'll get an exception or not. It again depends
>> 
>> on whether the layout complete or not - e.g. if it's the void layout
>> 
>> or, if it's partial layout (because it refers to an unresolved
>> struct)
>> an exception will be thrown.
>> </sidebar>
>> If we do retain Pointer vs. Array then I think we are free to decide
>> 
>> whether for Pointer::set we want (1) or (2) - that is, when we have
>> Pointer<Array<Integer>>
>> or
>> Pointer<StructFoo>
>> should Pointer::set do bulk copy, or should it throw? If the latter,
>> 
>> should jextract even emit setters for struct fields of type
>> array/struct? (or, perhaps, emulate such setters by performing
>> explicit copy, rather than by using Pointer::set internally).
>> These are all questions that are relevant, I believe, to the fix you
>> 
>> brought up - I'm of course fine with the fix, but I'd like to also
>> understand whether the bulk-copy on Pointer::set is one magic trick
>> too far and also, more generally, what do Panama-land feels about
>> the
>> Array vs. Pointer split/lump.
>> Feedback welcome.
>> Cheers
>> Maurizio
>> On 13/02/2019 15:47, Jorn Vernee wrote:
>> Hi,
>> I found a bug where it was possible to overwrite trailing fields of
>> a struct by writing an oversized array to a previous array field
>> (see bug). Overwriting is also possible in other cases by forcing an
>> oversized write to a struct field. This can be fixed (relatively
>> easily) by limiting the size of memory regions of pointers to struct
>> members to the size of their fields.
>> Please review the following.
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218772
>> Webrev:
>> 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.00/
>> Thanks,
>> Jorn