[foreign] RFR 8218772: Limit struct member pointers to size of the field

Fri Feb 15 12:28:45 UTC 2019

Actually... I realized that checkBounds was only being used by 
checkRange, so at this point we could just remove it if we wanted.

Webrev: 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/nocheck/webrev.00/

Jorn

Maurizio Cimadamore schreef op 2019-02-15 02:10:
> Pretty solid patch - thanks for the explanations (perhaps at some
> point it could be worth renaming checkBounds/checkRange to something
> else).
> 
> Go for it
> 
> Maurizio
> 
> On 15/02/2019 00:41, Jorn Vernee wrote:
>> Maurizio Cimadamore schreef op 2019-02-14 22:52:
>>> On 14/02/2019 21:23, Jorn Vernee wrote:
>>> 
>>>> Yes, this is a good idea.
>>>> 
>>>> Update webrev:
>>>> 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.02/
>>>> 
>>>> 
>>>> Struct::assign and Array::assign were pretty similar, so I factored
>>>> out the code into a Pointer::assign method. I also implemented
>>>> equals/hashCode/toString for LayoutTypeImpl for that, which seem
>>>> useful to have any ways. (I can imagine there are scenarios where a
>>>> user would want to have a Map<LayoutType, ...> for instance).
>>> 
>>> Not uber convinced about Pointer::assign, mostly because we already
>>> have a Pointer::copy - we should only have the latter, and perhaps
>>> that's the right name for the new method?
>> 
>> Sounds good. I've renamed 'assign' to 'copy'.
>> 
>>> Also, not greatly convinced by the need of the Array::ptr method -
>>> maybe hide that into the impl class?
>> 
>> Yeah, I'll hide that in the impl class. I originally added it to Array 
>> to avoid casting in the implementation, and I thought maybe if the 
>> method is useful for us, it's useful to users as well. But I guess 
>> it's a little too confusing to have both ptr() and elementPointer().
>> 
>>>> The new checking also uncovered a bug where allocation of a 0 length
>>>> array would return Pointer.nullPointer(), but this does not have the
>>>> right LayoutType. Besides, there's already a special case for 0
>>>> length regions by returning MemoryBoundInfo.NOTHING. I just removed
>>>> the nullPointer() return, so we end up with a pointer with the right
>>>> LayoutType, and perhaps more importantly, the right Scope.
>>>> 
>>>> However, this again caused a crash in ScopeTest, so I've also
>>>> updated the bounds checking code to remove the special checkBounds
>>>> case for accessing NOTHING at offset 0, and added a special case to
>>>> checkRange w.r.t. empty ranges, which fixes the issue.
>>> 
>>> Could you expand a bit on exactly who was calling checkRange with
>>> offset and length set to 0? I guess it was Pointer::addr?
>> 
>> I was seeing a segmentation fault with the following stack trace:
>> 
>> j  jdk.internal.foreign.memory.BoundedPointer.unsafeGetBits()J+111 
>> java.base at 13-internal
>> j  jdk.internal.foreign.memory.BoundedPointer.getBits()J+14 
>> java.base at 13-internal
>> j 
>> jdk.internal.foreign.memory.References$OfByte.getByte(Ljava/foreign/memory/Pointer;)B+4 
>> java.base at 13-internal
>> j 
>> java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)I+10 
>> java.base at 13-internal
>> j 
>> java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+15 
>> java.base at 13-internal
>> j 
>> java.lang.invoke.LambdaForm$MH.invoke_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+18 
>> java.base at 13-internal
>> j  java.foreign.memory.Array.get(J)Ljava/lang/Object;+23 
>> java.base at 13-internal
>> j 
>> ScopeTest.assertEmptyArray(Ljava/foreign/memory/Array;Ljava/lang/Object;)V+26
>> j  ScopeTest.testNullAllocation()V+44
>> 
>> This is because checkBounds explicitly allows access at offset = 0 and 
>> regionLength = 0 (bad imho). I think this wasn't previously a problem 
>> since NOTHING was only every used with References.OfGrumpy, which 
>> explicitly throws an exception when trying to access.
>> 
>>> The changes in MemoryBoundInfo look odd - for instance:
>>> 
>>> if (length == 0) {
>>> +            if (offset < 0 || offset > this.length) {
>>> +                throw new IllegalStateException("offset: " + offset 
>>> +
>>> ", region length: " + this.length);
>>> +            }
>>> +        }
>>> 
>>> This  stuff inside the braces is 99% the same as checkBounds(offset),
>>> except for a '<' instead of '<='.
>>> 
>>> Wouldn't it be better to special case checkBounds to deal with zero
>>> length gracefully, and then just call it from checkRange?
>> 
>> The problem is that we have 2 requirements, 1.) bounds check for 
>> accessing a byte in a memory region. 2.) checking if one region is a 
>> sub-region of another. For 1. the offset can not be equal to the 
>> length of the region, because then we are reading outside of the 
>> region, but for 2. this is fine, as long as the sub-region has a 
>> length of 0.
>> 
>> checkBounds works like an array index check, so it works well for 1., 
>> but not for 2. It seemed better to handle 2. in checkRange, since we 
>> actually know the length of the range being accessed there, which 
>> could be 0. I really don't think checkBounds is the right place to try 
>> and fix this problem. Tbh, it seems like a mistake that checkRange 
>> relies on checkBounds in the first place.
>> 
>> I agree that the current code looks kind of cryptic. I've tried to 
>> improve this by removing the dependency on checkBounds from 
>> checkRange.
>> 
>>> Also, and probably even better - if this stuff is there just to 
>>> handle
>>> NOTHING - wouldn't we better off overriding the check methods for the
>>> NOTHING instance (as we do for everything) ? I'm not a fan of these
>>> hard-coded special cases, which will be nearly impossible to read 2
>>> months from now.
>> 
>> Actually, I think I spoke too soon. I think the actual use case was to 
>> allow passing a null pointer to a native function, since the unboxing 
>> incurs a bounds check. The re-write of checkRange takes care of that 
>> as well.
>> 
>> Update webrev: 
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.03/
>> 
>> (FWIW, re-writing the bounds check seems to have improved perf as 
>> well. My test runs are finishing about a minute faster).
>> 
>> Jorn
>> 
>>> Cheers
>>> Maurizio
>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-02-14 16:19:
>>>> On 14/02/2019 15:18, Maurizio Cimadamore wrote:
>>>> Why not using the Array.assign API point, which already performs all
>>>> these checks?
>>>> 
>>>> Similarly, I think References.OfStruct::set should use Struct.assign
>>>> -
>>>> as there's always a chance of mis-using the setter MH to write a
>>>> struct that is bigger or smaller than the expected one.
>>>> 
>>>> Maurizio
>>>> 
>>>> Maurizio
>>>> 
>>>> On 14/02/2019 14:31, Jorn Vernee wrote:
>>>> Okay, I have implemented the check in References.OfArray::set.
>>>> 
>>>> Updated webrev:
>>>> 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.01/
>>>> 
>>>> 
>>>> I was also tinkering a long time with simplifying the original code,
>>>> but kept running into issues with bounds checks in MemoryBoundInfo.
>>>> The current bounds checking code isn't really suited for dealing
>>>> with 0 length range checks.
>>>> 
>>>> Jorn
>>>> 
>>>> Jorn Vernee schreef op 2019-02-13 19:49:
>>>> Jorn Vernee schreef op 2019-02-13 19:44:
>>>> Henry Jen schreef op 2019-02-13 19:19:
>>>> On Feb 13, 2019, at 9:42 AM, Jorn Vernee <jbvernee at xs4all.nl> wrote:
>>>> 
>>>> 
>>>> I don't think this bug is just a symptom of the bulk copy of Arrays.
>>>> Note that you can also cause an overwrite when you have a struct
>>>> with 2 ints, take a pointer to the first one, cast it to a long
>>>> pointer, and write to it.
>>>> 
>>>> This is perfectly OK. The working theory is that as long as you
>>>> operated in the allocated memory region, you should be able to cast
>>>> as
>>>> needed.
>>>> 
>>>> In you array case earlier, that’s not OK as the type of array has
>>>> a
>>>> length, so I would expect an exception. But if you cast the pointer
>>>> to
>>>> an Array of 3, then it should be able to overwrite the int. After
>>>> all,
>>>> [2i32]i32 and [3i32] and i32i32i32 are the same size of memory
>>>> block.
>>> 
>>> The problem is that there is no way to know the size of the target
>>> array type once we're in References.OfArray::set, since the target
>>> pointer might not actually have that type. The earliest location 
>>> where
>>> 
>>> we do know this size is in RuntimeSupport, when retrieving the 
>>> pointer
>>> 
>>> to the field.
>>> 
>>> Actually, maybe we do know the type... I'm wondering about use of
>>> unsafe set operations in boxing code. I will just try it out and see.
>>> 
>>> Jorn
>>> 
>>>> Jorn
>>>> 
>>>> Cheers,
>>>> Henry
>>>> 
>>>> It simply seems incorrect to me that a pointer to a struct's field
>>>> provides access outside of that field's memory.
>>>> 
>>>> ---
>>>> 
>>>> FWIW, I think the Array abstraction is a useful one. It signals the
>>>> difference between having a chunk of memory (Array) vs. having just
>>>> a cursor into memory (Pointer), and as such, I think it's fine to
>>>> say that writing an Array constitutes a copy of the array vs. just a
>>>> copy of the pointer. If a bulk-copy is not wanted, users can use the
>>>> elementPointer() instead (same with Struct::ptr).
>>>> 
>>>> The native types do not map perfectly into Java types, so there are
>>>> some things that have to be learned when using the API. I think the
>>>> pitfall here is the assumption that, since in C an array is just a
>>>> pointer in a lot of cases, and setting a pointer does not incur a
>>>> copy, the Array type in Java must also really be just a Pointer, and
>>>> setting it should not incur a copy. But in the API we have 2
>>>> distinct types, Pointer and Array, so I don't think it's
>>>> unreasonable to say that those 2 will behave differently.
>>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-02-13 17:57:
>>>> Hi Jorn,
>>>> I was looking at something very related to this - e.g. relationship
>>>> between pointers and arrays, and, in general bulk-write operations
>>>> for
>>>> structs and arrays (with Pointer::set) and I thought I might add
>>>> something to this discussion, to see if the issues that you are
>>>> running into are just bugs, or symptoms of something deeper.
>>>> Over the last few weeks I've been toying with the idea of merging
>>>> Array and Pointers - after all a BoundedPointer is expressive enough
>>>> 
>>>> to represent both. We could e.g. setup an approach like the one
>>>> below:
>>>> - Pointer has array-like accessors Pointer::get(long),
>>>> Pointer::set(long, X)
>>>> - regular accessor defaults to zero offset - that is, Pointer::get()
>>>> 
>>>> -> Pointer::get(0)
>>>> - jextract does NOT generate setters for array struct fields, or
>>>> array
>>>> global variables - only getters
>>>> - an API is provided (well, one exists already) to do bulk copy
>>>> between different arrays/structs
>>>> While I like the unification this brings about (only one abstraction
>>>> 
>>>> instead of two, Pointer and Array), and I also like the fact that we
>>>> 
>>>> move towards a model where Pointer::get, Pointer::set are O(1)
>>>> operations (with bulk operations left to higher level APIs), there
>>>> is
>>>> something that doesn't 100% convinces me: if we go down this path,
>>>> the
>>>> following layouts:
>>>> u64:u64:i32 (pointer to pointer to int)
>>>> and
>>>> u64:[5 i32] (pointer to array of int)
>>>> will effectively have the same carrier type:
>>>> Pointer<Pointer<Integer>>
>>>> The difference will be visible only upon closer inspection: we could
>>>> 
>>>> call Pointer::type() obtain a LayoutType, then do
>>>> LayoutType::layout()
>>>> and see whether the pointee layout is a sequence or an address.
>>>> Now, when we perform a get() on such a pointer, we can, given the
>>>> layout, construct the right pointer with the right size info (if
>>>> any).
>>>> But what about set() ? I see different options here, none of which
>>>> seems particularly satisfying:
>>>> 1) Pointer::set should throw if you are trying to set an array-like
>>>> pointer (which would require bulk copy)
>>>> 2) Pointer::set will silently perform bulk copy of the incoming
>>>> pointer into the pointed region
>>>> Of these, (2) is similar to what we have now, whereas (1) would be a
>>>> 
>>>> stricter variant.
>>>> The thing that puzzles me is that, looking at the code it will be
>>>> absolutely impossible to understand what's going on - e.g. if native
>>>> 
>>>> arrays and native pointers map to the same Java carrier (Pointer) it
>>>> 
>>>> then becomes super hard to explain/understand/predict why a given
>>>> operation (e.g. Pointer::set) resulted in bulk copy/exception.
>>>> One of the principles I've been trying to adhere to when designing
>>>> the
>>>> foreign API is to avoid surprises - e.g. the semantics of a given
>>>> operation should be clear from the carrier types involved in the
>>>> operation. In that respect, maybe a pointer with an optional
>>>> length()
>>>> is tolerable, but having Pointer::set behaving differently depending
>>>> 
>>>> on the result of Pointer::type() is that acceptable or too subtle?
>>>> This seems to suggest that the current Pointer vs. Array split
>>>> carries
>>>> some weight. If carrier type are different then it's easy to see for
>>>> 
>>>> the user which operation might or might not be supported (or might
>>>> have bulk-logic).
>>>> <sidebar>
>>>> On the other hand, one might argue that this is already happening
>>>> with
>>>> pointers to incomplete objects - if I have a Pointer<?> and I call
>>>> get() I don't know if I'll get an exception or not. It again depends
>>>> 
>>>> on whether the layout complete or not - e.g. if it's the void layout
>>>> 
>>>> or, if it's partial layout (because it refers to an unresolved
>>>> struct)
>>>> an exception will be thrown.
>>>> </sidebar>
>>>> If we do retain Pointer vs. Array then I think we are free to decide
>>>> 
>>>> whether for Pointer::set we want (1) or (2) - that is, when we have
>>>> Pointer<Array<Integer>>
>>>> or
>>>> Pointer<StructFoo>
>>>> should Pointer::set do bulk copy, or should it throw? If the latter,
>>>> 
>>>> should jextract even emit setters for struct fields of type
>>>> array/struct? (or, perhaps, emulate such setters by performing
>>>> explicit copy, rather than by using Pointer::set internally).
>>>> These are all questions that are relevant, I believe, to the fix you
>>>> 
>>>> brought up - I'm of course fine with the fix, but I'd like to also
>>>> understand whether the bulk-copy on Pointer::set is one magic trick
>>>> too far and also, more generally, what do Panama-land feels about
>>>> the
>>>> Array vs. Pointer split/lump.
>>>> Feedback welcome.
>>>> Cheers
>>>> Maurizio
>>>> On 13/02/2019 15:47, Jorn Vernee wrote:
>>>> Hi,
>>>> I found a bug where it was possible to overwrite trailing fields of
>>>> a struct by writing an oversized array to a previous array field
>>>> (see bug). Overwriting is also possible in other cases by forcing an
>>>> oversized write to a struct field. This can be fixed (relatively
>>>> easily) by limiting the size of memory regions of pointers to struct
>>>> members to the size of their fields.
>>>> Please review the following.
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218772
>>>> Webrev:
>>>> 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218772/webrev.00/
>>>> Thanks,
>>>> Jorn