From david.r.chase at oracle.com Thu Nov 6 21:58:45 2014 From: david.r.chase at oracle.com (David Chase) Date: Thu, 6 Nov 2014 16:58:45 -0500 Subject: Reference SUB/SEP Value question In-Reply-To: References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com> Message-ID: <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com> Here?s a first stab at clarifying the problem. We might need to stab it some more before we are done. >> 3) I spent some time experimenting with the relationship between a >> so-called ?Ref? type and a so-called >> ?Value? type, thought I had some answers, but after whacking on the > > I don't quite understand 3). Would it help to post some javadoc/test code > snippets to illustrate? If you haven't got this handy, we can scrounge > something out of our prototypes. I think there are two choices, refSUBvalue and refSEPvalue. For SUB, you might have a hierarchy that looks like this: interface ComplexValue class Complex implements ComplexValue interface ComplexRef extends ComplexValue, Ref class ComplexGeneratedProxy implements ComplexRef You can do this because a Reference can support all the methods of Value by fetching the relevant fields etc, plus it has some more methods for setting those fields. For SEP, you might have a hierarchy that looks like this: interface ComplexValue class Complex implements ComplexValue interface ComplexRef extends Ref { ComplexValue get(); // canonical connection between Ref and Value } class ComplexGeneratedProxy implements ComplexRef The main difference is that in the refSUBvalue world, every ref is implicitly also a value, and that might be convenient, but I think it has the two problems of tricking people into shooting themselves in the foot with aliasing, and that ComplexValue is likely to be an interface with at least two implementations, hence not falling-off-a-log easy to optimize with ClassHierarchyAnalysis. In the world where they are separate, the native programmer will be forced to notice the difference between refs and values, but ComplexRef (in particular) is very likely to be a single-implementation interface (the proxy class) and thus will be easy to optimize. On the other hand, getting your hands on a value will require a second memory allocation (but perhaps it will be a short-lived memory allocation amenable to escape analysis). I think SEP also helps us avoid some overloaded-method-selection surprises. TBD are the exact choices for field names in Ref and Value interfaces when they are separate; I am inclined to give Refs getter methods that have the same name and signature as those in the Value interface (hence, an implicit dereference ? is this inconsistency okay?) Did I at least explain what I think the issues are? I need to test some coding in both models (I?ve tested ?sub?) to see how they compare. David From david.r.chase at oracle.com Mon Nov 10 21:57:33 2014 From: david.r.chase at oracle.com (David Chase) Date: Mon, 10 Nov 2014 16:57:33 -0500 Subject: More detail than I had intended on Layout description language. Message-ID: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com> On 2014-10-31, at 1:21 PM, Angela Lin wrote: > Then there are these divergent rabbit holes to follow: > i) the tiny language specification <-- Let's call this the "layout > descriptor language"? > ii) the runtime system > > There might be a 3rd part that glues between the first two: > iii) generated interfaces that are provided to the Java programmer At this end, Henry Jen is working on a tool based on libclang that takes header files and creates appropriate Java interfaces. I think that the Java- level inputs are (1) those interfaces (2) combined with a layout-descriptor (written in LDL) that tells which interface methods are getters, which are setters, and what the sizes/locations/alignments of the various fields are. I think the output of the tool is still TBD. In my experiments, I used a little string encoding, along the lines of ?foo, bar[11], baz:4,nyb:4? and I used naming conventions and typing conventions from the supplied interface to infer the C types. I think this was mostly wrong. What I think now: The language should not be one long string (but you could talk me back into this, I am basing this purely on the difficulty of ruling out screwball string inputs). I?m torn between the structure and non-ambiguity of defining a class to describe fields and passing in an array, along the lines of (notice how this is not Java or C, so it must not be source code?) NativeFieldDescriptor { // AKA NFD String field_accessor_name; ? stuff describing size/align that I am having trouble with ... long[] array_dimensions default=long[0] or maybe null } The ?stuff ? I am having trouble with? is difficult for two reasons, and I think maybe Henry will have an opinion based on what comes out of this tool, or you will have an opinion based on your experience. Reason #1 is that I am trying to figure out where to set the ?friendly to human programmers? dial. I?ve become less and less worried about this, because of reason #2 and related issues, and because we could surely define a ?friendly layer? if we needed one. Reason #2 is that I?m trying to figure out how to deal gracefully with the endianness issue on a platform that works very hard to hide such issues. It seems to me that for a given ?little language? input, the behavior of the Java side (and the bytecodes that they generate) should not depend on the endianness of the underlying platform. The test case I?ve been playing with to try to figure out the right way to talk about this is struct c { unsigned short x:1; unsigned short y:7; }; I think that the little-language generated for that has to depend on the endianness of the underlying platform, because it generates different results at the Java side. On a little-endian machine, story 1 into x and y would be something along the lines of storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003 but on a bigendian machine storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100 So, since the Java-side behavior should be different, I think that the little language inputs (the output of the libclang-based tool) should be different depending on platform ? the offsets will need to be described in ways that make conform to Java?s expectations. One thing that might not be obvious is that the little language will need to specify the size of the bit container into which bitfields are stored, because the translation between pairs of byte offsets and short offsets differs on LE and BE machines. For example, the little endian storeByte equivalent of storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003 is storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03 storeByte(address+1) // store 0x00__ but big endian is storeByte(address) // store 0x00__ storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03 So a simple offset and size formulation is not adequate; we need to know the underlying container size for a given bit address ? unless we blow everything down to store bytes and then hack our compiler to reconsolidate adjacent stores. Again, input from Henry would be helpful here ? this container-size information may already be present because of how C deals with bitfields in general (a char-typed bitfield may not span a byte boundary, but a short-typed bitfield can if it is not also a short boundary). Another way to think of container size is ?alignment? ? a struct of short-typed bitfields will have size and alignment 2 even if the bitfields all fit in a single byte. (I don?t think we want to support full generality here because we could end up in places where we need to understand the underlying endianness; if we use byte loads to obtain an int, to where do we shuffle the respective bytes?) Is this wrong-headed? Do we like where the ?same behavior on the Java side? rule leads us? ------------------ Other opinions I have acquired recently based on mistakes already made: I think it is okay if we assume that the names in the LDL field descriptors should match methods defined in the corresponding interface in relatively obvious ways. I think there are three accessors to consider, and maybe we should just name them separately (and lack of a name means we don?t want that access mode). So NativeFieldDescriptor { // AKA NFD String field_getter_name; String field_setter_name; String field_reference_name; ? stuff describing size/align that I am having trouble with ? long[] array_dimensions default=long[0] or maybe null } getters and setters would probably default to returning something like a ?value?, and reference generators would of course create proxy objects for fields. I suspect (given the direction that I?ve gone specifying bitfield size and width and alignment) that signedness or unsignedness of integer fields should just be a boolean in the NFD, and as long as the type fits into the interface-method signature we use it with no error signaled. That still leaves the integer-size-for-unsigned issue to be resolved, but moves it to the tool generating LDL and interfaces; i.e. we just removed a little bit of policy from the layout engine, and that might be fine. David From henry.jen at oracle.com Tue Nov 11 01:35:09 2014 From: henry.jen at oracle.com (Henry Jen) Date: Mon, 10 Nov 2014 17:35:09 -0800 Subject: More detail than I had intended on Layout description language. In-Reply-To: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com> References: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com> Message-ID: <546167CD.8000403@oracle.com> On 11/10/2014 01:57 PM, David Chase wrote: > > On 2014-10-31, at 1:21 PM, Angela Lin wrote: >> Then there are these divergent rabbit holes to follow: >> i) the tiny language specification <-- Let's call this the "layout >> descriptor language"? >> ii) the runtime system >> >> There might be a 3rd part that glues between the first two: >> iii) generated interfaces that are provided to the Java programmer > > The test case I?ve been playing with to try to figure out the right way to talk about > this is > > struct c { > unsigned short x:1; > unsigned short y:7; > }; > > I think that the little-language generated for that has to depend on the endianness of > the underlying platform, because it generates different results at the Java side. On a > little-endian machine, story 1 into x and y would be something along the lines of > > storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003 > > but on a bigendian machine > > storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100 > > So, since the Java-side behavior should be different, I think that the little language inputs > (the output of the libclang-based tool) should be different depending on platform ? the > offsets will need to be described in ways that make conform to Java?s expectations. > > One thing that might not be obvious is that the little language will need to specify the > size of the bit container into which bitfields are stored, because the translation between > pairs of byte offsets and short offsets differs on LE and BE machines. > > For example, the little endian storeByte equivalent of > > storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003 > > is > > storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03 > storeByte(address+1) // store 0x00__ > > but big endian is > > storeByte(address) // store 0x00__ > storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03 > > So a simple offset and size formulation is not adequate; we need to know the > underlying container size for a given bit address ? unless we blow everything > down to store bytes and then hack our compiler to reconsolidate adjacent stores. > > Again, input from Henry would be helpful here ? this container-size > information may already be present because of how C deals with bitfields > in general (a char-typed bitfield may not span a byte boundary, but a short-typed > bitfield can if it is not also a short boundary). Another way to think of container > size is ?alignment? ? a struct of short-typed bitfields will have size and alignment 2 > even if the bitfields all fit in a single byte. (I don?t think we want to support full > generality here because we could end up in places where we need to understand > the underlying endianness; if we use byte loads to obtain an int, to where do > we shuffle the respective bytes?) > We need to consider when/where the libclang-based tool is used, that dictates whether we can rely on libclang for such information. The information is available for a target platform given a header file. It is possible to build some sort of abstraction layer to describe layout that can cover general cases, we will see how the experiment goes. Cheers, Henry From angela_lin at ca.ibm.com Wed Nov 12 14:49:39 2014 From: angela_lin at ca.ibm.com (Angela Lin) Date: Wed, 12 Nov 2014 09:49:39 -0500 Subject: Reference SUB/SEP Value question In-Reply-To: <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com> References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com> <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com> Message-ID: Have I got this straight? - ComplexGeneratedProxy is the runtime-generated thing that knows how to pull structured data out of some memory. - ComplexValue is the interface generated by the groveller. This is the type that the Java programmer can use directly? - Complex is the layout that is returned from the layout factory. This is also generated at runtime? - Ref supplies base behaviour for all layouts. Why the need for ComplexRef and ComplexValue to be separate entities? What are the methods of each? Thanks, Angela David Chase wrote on 11/06/2014 04:58:45 PM: > From: David Chase > To: Angela Lin/Ottawa/IBM at IBMCA > Cc: panama-spec-experts at openjdk.java.net > Date: 11/06/2014 04:58 PM > Subject: Reference SUB/SEP Value question > > Here?s a first stab at clarifying the problem. > We might need to stab it some more before we are done. > > >> 3) I spent some time experimenting with the relationship between a > >> so-called ?Ref? type and a so-called > >> ?Value? type, thought I had some answers, but after whacking on the > > > > I don't quite understand 3). Would it help to post some javadoc/test code > > snippets to illustrate? If you haven't got this handy, we can scrounge > > something out of our prototypes. > > I think there are two choices, refSUBvalue and refSEPvalue. > > For SUB, you might have a hierarchy that looks like this: > > interface ComplexValue > class Complex implements ComplexValue > interface ComplexRef extends ComplexValue, Ref > class ComplexGeneratedProxy implements ComplexRef > > You can do this because a Reference can support all the > methods of Value by fetching the relevant fields etc, plus > it has some more methods for setting those fields. > > For SEP, you might have a hierarchy that looks like this: > > interface ComplexValue > class Complex implements ComplexValue > > interface ComplexRef extends Ref { > ComplexValue get(); // canonical connection between Ref and Value > } > class ComplexGeneratedProxy implements ComplexRef > From david.r.chase at oracle.com Wed Nov 12 19:44:17 2014 From: david.r.chase at oracle.com (David Chase) Date: Wed, 12 Nov 2014 14:44:17 -0500 Subject: Reference SUB/SEP Value question In-Reply-To: References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com> <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com> Message-ID: On 2014-11-12, at 9:49 AM, Angela Lin wrote: > Have I got this straight? > > - ComplexGeneratedProxy is the runtime-generated thing that knows how to pull structured data out of some memory. > - ComplexValue is the interface generated by the groveller. This is the type that the Java programmer can use directly? I think the Groveler creates both ComplexValue and ComplexRef, plus a little language description of the layout. Maybe that?s wrong ? perhaps the little language should be encoded into static methods of the ComplexRef type. (Another option, besides strings or data structures). I?m not 100% sure of the need for ComplexValue. I know we need ComplexRef and the generated proxy class. > - Complex is the layout that is returned from the layout factory. This is also generated at runtime? I think that the layout factory returns a ComplexGeneratedProxy which is always a ComplexRef. > - Ref supplies base behaviour for all layouts. Yes ? a lot of it common to the unsafe machinery at the bottom, like ?address? and ?size? and ?align?. > Why the need for ComplexRef and ComplexValue to be separate entities? What are the methods of each? ComplexRef has more methods, in particular it has setter methods and may have ref methods to its own fields. ComplexValue only has getters. We were discussing this last night, and I don?t feel certain of my opinions, and that?s why I laid out both ?SEP? and ?SUB? hierarchies. The problem I?m trying to sort out are the tradeoffs between too darn many entities, versus what we know programmers will want to do with them, versus the downward-aimed aliasing firearms that we hand out to programmers. Making FooRef subtype FooValue guarantees value-aliasing problems; separating them allows the implementation to create them in the obvious way when you ask a FooRef for its FooValue, but does not require it. It may be the case that we want to make a cleaner distinction between those things that are ?values? and those that are ?references?, and not have anything be both. So for example, maybe we regard Complex as a ?value? in which case there is no way to separately set the two fields within the layout ? you can get them, you can get the whole Complex as a ?value?, but you cannot set them separately. -> this implies a not quite the same ComplexGeneratedProxy, since it will have no setters. -> and maybe we need a name for the ?value? type that you get from dereferencing such a thing. But for (say) in IP header, you probably do want to monkey with some of the fields in place, so it would be an IPHeaderRef (need to see the setter methods) and the IPHeaderGeneratedProxy would also implement those. Maybe don?t ever need to regard the IPHeaderRef as if it were a ?value?. Have I stated my problem clearly enough? David > Thanks, > Angela > > David Chase wrote on 11/06/2014 04:58:45 PM: > > > From: David Chase > > To: Angela Lin/Ottawa/IBM at IBMCA > > Cc: panama-spec-experts at openjdk.java.net > > Date: 11/06/2014 04:58 PM > > Subject: Reference SUB/SEP Value question > > > > Here?s a first stab at clarifying the problem. > > We might need to stab it some more before we are done. > > > > >> 3) I spent some time experimenting with the relationship between a > > >> so-called ?Ref? type and a so-called > > >> ?Value? type, thought I had some answers, but after whacking on the > > > > > > I don't quite understand 3). Would it help to post some javadoc/test code > > > snippets to illustrate? If you haven't got this handy, we can scrounge > > > something out of our prototypes. > > > > I think there are two choices, refSUBvalue and refSEPvalue. > > > > For SUB, you might have a hierarchy that looks like this: > > > > interface ComplexValue > > class Complex implements ComplexValue > > interface ComplexRef extends ComplexValue, Ref > > class ComplexGeneratedProxy implements ComplexRef > > > > You can do this because a Reference can support all the > > methods of Value by fetching the relevant fields etc, plus > > it has some more methods for setting those fields. > > > > For SEP, you might have a hierarchy that looks like this: > > > > interface ComplexValue > > class Complex implements ComplexValue > > > > interface ComplexRef extends Ref { > > ComplexValue get(); // canonical connection between Ref and Value > > } > > class ComplexGeneratedProxy implements ComplexRef > > > From atobia at ca.ibm.com Tue Nov 18 17:31:44 2014 From: atobia at ca.ibm.com (Tobi Ajila) Date: Tue, 18 Nov 2014 12:31:44 -0500 Subject: More detail than I had intended on Layout description language. In-Reply-To: References: Message-ID: Hi > The language should not be one long string (but you could talk me back > into this, I am basing this purely on the difficulty of ruling out screwball > string inputs). I?m torn between the structure and non-ambiguity of defining > a class to describe fields and passing in an array, along the lines of (notice > how this is not Java or C, so it must not be source code?) We are open to a class based LD format. The downside to this is that the LDL is no longer a compact representation. Our approach combines the two Java level inputs by placing LDL annotations in the generated interfaces. This is something we can explore further, but for the remainder of this email I will continue to use the string notation. > Reason #2 is that I?m trying to figure out how to deal gracefully with the endianness > issue on a platform that works very hard to hide such issues. It seems to me that > for a given ?little language? input, the behavior of the Java side (and the bytecodes > that they generate) should not depend on the endianness of the underlying platform. ... > Is this wrong-headed? Do we like where the ?same behavior on the Java side? > rule leads us? By default we assume that the layout data is the same endian as the execution environment. We think this is what end-users would expect. Automatic endian conversion would be nice to have, but we don't want to bake in that cost into the JDK. Also, nothing stops us from being explicit in the LDL about the endianness. We are certainly open to more discussion on this topic. In regards to, "? stuff describing size/align that I am having trouble with ...", we think the LDL should represent how the memory is laid out exactly. It should not make any assumptions about alignment. I will go over some features of our prototype LDL. Hopefully that will clear up some things. The basic grammar notation of the LD format is the following. "Qualified Name[ ':' Qualified Super-Class Name]" { ',' "[Field Name]{'['Number of Elements']'} ':' (Size | 'pointer' | 'Layout' Name)"} where: - 'Qualified Name' is the fully qualified name of the layout being described - for example "com.ibm.shapes.Square" - 'Qualified Super-Class Name' is the fully qualified name of the super class - 'Field Name' is the name of the field defined in the layout - if the field name is omitted no accessor is generated for it - 'Number of Elements' specifies the number of elements in an array dimension (example later) - LDL assumes multidimensional arrays are arranged in row-major order - 'Size' is the size of the field in bits - for arrays this is the size of a single element of the leaf type - special keyword 'pointer' is used for pointer fields (example later) - 'Name' specifies name of a defined layout (example later) The following is a basic example where we have a layout 'Point2D' with two fields, 'x' and 'y'. The corresponding LD is shown below with the appropriate size for the fields. The order in which the fields appear in the Layout is preserved in the LD. As you have probably noticed, the field type is not carried over to the LD. Each layout field is treated as a typeless collection of bits. Example 1: struct Point2D { uint32_t x; uint32_t y; } LD: "Point2D", "x:32", "y:32" One can define a layout with array fields by using the following notation. Example 2: struct A { uint32_t x; uint32_t y[10]; //this is an array of fields } LD: "A", "x:32", "y[10]:32" Here is another example with a multi-dimensional array field. Example 3: struct B { uint32_t x; uint32_t y[2][10]; //this is a 2-d array } LD: "B", "x:32", "y[2][10]:32" The language assumes no implicit padding or start alignment for allocators of a type. This is a pain point for native programmers as the compiler may add padding, and different compilers may add different padding. The language assumes that fields are laid out in order and in a packed representation. We will not automatically fix-up field alignment or padding. To illustrate this point, the following example shows a layout composed of two fields. The first, a 'uint8_t', and the second, a 'uint32_t'. On a 32-bit platform the a compiler may add 3 byte padding to align the second field. The padding field has to be explicitly notated in the LD, but the name can be omitted. Example 4: struct C { uint8_t data1; uint32_t data2; } LD: "C", "data1:8", "data2:32" on 32 bit would be compiled as: struct C { uint8_t data1; uint8_t padding1[3]; uint32_t data2; } LD: "C", "data1:8", ":24", "data2:32" Native pointer types are platform specific. To address this a new keyword, 'pointer', is introduced. The size of this type depends on the JVM architecture (32 or 64). The next example illustrates the use of pointers in a linked list node. Example 5: struct Node { uint32_t data; struct Node *next; } LD: "Node", "data:32", "next:pointer" This solution does have its drawbacks as it would require the alignment to be explicitly specified in the LDL depending on JVM architecture. The next example illustrates this. We have chosen this solution as a starting point but it is an issue that requires more consideration. Example 6: on 64bit would be compiled as: struct Node { uint32_t data; uint8_t padding1[4]; struct Node *next; } LD: "Node", "data:32", ":32", "next:pointer" Another issue to consider are floating-point types, where the native floating-point representation may not match Java's. Generated accessors for Layouts with float fields should give you access to the raw bits but not automatically convert the data into a Java floating-point value. The following example introduces a concept that was not previously discussed before, a nesting of layouts. This poses some interesting challenges as it introduces dependencies between layouts. If this is something we wish to support, the following is a syntax that can be used to describe it. Example 7: struct Line2D { Point2D start; Point2D end; } LD: "Line2D", "start:Layout Point2D", "end:Layout Point2D" Lastly, we have an example that displays layout inheritance. If this is something that we wish to support, here is how we could do it. Example 8: struct Point3D : Point2D { uint32_t z; } LD: "Point3D : Point2D", "z:32" The proposed LD format does a good job at representing how the layout fields appear in memory. But, it does not convey any type information about the fields. This makes it difficult for the end-user as it requires more effort on their part to add meaning to the fields. The ideal solution is one that can represent these two streams of data, either in a LDL or NFD format. Type Data - integral type - signed vs unsigned - float type - pointer type - raw type - if we need to represent data in Java without any type info associated with it Layout Data - name - size - offset Tobi From angela_lin at ca.ibm.com Tue Nov 18 22:31:00 2014 From: angela_lin at ca.ibm.com (Angela Lin) Date: Tue, 18 Nov 2014 17:31:00 -0500 Subject: More detail than I had intended on Layout description language. In-Reply-To: References: Message-ID: Some aspects I'd like to highlight from Tobi's email: 1. LDL/NFD, by default, should describe exact memory layout on the execution platform. In particular: - Fields must have explicitly specified width. So, if the LDL/NFD models a C struct that contains a void*, the size of the void* has to be known up-front. - No automatic padding or alignment. - Fields either have explicitly specified offset, or are laid out in the order of specification. - Native endian. This implies that the descriptor for, say, a C struct, may not be portable across all platforms. If the descriptor models a data structure that is shared with a native library, then the descriptor is tied to the particular native library binary. I think this is consistent with David's earlier statement: > So, since the Java-side behavior should be different, I think that > the little language inputs > (the output of the libclang-based tool) should be different > depending on platform ? the > offsets will need to be described in ways that make conform to > Java?s expectations. User-friendly descriptor options could certainly be added, as long as we agree on the default behaviour. 2. Layouts should offer access to the raw field bits by value and reference, without conversion to a Java type (primitive int, float, or otherwise). We'd like to be able to pass a field to a library that knows how to handle the non-Java data type, and we'd like to avoid marshalling costs. 3. The descriptor should provide a way to specify how the raw field bits should be interpreted as a Java entity. This could be as simple as casting the field bits to a suitably sized Java integer type. A more structured descriptor language does provide more flexibility for doing the above. Angela From david.r.chase at oracle.com Tue Nov 18 23:28:35 2014 From: david.r.chase at oracle.com (David Chase) Date: Tue, 18 Nov 2014 18:28:35 -0500 Subject: More detail than I had intended on Layout description language. In-Reply-To: References: Message-ID: On 2014-11-18, at 5:31 PM, Angela Lin wrote: > > Some aspects I'd like to highlight from Tobi's email: > > 1. LDL/NFD, by default, should describe exact memory layout on the > execution platform. In particular: > - Fields must have explicitly specified width. So, if the LDL/NFD models a > C struct that contains a void*, the size of the void* has to be known > up-front. > - No automatic padding or alignment. > - Fields either have explicitly specified offset, or are laid out in the > order of specification. > - Native endian. > > This implies that the descriptor for, say, a C struct, may not be portable > across all platforms. If the descriptor models a data structure that is > shared with a native library, then the descriptor is tied to the particular > native library binary. I think this is consistent with David's earlier > statement: >> So, since the Java-side behavior should be different, I think that >> the little language inputs >> (the output of the libclang-based tool) should be different >> depending on platform ? the >> offsets will need to be described in ways that make conform to >> Java?s expectations. I think what you are saying is consistent, but I didn?t get that from Tobi?s email, so I?ll reread it more carefully. The language surrounding ?pointer? in particular made me think that he was approaching it differently. Was I correct in seeing that it is proposed to attach LDL to interfaces with annotations? My snap reaction is that?s a good idea, even if it does perhaps push the LDL back into text. And I think (based on discussions at this end) that being able to treat these things as native-resident references is priority #1, but I?m not sure that the only priority. But I will look again. (A bug with a deadline is currently grabbing most of my attention, unfortunately). Also, I think we might need alignment specification, especially considering possible GPU applications. It?s not necessarily for padding purposes, but it is required to ensure that memory is properly allocated and placed ? for example, can I allocate a 64-bit quantity on an 8-bit boundary? How am I told that I cannot do this? David From angela_lin at ca.ibm.com Wed Nov 19 17:32:09 2014 From: angela_lin at ca.ibm.com (Angela Lin) Date: Wed, 19 Nov 2014 12:32:09 -0500 Subject: More detail than I had intended on Layout description language. In-Reply-To: References: Message-ID: David Chase wrote on 11/18/2014 06:28:35 PM: > From: David Chase > To: Angela Lin/Ottawa/IBM at IBMCA > Cc: panama-spec-experts at openjdk.java.net, IBM Panama Spec Group > > Date: 11/18/2014 06:28 PM > Subject: Re: More detail than I had intended on Layout description language. > > > On 2014-11-18, at 5:31 PM, Angela Lin wrote: > > > > > Some aspects I'd like to highlight from Tobi's email: > > > > 1. LDL/NFD, by default, should describe exact memory layout on the > > execution platform. In particular: > > - Fields must have explicitly specified width. So, if the LDL/NFD models a > > C struct that contains a void*, the size of the void* has to be known > > up-front. > > - No automatic padding or alignment. > > - Fields either have explicitly specified offset, or are laid out in the > > order of specification. > > - Native endian. > > > > This implies that the descriptor for, say, a C struct, may not be portable > > across all platforms. If the descriptor models a data structure that is > > shared with a native library, then the descriptor is tied to the particular > > native library binary. I think this is consistent with David's earlier > > statement: > >> So, since the Java-side behavior should be different, I think that > >> the little language inputs > >> (the output of the libclang-based tool) should be different > >> depending on platform ? the > >> offsets will need to be described in ways that make conform to > >> Java?s expectations. > > I think what you are saying is consistent, but I didn?t get that > from Tobi?s email, > so I?ll reread it more carefully. The language surrounding > ?pointer? in particular > made me think that he was approaching it differently. Tobi's email refers to our attempt to implement a portable "pointer" type in the LDL. It's an interesting option to consider, but I'm not sure we want to deal with the knock-on effects of supporting this. For example, how do we align the pointer field (in the LDL) if we don't know its size? What's the impact on the layout nesting or extension features? If we decide that LDL is not platform-independent, then "pointer" doesn't seem as useful. > Was I correct in seeing that it is proposed to attach LDL to > interfaces with annotations? Yes, we proposed this. However, we're certainly open to alternatives. Since LDL is not platform-independent, if the LDLs are attached to the generated Java interfaces, then the Java interfaces are also not platform-independent. Is this is good or bad? > My snap reaction is that?s a good idea, even if it does perhaps push > the LDL back into text. > > And I think (based on discussions at this end) that being able to treat these > things as native-resident references is priority #1, but I?m not > sure that the only > priority. Could you clarify the terminology "native-resident reference"? > But I will look again. (A bug with a deadline is currently grabbingmost of my > attention, unfortunately). > > Also, I think we might need alignment specification, especially > considering possible > GPU applications. It?s not necessarily for padding purposes, but it > is required to ensure > that memory is properly allocated and placed ? for example, can I > allocate a 64-bit quantity > on an 8-bit boundary? How am I told that I cannot do this? For start alignment, I think you're right. Again, I would propose starting with an explicit descriptor attribute, i.e. "the layout must start at an alignment of x". Angela From david.r.chase at oracle.com Fri Nov 21 22:34:41 2014 From: david.r.chase at oracle.com (David Chase) Date: Fri, 21 Nov 2014 17:34:41 -0500 Subject: More detail than I had intended on Layout description language. In-Reply-To: References: Message-ID: On 2014-11-19, at 12:32 PM, Angela Lin wrote: > Tobi's email refers to our attempt to implement a portable "pointer" type in the LDL. It's an interesting option to consider, but I'm not sure we want to deal with the knock-on effects of supporting this. For example, how do we align the pointer field (in the LDL) if we don't know its size? What's the impact on the layout nesting or extension features? > > If we decide that LDL is not platform-independent, then "pointer" doesn't seem as useful. > > > Was I correct in seeing that it is proposed to attach LDL to > > interfaces with annotations? > > Yes, we proposed this. However, we're certainly open to alternatives. > > Since LDL is not platform-independent, if the LDLs are attached to the generated Java interfaces, then the Java interfaces are also not platform-independent. Is this is good or bad? Normally I?d say bad. It might be interesting to try to see what happens with network protocols, which are ultimately (at the byte-by-byte level) unambiguously specified. That gets in the way of the platform-independent Java-behavior story, because I can easily imagine that what you like to have happen in the generated code is something like a loadInt, and on wrong-byte-order platforms, a byte-swap. I was thinking ?network protocols? because those are something that we really do expect to see on more than one platform. Alternatively, maybe we need a pair of intrinsics (potentially in sized versions) called toBigEndian() and toLittleEndian() so the generated code could always look the same but the behavior would vary?. Right, that will sail right past the language and corelibs guardians-of-the-faith. I think there?s some ugliness-conservation at work here. Perhaps we could extend the Unsafe interface to include endianness-tagged loads and stores. (This is an application of the Alice?s Restaurant Algorithm #1, one big pile of garbage is better than two little piles of garbage). Another advantage of this is that some processors (Sparc, I think) have options for swapping the data as it is loaded, so it may be expedient anyhow to push the byteswapping as close to the loads and stores as possible. Note that either of these options ? introducing intrinsics with targeted platform-dependent behavior, either conditional swaps or conditionally-swapped loads/stores ? would at least give us a prayer of saying that the same generated proxies would work across platforms for network-protocol layouts. Can we tag this as a ?would be nice if possible?? > > My snap reaction is that?s a good idea, even if it does perhaps push > > the LDL back into text. > > > > And I think (based on discussions at this end) that being able to treat these > > things as native-resident references is priority #1, but I?m not > > sure that the only > > priority. > > Could you clarify the terminology "native-resident reference?? I think I mean ?what you guys want? meaning that on the Java side there is (at minimum) an interface full of getters/setters and a generated proxy class, which conspire to peek/poke at memory that is ?native?, meaning not on the Java heap (i.e., allocated by malloc or mmap or similar). This is the bare minimum for smooth copy-less interoperation with native code in general, I think. For this, and for alignment, I think things get more interesting when you start nesting things ? for example, suppose you have struct Point { int x,y; } struct Triangle { Point vertices[6]; } Considering alignment, there is the natural propagation from fields to containers; a container can never have smaller alignment than its fields (except in the case of some exotic addressing schemes that I don?t think are normal for C programs). If you are references-only, then it?s all aliased. Suppose you wanted to rotate the vertices of the triangle, say Triangle tri = someNativeThing.get Triangle() Array points = tri(); Point t = ??I need to make a copy here?? (points.get(0)); points.get(0).set(points.get(1)); points.get(1).set(points.get(2)); points.get(2).set(t); So what happens to make that copy? I think there are (at least) 3 choices that are not even mutually exclusive: 1) Interface includes a clone() method, proxy allocates that clone (a) on the Java heap, using the standard Unsafe.get/set(null,?) hack (b) on the native heap 2) We have ?values? that are different in that they are never aliased and not mutable. 3) It?s an interface, there?s nothing in particular stopping programmers from writing their own implementations. 1a) pro: will GC smoothly. con: cannot be passed to native code unless our wrappers are a little clever 1b) pro: can definitely pass to native code. con: there may be GC issues (will require finalizers to free temps?) 2) pro: will GC smoothly, will probably optimize better in Java, no aliasing issues. con: cannot pass to native code, multiplying entities. 3) pro: we get to be lazy and punt; we can be sure that we did not commit to a mistake. con: not like the total amount of work is saved by us being lazy and punting, programmers can screw this up. There?s a slightly larger-picture question here, which is sort of ?how far does this stuff propagate into surrounding code before it gets a wrapper slapped on it?. > > But I will look again. (A bug with a deadline is currently grabbingmost of my > > attention, unfortunately). > > > > Also, I think we might need alignment specification, especially > > considering possible > > GPU applications. It?s not necessarily for padding purposes, but it > > is required to ensure > > that memory is properly allocated and placed ? for example, can I > > allocate a 64-bit quantity > > on an 8-bit boundary? How am I told that I cannot do this? > > For start alignment, I think you're right. Again, I would propose starting with an explicit descriptor attribute, i.e. "the layout must start at an alignment of x". I think that works, but we might want to consider how alignment specs interact with everything else. Have you guys thought much about bitfields? There are some peculiar interactions there with alignment and endianness. The other point I wanted to toss in, is that once you put alignment into the mix and not just as an ?oh yeah, we need to think about alignment? it might change how we talk about some of the other stuff. If you look at C bitfields, there is very much a model of ?load a box this big, then shift the data towards the low order bits some distance, then mask/sign-smear the part we don?t care about?. If that bleeds through into the groveler tool, maybe it bleeds through into the little language, too. Some of the memory model issues demand the ?load/store/cas-a-box" model, too ? if you treat a bitfield as something that is split across a pair of byteloads that all gets nastier. David