From atobia at ca.ibm.com Fri May 1 15:21:32 2015 From: atobia at ca.ibm.com (Tobi Ajila) Date: Fri, 1 May 2015 11:21:32 -0400 Subject: State of the LDL In-Reply-To: <553AB539.3000108@oracle.com> References: <553AB539.3000108@oracle.com> Message-ID: > As an approach, I'd like to suggest that we separate the semantic > aspects of the layout language from the proposal for a specific > encoding; it will be better (and easier) to come to agreement on the > abstract model and its semantics before trying to propose an encoding. > History has shown that the latter can often get in the way of paying > enough attention to the former. > I think separating the semantic aspects from the encoding is a good idea. We already have some kind of distinction, the goals (with exception to #4) describe semantic aspects of the LDL, and the grammar defines the syntax. This will be clearer in the next version of the document. > Separately, it would also be good to separate the concepts (Layout, > Location, etc) from the implementation strategy (abstract classes.) > Agreed, these are two separate discussions. My intention there was to put out a straw man. We chose to implement our prototype with abstract classes but we fully expect some debate in this area. We would prefer to discuss Layout concepts and implementation strategies in a separate thread, but there is some unavoidable overlap. > First, some comments on the goals: > > > 2. The LD must specify the endianness of the layout. The bit and byte > > endian must be consistent. Endian is specified at container granularity. A > > shorthand notation can be provided to specify endian for all containers in > > a layout. > > This "same for bits and bytes" restrictions seems like it would prohibit > encodings of sequences of bytes encoded in machine-endianness, such as > variable-length strings encoded with a length field. > One of the goals has been to allow portable bytecodes for protocols with a well-defined endian (ie: network packets) while limiting the amount of ugliness that has to be added to sun.misc.Unsafe in terms of endian-specific read/write intrinsics. Not supporting native endian decreases the number of signatures {big, little, native} * {primitive type sizes} that need to be added. We've also viewed the existing native data as the "source of truth" {offsets, endian, etc} that Java needs to interop with. Given that explicit endianness is required, we didn't initially see enough value in native endian. Recently, we've started to come around on native endian for cases where java wants to serialize data offheap and read it back in the same process without there being some other consumer / user of that data. That being said I don't fully understand what this has to do with variable length strings or how our endianness specification prohibits variable length strings (more on this later). The LDL format allows us to make a distinction between the length field and the sequence of bytes, e.g. LD: VLS, 10, < { short, 2, length, //we can specify endian for length and characters separately char, 1[8], characters, } Although, we do have a restriction that any accessible memory must be explicitly stated in the LD. This would be the only restriction prohibiting variable length strings. > Also, this is the first use of "container", which should be defined > before first use. > > > 5. A container is a sequence of one or more adjacent fields. > > It seems we've defined fields and containers in terms of each other. At > this point, an unfamiliar reader will not have a real understanding of > either, or why there are two separate concepts. This should be > clarified. It would help to make the motivation for this two-level > hierarchy more explicit. > In our previous discussions it became clear that we need to define rules regarding atomicity and tearing. The proposed memory model is inspired from the c++ memory model (http://www.hboehm.info/c++mm/), our concept of "container" and "field" are analogous to the c++ "memory location" and "bit-field". The LDL requires this type of distinction as it is important to be able provide equivalent behaviour to native languages. If a field access in C has a certain behaviour I should be able to get equivalent behaviour in Java. > > 6. Default alignment is the size of the largest container in the layout > > rounded up to 2^n bits. In the case of arrays the container element size is > > considered. > > It feels like arrays are tacked on as an ancillary concern. I can't > imagine that this is true? > Ideally #6 would be something like this: The default alignment of a Layout is the largest alignment of all the containers and unions that compose the Layout. But our proposal does not allow for container alignments since containers are defined in terms of size and offset. So we need to define the default alignment in terms of container sizes. This definition works well but it may be confusing when discussing arrays. For this reason we feel it necessary to call it out specifically. > > Type Information Specification: > > The following describes how native data is associated with Java Types. > > First we will begin by defining the Base Layout Classes. > > > > //Base Layout class, all Layouts subclass this > > abstract class Layout { > > private Location loc; > > } > > Before diving into implementation, it would be useful to motivate these > two key concepts, Layout and Location. > For the purposes of this document we need to mention the existence of a type called Layout. The specifics and motivation for this type can be discussed in another thread. > > 2) Pointer > > Pointer or Object Reference? > This class represents a native pointer. It lets us take a native field and get what it is point at. A good example is a linked list node. struct Node { uint64_t data; struct Node* next; } The "dereference" method in Pointer lets me get the next node. > > 5) Primitive Arrays > > Valhalla will provide the ability to have generics over primitives; I > think this means that you can merge (5) and (6) into "Array of T", and > provide base types for each primitive layout. This should simplify > things a fair bit. > Agreed. > > Grammar: > > To be honest, I am kind of mystified at the design choices for the > grammar; it seems to be chosen to be both hard to mechanically parse > *and* hard for humans to read! I don't want to dwell on bikeshed issues > like this, so I'll just say that this is definitely something that we're > going to need to revisit before too much implementation happens. > > Perhaps we should take a step back: > - Define an abstract model for the layout language, separate from syntax; > - Identify some design goals to describe the properties of a desirable > syntax. > Yes, the grammar is a strawman, we should have made that more clear. We are not committed to it, but it lets us create examples to discuss. We will make a distinction between the semantic elements and the syntax in the revised document. > > {(containers | unions)} > > The descriptive text doesn't say anything about unions. > Unions are composed of containers but with the property that they overlap with one another. You could think of them as C/C++ unions. > The other thing I don't see in the grammar is any way of encoding > variable-length arrays with the length field embedded as a field. This > means that layouts cannot describe embedded strings or other repeating > data, which is common (ASN.1, protocol buffers.) > Yes, this was purposefully left out. There are some concerns regarding security but it is an interesting feature and it is something we need to address. We can not encode all possible access patterns in a description of a memory layout, there will be many features that people want but we will not be able to support all of them. We see that there is great interest in this feature so it is worth including this in our future discussions. However, we need to provide a mechanism that allows one to attach user defined behaviour to a generated Layout so that they can implement their own access patterns. We plan on updating the "State of the LDL" and should have the next version out shortly. Thanks -Tobi From angela_lin at ca.ibm.com Tue May 5 15:38:46 2015 From: angela_lin at ca.ibm.com (Angela Lin) Date: Tue, 5 May 2015 11:38:46 -0400 Subject: Layout runtime interfaces In-Reply-To: References: Message-ID: To kick off a discussion about the runtime underpinnings of layouts, I'm going to describe some of the external interfaces of our functional prototype. This represents our attempt to interpret the whiteboard discussions about Layouts in a more concrete way. In particular, it illustrates the concepts of Layout stub interfaces, generated accessors, and Locations. We've made a lot of assumptions about the design based on both our internal and external discussions. Do these assumptions reflect your current thinking as well? In earlier discussions, there was interest in providing both immutable and mutable versions of each layout. In our local discussions, we leaned towards keeping mutability out of the type system and using the object state (isMutable style flags) to determine if a set operation should be permitted. In this approach, the mutability can be left to the programmer. Given our experience with PackedObjects, we're keen to be able to allow Layouts to modify both on- and off-heap memory. A Layout backed by a byte array (or other on-heap structure) should work just as well as native memory. One detail that still needs discussion is whether / how to allow Layouts to access structured on-heap memory that contains Object references. There may be dragons lurking here, but we've seen a lot of interest in this piece with Packed. Class Hierarchy com.ibm.layout.Location - Encapsulates the memory range to be accessed using a layout, and security checks (TBD) associated with it com.ibm.layout.LayoutType com.ibm.layout.Layout - singleton layout com.ibm.layout.Array1D - 1D array layout com.ibm.layout.Array2D - 2D array layout Layout, Array1D, Array2D are templates that provide APIs for data access patterns. The JDK would provide a tool that takes a layout descriptor as input, and generates a Java interface that extends one of these templates. I'll call the generated interface a "Layout stub interface", or just "stub interface". The stub interface must be generated before a layout can be used by a Java application. The stub interface defines methods for accessing fields of the structured data by name. Our prototype allows users to extend the stub interface to add their own behaviour. In the context of interfacing Java with native libraries, layout descriptors would be metadata associated with native library binaries. com.ibm.layout.ByteArray1D - examples of primitive array layouts, which would be obsoleted by Valhalla support for generics over primitives com.ibm.layout.LongArray1D com.ibm.layout.LongArray2D Annotation Type Hierarchy com.ibm.layout.LayoutDesc - Annotation for attaching a layout descriptor to a stub interface; We've obsoleted this idea, but the prototype hasn't yet been updated to reflect this. I've left in this reference because it was an interesting idea. Usage // "Point" is a stub interface, generated from a layout descriptor. // getLayout() invokes a bytecode generator that implements the interface, using info from the layout descriptor. The implementation is the "accessor class", which implements access to named fields of the layout. // The Point instance is only an accessor. It is not inherently attached to a particular data location. Point p = Point.getLayout(Point.class); // Allocate some memory (actual API TBD) Location loc = new Location(new byte[(int)p.sizeof()]); // Attach the Point accessor to the memory p.bindLocation(loc); // Modify the memory p.x(10); p.y(20); For more specific API ideas, I have attached javadoc extracted from our prototype. It's very much a work-in-progress, so there are some obvious omissions and problems: - no security model - we haven't properly hidden private data - was based on Java 7, so used abstract classes instead of interfaces so that we could provide default method implementations. We happened to use abstract classes for expediency; we aren't trying to dictate the choice of one over the other. Also note that the prototype predates the latest revision of the LDL. - Angela (See attached file: PanamaLayoutPrototypeV1.zip) From atobia at ca.ibm.com Mon May 11 20:47:27 2015 From: atobia at ca.ibm.com (Tobi Ajila) Date: Mon, 11 May 2015 16:47:27 -0400 Subject: Making native calls from the JVM Message-ID: Hi, I'd like to start off discussions on native function calls from the JVM. We've read (a slightly redacted version of) John Rose's paper, "Making native calls from the JVM". Using MethodHandles as the capability to call native functions is a great idea. While I like the MHs.nativeInvoker / Lookup.findNativeAddress APIs, we should think long and hard before exposing the native entrypoints as raw longs. Apart from the obvious security issues touched on in the paper, there are issues if the native library gets unloaded while a user still has a long representing a function in that library. The address either needs to be embedded in the MH (which means no nativeInvoker api) or the address needs to be wrapped in some kind of 'NativeFunction' Object so that library unloading can invalidate the MH or NativeFunction. Can you post an initial version (javadoc?) of the raw API? It would be valuable to have both JVMs prototyping the same API. The document mentions an "options" string that describes arguments for native calls. Is this an implementation detail that describes how to handle the wrapping / unwrapping of arguments and other calling sequence details? If this is something that describes the types, it seems to overlap with the LDL discussions. Can you provide some additional details on the "options" string and its purpose? Our current mental model is that the MHs would provide a better way to call native functions while the LDL would describe the native arguments / return types. There are a lot of details in how the two pieces interact that still need to be worked out but this approach sounds promising. Thanks -Tobi From atobia at ca.ibm.com Fri May 22 20:43:02 2015 From: atobia at ca.ibm.com (Tobi Ajila) Date: Fri, 22 May 2015 16:43:02 -0400 Subject: State of the LDL Message-ID: Hi An updated version of the State of the LDL document can be viewed here: http://danheidinga.github.io/J9-Panama/StateOfTheLDL.html. It attempts to address the concerns about how containers where defined and provides further info on access types. The grammar hasn't been touched (yet) and continues to be an ugly strawman for discussing examples. Regards, --Tobi