From atobia at ca.ibm.com  Fri May  1 15:21:32 2015
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Fri, 1 May 2015 11:21:32 -0400
Subject: State of the LDL
In-Reply-To: <553AB539.3000108@oracle.com>
References: <OF1E2419BC.708D3A9C-ON85257E31.005E876C-85257E31.005EB106@ca.ibm.com>
	<553AB539.3000108@oracle.com>
Message-ID: <OFEC2BFEB2.FC5B18DA-ON85257E38.0053BAC0-85257E38.00545DA7@ca.ibm.com>


> As an approach, I'd like to suggest that we separate the semantic
> aspects of the layout language from the proposal for a specific
> encoding; it will be better (and easier) to come to agreement on the
> abstract model and its semantics before trying to propose an encoding.
> History has shown that the latter can often get in the way of paying
> enough attention to the former.
>
I think separating the semantic aspects from the encoding is a good idea.
We already have some kind of distinction, the goals (with exception to #4)
describe semantic aspects of the LDL, and the grammar defines the syntax.
This will be clearer in the next version of the document.

> Separately, it would also be good to separate the concepts (Layout,
> Location, etc) from the implementation strategy (abstract classes.)
>
Agreed, these are two separate discussions. My intention there was to put
out a straw man. We chose to implement our prototype with abstract classes
but we fully expect some debate in this area. We would prefer to discuss
Layout concepts and implementation strategies in a separate thread, but
there is some unavoidable overlap.

> First, some comments on the goals:
>
> > 2. The LD must specify the endianness of the layout. The bit and byte
> > endian must be consistent. Endian is specified at container
granularity. A
> > shorthand notation can be provided to specify endian for all containers
in
> > a layout.
>
> This "same for bits and bytes" restrictions seems like it would prohibit
> encodings of sequences of bytes encoded in machine-endianness, such as
> variable-length strings encoded with a length field.
>

One of the goals has been to allow portable bytecodes for protocols with a
well-defined endian (ie: network packets) while limiting the amount of
ugliness that has to be added to sun.misc.Unsafe in terms of
endian-specific read/write intrinsics.  Not supporting native endian
decreases the number of signatures {big, little, native} * {primitive type
sizes} that need to be added.

We've also viewed the existing native data as the "source of
truth" {offsets, endian, etc} that Java needs to interop with.  Given that
explicit endianness is required, we didn't initially see enough value in
native endian.  Recently, we've started to come around on native endian for
cases where java wants to serialize data offheap and read it back in the
same process without there being some other consumer / user of that data.

That being said I don't fully understand what this has to do with variable
length strings or how our endianness specification prohibits variable
length strings (more on this later). The LDL format allows us to make a
distinction between the length field and the sequence of bytes, e.g.

LD:
VLS, 10, < {
	short, 2, length, //we can specify endian for length and characters
separately
	char, 1[8], characters,
}
Although, we do have a restriction that any accessible memory must be
explicitly stated in the LD. This would be the only restriction prohibiting
variable length strings.

> Also, this is the first use of "container", which should be defined
> before first use.
>
> > 5. A container is a sequence of one or more adjacent fields.
>
> It seems we've defined fields and containers in terms of each other.  At
> this point, an unfamiliar reader will not have a real understanding of
> either, or why there are two separate concepts.  This should be
> clarified.  It would help to make the motivation for this two-level
> hierarchy more explicit.
>
In our previous discussions it became clear that we need to define rules
regarding atomicity and tearing. The proposed memory model is inspired from
the c++ memory model (http://www.hboehm.info/c++mm/), our concept of
"container" and "field" are analogous to the c++ "memory location" and
"bit-field". The LDL requires this type of distinction as it is important
to be able provide equivalent behaviour to native languages. If a field
access in C has a certain behaviour I should be able to get equivalent
behaviour in Java.

> > 6. Default alignment is the size of the largest container in the layout
> > rounded up to 2^n bits. In the case of arrays the container element
size is
> > considered.
>
> It feels like arrays are tacked on as an ancillary concern.  I can't
> imagine that this is true?
>
Ideally #6 would be something like this: The default alignment of a Layout
is the largest alignment of all the containers and unions that compose the
Layout. But our proposal does not allow for container alignments since
containers are defined in terms of size and offset. So we need to define
the default alignment in terms of container sizes. This definition works
well but it may be confusing when discussing arrays. For this reason we
feel it necessary to call it out specifically.

> > Type Information Specification:
> > The following describes how native data is associated with Java Types.
> > First we will begin by defining the Base Layout Classes.
> >
> > //Base Layout class, all Layouts subclass this
> > abstract class Layout {
> >    private Location loc;
> > }
>
> Before diving into implementation, it would be useful to motivate these
> two key concepts, Layout and Location.
>
For the purposes of this document we need to mention the existence of a
type called Layout. The specifics and motivation for this type can be
discussed in another thread.

> > 2) Pointer
>
> Pointer or Object Reference?
>
This class represents a native pointer. It lets us take a native field and
get what it is point at. A good example is a linked list node.

struct Node {
	uint64_t data;
	struct Node* next;
}

The "dereference" method in Pointer lets me get the next node.

> > 5) Primitive Arrays
>
> Valhalla will provide the ability to have generics over primitives; I
> think this means that you can merge (5) and (6) into "Array of T", and
> provide base types for each primitive layout.  This should simplify
> things a fair bit.
>
Agreed.

> > Grammar:
>
> To be honest, I am kind of mystified at the design choices for the
> grammar; it seems to be chosen to be both hard to mechanically parse
> *and* hard for humans to read!  I don't want to dwell on bikeshed issues
> like this, so I'll just say that this is definitely something that we're
> going to need to revisit before too much implementation happens.
>
> Perhaps we should take a step back:
>   - Define an abstract model for the layout language, separate from
syntax;
>   - Identify some design goals to describe the properties of a desirable
> syntax.
>
Yes, the grammar is a strawman, we should have made that more clear. We are
not committed to it, but it lets us create examples to discuss. We will
make a distinction between the semantic elements and the syntax in the
revised document.

> >    {(containers | unions)}
>
> The descriptive text doesn't say anything about unions.
>
Unions are composed of containers but with the property that they overlap
with one another. You could think of them as C/C++ unions.

> The other thing I don't see in the grammar is any way of encoding
> variable-length arrays with the length field embedded as a field.  This
> means that layouts cannot describe embedded strings or other repeating
> data, which is common (ASN.1, protocol buffers.)
>
Yes, this was purposefully left out. There are some concerns regarding
security but it is an interesting feature and it is something we need to
address. We can not encode all possible access patterns in a description of
a memory layout, there will be many features that people want but we will
not be able to support all of them. We see that there is great interest in
this feature so it is worth including this in our future discussions.
However, we need to provide a mechanism that allows one to attach user
defined behaviour to a generated Layout so that they can implement their
own access patterns.

We plan on updating the "State of the LDL" and should have the next version
out shortly.

Thanks
-Tobi

From angela_lin at ca.ibm.com  Tue May  5 15:38:46 2015
From: angela_lin at ca.ibm.com (Angela Lin)
Date: Tue, 5 May 2015 11:38:46 -0400
Subject: Layout runtime interfaces
In-Reply-To: <OF91FD3830.92E6380C-ON85257E26.0006942C-85257E26.004C10A4@ca.ibm.com>
References: <OF91FD3830.92E6380C-ON85257E26.0006942C-85257E26.004C10A4@ca.ibm.com>
Message-ID: <OFE58AE1D0.2415EAF2-ON85257E3C.0053F0EA-85257E3C.0055F27C@ca.ibm.com>


To kick off a discussion about the runtime underpinnings of layouts, I'm
going to describe some of the external interfaces of our functional
prototype. This represents our attempt to interpret the whiteboard
discussions about Layouts in a more concrete way. In particular, it
illustrates the concepts of Layout stub interfaces, generated accessors,
and Locations.

We've made a lot of assumptions about the design based on both our internal
and external discussions.  Do these assumptions reflect your current
thinking as well?

In earlier discussions, there was interest in providing both immutable and
mutable versions of each layout. In our local discussions, we leaned
towards keeping mutability out of the type system and using the object
state (isMutable style flags) to determine if a set operation should be
permitted.  In this approach, the mutability can be left to the programmer.

Given our experience with PackedObjects, we're keen to be able to allow
Layouts to modify both on- and off-heap memory.  A Layout backed by a byte
array (or other on-heap structure) should work just as well as native
memory.

One detail that still needs discussion is whether / how to allow Layouts to
access structured on-heap memory that contains Object references.  There
may be dragons lurking here, but we've seen a lot of interest in this piece
with Packed.

Class Hierarchy
        com.ibm.layout.Location - Encapsulates the memory range to be
accessed using a layout, and security checks (TBD) associated with it

        com.ibm.layout.LayoutType
            com.ibm.layout.Layout  - singleton layout
            com.ibm.layout.Array1D<T>  - 1D array layout
            com.ibm.layout.Array2D<T>  - 2D array layout

Layout, Array1D<T>, Array2D<T> are templates that provide APIs for data
access patterns. The JDK would provide a tool that takes a layout
descriptor as input, and generates a Java interface that extends one of
these templates. I'll call the generated interface a "Layout stub
interface", or just "stub interface". The stub interface must be generated
before a layout can be used by a Java application.

The stub interface defines methods for accessing fields of the structured
data by name. Our prototype allows users to extend the stub interface to
add their own behaviour.

In the context of interfacing Java with native libraries, layout
descriptors would be metadata associated with native library binaries.

            com.ibm.layout.ByteArray1D  - examples of primitive array
layouts, which would be obsoleted by Valhalla support for generics over
primitives
            com.ibm.layout.LongArray1D
            com.ibm.layout.LongArray2D

Annotation Type Hierarchy
    com.ibm.layout.LayoutDesc - Annotation for attaching a layout
descriptor to a stub interface; We've obsoleted this idea, but the
prototype hasn't yet been updated to reflect this. I've left in this
reference because it was an interesting idea.

Usage

// "Point" is a stub interface, generated from a layout descriptor.
// getLayout() invokes a bytecode generator that implements the interface,
using info from the layout descriptor. The implementation is the "accessor
class", which implements access to named fields of the layout.
// The Point instance is only an accessor. It is not inherently attached to
a particular data location.
Point p = Point.getLayout(Point.class);

// Allocate some memory (actual API TBD)
Location loc = new Location(new byte[(int)p.sizeof()]);

// Attach the Point accessor to the memory
p.bindLocation(loc);

// Modify the memory
p.x(10);
p.y(20);


For more specific API ideas, I have attached javadoc extracted from our
prototype. It's very much a work-in-progress, so there are some obvious
omissions and problems:
- no security model
- we haven't properly hidden private data
- was based on Java 7, so used abstract classes instead of interfaces so
that we could provide default method implementations. We happened to use
abstract classes for expediency; we aren't trying to dictate the choice of
one over the other.

Also note that the prototype predates the latest revision of the LDL.

- Angela

(See attached file: PanamaLayoutPrototypeV1.zip)

From atobia at ca.ibm.com  Mon May 11 20:47:27 2015
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Mon, 11 May 2015 16:47:27 -0400
Subject: Making native calls from the JVM
Message-ID: <OF968ED0E0.43768160-ON85257E42.006CE32F-85257E42.007234F7@ca.ibm.com>


Hi,

I'd like to start off discussions on native function calls from the JVM.
We've read (a slightly redacted version of) John Rose's paper, "Making
native calls from the JVM".  Using MethodHandles as the capability to call
native functions is a great idea.

While I like the MHs.nativeInvoker / Lookup.findNativeAddress APIs, we
should think long and hard before exposing the native entrypoints as raw
longs.  Apart from the obvious security issues touched on in the paper,
there are issues if the native library gets unloaded while a user still has
a long representing a function in that library.  The address either needs
to be embedded in the MH (which means no nativeInvoker api) or the address
needs to be wrapped in some kind of 'NativeFunction' Object so that library
unloading can invalidate the MH or NativeFunction.

Can you post an initial version (javadoc?) of the raw API?  It would be
valuable to have both JVMs prototyping the same API.

The document mentions an "options" string that describes arguments for
native calls.  Is this an implementation detail that describes how to
handle the wrapping / unwrapping of arguments and other calling sequence
details?  If this is something that describes the types, it seems to
overlap with the LDL discussions.  Can you provide some additional details
on the "options" string and its purpose?

Our current mental model is that the MHs would provide a better way to call
native functions while the LDL would describe the native arguments / return
types. There are a lot of details in how the two pieces interact that still
need to be worked out but this approach sounds promising.

Thanks
-Tobi

From atobia at ca.ibm.com  Fri May 22 20:43:02 2015
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Fri, 22 May 2015 16:43:02 -0400
Subject: State of the LDL
Message-ID: <OF2F0A4A59.139F8D8B-ON85257E4D.0071AE6A-85257E4D.0071CCDA@ca.ibm.com>


Hi

An updated version of the State of the LDL document can be viewed here:
http://danheidinga.github.io/J9-Panama/StateOfTheLDL.html.

It attempts to address the concerns about how containers where defined and
provides further info on access types.  The grammar hasn't been touched
(yet) and continues to be an ugly strawman for discussing examples.

Regards,
--Tobi