constant pool futures

Mon Jun 25 17:57:19 UTC 2018

On Jun 25, 2018, at 9:37 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> Hi Peter,
> you can simulate the equivalent of a Constant_Bytes by base64 encoding your bytes
> and pass the resulting string to a Constant Dynamic which at runtime will load it,
> decode it and return it as a byte array.

You can also simulate a CONSTANT_Group by using a long array of arguments
to the bootstrap method.

As of 10, the limit of 255 arguments has been lifted; the class file format can handle
up to 2^16-1 extra arguments.  The BSM has to be a varargs method of course,
since no method can get more than 255 positional arguments on the stack,
including receiver and (if present) method handle.

As of 11, the CONSTANT_Dynamic constant allows direct "ldc" of a constant
computed directly by a BSM (without an intermediate indy or CallSite).  The
constant can be of any type expressible by the JVM.  Crucially, such a constant
can be an input to a BSM for a larger constant or call site; this introduces something
like expression trees into the constant pool.  For example, if you combine List.of
with ConstantBootstraps.invoke plus a list of constants, you get a constant List
of anything expressible in the constant pool (including other Lists).

That's the present.  The future of Java is shaped by the people who work hard
to create it, but even they can't predict it accurately, although they try to write
roadmaps and discuss their aspirations with the community.  (We are not hiding
secret plans or schedules!)  I can say a few things about the how constant pool
features fit into the big picture, but I can't predict which releases they will land in.

The future Bytes and Group constants are envisioned as helpers for scaling complex
class files toward lower overheads and higher limits.  The existing workarounds
have certain overheads and limits that we would probably like to remove in the
longer term.  For example, a base64 encoding string has a maximum payload
length of about 0.75 * 2^16 octets.  *Also*, using base64 requires one or more
throwaway intermediate copies of the envelope and payload; this is the overhead
of using a simulation instead of direct expression of a sequence of octets.

Personally, I'd like to see more *zero copy* data structures in classfiles, just
like Linux object files support zero copy read-only data, using file mapping.
But zero copy data requires every layer of the system to support either
immutability or (at worst) lazy copy on write, and also support "views" on
the original data.  This means we have to tune certain Java APIs to avoid
statefulness and avoid certain data types.  Java arrays can't do zero copy
views for the same reasons they can't do slices.  Value types will eventually
help reduce the overheads of views, reducing zero copy API overheads.

The Bytes constant is envisioned as scaling to at least 2^31-1 and perhaps beyond,
and will not require decoding or copying.  As such it is a potential replacement for
resource files, as well as a carrier for short "binary string" data.  Before we can do
this in a copy-free manner, we need either an interface like CharSequence for bytes,
or else a kind of ByteBuffer which has no state and is read-only.  IMO the ByteBuffer
improvements should play out a little longer before we decide what is the type of
an "ldc" of a CONSTANT_Bytes constant.  The requirements on this structure are
close to (but not identical with) a ByteBuffer.  My money is on a simple ByteSequence
interface which ByteBuffer (and other types) will implement for interoperability,
but the "ldc" of a Bytes constant should load a flyweight object (perhaps a value
type) which does little more than hold the virtual address of a slice of a classfile
(perhaps mapped from disk); this is probably lighter than any ByteBuffer.

Likewise, the simulation of groups using BSM arguments has a maximum payload
length of 2^16-1, and burns a constant pool index for each (distinct) BSM argument.
Since there are only 2^16-1 possible CP entries, this is a serious cost.  The Group
format is envisioned as expressing a range of CP entries which do *not* consume
global CP entry indexes.  Uses of Group constants which don't have large number of
resolved constants might be better coded as serialized bundles of bits, wrapped
in a CONSTANT_Bytes envelope and deserialized from an ad hoc encoding.
You can see how the simulation overheads and limits stack up if you then require
such a bundle in a base64 string.

It's not obvious on first glance, but if you think about it a long group of constants
has a likely scalability requirement that the constants not be resolved eagerly
for their bootstrap method.  I.e,. the BSM should somehow be able to control
the sequencing of constant resolution, including even deferring some constants
to be resolved *after* the BSM returns (at some future point when the BSM
logic needs the resolved constant).  This requires another bit of VM functionality,
the BootstrapCallInfo API, which allows a BSM to fully control resolution.
This is will a WIP although it is (non-public) in the sources.

In order for the BootstrapCallInfo API to properly express unresolved constants,
we first need to land the JVM Constants API (JEP 334).

Given the number of people available to work on various projects, we are working
on these features as quickly as we can.  There's a mix of sizes:  Small things like
ByteBuffer upgrades, medium like CONSTANT_Dynamic, and large like value
types.  Because such features tend to be interrelated, there's also a natural order
in which we are approaching them.  Also, the speed of development depends not
only on the number of hands working on the technology but also on the surprising
complexity of properly developing core JVM features, even supposedly simple
one like CONSTANT_Dynamic.  Coding is a fraction of the required work; there
is also test development, integration, spec. development, several kinds of review,
and iterative polishing; are are necessary to get a result our users will enjoy,
and we'll be proud of, even a decade later.  So even the simplest JVM feature
requires a total of years of labor.

We'll talk more about this at the JVM Language Summit and Oracle Committers'
Workshop.  I hope we can continue to spread the work around more to current
partners like Intel and Red Hat.  I will be delighted to talk to conference attenders
in great detail about this stuff.

HTH
— John