constant pool futures
forax at univ-mlv.fr
forax at univ-mlv.fr
Tue Jun 26 08:41:22 UTC 2018
----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Peter" <jini at zeus.net.au>, "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Mardi 26 Juin 2018 09:28:15
> Objet: Re: constant pool futures
> On Jun 26, 2018, at 12:03 AM, forax at univ-mlv.fr wrote:
>>
>> Hi John,
>>
>> ----- Mail original -----
>>> De: "John Rose" <john.r.rose at oracle.com>
>>> À: "Remi Forax" <forax at univ-mlv.fr>, "Peter" <jini at zeus.net.au>
>>> Cc: "valhalla-dev" <valhalla-dev at openjdk.java.net>
>>> Envoyé: Lundi 25 Juin 2018 19:57:19
>>> Objet: constant pool futures
>>
>>>> On Jun 25, 2018, at 9:37 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>>>>
>>>> Hi Peter,
>>>> you can simulate the equivalent of a Constant_Bytes by base64 encoding your
>>>> bytes
>>>> and pass the resulting string to a Constant Dynamic which at runtime will load
>>>> it,
>>>> decode it and return it as a byte array.
>>>
>>>
>>> You can also simulate a CONSTANT_Group by using a long array of arguments
>>> to the bootstrap method.
>>>
>>> As of 10, the limit of 255 arguments has been lifted; the class file format can
>>> handle
>>> up to 2^16-1 extra arguments. The BSM has to be a varargs method of course,
>>> since no method can get more than 255 positional arguments on the stack,
>>> including receiver and (if present) method handle.
>>>
>>> As of 11, the CONSTANT_Dynamic constant allows direct "ldc" of a constant
>>> computed directly by a BSM (without an intermediate indy or CallSite). The
>>> constant can be of any type expressible by the JVM. Crucially, such a constant
>>> can be an input to a BSM for a larger constant or call site; this introduces
>>> something
>>> like expression trees into the constant pool. For example, if you combine
>>> List.of
>>> with ConstantBootstraps.invoke plus a list of constants, you get a constant List
>>> of anything expressible in the constant pool (including other Lists).
>>>
>>> That's the present. The future of Java is shaped by the people who work hard
>>> to create it, but even they can't predict it accurately, although they try to
>>> write roadmaps and discuss their aspirations with the community. (We are not
>>> hiding
>>> secret plans or schedules!) I can say a few things about the how constant pool
>>> features fit into the big picture, but I can't predict which releases they will
>>> land in.
>>>
>>> The future Bytes and Group constants are envisioned as helpers for scaling
>>> complex class files toward lower overheads and higher limits. The existing
>>> workarounds
>>> have certain overheads and limits that we would probably like to remove in the
>>> longer term. For example, a base64 encoding string has a maximum payload
>>> length of about 0.75 * 2^16 octets. *Also*, using base64 requires one or more
>>> throwaway intermediate copies of the envelope and payload; this is the overhead
>>> of using a simulation instead of direct expression of a sequence of octets.
>>>
>>> Personally, I'd like to see more *zero copy* data structures in classfiles, just
>>> like Linux object files support zero copy read-only data, using file mapping.
>>> But zero copy data requires every layer of the system to support either
>>> immutability or (at worst) lazy copy on write, and also support "views" on
>>> the original data. This means we have to tune certain Java APIs to avoid
>>> statefulness and avoid certain data types. Java arrays can't do zero copy
>>> views for the same reasons they can't do slices. Value types will eventually
>>> help reduce the overheads of views, reducing zero copy API overheads.
>>>
>>> The Bytes constant is envisioned as scaling to at least 2^31-1 and perhaps
>>> beyond, and will not require decoding or copying. As such it is a potential
>>> replacement
>>> for resource files, as well as a carrier for short "binary string" data. Before
>>> we
>>> can do this in a copy-free manner, we need either an interface like CharSequence
>>> for
>>> bytes, or else a kind of ByteBuffer which has no state and is read-only. IMO
>>> the
>>> ByteBuffer improvements should play out a little longer before we decide what is
>>> the type
>>> of an "ldc" of a CONSTANT_Bytes constant. The requirements on this structure
>>> are
>>> close to (but not identical with) a ByteBuffer. My money is on a simple
>>> ByteSequence interface which ByteBuffer (and other types) will implement for
>>> interoperability, but the "ldc" of a Bytes constant should load a flyweight
>>> object (perhaps a
>>> value type) which does little more than hold the virtual address of a slice of a
>>> classfile (perhaps mapped from disk); this is probably lighter than any
>>> ByteBuffer.
>>
>> Refactoring ByteBuffer (subclasses or at least the readonly impl) to be a value
>> types is impossible with the value type current design (a value type can not
>> inherit from an abstract class), so an interface like ByteSequence makes sense
>> but it means re-implementing a lot of methods (slice, bulk copy, etc) on
>> ByteSequence.
>
>
> Yes, it will be harder to put BSeq over B.B. than it was to put CSeq over String
> but I think we must.
>>
>>>
>>> Likewise, the simulation of groups using BSM arguments has a maximum payload
>>> length of 2^16-1, and burns a constant pool index for each (distinct) BSM
>>> argument.
>>> Since there are only 2^16-1 possible CP entries, this is a serious cost. The
>>> Group
>>> format is envisioned as expressing a range of CP entries which do *not* consume
>>> global CP entry indexes. Uses of Group constants which don't have large number
>>> of resolved constants might be better coded as serialized bundles of bits,
>>> wrapped
>>> in a CONSTANT_Bytes envelope and deserialized from an ad hoc encoding.
>>> You can see how the simulation overheads and limits stack up if you then require
>>> such a bundle in a base64 string.
>>>
>>> It's not obvious on first glance, but if you think about it a long group of
>>> constants has a likely scalability requirement that the constants not be
>>> resolved eagerly
>>> for their bootstrap method. I.e,. the BSM should somehow be able to control
>>> the sequencing of constant resolution, including even deferring some constants
>>> to be resolved *after* the BSM returns (at some future point when the BSM
>>> logic needs the resolved constant). This requires another bit of VM
>>> functionality, the BootstrapCallInfo API, which allows a BSM to fully control
>>> resolution.
>>> This is will a WIP although it is (non-public) in the sources.
>>>
>>> In order for the BootstrapCallInfo API to properly express unresolved constants,
>>> we first need to land the JVM Constants API (JEP 334).
>>
>> I'm not sure to see the link here.
>
> With natural representation of unresolved CP items we can have a
> getUnresolved/getSymbolic access method on BCI that avoids resolving.
ok, got it, you want BootstrapCallInfo not only to provide lazy/deferred evaluation but be able to provide an unresolved version of each CP item, hence the link.
>>
>> Anyway, the BootstrapCallInfo API from a language runtime implementor because
>> currently i use some strings instead of the proper constants because of the
>> early initialization.
>
> It would be better to access the full structure of the CP instead of just the
> strings in it.
yes, i fully agree, (my sentence above would be far easier to understood with the words 'is a great addition' after implementor).
>>
>>>
>>> Given the number of people available to work on various projects, we are working
>>> on these features as quickly as we can. There's a mix of sizes: Small things
>>> like
>>> ByteBuffer upgrades, medium like CONSTANT_Dynamic, and large like value
>>> types. Because such features tend to be interrelated, there's also a natural
>>> order in which we are approaching them. Also, the speed of development depends
>>> not
>>> only on the number of hands working on the technology but also on the surprising
>>> complexity of properly developing core JVM features, even supposedly simple
>>> one like CONSTANT_Dynamic. Coding is a fraction of the required work; there
>>> is also test development, integration, spec. development, several kinds of
>>> review, and iterative polishing; are are necessary to get a result our users
>>> will enjoy,
>>> and we'll be proud of, even a decade later. So even the simplest JVM feature
>>> requires a total of years of labor.
>>>
>>> We'll talk more about this at the JVM Language Summit and Oracle Committers'
>>> Workshop. I hope we can continue to spread the work around more to current
>>> partners like Intel and Red Hat. I will be delighted to talk to conference
>>> attenders in great detail about this stuff.
>>
>> s/Oracle Committers' Workshop/OpenJDK Committers' Workshop ?
>
> Wow, just wow. Can I blame this one on autocomplete?? Of course I am referring
> to OpenJDK committers, not just the Oracle ones.
>
:)
Rémi
More information about the valhalla-dev
mailing list