Please stop incrementing the classfile version number when there are no format changes

Sat Oct 12 04:53:17 UTC 2019

On Fri, Oct 11, 2019 at 2:21 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> you're making the classic mistake of considering only the benefit, and
> not the cost.  For example, fatter classfile structures mean longer
> startup times for 100% of Java developers.  Should we really punish the
> 100% with worse startup for the sake of the .00001% of those who are
> maintaining bytecode-parsing tools, but aren't committed to keeping up?
> That seems like putting the burden in the wrong place.

Have you ever argued that length fields should be dropped throughout the
classfile format, whenever they are not needed? Modified UTF-8 does not
even need a length field, it can be null-terminated. This means that the
constant pool entries for UTF-8 strings don't need a length field, and by
your reasoning, this incurs cost for 100% users -- but string constant pool
entries have always had a length field, and I'm sure nobody has ever
complained about this inefficiency.

On Fri, Oct 11, 2019 at 3:11 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> Actually, this sort of change is virtually guaranteed in the next five
> years, when we get to specializable generics in Valhalla.  I would really,
> really like to kill the misassumption that "the classfile will always
> remain backward compatible" earlier rather than later, so when we pull this
> trigger -- and we absolutely will -- no one can reasonably claim surprise.
> The version number is the gate, and always has been.  Parsing classfile
> whose version you do not recognize is entirely a leap of faith, and someday
> you're going to fall on your face.  Better off to stop leaping before that
> happens!
>

You're talking about causing a potentially large number of users to "fall
on their face" (by libraries throwing unnecessary exceptions), forcing them
to have to upgrade their entire transitive dependency graph every 6 months,
which is often outside their control due to dependency hell, in order to
avoiding them falling on their face once every 10 years or so, due to an
actual breaking change. That is not a justifiable tradeoff. I'm totally
happy writing an entirely new parser once every 10 years to support major
breakage. I'm not happy pushing out a release every 6 months just to say
"no longer throw an exception for classfile format X+1, since it is no
different functionally from classfile format X".

The fallout of forcing parsers to never assume backwards compatibility can
be widespread. For example, it took Spring a long time to upgrade theIr ASM
dependency after JDK 9 came out, and tons of users' Spring code suddenly
bombed with an unrecoverable exception once they moved to JDK 9, even when
the users weren't doing bytecode manipulation or parsing of any sort --
they were simply scanning for Spring annotations (and annotation classfile
entries were, as far as I know, fully backwards compatible between JDK 9
and earlier). These days it is not uncommon for a large project to pull in
multiple classfile libraries. In fact in one case I saw a project depending
upon at least three different classfile libraries, all of which broke with
the JDK 9 constant pool changes.

If JRE N+1 is supposed to always run JRE N code, but libraries that have
become almost ubiquitous (especially when transitive dependencies are
considered) cannot run on JRE N+1 until they are fixed, even when the
change is simply "don't throw an exception on this larger classfile version
number" and nothing more, then particularly with the 6 month release
cadence, JRE's own guarantees of backwards compatibility will start to
become tarnished by the very software that runs on the JRE being hardwired
to the current version N. This was the original poster's main point, and I
strongly agree with it.

In the Android space, a closely related sort of unnecessary breakage in the
app space, along with significant compatible version skew between different
libraries, started to be known generally as "fragmentation", and was one of
the biggest sources of consternation among developers until Google built a
real HAL, as well as API translation compatibility layers that gave
stronger guarantees of forwards and backwards compatibility. The JDK
ecosystem will start to see fragmentation with the 6-month cadence unless
stronger guarantees can be made about graceful forwards and backwards
compatibility. As I said before, you simply cannot expect the entire Java
ecosystem to suddenly get on board with pushing out new compatible releases
right after each new JDK comes out, even if a few key libraries like ASM
and ClassGraph do push out new releases quickly -- so there will be a
massive amount of version skew / version staggering across complex library
dependency graphs.

Except that would be the sort of incompatible change that this entire
> argument presupposes will never happen -- every classfile parser ever
> written assumes that the CP starts right after the version. Also, it's not
> simply a matter of a table that says "constant #19 is three bytes", because
> constant pool entries can be variably sized (though we do try to avoid
> this.)  A self-describing format would require a more complex grammar -- so
> now we'd be spending complexity budget as well as footprint budget.
>

OK, so in that case, introduce the constant pool sizes as a constant pool
"meta-entry". The size meta-entry would have tag 21, and contain the
referenced tag number, and the size of the entry.

To solve the variable-sized entry problem: just like with the only current
variable-sized entry type, CONSTANT_Utf8_info, whose first field after the
tag is "u2 length", require all variable-sized entries to include a u2
length field as their first two bytes, then define one more unique meta-tag
number (say 22), containing only the referenced tag number, but no size
field. This would mark all future CP entries with the referenced tag number
as including a u2 length field in the first two bytes.

With these two changes, assuming that the constant pool is the only place
that size information is needed (I think this is the case), then any
classfile parsing tool should be able to locate any piece of information in
a classfile without having to understand the whole classfile. How is this
not a highly desirable property for classfiles and their parsers? Why
should tools need to understand the entire classfile format just to locate
only a small subset of all the information the classfile contains? If you
do do a major redesign of the classfile format in future, whatever you do,
please consider making this property an intentional design choice.