Please stop incrementing the classfile version number when there are no format changes

Fri Oct 11 20:51:41 UTC 2019

On Fri, Oct 11, 2019 at 2:21 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> Not true at all.  The classfile version is in a well-known place in the
> classfile format, and tools should not attempt to interpret classfiles
> for whose version they don't understand.  Your argument below is a stark
> demonstration of the moral hazard of our working hard to keep the
> classfile format stable -- users assume (with absolutely no basis) that
> it should be safe to just blindly parse a classfile whose version they
> do not recognize, and treat it as a bug that somehow this isn't
> possible.  The JVMS has been specified such that it is allowable to
> entirely redo the bytecode set in every version.  The protection against
> classfile misinterpretation is not trying to reason about classfile
> versions you don't recognize.
>

I think you are arguing in the abstract simply because of what is possible,
but not at all because of what is remotely likely. Sure, I accept your
point, but in practice, the classfile format has been almost a strict set
of growing supersets, going all the way back to JDK 1.1, right? And there
is a good reason for that, because Java has always cared a lot about
backwards compatibility.

Of course the JDK could switch to an entirely new format -- Android did
this already, and JDK could theoretically even adopt DEX or something like
it. But really, if you want to talk about imposing difficulty on
developers, how likely would this be to ever happen? I assume that in
practical terms, the chance of such a change would be very close to zero.
Or maybe at some future point, there will be a major breaking change (like
maybe increasing the size of the constant pool beyond 2^16 entries) -- but
then we will be back to incremental supersets.

Of course if there is a major break in ABI for a file format, yes,
absolutely, it is the responsibility of library maintainers to update to
handle the new format as quickly as possible. I would never argue otherwise.

> I understand you are thinking "well, it's possible to define the
> classfile format in a way that is more self-describing, and then it
> would be easier, so isn't it obvious that we should just do that?" But,
> you're making the classic mistake of considering only the benefit, and
> not the cost.  For example, fatter classfile structures mean longer
> startup times for 100% of Java developers.  Should we really punish the
> 100% with worse startup for the sake of the .00001% of those who are
> maintaining bytecode-parsing tools, but aren't committed to keeping up?
> That seems like putting the burden in the wrong place.
>

OK, again I accept your general point, but I think you are arguing from an
ideological point of view without acknowledging the benefit of fixing the
issue I raised. There are ways to accomplish what I suggested (yes, making
the format more self-describing) that incur close to zero extra cost. For
example, I suggested adding a length field to every constant pool entry.
But instead, there could be some simple metadata before the constant pool
that gave the size for any new introduced constant pool types. Total cost:
a few bytes per classfile. Total benefit: preventing much disruption for
many users.

ASM works hard to have a release ready for JDK N by the time JDK is
> ready to ship.  If other bytecode libraries want to play in this space,
> that's the bar, and it's a reasonable one.  It's not like changes to the
> classfile format aren't broadly discussed in public forums for a long
> time prior to making a change.  The ASM guys have figured out what good
> citizenship looks like here (thanks Remi!); other libraries should take
> a page from their book.
>

One of the biggest sources of user disruption are N-step dependencies,
where N>1. Many bugs have been reported against ClassGraph that were
already fixed, by users who run into a bug due to depending upon library X
which itself depends upon an old version of ClassGraph. This happened
specifically, about three times, in the case of the bump to JDK 9, which
introduced the new constant pool tag types I mentioned. To your point, ASM
ran into this problem with many downstream dependencies for the same
reason, again due to second- or nth-degree dependencies (I saw at least a
couple of GitHub bug reports and a several different StackOverflow
questions about exactly this issue, over a year after JDK 9 was released,
because it took a very long time for the Java ecosystem to upgrade to ASM
6.0 beta or later).  Even fixing the core issue in the concerned library
does not always fix the issue for the entire ecosystem, because literally
every library, and its entire transitive dependency graph, have to be fixed
every time there's a bump in version that either breaks something, or
because the library throws an exception because the version number has been
bumped. And very frequently, to fix an issue (such as bumping the version
number for the ASM dependency), you have to do a lot of work to adopt to a
new API or to fix other breakage. You simply cannot expect the entire Java
ecosystem to upgrade all transitive dependency chains every 6 months,
that's just not the level of efficiency that the broader ecosystem operates
under. This is why it is very important to incur as little disruption as
possible, and to make libraries as backwards- and forwards-compatible as
possible.

Throwing an exception just because a library "might" not handle a file only
causes havoc. See also: https://en.wikipedia.org/wiki/Robustness_principle