Condy bsm should be idempotent

Thu Aug 17 21:41:15 UTC 2017

On Aug 17, 2017, at 11:41 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> I agree, and I think this is already implied by the race-arbitrating behavior of CP resolution.  If two threads race to resolve the same CP#, the VM will arbitrarily pick a winner, and toss the losing result.  Which means that both results must be, in some sense, equivalent.  But there's no harm in stating it (just as there's no harm in reminding people that these are supposed to be CONSTANTS.)

We are on tricky ground here, wanting to say something about
equivalent expressions yielding equivalent results.  (And yes,
it's like the similar desire to say that of course a condy result
is, somehow, constant.)  There's no good way to enforce these
constraints, short of inventing a restricted subset of Java
that can be proven to have the desired properties, and then
requiring that condy expressions use that subset.

What we can do is give advice to users of condy on how
to use it safely.  And then surround those good behaviors
with a spec. which does something reasonably predictable
and safe even if the users go off the rails (by accident or
nefarious design).

There are a lot of ways to win at this, without solving the
halting problem for full Java or designing a compile-time
execution mode for Java.  (BTW, I'd like to do the latter,
some day, but for today let's suppose that condy BSMs
are completely unpredictable in their actions, unless
their authors take responsibility for them.

The current position is for the JVM to uphold a very simple
contract:  Each CP entry is distinct (as a contract between
the classfile author and the JVM) and has independent
behavior, which is idempotent.  The linkage process
*behind* the CP is not, and cannot be, idempotent,
which is why we have to record both normal and
exceptional linkage results.

Despite the inconvenience for either Remi or ASM users
(and likewise with Maurizio) I think this is the best way
to go because it's the simplest for the most delicate part
of the system, the JVMS.  (That's where the attackers
attack, and where needless complexity is to be avoided.)
So, I'd prefer to leave the JVMS as it is, and allow bytecode
generation APIs to cater *only* (or mainly) to well-behaved
authors who would never dream of writing non-idempotent
condys.

To complicate the JVMS in order to regularize the user
model of ASM would be a mistake.  But I don't advocate
complicating ASM either.  Instead, I think it is perfectly
reasonable to do any of three things in ASM (and other
tools like it):

A. Continue normalizing all CP entries, including the
new ones.  This means that a null translation might
de-duplicate equivalent condy entries.  This will
only hurt people who are creating bad class files
on purpose, either as negative tests or to explore
the dark corners of the JVMS behavior.  (Remember,
the bright center requires human responsibility.)

B. For the new data-type used by ASM to describe
a condy constant, add a 32-bit "stamp" field which
participates in that type's equals/hashCode/toString
methods.  This "stamp" field is an arbitrary value
serving only to differentiate otherwise equivalent
condy constants.  User-built constants default
their stamp to zero.  Constants built during class
file reading default their stamp to the CP index
at which they occur.  New condy constants are
interned, old ones are retained distinct.  And
nobody needs to be the wiser, unless they choose
to look very, very close at the behavior of ASM.

(B2 Variation:  Give the stamp value of zero to
the every unique condy constant encountered
in a class file.  For the edge case of non-unique
constants, give them stamps of their CP indexes.
Other variations are possible.  I don't think the
effort would be well spent, because it requires
extra stamp-suppressing comparison logic, which
goes against ASM's minimalist design, and
may slightly slow ASM's processing of condy.
Perhaps an optional method could be given
to find a pre-existing condy item that matches
a given one?  Nobody will use it, I think.)

C. Say that ASM is free to do either of behaviors
A (interning) or B (keeping distinct), as a matter
of implementation.  If you need to predict the
treatment of equivalent condy constants, you
need to find a workaround:  Either don't use
ASM, or add some salt to the name component
of the condy's name-and-type, and remove it as a
post-pass.

The choice between A/B/C can be adjusted over
time in response to bugs.  Perhaps C is the best
choice to start with, as a contract, with A as an
implementation, switching to B or B2 if users
run into actual problems with duplicate condy's.
(They probably won't.)

The JVM must retain the distinction between equivalent
condy constants at distinct CP indexes.  It cannot
do the interning (in A above) because that's too
expensive; that's an off-line tool's job.  It might specify
the equivalent of C (threaten to intern), but I think
that is an empty threat, and could only cause harm
down the road.

I'll go even farther:  For the JVM, we should specifically
test that distinct condy constants with equivalent
structure *can* evaluate to distinct results.  The purpose
of this is not to encourage the use case (although it
could be used for things like cryptographic nonces)
but rather as a sort of edge behavior test, to ensure
that there is no "cross-talk" between constant pool entries.

— John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20170817/6347b28f/attachment-0001.html>