The idea of implicit vs default
Jonathan F
livedinsquares at gmail.com
Sat Jan 20 22:34:26 UTC 2024
Hi John - thanks for the lucid explanation of why zero is special to the VM
for numerous optimisations etc. (I know you’ve written about this before.)
But just to be clear about my original point, I’m not advocating a non-zero
implicit constructor. I like zero 8^) . See my reply to Brian: I’m just
hoping we can have (or get away with…) a simpler way of describing value
object creation, to avoid the idea of ‘implicit’. I’d love a fairly tidy
view of the world for us non-expert developers, even if it’s an illusion!
best wishes
Jonathan Finn
On 20 January 2024 at 22:01:27, John Rose (john.r.rose at oracle.com) wrote:
Thanks, Brian. Let me add some more thoughts about this, because it really
isn’t a case of “you guys missed an obvious move” or “you don’t want us
programmers to have good tools”.
The VM really, really likes its zeroes. This is because zero is the initial
state of any scalar. Null is a kind of zero, from this point of view.
Low-level data structure always needs to bootstrap from something definite,
and Java bootstraps from a very small menu of zeroes (and null and false).
We could imagine a software stack where zeroes are not privileged. In fact,
at the source level, the special role of zeroes can be suppressed almost
completely, except for array creation. But it’s there, every time you start
creating an object or array. If we try to take the idea of de-privileging
zeroes and push it down into the VM, bad things happen. The VM physics are
not friendly; you will see poorer performance if you try to dictate
user-defined initial states. This is what Brian is meaning when he talks
about “paint rollers”. Zero-colored paint the standard paint in the Java
stack, and you get a volume discount on it.
On the other hand, it might seem to be just a “matter of software”,
arbitrarily adjustable, to allow programmers to create user-defined initial
states. To support a whole spectrum of “paint colors”, one for each job.
But for Java it is not a mere “matter of software” and that is why we
appeal to the (metaphor of) physics of computation.
So, forget for a second about values, and try the mental exercise of
redesigning the Java language (as of today), and its translation strategy
to a VM, and the VM itself, so that all initial states are user
controllable. You will need a few months to get a good start on this, and
you will find it touches many parts of the JLS and JVMS. Don’t forget the
Java Memory Model, and installing the correct happens-before states for a
reference that initializes to point to another object. In the end, you will
find you don’t want to finish this exercise. We’ve done enough of it,
ourselves, in the years we’ve been working on Valhalla, to know we won’t
enjoy it.
So we don’t want to do it in Valhalla either, even “just for value
objects”.
One place where things would go wrong is array creation performance. Recall
that null is privileged, so that when you have an array that is created
with reference fields they are set to null. (And if it has flattened value
objects any and all of those reference fields are set to null, in every
array element.) That works so well and so simply it is easy to miss what
just happened: The GC, with all its complex invariants about what goes
where, starts “thinking about” an array element just after the zeroes are
stored, and it “knows there’s nothing there”. When you store a non-null
reference, the GC has to “start thinking some more” about that variable. It
might even update a transactional log for that store operation.
Now imagine a VM feature which made arrays initialized to some
non-zero/non-null pattern. What must happen? Well, for many GC’s (those
with store barriers) the GC must register the value of each original
reference stored in that array. Even if you are going to overwrite it
imediately, the micro-states of the array (while it is under construction,
while it has a mix of default values and really useful values) must be
correctly managed. (Because the GC might have to collect storage while the
array is partially populated.) In the end, setting up an array to
user-defined default values turns into AN EXTRA PASS OVER EVERY ARRAY. (Put
another way, it is in effect an assignment operation to every array
element, not present in the code, but costly.) This extra complexity in VM
physics turns into costs at the level of hardware (memory fabric) physics.
(You might try something “lazier”, like an array fill pointer, but that has
its own costs, and bug tail.) In the end, after all the heroics are done,
what would we get in return? People who dislike zeroes could use non-zero
values in their value types. Not a real prize for any self-respecting hero;
not a good tradeoff.
As others have already pointed out, you, the value class author, can always
find a way to cope with those initial zeroes. If you really really are
stuck on 42, then write your field accessor to add or xor with 42. If you
really like some particular non-null reference, adjust the field accessor
accordingly. But don’t ask the VM to do these trivial chores for you,
because it will make the rest of the system slower and/or more complex.
For another system which did it the other way, please look at how C++
object constructors interact with C++ array creation. It is awkward, hard
to understand, bug-prone, and expensive. We don’t choose to adopt those
costs into the Java language or VM.
On the other hand, there will be frequent use cases where the user wants to
place a non-default value as the initial state of every element of some new
array. That’s part of the programmer’s toolkit, after all. That shouldn’t
be done at the level of the VM or language, obviously, since different use
cases will choose different initial values. So this is a job for library
APIs not the language or VM. (Maybe the language should provide sugar; that
will come later, maybe.) And, as long as we are talking about use cases for
array construction, sometimes the initial array element is a FUNCTION of
the index. Obviously not a job for the VM or language (unless there’s
sugar); this is a library job.
So we are not saying your flat value arrays must always have that one
globally defined zero-rich value. We are saying that they have a privileged
position in the language and VM, but the real action will always be in the
library APIs.
Are arrays the only reason we are “sticking with zero”? They certainly make
the problem very notable, but any large collection of objects will also
have similar extra costs, analogous to the GC-related costs I pointed out
above, if their initialization is not allowed to be rich in zeros and
(especially) nulls. Surely many of you on this mailing list have had
moments when, as a Java programmer, you weighed the cost of leaving a field
uninitialized (and working with the resulting zero as the first state) vs.
initializing it in the constructor to a value that made more logical sense.
Sometimes that choice makes for better performance if you don’t execute
that first assignment. Now imagine that a value class you wish to use has a
non-zero default which makes variables of that type slightly slower to
initialize (because of impacts on the GC and maybe others). You wouldn’t
thank the value author for this; you might send them an email asking them
to push your desired embrace of zeroes into their class as well, so your
class instances (in their flat value fields) will set up faster.
Ultimately, our choice to support only zero-rich default/implicit/initial
values is a push like that, once and for all, everywhere. It helps all
programmers by helping the VM focus its optimizations on globally known
values. Only the one paint color that has the bulk discount. And there can
only be one that gets the full discount, since remembering one state
requires zero (lg 1) bits.
I hope this helps. I know it’s complex and subtle. We’ve been wrestling
with this particular issue for many years.
On 20 Jan 2024, at 12:48, Brian Goetz wrote:
> This is a nice idea, and it has come around several times in the design
discussions. From a the-system-stops-at-the-source-code perspective, it
seems fine; you declare a constructor to make "the default value", and
arrange that this constructor is only ever called once (during class
preparation, most likely), to initialize the "stamp". Then you use the
stamp to stamp out default values. Easy, right?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-dev/attachments/20240120/31087cc9/attachment.htm>
More information about the valhalla-dev
mailing list