The idea of implicit vs default

John Rose john.r.rose at oracle.com
Sat Jan 20 22:01:21 UTC 2024


Thanks, Brian.  Let me add some more thoughts about this, because it really isn’t a case of “you guys missed an obvious move” or “you don’t want us programmers to have good tools”.

The VM really, really likes its zeroes.  This is because zero is the initial state of any scalar.  Null is a kind of zero, from this point of view.  Low-level data structure always needs to bootstrap from something definite, and Java bootstraps from a very small menu of zeroes (and null and false).

We could imagine a software stack where zeroes are not privileged.  In fact, at the source level, the special role of zeroes can be suppressed almost completely, except for array creation.  But it’s there, every time you start creating an object or array.  If we try to take the idea of de-privileging zeroes and push it down into the VM, bad things happen.  The VM physics are not friendly; you will see poorer performance if you try to dictate user-defined initial states.  This is what Brian is meaning when he talks about “paint rollers”.  Zero-colored paint the standard paint in the Java stack, and you get a volume discount on it.

On the other hand, it might seem to be just a “matter of software”, arbitrarily adjustable, to allow programmers to create user-defined initial states.  To support a whole spectrum of “paint colors”, one for each job.  But for Java it is not a mere “matter of software” and that is why we appeal to the (metaphor of) physics of computation.

So, forget for a second about values, and try the mental exercise of redesigning the Java language (as of today), and its translation strategy to a VM, and the VM itself, so that all initial states are user controllable.  You will need a few months to get a good start on this, and you will find it touches many parts of the JLS and JVMS.  Don’t forget the Java Memory Model, and installing the correct happens-before states for a reference that initializes to point to another object.  In the end, you will find you don’t want to finish this exercise.  We’ve done enough of it, ourselves, in the years we’ve been working on Valhalla, to know we won’t enjoy it.

So we don’t want to do it in Valhalla either, even “just for value objects”.

One place where things would go wrong is array creation performance.  Recall that null is privileged, so that when you have an array that is created with reference fields they are set to null.  (And if it has flattened value objects any and all of those reference fields are set to null, in every array element.)  That works so well and so simply it is easy to miss what just happened:  The GC, with all its complex invariants about what goes where, starts “thinking about” an array element just after the zeroes are stored, and it “knows there’s nothing there”.  When you store a non-null reference, the GC has to “start thinking some more” about that variable.  It might even update a transactional log for that store operation.

Now imagine a VM feature which made arrays initialized to some non-zero/non-null  pattern.  What must happen?  Well, for many GC’s (those with store barriers) the GC must register the value of each original reference stored in that array.  Even if you are going to overwrite it imediately, the micro-states of the array (while it is under construction, while it has a mix of default values and really useful values) must be correctly managed.  (Because the GC might have to collect storage while the array is partially populated.)  In the end, setting up an array to user-defined default values turns into AN EXTRA PASS OVER EVERY ARRAY.  (Put another way, it is in effect an assignment operation to every array element, not present in the code, but costly.)  This extra complexity in VM physics turns into costs at the level of hardware (memory fabric) physics.  (You might try something “lazier”, like an array fill pointer, but that has its own costs, and bug tail.)  In the end, after all the heroics are done, what would we get in return?  People who dislike zeroes could use non-zero values in their value types.  Not a real prize for any self-respecting hero; not a good tradeoff.

As others have already pointed out, you, the value class author, can always find a way to cope with those initial zeroes.  If you really really are stuck on 42, then write your field accessor to add or xor with 42.  If you really like some particular non-null reference, adjust the field accessor accordingly.  But don’t ask the VM to do these trivial chores for you, because it will make the rest of the system slower and/or more complex.

For another system which did it the other way, please look at how C++ object constructors interact with C++ array creation.  It is awkward, hard to understand, bug-prone, and expensive.  We don’t choose to adopt those costs into the Java language or VM.

On the other hand, there will be frequent use cases where the user wants to place a non-default value as the initial state of every element of some new array.  That’s part of the programmer’s toolkit, after all.  That shouldn’t be done at the level of the VM or language, obviously, since different use cases will choose different initial values.  So this is a job for library APIs not the language or VM.  (Maybe the language should provide sugar; that will come later, maybe.)  And, as long as we are talking about use cases for array construction, sometimes the initial array element is a FUNCTION of the index.  Obviously not a job for the VM or language (unless there’s sugar); this is a library job.

So we are not saying your flat value arrays must always have that one globally defined zero-rich value.  We are saying that they have a privileged position in the language and VM, but the real action will always be in the library APIs.

Are arrays the only reason we are “sticking with zero”?  They certainly make the problem very notable, but any large collection of objects will also have similar extra costs, analogous to the GC-related costs I pointed out above, if their initialization is not allowed to be rich in zeros and (especially) nulls.  Surely many of you on this mailing list have had moments when, as a Java programmer, you weighed the cost of leaving a field uninitialized (and working with the resulting zero as the first state) vs. initializing it in the constructor to a value that made more logical sense.  Sometimes that choice makes for better performance if you don’t execute that first assignment.  Now imagine that a value class you wish to use has a non-zero default which makes variables of that type slightly slower to initialize (because of impacts on the GC and maybe others).  You wouldn’t thank the value author for this; you might send them an email asking them to push your desired embrace of zeroes into their class as well, so your class instances (in their flat value fields) will set up faster.

Ultimately, our choice to support only zero-rich default/implicit/initial values is a push like that, once and for all, everywhere.  It helps all programmers by helping the VM focus its optimizations on globally known values.  Only the one paint color that has the bulk discount.  And there can only be one that gets the full discount, since remembering one state requires zero (lg 1) bits.

I hope this helps.  I know it’s complex and subtle.  We’ve been wrestling with this particular issue for many years.

On 20 Jan 2024, at 12:48, Brian Goetz wrote:

> This is a nice idea, and it has come around several times in the design discussions.  From a the-system-stops-at-the-source-code perspective, it seems fine; you declare a constructor to make "the default value", and arrange that this constructor is only ever called once (during class preparation, most likely), to initialize the "stamp".  Then you use the stamp to stamp out default values.  Easy, right?




More information about the valhalla-dev mailing list