A class per static field? Why or why not?

Wed Dec 7 16:39:51 UTC 2022

Thanks, Dan, for sharing the investigation and for asking the right 
questions.  A few comments inline.

On 12/7/2022 10:52 AM, Dan Heidinga wrote:
> Continuing on the Class init progression discussion....
>
> Why don't we put every static field in its own class?

Pedantic correction: we're only talking about static finals with 
initializers.  Mutable statics have arbitrarily complicated 
initialization lifecycles, and that's just how it is; static finals that 
are initialized in `static { }` blocks already have their lifecycle 
complected with other writes in those blocks.

> The obvious answer is that it's too much mental load for developers. 
> But if we put that aside for a moment, and assume that we have 
> infinitely smart developers, it might be useful to understand why we 
> don't program like this now.  Or what programming like this might 
> actually look like.
>
> Putting every static field in its own class trivially gives us lazy 
> static fields (sorry John, no new JEP required in this world) with 
> each static only being initialized when actually accessed.
>
> It gives each static field a clear initialization point where we can 
> more easily tell what caused a particular static to be initialized.
>
> It makes it easier to determine the true dependency graph between 
> static fields rather than today's "soupy" model.

Some possible reasons (just brainstorming here):

  - It's more code, both at the declaration site (wrap it in a class) 
and the use site (qualify it with a class name).  Developers instantly 
see this cost, but it make take longer to see the benefit.
  - Perception that this is more heavyweight, since classes are 
"obviously" more heavyweight than variables.
  - Thinking about lifecycles is hard.  If the easy thing -- declare a 
bunch of statics and initialize them -- works, this is what developers 
will do, and are unlikely to revisit it until something doesn't work.
  - More importantly, lifecycle mostly becomes relevant when your code 
is used in a bigger system, and at coding time, that's a distant-future 
worry.  Like other crosscutting concerns such as concurrency and 
security, thinking about deployment / redeployment / startup 
characteristics is hard to focus on when you're trying to get your code 
to work, and its easy to forget to go back and think about it after you 
get your code to work.

So, I think the answer is: people follow the path of least resistance, 
and the path of least resistance here leads to someplace "good enough" 
to get things working but which sows the seed for long-term technical 
debt.  The PoLR today is good enough that people can get to something 
that mostly works without thinking very hard. If we can make the PoLR 
lead someplace better, that's what winning will look like.

> It doesn't solve the "soupy" <clinit> problem as developers can still 
> do arbitrary things in the <clinit> but it does reduce the problem as 
> it moves a lot of code out of the common <clinit> as each static now 
> has its own <clinit>.  Does this make analysis more tractable?

I agree with your (implicit) intuition that if we could get to a world 
where we only complected initialization lifecycles rarely, rather than 
routinely, then it would be more practical to characterize those as 
"weirdo" cases for which the answer is "rewrite/don't use that code if 
you want <benefit X>".  The problem today is that way too much code uses 
the existing soupy mechanisms -- but only some smaller fraction of it, 
which is hard to identify either by human or automated analysis, 
implicitly depends on the initialization-order semantics of the existing 
mechanisms.

> In our investigation [0], we focused on the underlying JVM physics of 
> classes and looked at the memory use of this approach.  Which was 
> estimated to average out to under 1K per class.

Semantics and boilerplate aside, this seems amenable to a "Loomy" move, 
which is: "make the expensive thing less expensive, rather than asking 
users to resort to complex workarounds."

> What do other languages do with their equivalent of static state? Are 
> there different design points for expressing static state we should be 
> investigating to better enable shifting computation to different 
> points in time?

One of the things that accidentally makes our lives harder here is that 
most other languages do not specify semantics as carefully as Java does, 
so the answer is sometimes "whatever the implementation does."  For 
better or worse, Java is much more precise at specifying what triggers 
class initialization.

Looking at the most Java-like languages:

  - C# allows members to be declared static, supports field initializers 
like Java, and supports "static constructors" (similar to `static { }` 
blocks in Java, but with a constructor-like syntax) which are run at 
class initialization time.  If a static constructor is present, it does 
the same soupy thing, where field initializers are run in textual order 
prior to running the static constructor; if no static constructor is 
present, the spec is cagey about when static field initializers are run, 
but they appear to all be run in the textual order:

> 14.5.6.2 Static field initialization
> The static field variable initializers of a class correspond to a 
> sequence of assignments that are executed in the textual order in 
> which they appear in the class declaration (§14.5.6.1). Within a 
> partial class, the meaning of “textual order” is specified by 
> §14.5.6.1. If a static constructor (§14.12) exists in the class, 
> execution of the static field initializers occurs immediately prior to 
> executing that static constructor. Otherwise, the static field 
> initializers are executed at an implementation-dependent time prior to 
> the first use of a static field of that class.

  - Scala and Kotlin ditched "static" as a modifier, instead offering 
"companion objects" (singleton classes).  While the two models are 
equally expressive, companion objects have us syntactically segregate 
the static parts of a class into a single entity, and encourage us to 
think about the static parts as a whole rather than individual members.

Kotlin:
     class X {
         companion object {
             // per-class fields and methods here
         }
     }

Members of the companion object can be qualified with the class name, or 
used unqualified, just as in Java.

Scala lets you declare something similar as a top level entity:

     class X { ... }
     object X { ... }

with more complex rules that treat a class and an object with the same 
name as being two facets of the same entity.  (You can have an object 
separate from a class; it's just a class whose members are effectively 
static and which is initialized the first time one of its members is 
accessed.)

The approach of companion objects rather than static members provides a 
useful nudge to thinking of the static parts of a class as being a 
single, independent entity.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20221207/f271d6a5/attachment.htm>