A class per static field? Why or why not?
Continuing on the Class init progression discussion.... Why don't we put every static field in its own class? The obvious answer is that it's too much mental load for developers. But if we put that aside for a moment, and assume that we have infinitely smart developers, it might be useful to understand why we don't program like this now. Or what programming like this might actually look like. Putting every static field in its own class trivially gives us lazy static fields (sorry John, no new JEP required in this world) with each static only being initialized when actually accessed. It gives each static field a clear initialization point where we can more easily tell what caused a particular static to be initialized. It makes it easier to determine the true dependency graph between static fields rather than today's "soupy" model. It doesn't solve the "soupy" <clinit> problem as developers can still do arbitrary things in the <clinit> but it does reduce the problem as it moves a lot of code out of the common <clinit> as each static now has its own <clinit>. Does this make analysis more tractable? In our investigation [0], we focused on the underlying JVM physics of classes and looked at the memory use of this approach. Which was estimated to average out to under 1K per class. What do other languages do with their equivalent of static state? Are there different design points for expressing static state we should be investigating to better enable shifting computation to different points in time? --Dan [0] http://cr.openjdk.java.net/~heidinga/leyden/ClassInitPlan.pdf
Thanks, Dan, for sharing the investigation and for asking the right questions. A few comments inline. On 12/7/2022 10:52 AM, Dan Heidinga wrote:
Continuing on the Class init progression discussion....
Why don't we put every static field in its own class?
Pedantic correction: we're only talking about static finals with initializers. Mutable statics have arbitrarily complicated initialization lifecycles, and that's just how it is; static finals that are initialized in `static { }` blocks already have their lifecycle complected with other writes in those blocks.
The obvious answer is that it's too much mental load for developers. But if we put that aside for a moment, and assume that we have infinitely smart developers, it might be useful to understand why we don't program like this now. Or what programming like this might actually look like.
Putting every static field in its own class trivially gives us lazy static fields (sorry John, no new JEP required in this world) with each static only being initialized when actually accessed.
It gives each static field a clear initialization point where we can more easily tell what caused a particular static to be initialized.
It makes it easier to determine the true dependency graph between static fields rather than today's "soupy" model.
Some possible reasons (just brainstorming here): - It's more code, both at the declaration site (wrap it in a class) and the use site (qualify it with a class name). Developers instantly see this cost, but it make take longer to see the benefit. - Perception that this is more heavyweight, since classes are "obviously" more heavyweight than variables. - Thinking about lifecycles is hard. If the easy thing -- declare a bunch of statics and initialize them -- works, this is what developers will do, and are unlikely to revisit it until something doesn't work. - More importantly, lifecycle mostly becomes relevant when your code is used in a bigger system, and at coding time, that's a distant-future worry. Like other crosscutting concerns such as concurrency and security, thinking about deployment / redeployment / startup characteristics is hard to focus on when you're trying to get your code to work, and its easy to forget to go back and think about it after you get your code to work. So, I think the answer is: people follow the path of least resistance, and the path of least resistance here leads to someplace "good enough" to get things working but which sows the seed for long-term technical debt. The PoLR today is good enough that people can get to something that mostly works without thinking very hard. If we can make the PoLR lead someplace better, that's what winning will look like.
It doesn't solve the "soupy" <clinit> problem as developers can still do arbitrary things in the <clinit> but it does reduce the problem as it moves a lot of code out of the common <clinit> as each static now has its own <clinit>. Does this make analysis more tractable?
I agree with your (implicit) intuition that if we could get to a world where we only complected initialization lifecycles rarely, rather than routinely, then it would be more practical to characterize those as "weirdo" cases for which the answer is "rewrite/don't use that code if you want <benefit X>". The problem today is that way too much code uses the existing soupy mechanisms -- but only some smaller fraction of it, which is hard to identify either by human or automated analysis, implicitly depends on the initialization-order semantics of the existing mechanisms.
In our investigation [0], we focused on the underlying JVM physics of classes and looked at the memory use of this approach. Which was estimated to average out to under 1K per class.
Semantics and boilerplate aside, this seems amenable to a "Loomy" move, which is: "make the expensive thing less expensive, rather than asking users to resort to complex workarounds."
What do other languages do with their equivalent of static state? Are there different design points for expressing static state we should be investigating to better enable shifting computation to different points in time?
One of the things that accidentally makes our lives harder here is that most other languages do not specify semantics as carefully as Java does, so the answer is sometimes "whatever the implementation does." For better or worse, Java is much more precise at specifying what triggers class initialization. Looking at the most Java-like languages: - C# allows members to be declared static, supports field initializers like Java, and supports "static constructors" (similar to `static { }` blocks in Java, but with a constructor-like syntax) which are run at class initialization time. If a static constructor is present, it does the same soupy thing, where field initializers are run in textual order prior to running the static constructor; if no static constructor is present, the spec is cagey about when static field initializers are run, but they appear to all be run in the textual order:
14.5.6.2 Static field initialization The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration (§14.5.6.1). Within a partial class, the meaning of “textual order” is specified by §14.5.6.1. If a static constructor (§14.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.
- Scala and Kotlin ditched "static" as a modifier, instead offering "companion objects" (singleton classes). While the two models are equally expressive, companion objects have us syntactically segregate the static parts of a class into a single entity, and encourage us to think about the static parts as a whole rather than individual members. Kotlin: class X { companion object { // per-class fields and methods here } } Members of the companion object can be qualified with the class name, or used unqualified, just as in Java. Scala lets you declare something similar as a top level entity: class X { ... } object X { ... } with more complex rules that treat a class and an object with the same name as being two facets of the same entity. (You can have an object separate from a class; it's just a class whose members are effectively static and which is initialized the first time one of its members is accessed.) The approach of companion objects rather than static members provides a useful nudge to thinking of the static parts of a class as being a single, independent entity.
On Wed, Dec 7, 2022 at 11:40 AM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks, Dan, for sharing the investigation and for asking the right questions. A few comments inline.
On 12/7/2022 10:52 AM, Dan Heidinga wrote:
Continuing on the Class init progression discussion....
Why don't we put every static field in its own class?
Pedantic correction: we're only talking about static finals with initializers. Mutable statics have arbitrarily complicated initialization lifecycles, and that's just how it is; static finals that are initialized in `static { }` blocks already have their lifecycle complected with other writes in those blocks.
The obvious answer is that it's too much mental load for developers. But if we put that aside for a moment, and assume that we have infinitely smart developers, it might be useful to understand why we don't program like this now. Or what programming like this might actually look like.
Putting every static field in its own class trivially gives us lazy static fields (sorry John, no new JEP required in this world) with each static only being initialized when actually accessed.
It gives each static field a clear initialization point where we can more easily tell what caused a particular static to be initialized.
It makes it easier to determine the true dependency graph between static fields rather than today's "soupy" model.
Some possible reasons (just brainstorming here):
- It's more code, both at the declaration site (wrap it in a class) and the use site (qualify it with a class name). Developers instantly see this cost, but it make take longer to see the benefit. - Perception that this is more heavyweight, since classes are "obviously" more heavyweight than variables. - Thinking about lifecycles is hard. If the easy thing -- declare a bunch of statics and initialize them -- works, this is what developers will do, and are unlikely to revisit it until something doesn't work. - More importantly, lifecycle mostly becomes relevant when your code is used in a bigger system, and at coding time, that's a distant-future worry. Like other crosscutting concerns such as concurrency and security, thinking about deployment / redeployment / startup characteristics is hard to focus on when you're trying to get your code to work, and its easy to forget to go back and think about it after you get your code to work.
So, I think the answer is: people follow the path of least resistance, and the path of least resistance here leads to someplace "good enough" to get things working but which sows the seed for long-term technical debt. The PoLR today is good enough that people can get to something that mostly works without thinking very hard. If we can make the PoLR lead someplace better, that's what winning will look like.
+1. One additional challenge here is the deployment model affects the end destination for the path. Paving the PoLR to make lazy init more common / easier can result in making earlier init (build time or shifted early) more difficult. And vice versa. The PoLR should ideally lead developers to say "as early as possible (build time) or as lazy as possible, I don't care which" so the VM has as much freedom as possible. Really what they often want to say is "don't affect my startup time with this operation" but don't have a good way to express that both early & late are valid solutions.
It doesn't solve the "soupy" <clinit> problem as developers can still do arbitrary things in the <clinit> but it does reduce the problem as it moves a lot of code out of the common <clinit> as each static now has its own <clinit>. Does this make analysis more tractable?
I agree with your (implicit) intuition that if we could get to a world where we only complected initialization lifecycles rarely, rather than routinely, then it would be more practical to characterize those as "weirdo" cases for which the answer is "rewrite/don't use that code if you want <benefit X>". The problem today is that way too much code uses the existing soupy mechanisms -- but only some smaller fraction of it, which is hard to identify either by human or automated analysis, implicitly depends on the initialization-order semantics of the existing mechanisms.
In our investigation [0], we focused on the underlying JVM physics of classes and looked at the memory use of this approach. Which was estimated to average out to under 1K per class.
Semantics and boilerplate aside, this seems amenable to a "Loomy" move, which is: "make the expensive thing less expensive, rather than asking users to resort to complex workarounds."
What do other languages do with their equivalent of static state? Are there different design points for expressing static state we should be investigating to better enable shifting computation to different points in time?
One of the things that accidentally makes our lives harder here is that most other languages do not specify semantics as carefully as Java does, so the answer is sometimes "whatever the implementation does." For better or worse, Java is much more precise at specifying what triggers class initialization.
Looking at the most Java-like languages:
- C# allows members to be declared static, supports field initializers like Java, and supports "static constructors" (similar to `static { }` blocks in Java, but with a constructor-like syntax) which are run at class initialization time. If a static constructor is present, it does the same soupy thing, where field initializers are run in textual order prior to running the static constructor; if no static constructor is present, the spec is cagey about when static field initializers are run, but they appear to all be run in the textual order:
14.5.6.2 Static field initialization The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration (§14.5.6.1). Within a partial class, the meaning of “textual order” is specified by §14.5.6.1. If a static constructor (§14.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.
- Scala and Kotlin ditched "static" as a modifier, instead offering "companion objects" (singleton classes). While the two models are equally expressive, companion objects have us syntactically segregate the static parts of a class into a single entity, and encourage us to think about the static parts as a whole rather than individual members.
Kotlin: class X { companion object { // per-class fields and methods here } }
Members of the companion object can be qualified with the class name, or used unqualified, just as in Java.
Scala lets you declare something similar as a top level entity:
class X { ... } object X { ... }
with more complex rules that treat a class and an object with the same name as being two facets of the same entity. (You can have an object separate from a class; it's just a class whose members are effectively static and which is initialized the first time one of its members is accessed.)
The approach of companion objects rather than static members provides a useful nudge to thinking of the static parts of a class as being a single, independent entity.
Independent entity, yes. Single, maybe. We group the statics of a class into a single <clinit> today but we may want multiple groupings if we can give them different initialization points (lifecycles). Maybe that's just putting them in a different class but whatever we pick here will affect the PoLR discussed above and "single" may not be the right model given the classes we already have. --Dan
The approach of companion objects rather than static members provides a useful nudge to thinking of the static parts of a class as being a single, independent entity.
Independent entity, yes. Single, maybe.
Right. The companion mechanism in Kotlin pushes pretty hard at "single"; the companion mechanism in Scala is somewhere in the middle, where it use a magic name association between a class called X and an object called X, but you can also have objects whose name is separate from any class and it can stand as an independent sub-part. If we went down this road, we would probably go even farther, where the analogue of `object` would be more like a general-purpose singleton class which you could freely mix and match with. It's not all that different from using IODH today from an expressiveness perspective, but (like with enums) it moves instance management from the user's side of the ledger to the language's side. Let's say that this is a possibility we could explore if we suspected there were a bigger potential payoff.
On 07/12/2022 15:52, Dan Heidinga wrote:
Why don't we put every static field in its own class?
There's a war story to be shared here:-) - you might want to take a look at what the jextract tool does: https://github.com/openjdk/jextract To generate native bindings, jextract needs _a lot_ of constants, downcall method handles, memory layouts, var handles, ... The first iteration of jextract used static final fields, and startup time was horrible given to the presence of so many field inits. The current iteration of jextract uses a class holder for N constants (with N configurable). This is a trade-off, to get good startup while reducing the number of classes generated by the tool. That said, the number of classes generated by jextract is _still_ astonishingly high (with big libraries you can easily get 200 of them). While they are only loaded on-demand, users are often scared by this. For this reason jextract also provides filtering capabilities: if you only ever need to interact with 5 native functions, skip generation for everything else. This model has been serving us well so far, which confirms your "hunch". Also, recent "tip" - since we can now have static fields inside methods, we can also do this: ```java X computeOnce() { class Holder { static final X x = ... } return Holder.x; } ``` Which seems better than having N unrelated classes scattered around (although you will see the classfiles for them if you run javac). We have not tweaked jextract to do this (yet). Of course, tools like jextract are a natural "killer application" for lazy statics. Lazy statics would allow jextract to generate a lot less code. And not only would the source code be more compact, but the size of the generated classfiles would be much, much shorter too. Cheers Maurizio
participants (3)
-
Brian Goetz
-
Dan Heidinga
-
Maurizio Cimadamore