Thoughts on peeling and readability

Timo Kinnunen timo.kinnunen at gmail.com
Tue Dec 15 16:55:38 UTC 2015


Another use-case I had in mind is a more compact storage form when added into a collection. An ArrayList permitting all values of a value type of 32bits at the same time would use about 17 GBs of memory and require splitting the array, at least currently. A Set permitting the same would use about 540 MBs by using 1 bit per value to record set membership. 

Also, one final point I’d like to make about sentinels and generating them. While I can think of ways of using sentinels without making new ones myself, there’s a problem. I could decide to partition the values I am given into partition A and partition B. Then, when I need a new sentinel, I can use one value from partition B as a sentinel in partition A and vice versa. Neat! However, I may have just inadvertently used someone’s social security number plastering it all over the memory with no way of knowing there was PII contained in them. And if the program contains a data race, well, you can imagine the rest.

What is needed is a way of generating values that are inert, structurally sound and semantically meaningless, possibly semi-random but otherwise plausible. Anything else just feels various degrees of wrong. Without that the fact that values are one level of indirection less far away than usual and that they are so neatly packaged sure is making sun.misc.Unsafe start to look really tempting!






-- 
Have a nice day, 
Timo

Sent from Mail for Windows 10


From: Brian Goetz
Sent: Sunday, December 13, 2015 20:08
To: Timo Kinnunen
Cc: Maurizio Cimadamore; Paul Benedict; valhalla-dev at openjdk.java.net
Subject: Re: Thoughts on peeling and readability

Primitives like int and long are "special" in that all bit patterns are valid and there are no integrity constraints that would prevent a client from requesting a specific bit pattern.  But this is the special case, not the general case.  A linguistic construct like T.default would have to work for *all* T, not just the special cases.  

There's nothing to stop you from writing an implementation that takes advantage of knowledge of specific types like int; there's a range of options there.  What we're uninterested in doing is allowing clients to have unsafe bit-level access to the representation of all value types.  There's also nothing to stop you from writing value types that allow raw-bit operations; we're just not going to require that every value type support that (which is what asking for a bit-oriented T.default would be.)  

T.default will almost certainly not go through a constructor; the VM will zero out the bits as it does with the existing built-in types (the eight primitive types plus references.)  This process is pretty obvious from both a specification and implementation perspective, but it does create some responsibility for writers of value types -- specifically the need to deal with the default bit pattern.  

As our friends in the .NET community have discovered, trying to enforce that the no-arg ctor is always executed before a value is exposed is a game of whack-a-mole, so having T.default go through a constructor simply reduces the probability of the "implicit null" surprise, but doesn't banish it.  One can be as snarky as one likes about the tradeoffs ("reinvented null" vs "reinvented serialization"), but the reality here is that there are risks lurking around both corners.  Personally I like the tradeoff we're converging towards, but we are aware it is not perfect.  


Stepping back, rather than arguing the merits or demerits of a particular solution (which at this point we've more than exhausted), it's far more helpful to talk about the problem instead of the solution.  So, let me ask -- what use cases are you concerned about, other then the manufacture of multiple sentinels for use inside data structure implementations?  

(I suspect in the end you will find that you will be able to accomplish what you need with the tools available, but the "let me specify the bit pattern for an arbitrary value type" approach is not the way to get there.)

On 12/13/2015 1:32 PM, Timo Kinnunen wrote:
Well, it’s ints and longs, and all primitive types copyable around as ints and longs, and all objects serializable to and from arrays of ints and longs, and all arrays of such, and all values made of such, and all arrays of such values, and all values made of such values, aaand I’m probably missing a dimension or two somewhere.
 
These values are just as valid regardless of which bit patterns, all-zero or not, were used to construct them. They are safe to be copied around and, if you implemented them yourself, hashCode, equals, toString and any component-wise operations could also be done safely. Such operations simply can’t call any foreign code of any of the value or reference types involved. We don’t expect that we can take an arbitrary Object, use reflection to zero out its fields and then be able to call its instance methods like nothing had happened either. So, for reference types these operations would have to done using reflection, for value types VarHandles might give better performance. 
 
Ultimately I guess it all depends on whether T.default invokes some constructor or not. If it doesn’t or if the constructor is specified to always succeed trivially then we have reinvented null. Our new nulls trade off a large number of NPEs for silent invariant violations. This could be a good tradeoff but without knowing about the consequences of the violations beforehand it’s gonna be hard to say for certain. The good news is that hey, null is back!
 
If T.default executes a constructor that can refuse an all-zero bit pattern, then we have reinvented serialization for value types and are requiring all value types support it. Our serialization protocol only recognizes one input value and can only deserialize, so it’s a bit useless. But with the addition of the missing serialize-function we can then define transforms for long[] <-> val <-> long[] and will have a quite general and capable system already.
 
Or are we gonna just include the worst parts from both?
 





-- 
Have a nice day, 
Timo

Sent from Mail for Windows 10
 
 

From: Brian Goetz
Sent: Sunday, December 13, 2015 01:20
To: Timo Kinnunen
Cc: Maurizio Cimadamore;Paul Benedict;valhalla-dev at openjdk.java.net
Subject: Re: Thoughts on peeling and readability
 
 
No.  In general, to/from raw bit operations are not safe except in a few corner cases (like int and long.)  Values are not uncontrolled buckets of bits.  
 
On the other hand, T.default *is* safe, because every type has a default bit pattern which is the initialization of any otherwise uninitialized field or array element.  (It so happens that this default bit pattern corresponds to all zero bits for all types, though this is mostly a convenience for VM implementors.)  For a composite value, the default value is comprised of the default value for all fields.  By *definition*, the all-zero bit pattern is a valid element of all value types.  However, there is no guarantee that any other bit pattern is valid for any given value type.  
 
If a particular value type wants to expose to/from raw bit constructors, that’s fine — but you’re asking for a language feature that applies to *all* values — and there is no guarantee that this is a safe operation for all values.  
 
On Dec 12, 2015, at 5:44 PM, Timo Kinnunen <timo.kinnunen at gmail.com> wrote:



Field layout and bit fiddling isn’t exactly what I was thinking. Rather I was thinking something like Float.floatToRawIntBits() and Double.doubleToRawLongBits(), but without having to know about the types Float and Double or how many bits are in their raw bits. So something like this syntax:
 
                static <any T> T nextUp (T value) {
                                <?missing type?> rawBits = T.toRawBits(value);
                                T nextValue = T.fromRawBits(rawBits + 1);
                                return nextValue;
                }
 
This should fit in Valhalla reasonably well, as it is just a generalization of T.default with its complement operation included. And as it is, all of the problems you listed already apply to T.default. For example, a value type with one long field: If the long value in the field is a handle pointing to a memory-mapped buffer then any use of a default value of such a type could cause a crash. Which can include asking a properly constructed value if it is equal to any of the values in an array you have.




-- 
Have a nice day, 
Timo

Sent from Mail for Windows 10
 
 

From: Brian Goetz
Sent: Saturday, December 12, 2015 18:21
To: Timo Kinnunen;Maurizio Cimadamore;Paul Benedict;valhalla-dev at openjdk.java.net
Subject: Re: Thoughts on peeling and readability
 
 
Precise layout and bit control of values are anti-goals of Valhalla, so
we're not really exploring this direction at this time.
 
The problem with approaches like the one you suggest is they fall apart
as soon as you leave the realm of "primitives modeled as values."  What
about values that have refs in them?  What about values whose
representations are private?  Their implementation is supposed to be in
sole control of their representation.  This runs contrary to the "codes
like a class" dictum.
 
 
 
 
 
 
On 12/12/2015 4:43 AM, Timo Kinnunen wrote:
> Hi,
> 
> One thing that I don’t remember seeing is any syntax for constructing arbitrary values in generic code without having to know about the precise field layouts and what the meaning of such fields is. Something like T.default but for values other than 0. Perhaps T.default(12345) or some such?
> 
> Or maybe this is slated to go with bytecode type specialization… What sort of syntax is envisioned to be driving that anyways?
> 
> 
> 
> 
> 
 
 
 




More information about the valhalla-dev mailing list