Thoughts on peeling and readability

Vitaly Davidovich vitalyd at gmail.com
Wed Dec 16 13:42:40 UTC 2015


If you're rolling your own data structure that needs a sentinel, why not
pass a factory in the constructor that can instantiate a T from some
parameter (e.g. int) and use that to create the sentinels? This is assuming
no new generic constraint capability that allows constraining the T to
having a constructor of some shape.

I don't think we want to go the reflection or Unsafe route here.

On Tuesday, December 15, 2015, Timo Kinnunen <timo.kinnunen at gmail.com>
wrote:

> Another use-case I had in mind is a more compact storage form when added
> into a collection. An ArrayList permitting all values of a value type of
> 32bits at the same time would use about 17 GBs of memory and require
> splitting the array, at least currently. A Set permitting the same would
> use about 540 MBs by using 1 bit per value to record set membership.
>
> Also, one final point I’d like to make about sentinels and generating
> them. While I can think of ways of using sentinels without making new ones
> myself, there’s a problem. I could decide to partition the values I am
> given into partition A and partition B. Then, when I need a new sentinel, I
> can use one value from partition B as a sentinel in partition A and vice
> versa. Neat! However, I may have just inadvertently used someone’s social
> security number plastering it all over the memory with no way of knowing
> there was PII contained in them. And if the program contains a data race,
> well, you can imagine the rest.
>
> What is needed is a way of generating values that are inert, structurally
> sound and semantically meaningless, possibly semi-random but otherwise
> plausible. Anything else just feels various degrees of wrong. Without that
> the fact that values are one level of indirection less far away than usual
> and that they are so neatly packaged sure is making sun.misc.Unsafe start
> to look really tempting!
>
>
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
> From: Brian Goetz
> Sent: Sunday, December 13, 2015 20:08
> To: Timo Kinnunen
> Cc: Maurizio Cimadamore; Paul Benedict; valhalla-dev at openjdk.java.net
> <javascript:;>
> Subject: Re: Thoughts on peeling and readability
>
> Primitives like int and long are "special" in that all bit patterns are
> valid and there are no integrity constraints that would prevent a client
> from requesting a specific bit pattern.  But this is the special case, not
> the general case.  A linguistic construct like T.default would have to work
> for *all* T, not just the special cases.
>
> There's nothing to stop you from writing an implementation that takes
> advantage of knowledge of specific types like int; there's a range of
> options there.  What we're uninterested in doing is allowing clients to
> have unsafe bit-level access to the representation of all value types.
> There's also nothing to stop you from writing value types that allow
> raw-bit operations; we're just not going to require that every value type
> support that (which is what asking for a bit-oriented T.default would be.)
>
> T.default will almost certainly not go through a constructor; the VM will
> zero out the bits as it does with the existing built-in types (the eight
> primitive types plus references.)  This process is pretty obvious from both
> a specification and implementation perspective, but it does create some
> responsibility for writers of value types -- specifically the need to deal
> with the default bit pattern.
>
> As our friends in the .NET community have discovered, trying to enforce
> that the no-arg ctor is always executed before a value is exposed is a game
> of whack-a-mole, so having T.default go through a constructor simply
> reduces the probability of the "implicit null" surprise, but doesn't banish
> it.  One can be as snarky as one likes about the tradeoffs ("reinvented
> null" vs "reinvented serialization"), but the reality here is that there
> are risks lurking around both corners.  Personally I like the tradeoff
> we're converging towards, but we are aware it is not perfect.
>
>
> Stepping back, rather than arguing the merits or demerits of a particular
> solution (which at this point we've more than exhausted), it's far more
> helpful to talk about the problem instead of the solution.  So, let me ask
> -- what use cases are you concerned about, other then the manufacture of
> multiple sentinels for use inside data structure implementations?
>
> (I suspect in the end you will find that you will be able to accomplish
> what you need with the tools available, but the "let me specify the bit
> pattern for an arbitrary value type" approach is not the way to get there.)
>
> On 12/13/2015 1:32 PM, Timo Kinnunen wrote:
> Well, it’s ints and longs, and all primitive types copyable around as ints
> and longs, and all objects serializable to and from arrays of ints and
> longs, and all arrays of such, and all values made of such, and all arrays
> of such values, and all values made of such values, aaand I’m probably
> missing a dimension or two somewhere.
>
> These values are just as valid regardless of which bit patterns, all-zero
> or not, were used to construct them. They are safe to be copied around and,
> if you implemented them yourself, hashCode, equals, toString and any
> component-wise operations could also be done safely. Such operations simply
> can’t call any foreign code of any of the value or reference types
> involved. We don’t expect that we can take an arbitrary Object, use
> reflection to zero out its fields and then be able to call its instance
> methods like nothing had happened either. So, for reference types these
> operations would have to done using reflection, for value types VarHandles
> might give better performance.
>
> Ultimately I guess it all depends on whether T.default invokes some
> constructor or not. If it doesn’t or if the constructor is specified to
> always succeed trivially then we have reinvented null. Our new nulls trade
> off a large number of NPEs for silent invariant violations. This could be a
> good tradeoff but without knowing about the consequences of the violations
> beforehand it’s gonna be hard to say for certain. The good news is that
> hey, null is back!
>
> If T.default executes a constructor that can refuse an all-zero bit
> pattern, then we have reinvented serialization for value types and are
> requiring all value types support it. Our serialization protocol only
> recognizes one input value and can only deserialize, so it’s a bit useless.
> But with the addition of the missing serialize-function we can then define
> transforms for long[] <-> val <-> long[] and will have a quite general and
> capable system already.
>
> Or are we gonna just include the worst parts from both?
>
>
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
>
> From: Brian Goetz
> Sent: Sunday, December 13, 2015 01:20
> To: Timo Kinnunen
> Cc: Maurizio Cimadamore;Paul Benedict;valhalla-dev at openjdk.java.net
> <javascript:;>
> Subject: Re: Thoughts on peeling and readability
>
>
> No.  In general, to/from raw bit operations are not safe except in a few
> corner cases (like int and long.)  Values are not uncontrolled buckets of
> bits.
>
> On the other hand, T.default *is* safe, because every type has a default
> bit pattern which is the initialization of any otherwise uninitialized
> field or array element.  (It so happens that this default bit pattern
> corresponds to all zero bits for all types, though this is mostly a
> convenience for VM implementors.)  For a composite value, the default value
> is comprised of the default value for all fields.  By *definition*, the
> all-zero bit pattern is a valid element of all value types.  However, there
> is no guarantee that any other bit pattern is valid for any given value
> type.
>
> If a particular value type wants to expose to/from raw bit constructors,
> that’s fine — but you’re asking for a language feature that applies to
> *all* values — and there is no guarantee that this is a safe operation for
> all values.
>
> On Dec 12, 2015, at 5:44 PM, Timo Kinnunen <timo.kinnunen at gmail.com
> <javascript:;>> wrote:
>
>
>
> Field layout and bit fiddling isn’t exactly what I was thinking. Rather I
> was thinking something like Float.floatToRawIntBits() and
> Double.doubleToRawLongBits(), but without having to know about the types
> Float and Double or how many bits are in their raw bits. So something like
> this syntax:
>
>                 static <any T> T nextUp (T value) {
>                                 <?missing type?> rawBits =
> T.toRawBits(value);
>                                 T nextValue = T.fromRawBits(rawBits + 1);
>                                 return nextValue;
>                 }
>
> This should fit in Valhalla reasonably well, as it is just a
> generalization of T.default with its complement operation included. And as
> it is, all of the problems you listed already apply to T.default. For
> example, a value type with one long field: If the long value in the field
> is a handle pointing to a memory-mapped buffer then any use of a default
> value of such a type could cause a crash. Which can include asking a
> properly constructed value if it is equal to any of the values in an array
> you have.
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
>
> From: Brian Goetz
> Sent: Saturday, December 12, 2015 18:21
> To: Timo Kinnunen;Maurizio Cimadamore;Paul
> Benedict;valhalla-dev at openjdk.java.net <javascript:;>
> Subject: Re: Thoughts on peeling and readability
>
>
> Precise layout and bit control of values are anti-goals of Valhalla, so
> we're not really exploring this direction at this time.
>
> The problem with approaches like the one you suggest is they fall apart
> as soon as you leave the realm of "primitives modeled as values."  What
> about values that have refs in them?  What about values whose
> representations are private?  Their implementation is supposed to be in
> sole control of their representation.  This runs contrary to the "codes
> like a class" dictum.
>
>
>
>
>
>
> On 12/12/2015 4:43 AM, Timo Kinnunen wrote:
> > Hi,
> >
> > One thing that I don’t remember seeing is any syntax for constructing
> arbitrary values in generic code without having to know about the precise
> field layouts and what the meaning of such fields is. Something like
> T.default but for values other than 0. Perhaps T.default(12345) or some
> such?
> >
> > Or maybe this is slated to go with bytecode type specialization… What
> sort of syntax is envisioned to be driving that anyways?
> >
> >
> >
> >
> >
>
>
>
>
>
>

-- 
Sent from my phone



More information about the valhalla-dev mailing list