Thoughts on peeling and readability
Vitaly Davidovich
vitalyd at gmail.com
Wed Dec 16 15:08:13 UTC 2015
On Wednesday, December 16, 2015, Timo Kinnunen <timo.kinnunen at gmail.com>
wrote:
> One reason is that requiring an additional constructor ripples all over
> the API, making it more cumbersome to use. For example, imagine an
> Arrays.asList() implementation that required passing in a constructor in
> addition to passing in the elements. This adds a lot of weight to an
> otherwise lightweight API.
>
You'd only need such a constraint where you generic type requires
constructing the objects. Specific to this example, only where sentinels
are used (not too common).
>
>
> Another reason is that then the code depends on an external party to
> generate good sentinels for its internal use. What is a “good sentinel
> value” is an implementation detail that might have to be documented. As the
> client is only concerned with generating real values for its own uses and
> the implementation cares about sentinels, an API where these are mixed goes
> against the separation of concerns design principle.
>
Yes it does "leak" into the client. But, the client/caller "owns" the
concrete type and so if you want to somehow piggyback on it for sentinel
use, getting the caller involved in figuring that out makes some sense.
Scribbling some random bits over their type seems worse.
>
>
> And I agree, we don’t want to go with Unsafe or reflection, but we may not
> be able to avoid it.
>
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
>
>
> *From: *Vitaly Davidovich
> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>
> *Sent: *Wednesday, December 16, 2015 14:42
> *To: *Timo Kinnunen
> <javascript:_e(%7B%7D,'cvml','timo.kinnunen at gmail.com');>
> *Cc: *Brian Goetz <javascript:_e(%7B%7D,'cvml','brian.goetz at oracle.com');>;
> valhalla-dev at openjdk.java.net
> <javascript:_e(%7B%7D,'cvml','valhalla-dev at openjdk.java.net');>
> *Subject: *Re: Thoughts on peeling and readability
>
>
>
> If you're rolling your own data structure that needs a sentinel, why not
> pass a factory in the constructor that can instantiate a T from some
> parameter (e.g. int) and use that to create the sentinels? This is assuming
> no new generic constraint capability that allows constraining the T to
> having a constructor of some shape.
>
>
>
> I don't think we want to go the reflection or Unsafe route here.
>
> On Tuesday, December 15, 2015, Timo Kinnunen <timo.kinnunen at gmail.com
> <javascript:_e(%7B%7D,'cvml','timo.kinnunen at gmail.com');>> wrote:
>
> Another use-case I had in mind is a more compact storage form when added
> into a collection. An ArrayList permitting all values of a value type of
> 32bits at the same time would use about 17 GBs of memory and require
> splitting the array, at least currently. A Set permitting the same would
> use about 540 MBs by using 1 bit per value to record set membership.
>
> Also, one final point I’d like to make about sentinels and generating
> them. While I can think of ways of using sentinels without making new ones
> myself, there’s a problem. I could decide to partition the values I am
> given into partition A and partition B. Then, when I need a new sentinel, I
> can use one value from partition B as a sentinel in partition A and vice
> versa. Neat! However, I may have just inadvertently used someone’s social
> security number plastering it all over the memory with no way of knowing
> there was PII contained in them. And if the program contains a data race,
> well, you can imagine the rest.
>
> What is needed is a way of generating values that are inert, structurally
> sound and semantically meaningless, possibly semi-random but otherwise
> plausible. Anything else just feels various degrees of wrong. Without that
> the fact that values are one level of indirection less far away than usual
> and that they are so neatly packaged sure is making sun.misc.Unsafe start
> to look really tempting!
>
>
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
> From: Brian Goetz
> Sent: Sunday, December 13, 2015 20:08
> To: Timo Kinnunen
> Cc: Maurizio Cimadamore; Paul Benedict; valhalla-dev at openjdk.java.net
> Subject: Re: Thoughts on peeling and readability
>
> Primitives like int and long are "special" in that all bit patterns are
> valid and there are no integrity constraints that would prevent a client
> from requesting a specific bit pattern. But this is the special case, not
> the general case. A linguistic construct like T.default would have to work
> for *all* T, not just the special cases.
>
> There's nothing to stop you from writing an implementation that takes
> advantage of knowledge of specific types like int; there's a range of
> options there. What we're uninterested in doing is allowing clients to
> have unsafe bit-level access to the representation of all value types.
> There's also nothing to stop you from writing value types that allow
> raw-bit operations; we're just not going to require that every value type
> support that (which is what asking for a bit-oriented T.default would be.)
>
> T.default will almost certainly not go through a constructor; the VM will
> zero out the bits as it does with the existing built-in types (the eight
> primitive types plus references.) This process is pretty obvious from both
> a specification and implementation perspective, but it does create some
> responsibility for writers of value types -- specifically the need to deal
> with the default bit pattern.
>
> As our friends in the .NET community have discovered, trying to enforce
> that the no-arg ctor is always executed before a value is exposed is a game
> of whack-a-mole, so having T.default go through a constructor simply
> reduces the probability of the "implicit null" surprise, but doesn't banish
> it. One can be as snarky as one likes about the tradeoffs ("reinvented
> null" vs "reinvented serialization"), but the reality here is that there
> are risks lurking around both corners. Personally I like the tradeoff
> we're converging towards, but we are aware it is not perfect.
>
>
> Stepping back, rather than arguing the merits or demerits of a particular
> solution (which at this point we've more than exhausted), it's far more
> helpful to talk about the problem instead of the solution. So, let me ask
> -- what use cases are you concerned about, other then the manufacture of
> multiple sentinels for use inside data structure implementations?
>
> (I suspect in the end you will find that you will be able to accomplish
> what you need with the tools available, but the "let me specify the bit
> pattern for an arbitrary value type" approach is not the way to get there.)
>
> On 12/13/2015 1:32 PM, Timo Kinnunen wrote:
> Well, it’s ints and longs, and all primitive types copyable around as ints
> and longs, and all objects serializable to and from arrays of ints and
> longs, and all arrays of such, and all values made of such, and all arrays
> of such values, and all values made of such values, aaand I’m probably
> missing a dimension or two somewhere.
>
> These values are just as valid regardless of which bit patterns, all-zero
> or not, were used to construct them. They are safe to be copied around and,
> if you implemented them yourself, hashCode, equals, toString and any
> component-wise operations could also be done safely. Such operations simply
> can’t call any foreign code of any of the value or reference types
> involved. We don’t expect that we can take an arbitrary Object, use
> reflection to zero out its fields and then be able to call its instance
> methods like nothing had happened either. So, for reference types these
> operations would have to done using reflection, for value types VarHandles
> might give better performance.
>
> Ultimately I guess it all depends on whether T.default invokes some
> constructor or not. If it doesn’t or if the constructor is specified to
> always succeed trivially then we have reinvented null. Our new nulls trade
> off a large number of NPEs for silent invariant violations. This could be a
> good tradeoff but without knowing about the consequences of the violations
> beforehand it’s gonna be hard to say for certain. The good news is that
> hey, null is back!
>
> If T.default executes a constructor that can refuse an all-zero bit
> pattern, then we have reinvented serialization for value types and are
> requiring all value types support it. Our serialization protocol only
> recognizes one input value and can only deserialize, so it’s a bit useless.
> But with the addition of the missing serialize-function we can then define
> transforms for long[] <-> val <-> long[] and will have a quite general and
> capable system already.
>
> Or are we gonna just include the worst parts from both?
>
>
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
>
> From: Brian Goetz
> Sent: Sunday, December 13, 2015 01:20
> To: Timo Kinnunen
> Cc: Maurizio Cimadamore;Paul Benedict;valhalla-dev at openjdk.java.net
> Subject: Re: Thoughts on peeling and readability
>
>
> No. In general, to/from raw bit operations are not safe except in a few
> corner cases (like int and long.) Values are not uncontrolled buckets of
> bits.
>
> On the other hand, T.default *is* safe, because every type has a default
> bit pattern which is the initialization of any otherwise uninitialized
> field or array element. (It so happens that this default bit pattern
> corresponds to all zero bits for all types, though this is mostly a
> convenience for VM implementors.) For a composite value, the default value
> is comprised of the default value for all fields. By *definition*, the
> all-zero bit pattern is a valid element of all value types. However, there
> is no guarantee that any other bit pattern is valid for any given value
> type.
>
> If a particular value type wants to expose to/from raw bit constructors,
> that’s fine — but you’re asking for a language feature that applies to
> *all* values — and there is no guarantee that this is a safe operation for
> all values.
>
> On Dec 12, 2015, at 5:44 PM, Timo Kinnunen <timo.kinnunen at gmail.com>
> wrote:
>
>
>
> Field layout and bit fiddling isn’t exactly what I was thinking. Rather I
> was thinking something like Float.floatToRawIntBits() and
> Double.doubleToRawLongBits(), but without having to know about the types
> Float and Double or how many bits are in their raw bits. So something like
> this syntax:
>
> static <any T> T nextUp (T value) {
> <?missing type?> rawBits =
> T.toRawBits(value);
> T nextValue = T.fromRawBits(rawBits + 1);
> return nextValue;
> }
>
> This should fit in Valhalla reasonably well, as it is just a
> generalization of T.default with its complement operation included. And as
> it is, all of the problems you listed already apply to T.default. For
> example, a value type with one long field: If the long value in the field
> is a handle pointing to a memory-mapped buffer then any use of a default
> value of such a type could cause a crash. Which can include asking a
> properly constructed value if it is equal to any of the values in an array
> you have.
>
>
>
>
> --
> Have a nice day,
> Timo
>
> Sent from Mail for Windows 10
>
>
>
> From: Brian Goetz
> Sent: Saturday, December 12, 2015 18:21
> To: Timo Kinnunen;Maurizio Cimadamore;Paul
> Benedict;valhalla-dev at openjdk.java.net
> Subject: Re: Thoughts on peeling and readability
>
>
> Precise layout and bit control of values are anti-goals of Valhalla, so
> we're not really exploring this direction at this time.
>
> The problem with approaches like the one you suggest is they fall apart
> as soon as you leave the realm of "primitives modeled as values." What
> about values that have refs in them? What about values whose
> representations are private? Their implementation is supposed to be in
> sole control of their representation. This runs contrary to the "codes
> like a class" dictum.
>
>
>
>
>
>
> On 12/12/2015 4:43 AM, Timo Kinnunen wrote:
> > Hi,
> >
> > One thing that I don’t remember seeing is any syntax for constructing
> arbitrary values in generic code without having to know about the precise
> field layouts and what the meaning of such fields is. Something like
> T.default but for values other than 0. Perhaps T.default(12345) or some
> such?
> >
> > Or maybe this is slated to go with bytecode type specialization… What
> sort of syntax is envisioned to be driving that anyways?
> >
> >
> >
> >
> >
>
>
>
>
>
>
> --
> Sent from my phone
>
>
>
--
Sent from my phone
More information about the valhalla-dev
mailing list