sentinels & who owns the value space Re: valhalla-dev Digest, Vol 17, Issue 10

Thomas W twhitmore.nz at gmail.com
Fri Dec 18 23:52:22 UTC 2015


Hi Mohammad.

> A properly written data structure (see trove, gs collections and koloboke
for examples) does
> not preclude the client from using the sentinel values in the collection
or for other purposes. null is a sentinel in hashmap, yet
> null keys are fully supported. A properly written data structure hides
the actual sentinel values from the client.

Thanks for your email. You may be referring to a sentinel for _entries_,
which is a different issue from sentinels for _keys_ as _entries_ are not
the client value space. I'm not sure it would even be correct terminology
to describe null entries as "sentinels"; they're definitely not sentinel
key values.

If the collection requires a sentinel for internal meaning, an unwrapped
null sentinel value cannot be used in a collection allowing null values
from the client where the same storage element is required to store both.

This is due to the simple fact that the meaning of that sentinel cannot not
be distinguished from all valid client values. Don't forget, if you want to
consider eg. a byte type -- the client has 256 valid values, plus you're
saying the collection has 1 more (the sentinel) for a total of 257 values.

Bytes do not allow 257 values. When I said information theory, I meant it.

We are discussing use of sentinels in value space _directly_. I expect you
are referring to alternative approaches using additional storage/ or
indirection, which make it easy to solve the problem -- see below. However
your (very vague) references presumably to those approaches don't seem to
shed any light on the (different) topic which we were discussing.

Sentinels directly in value space, with no extra storage, indirection or
workarounds, is really what we were discussing.

> No information theory is broken, because the client does not need to be
aware of the internal sentinels used in the data
> structure.

Information theory then tells us that a collection allowing 256 valid
client values, plus distinguishing a sentinel, must using more than 1 byte
for storage. Indirection, reference-type sentinels or additional structure
are all possible. The collection will not be using just the declared
datatype for storage, since those 257 distinct meanings cannot be coded in
a single byte.

Once we accept a need for indirection/ extra storage, the 'empty cell'
problem is already solved -- the collection has space for a flag/ or can
easily mask null with a unique non-null reference, etc. Simplest solutions
are the most direct; which would seem to leave no benefit (only negatives)
for random-bit approaches.


Regards,
Thomas


More information about the valhalla-dev mailing list