Implementing Towards Better Serialization

Mon Aug 10 12:48:03 UTC 2020

> Most definitions come attached to a ball of domain theory or category
> theory.  Let me take a whack at it without all that.
>
> Imagine two domains, B and S,  for “big” and “small”.  The idea is that
> there is a subset of B that looks and behaves like S, so we are going to
> *embed* S in B.  We have two functions:
>
>     e : S -> B
>     p:  B -> S
>
> We’ll start by saying that P is partial on B; for some values, it throws
> (this can be refined).  Let’s let B’ be the subset of B where it does not
> throw, and pg be the restriction of p  to B’ (the “good” part of p).  What
> we want is for the composition e-then-p to behave like an identity, and
> also for pg-then-e.  (Waving hands about whether I mean == or .equals or
> some sort of observational equivalence.)
>
> An example may help.  Let S=int and B=Integer.  We  can  embed  S in B  by
> boxing;  we  can project B to S by unboxing, for all of B  except null.
> (In domains like this where things like + are defined on both, e and p are
> homomorphic.)  Similarly, there are e-p pairs from int to double (though
> this requires more mechanics since casting from  3.14 to int doesn’t throw,
> it truncates, but the notion can be extended by defining B’ to be the image
> of e(S), and defining a partial ordering for approximation, where throwing
> (bottom) is the worst possible approximation), and similar with widening
> between int and long.
>
> Here,  the e-p pairs we want are between an object’s representation and a
> tuple that carries its external state; I can construct one between Point
> and tuples (x, y), or between List<T> and a tuple (T[]).  If the
> constructor checks invariants (Rational will reject denom=0), that’s where
> the B’ subset comes in.
>
> The point is, this relationship comes up all the time, when we want to
> describe one domain by a “richer” domain.  The set of (n,d) tuples is
> richer than the set of rationals, but if you restrict to the set of tuples
> where d != 0, the two are isomorphic.  The set of doubles is richer than
> the set of ints, but you only have to give a double one push, and then
> you’re in a domain that behaves isomorphically.
>
>
Thanks, this is really interesting, and I must admit I probably get at most
80%. Are there any texts you can suggest as a starting point to domain
theory or category theory?

So if I get this right, my job as a designer of an interoperable
serialization schema is to create a schema S that is an embedded
projections pair of multiple programming language domains B. And to take it
a step further, the encoded data of an object is the embedded projection
pair of the internal representation of that object, and that internal
representation is the embedded projection pair of the target object we
started with. Am I generalizing this construct a bit too far?

> There is some brittleness, but I think this is counterweighted by the fact
> that the design of the deconstruction feature is meant to mirror that of
> constructors, and I expect that “matching ctor/dtor pairs” will be a common
> programming pattern (because you can derive so many good properties, not
> just serialization, from them.)  So while “methods with names like getXxx”
> are a noisy space to draw from and try to match up with constructors, dtors
> are the natural dual of constructors and so I expect a lot less “drift”.
>
>
As all the examples so far were reasonably trivial from a S->B point of
view with direct mapping, I wanted to put together something a little less
trivial. This would be something you might find in a protocol where the
header is decoded from two bytes. It's also an example of where the
ctor/dtor proposal wouldn't fit as nicely (unless the state byte was
updated after each change of flags or status).

public class Header {

   enum Status { GOOD(0), BAD(1), UNKNOWN(2), NORMAL(3); ... }

   private Status status;
   private boolean flagOne;
   private boolean flagTwo;
   private byte value;
   private int someCalculatedData;

   public record HeaderCoded(byte state, byte value) {}

   public Header(HeaderCoded coded) {
      this.status = Status.fromId(coded.state() | 0x03);
      this.flagOne = (coded.state | 0x4) == 0;
      this.flagTwo = (coded.state | 0x8) == 0;
      this.value = coded.value;
      this.someCalculatedData = (this.flagOne && this.status ==
Status.GOOD) ? doSomeCalculation() : 0;
   }

   public HeaderCoded data() {
      return new HeaderCoded((byte) ((flagTwo ? 0x08 : 0) | (flagOne ? 0x04
: 0) | status.id), value);
   }
   ... setters/getters/etc ...
}

or better yet, the conversion pair can be put into the record. This becomes
easier to protect later if we don't want to expose this in the public API.

public record HeaderCoded(byte state, byte value) {
    HeaderCoded(Header header) {
       this.state = (byte) ((header.flagTwo ? 0x08 : 0) | (header.flagOne ?
0x04 : 0) | header.status.id);
       this.value = header.value;
    }
    public Header convert() {
       Status status = Status.fromId(state | 0x03);
       boolean flagOne = (state | 0x4) == 0;
       boolean flagTwo = (state | 0x8) == 0;
       return new Header(status, flagOne, flagTwo, value);
    }
}

Given the concept of the embedded projection pair, I'll throw out a far
left of field idea. The conversions are not really part of the
functionality of the Header class or HeaderCoded record. They are only
there as we need somewhere to put those methods. We could write the the
conversion pairs as:

public Header convert(HeaderCoded coded) {
    Status status = Status.fromId(coded.state() | 0x03);
    boolean flagOne = (coded.state | 0x4) == 0;
    boolean flagTwo = (coded.state | 0x8) == 0;
    return new Header(status, flagOne, flagTwo, coded.value);
}

public HeaderCoded convert(Header header) {
   return new HeaderCoded((byte) ((header.flagTwo ? 0x08 : 0) |
(header.flagOne ? 0x04 : 0) | header.status.id), value);
}

That starts to look a lot like a chance for type cast operator overloading.
Are these the droids I'm looking for? :) It would then be possible to cast
Header into the serialized form:

  Serialized serialized = (Serialized) header;

and back again:

  Header header = (Header) serialized;

Where S == B for an object, it can return the object.

Unfortunately, all of this still doesn't solve the ctor/accessors pairing.

>
>
> The second option is winning for me at the moment as it looks a little
> cleaner, even though there's more chance for errors.
>
>
> You gotta do what you gotta do :)
>
>
I would have preferred a magical solution to my problem right now. :)

I think I've dug into enough detail and used enough of your time and
mailing list space for now. If I make progress on an implementation I'll be
back with some of my other experiences.

Thanks,
David.