My experience with Sealed Types and Data-Oriented Programming

David Alayachew davidalayachew at gmail.com
Fri Sep 9 13:41:11 UTC 2022


Hello Amber Team,

I just wanted to share my experiences with Sealed Types and Data-Oriented
Programming. Specifically, I wanted to show how things turned out when I
used them in a project. This project was built from the ground up to use
Data-Oriented Programming to its fullest extent. If you want an in-depth
look at the project itself, here is a link to the GitHub. If you clone it,
you can run it right now using Java 18 + preview features.

The GitHub repo = https://github.com/davidalayachew/RulesEngine

Current version =
https://github.com/davidalayachew/RulesEngine/commit/0e7fa42db4bbebaa3aa30f882645226d28e63ff4

The project I am building is essentially an Inference Engine. This is very
similar to Prolog, where you can give the language a set of rules, and then
the language can use those rules to make logical deductions if you ask it a
question. The only difference is that my version accepts plain English
sentences, as opposed to requiring you to know syntax beforehand.

Here is a snippet from the logs to show how things work.

David is a programmer
-------- OK
Every programmer is an engineer
-------- OK
Every engineer is an artist
-------- OK
Is David an artist?
-------- CORRECT

As you can see, it takes in natural English and gleans rules from it, then
uses those rules to perform logical deductions when a query is later made.

Sealed types made this really powerful to work with because it helped me
ensure that I was covering every edge case. I used a sealed interface to
hold all of the possible rule types, then would use switch expressions to
ensure that all possible branches of my code were handled if the parameter
is of that sealed type. For the most part, it was a pleasant experience
where the code more or less wrote itself.

The part that I enjoyed the most about this was the ease of refactoring
that sealed types, records, and switch expressions allowed. This project
grew in difficulty very quickly, so I found myself refactoring my solution
many times. Records automatically update all of their methods when you
realize that that record needs to/shouldn't have a field. And switch
expressions combined with sealed types ensured that if I added a new
permitted subclass, I would have to update all of my methods that used
switch expressions. That fact especially made me gravitate to using switch
expressions to get as much totality as possible. When refactoring your
code, totality is a massive time-saver and bug-preventer. Combine that with
the pre-existing fact that interfaces force all subclasses to have the
instance methods defined, and I had some powerful tools for refactoring
that allowed me to iterate very quickly. I found that to be especially
powerful because, when dealing with software that is exposed to the outside
world, making that code easy to refactor is a must. The outside world is
constantly changing, so it is important that we can change quickly too.
Therefore, I really want to congratulate you all on creating such a
powerful and expressive feature. I really enjoyed building this project,
and I'm excited to add a lot more functionality to it.

However, while I found working with sealed types and their permitted
subclasses to be a smooth experience, I found the process of turning data
from untyped Strings into one of the permitted subclasses to be a rather
frustrating and difficult experience.

At first glance, the solution looks really simple - just make a simple
parse method like this.

public static MySealedType parse(String sanitizedUserInput)
{

    //if string can safely be turned into Subclass1, then store into
Subclass1 and return
    //else, attempt for all other subclasses
    //else, fail because string must be invalid to get here

}

Just like that, I should have my gateway into the world of strongly typed,
expressive data-oriented programming, right? Unfortunately, maintaining
that method got ugly fast. For starters, I don't have a small handful of
permitted subclasses, I have many of them. Currently, there are 11, but I'm
expecting my final design to have a little over 30 subclasses total. On top
of that, since my incoming strings are natural English, each of my if
branches carries non-trivial amounts of logic so that I can perform the
necessary validation against all edge cases.

To better explain the complexity, I had created a complex regex with
capture groups for each permitted subclass, and then used that to validate
the incoming String. If the regex matches, pass the values contained in the
capture groups onto the constructor as a List<String>, then return the
subclass containing the data of the string.

At first, this worked well, but as the number of subclasses grew, this got
very difficult to maintain as well. This difficulty was twofold.

Problem 1 - I found that my regex would frequently be misaligned with my
constructors during refactoring. If I decided that a record needed a new
field, or that a field should be removed, I would update the record but not
the regex, and then find errors during runtime. In fact, I sometimes didn't
find errors during runtime because List<String> had the same number of
elements as the constructor was expecting, but the fields were not aligned
to the right index. This cost me a lot of development time.

Problem 2 - I found that there wasn't an easy way to make sure that all of
my subclasses followed all the rules that they were supposed to, and thus,
I kept forgetting to implement those rules in one way or another every time
I refactored. For example, for problem 1, I said that every subclass must
have a regex. However, I couldn't find some compiler enforced way to
enforce this.

* Interfaces are only able to enforce instance methods. However, I can't
have my regex be an instance method. That would be putting the cart before
the horse - I am using the regex to create an instance, so the instance
method is not helpful here

* If I used a sealed abstract class instead and had permitted subclasses
instead of permitted records, I still couldn't store my regex as a final
instance field for the above reason.

* In Java, static methods cannot be overrided, so I can't use a static
method on my sealed interface. The static method would belong to the
interface, not to the child subclasses.

* And a static final field would not work for the same reason above.

I ran into similar troubles when creating the alternative constructors for
each permitted subrecord. Almost all of the above bulleted points apply,
with the only exception being that for an abstract class, you can
*technically* force your subclasses to call the super constructor. However,
that did very little to help me solve my problem. Maybe I'm wrong and this
is the silver bullet I am looking for, but it certainly doesn't seem like
it. Therefore, I stuck to my original solution of a sealed interface with
permitted subrecords.

But back to my original point. I had 2 problems - misalignment and no
enforcement of my abstract rules. Since I kept changing and creating and
recreating more and more subclasses, these 2 pain points became bigger and
bigger thorns in my side. Worse yet, I actually wanted to add more rules to
make these classes even easier to work with, but decided not to after
seeing the above difficulty.

To alleviate problem 1, I stored my regexes in the records themselves, so
that I would be forced to see the regex each time I looked at the record.
For the most part, that solution seems to be good enough to deal with regex
misalignment.

To alleviate problem 2, I decided to brute force some totality and
enforcement of my own. I fully admit, the solution I came up with here is a
bad practice and something no one should imitate, but I found this to be
the most effective way for me to enforce the rules I needed.

I used reflection on my sealed interface. I got the sealed type class,
called Class::getPermittedSubclasses, looped through the subclasses, did an
unsafe cast from Class<?> to Class<SealedInterface> (because
::getPermittedSubclasses doesn't do that on its own for some reason???),
called Class::getConstructor with the parameter being List.class (to
represent the list of strings), and then used that to construct a
Map<Pattern, Function<List<String>, MySealedInterface>>. I didn't do the
same for the regex because that monstrosity of code included a Map::put
which would take in the regex and the constructor. Therefore, it was pretty
easy to remember both since they were right next to each other, and JVM
will error out on startup if I forget to include my constructor. So, I have
effectively solved both of my problems, but in less than desirable ways.

For problem 2, one analogy that kept popping into my head was the idea of
there being 2 islands. The island on the right has strong types, totality,
pattern matching, and more. Meanwhile the island on the left is where
everything is untyped and just strings. There does exist a bridge between
the 2, but it's either difficult to make, doesn't scale very well, or not
very flexible.

This analogy really helped realize my frustration with it because it
actually showed why I like Java enums so much. You can use the same analogy
as above. The island on the right has ::ordinal, ::name, ::values, enums
having their own instance fields and methods, and even some powerful tools
like EnumSet and EnumMap. But what really ties it all together is that,
there is a very clear and defined bridge between the left and the right -
the ::valueOf method. Having this centralized pathway between the 2 made
working with enums a pleasure and something I always liked to use when
dealing with my code's interactions with the outside world. That ::valueOf
enforced a constraint against all incoming Strings. And therefore, it
allowed me to just perform some sanitizations along the way to make sure
that that method could do it's job (uppercase all strings, remove
non-identifier characters, etc). If it wasn't for JEP 301, I would call
enums perfect.

I just wish that there was some similar centralized pathway between
data-oriented programming and the outside world. Some way for me to define
on my sealed type, a method to give me a pathway to all of the permitted
subclasses. Obviously, I can build it on my own, but that is where most of
my pain points came from. Really, having some way to enforce that all of my
subclasses have a similar class level validation logic and a similar class
level factory/constructor method is what I am missing.

That is the summary of my thoughts. Please do not misinterpret the extended
discussion on the negatives to mean that I found the negative to be even
equal to, let alone more than, the positives. I found this to be an
overwhelmingly pleasant experience. Once I got my data turned into a type,
everything flowed perfectly. It was just difficult to get it into a type in
the first place, and it took a lot of words for me to explain why.

Thank you all for your time and your help!
David Alayachew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20220909/15f7182d/attachment-0001.htm>


More information about the amber-dev mailing list