My experience with Sealed Types and Data-Oriented Programming

Fri Sep 9 16:38:00 UTC 2022

> From: "David Alayachew" <davidalayachew at gmail.com>
> To: "amber-dev" <amber-dev at openjdk.org>
> Sent: Friday, September 9, 2022 3:41:11 PM
> Subject: My experience with Sealed Types and Data-Oriented Programming

> Hello Amber Team,

> I just wanted to share my experiences with Sealed Types and Data-Oriented
> Programming. Specifically, I wanted to show how things turned out when I used
> them in a project. This project was built from the ground up to use
> Data-Oriented Programming to its fullest extent. If you want an in-depth look
> at the project itself, here is a link to the GitHub. If you clone it, you can
> run it right now using Java 18 + preview features.

> The GitHub repo = [ https://github.com/davidalayachew/RulesEngine |
> https://github.com/davidalayachew/RulesEngine ]

> Current version = [
> https://github.com/davidalayachew/RulesEngine/commit/0e7fa42db4bbebaa3aa30f882645226d28e63ff4
> |
> https://github.com/davidalayachew/RulesEngine/commit/0e7fa42db4bbebaa3aa30f882645226d28e63ff4
> ]

> The project I am building is essentially an Inference Engine. This is very
> similar to Prolog, where you can give the language a set of rules, and then the
> language can use those rules to make logical deductions if you ask it a
> question. The only difference is that my version accepts plain English
> sentences, as opposed to requiring you to know syntax beforehand.

> Here is a snippet from the logs to show how things work.

> David is a programmer
> -------- OK
> Every programmer is an engineer
> -------- OK
> Every engineer is an artist
> -------- OK
> Is David an artist?
> -------- CORRECT

> As you can see, it takes in natural English and gleans rules from it, then uses
> those rules to perform logical deductions when a query is later made.

> Sealed types made this really powerful to work with because it helped me ensure
> that I was covering every edge case. I used a sealed interface to hold all of
> the possible rule types, then would use switch expressions to ensure that all
> possible branches of my code were handled if the parameter is of that sealed
> type. For the most part, it was a pleasant experience where the code more or
> less wrote itself.

> The part that I enjoyed the most about this was the ease of refactoring that
> sealed types, records, and switch expressions allowed. This project grew in
> difficulty very quickly, so I found myself refactoring my solution many times.
> Records automatically update all of their methods when you realize that that
> record needs to/shouldn't have a field. And switch expressions combined with
> sealed types ensured that if I added a new permitted subclass, I would have to
> update all of my methods that used switch expressions. That fact especially
> made me gravitate to using switch expressions to get as much totality as
> possible. When refactoring your code, totality is a massive time-saver and
> bug-preventer. Combine that with the pre-existing fact that interfaces force
> all subclasses to have the instance methods defined, and I had some powerful
> tools for refactoring that allowed me to iterate very quickly. I found that to
> be especially powerful because, when dealing with software that is exposed to
> the outside world, making that code easy to refactor is a must. The outside
> world is constantly changing, so it is important that we can change quickly
> too. Therefore, I really want to congratulate you all on creating such a
> powerful and expressive feature. I really enjoyed building this project, and
> I'm excited to add a lot more functionality to it.

> However, while I found working with sealed types and their permitted subclasses
> to be a smooth experience, I found the process of turning data from untyped
> Strings into one of the permitted subclasses to be a rather frustrating and
> difficult experience.

> At first glance, the solution looks really simple - just make a simple parse
> method like this.

> public static MySealedType parse(String sanitizedUserInput)
> {

> //if string can safely be turned into Subclass1, then store into Subclass1 and
> return
> //else, attempt for all other subclasses
> //else, fail because string must be invalid to get here

> }

> Just like that, I should have my gateway into the world of strongly typed,
> expressive data-oriented programming, right? Unfortunately, maintaining that
> method got ugly fast. For starters, I don't have a small handful of permitted
> subclasses, I have many of them. Currently, there are 11, but I'm expecting my
> final design to have a little over 30 subclasses total. On top of that, since
> my incoming strings are natural English, each of my if branches carries
> non-trivial amounts of logic so that I can perform the necessary validation
> against all edge cases.

> To better explain the complexity, I had created a complex regex with capture
> groups for each permitted subclass, and then used that to validate the incoming
> String. If the regex matches, pass the values contained in the capture groups
> onto the constructor as a List<String>, then return the subclass containing the
> data of the string.

> At first, this worked well, but as the number of subclasses grew, this got very
> difficult to maintain as well. This difficulty was twofold.

> Problem 1 - I found that my regex would frequently be misaligned with my
> constructors during refactoring. If I decided that a record needed a new field,
> or that a field should be removed, I would update the record but not the regex,
> and then find errors during runtime. In fact, I sometimes didn't find errors
> during runtime because List<String> had the same number of elements as the
> constructor was expecting, but the fields were not aligned to the right index.
> This cost me a lot of development time.

> Problem 2 - I found that there wasn't an easy way to make sure that all of my
> subclasses followed all the rules that they were supposed to, and thus, I kept
> forgetting to implement those rules in one way or another every time I
> refactored. For example, for problem 1, I said that every subclass must have a
> regex. However, I couldn't find some compiler enforced way to enforce this.

> * Interfaces are only able to enforce instance methods. However, I can't have my
> regex be an instance method. That would be putting the cart before the horse -
> I am using the regex to create an instance, so the instance method is not
> helpful here

> * If I used a sealed abstract class instead and had permitted subclasses instead
> of permitted records, I still couldn't store my regex as a final instance field
> for the above reason.

> * In Java, static methods cannot be overrided, so I can't use a static method on
> my sealed interface. The static method would belong to the interface, not to
> the child subclasses.

> * And a static final field would not work for the same reason above.

> I ran into similar troubles when creating the alternative constructors for each
> permitted subrecord. Almost all of the above bulleted points apply, with the
> only exception being that for an abstract class, you can *technically* force
> your subclasses to call the super constructor. However, that did very little to
> help me solve my problem. Maybe I'm wrong and this is the silver bullet I am
> looking for, but it certainly doesn't seem like it. Therefore, I stuck to my
> original solution of a sealed interface with permitted subrecords.

> But back to my original point. I had 2 problems - misalignment and no
> enforcement of my abstract rules. Since I kept changing and creating and
> recreating more and more subclasses, these 2 pain points became bigger and
> bigger thorns in my side. Worse yet, I actually wanted to add more rules to
> make these classes even easier to work with, but decided not to after seeing
> the above difficulty.

> To alleviate problem 1, I stored my regexes in the records themselves, so that I
> would be forced to see the regex each time I looked at the record. For the most
> part, that solution seems to be good enough to deal with regex misalignment.

> To alleviate problem 2, I decided to brute force some totality and enforcement
> of my own. I fully admit, the solution I came up with here is a bad practice
> and something no one should imitate, but I found this to be the most effective
> way for me to enforce the rules I needed.

> I used reflection on my sealed interface. I got the sealed type class, called
> Class::getPermittedSubclasses, looped through the subclasses, did an unsafe
> cast from Class<?> to Class<SealedInterface> (because ::getPermittedSubclasses
> doesn't do that on its own for some reason???), called Class::getConstructor
> with the parameter being List.class (to represent the list of strings), and
> then used that to construct a Map<Pattern, Function<List<String>,
> MySealedInterface>>. I didn't do the same for the regex because that
> monstrosity of code included a Map::put which would take in the regex and the
> constructor. Therefore, it was pretty easy to remember both since they were
> right next to each other, and JVM will error out on startup if I forget to
> include my constructor. So, I have effectively solved both of my problems, but
> in less than desirable ways.

> For problem 2, one analogy that kept popping into my head was the idea of there
> being 2 islands. The island on the right has strong types, totality, pattern
> matching, and more. Meanwhile the island on the left is where everything is
> untyped and just strings. There does exist a bridge between the 2, but it's
> either difficult to make, doesn't scale very well, or not very flexible.

> This analogy really helped realize my frustration with it because it actually
> showed why I like Java enums so much. You can use the same analogy as above.
> The island on the right has ::ordinal, ::name, ::values, enums having their own
> instance fields and methods, and even some powerful tools like EnumSet and
> EnumMap. But what really ties it all together is that, there is a very clear
> and defined bridge between the left and the right - the ::valueOf method.
> Having this centralized pathway between the 2 made working with enums a
> pleasure and something I always liked to use when dealing with my code's
> interactions with the outside world. That ::valueOf enforced a constraint
> against all incoming Strings. And therefore, it allowed me to just perform some
> sanitizations along the way to make sure that that method could do it's job
> (uppercase all strings, remove non-identifier characters, etc). If it wasn't
> for JEP 301, I would call enums perfect.

> I just wish that there was some similar centralized pathway between
> data-oriented programming and the outside world. Some way for me to define on
> my sealed type, a method to give me a pathway to all of the permitted
> subclasses. Obviously, I can build it on my own, but that is where most of my
> pain points came from. Really, having some way to enforce that all of my
> subclasses have a similar class level validation logic and a similar class
> level factory/constructor method is what I am missing.

> That is the summary of my thoughts. Please do not misinterpret the extended
> discussion on the negatives to mean that I found the negative to be even equal
> to, let alone more than, the positives. I found this to be an overwhelmingly
> pleasant experience. Once I got my data turned into a type, everything flowed
> perfectly. It was just difficult to get it into a type in the first place, and
> it took a lot of words for me to explain why.

The solution is known as type class (see [1] for type class in Scala 3) sadly Java does not support them (yet ?). 

Another solution is to switch on classes with 
Class<? extends Parseable> parseableType = (...) Parseable.class.getPermittedSubclasses(); 
switch(parseableType) { 
case Identifier.class -> ... 
case Type.class -> ... 
// etc 
} 

Here because Parseable is sealed, the compiler knows all the classes so can do an exhaustive checks, 
sadly Java does not support them too. 

So if we can not use the compiler, the best is to write unit tests that will check that there is a field named "regex" of type pattern on all subclasses and also checks that the is a unit test for each subclasses that check the string format (by doing reflection) on the unit tests. The idea is to replace the checks you would like the compiler does for you by unit tests that ensure that everything follow the "meta-protocol" you have defined. 

By doing the reflection in the tests instead of in the main code avoid to make the main code slow and you can also check that the test coverage is good. 

> Thank you all for your time and your help!
> David Alayachew

regards, 
Rémi 

[1] https://docs.scala-lang.org/scala3/book/ca-type-classes.html 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20220909/2361da05/attachment-0001.htm>