Newbie dumb question

Wed Jul 24 14:35:18 UTC 2024

Hi,

Got a newbie dumb question...

Why not have the compiler mandatorily generate code models for *all* class
files?

Motivation: I'm interested in how Babylon could facilitate the general use
case of remote execution, e.g. for database access (i.e., moving the code
to the data vs. the other way around like we do today), high-performance
computing, holomorphic encryption, or some other purpose:

   1. We want to describe a computation/algorithm that will execute on some
   remote, non-Java runtime
   2. We want developers to be able to write the computation in completely
   normal Java and have it translated and serialized
   3. "Normal" includes the use of usual classes like java.util.HashMap,
   Guava utility classes, etc.

This is similar to the GPU use case except for #3 - AFAICT, with current
Babylon your reflected code model doesn't include classes like HashMap they
aren't code-reflected - correct?

In an ideal world, it should be possible to access the code model for every
non-native method, so there are no "holes" in your view of the code.
Otherwise, when coding up your algorithm you'll have one hand tightly tied
behind your back - no collections, no streams, no Guava, no Apache
commons-foobar, etc.

Then you could really go to town optimizing your code - virtual dispatch
becomes non-virtual, escape analysis eliminates heap allocations, lots of
methods inlined, etc. You could take a huge Java processing pipeline and
lower it down to target a relatively simplistic sandbox-style virtual
machine environment (could be WASM, SQL stored procedure, or even machine
code). For scenarios where the pipeline is going to be executed frequently
(e.g., database query) it is well worth it to pay this one-time, upfront
cost for these extensive optimizations.

Of course, the code couldn't do native code things like Thread.start()
without some kind of shim... and there may be more native code "gotchas"
lying around in the standard Java libraries than one might think. But I'm
guessing most stuff you would want to send over the network for the above
examples of remote execution would need few if any native methods.

Side note: This situation reminds me of when I first switched from svn to
git and gasped at the notion that git stores the *entire revision history*
on your local disk. It turns out to work great, and all the scary seeming
downsides are not an issue (e.g., space is not an issue with today's disks
and the fact that git just ZIPs up the past history into giant "packs").
Moreover, there are important upsides like speed, private branches, repo
portability, server/client symmetry, the ability to work on an airplane,
etc.

Similarly, I'm sure there are a lot of "obvious" reasons to not make code
models mandatory, like disk space, compiler speed, and fear of mass panic.
But in the end these may also be non-issues, especially if having code
models available for all classes opens up a lot of use cases that aren't
available now. Maybe there are some simple things that could be done to
quiet the naysayers, like storing the JRE code models in a separate JAR
file, etc.

Put another way, I think we need to think bigger...  e.g., project Babylon
could position Java as a universal "starting language" so to speak.

-Archie

-- 
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/babylon-dev/attachments/20240724/a0f0ae40/attachment.htm>