Feedback about an experiment to embed Python interpreter with FFI API

Tue Dec 28 12:06:00 UTC 2021

Hi,

Recently I'm tried to experiment with FFI API by implementing a library
for embedding a Python interpreter inside of a Java program. Such library
could be a very powerful tool for closing a gap between the vast amount of
mathematical libraries in Python (like numpy, scipy, various ML libraries
and like that) and vast amount of general purpose libraries in Java (like
Netty and Akka). Due to it's dynamic nature and rather simplistic and slow
GC, Python is not particulary well-suited for developing large-scale
applications (we easily hit 50ms GC pauses under load and inability to do
any refactors even on a medium-scale project). Java, on other hand, lacks
any ecosystem of scientific libraries - after all, Java is primarily
intended for building applications, and is excellent exactly for that
purpose. So writing application in Java and using Python code as a set of
computational kernels would be especially beneficial.

During the development, it turned out that tooling support is equally as
important as runtime support. So this feedback will not touch FFI API
itself. Python has many elaborate structures, so JNI programming model
is beneficial here because it lets GCC to handle all those structures
automatically, and wrapping every structure with 50 members by hand is a
laborous task. So the experiment failed, but FFI API is not to blame here,
and definitely situation will improve as tooling improves.

The topic I want to talk about is a small flag `--enable-native-access=X`
that looks harmless and useful at first glance. tl,dr: it is neither
harmless nor useful. The reasoning that "user must opt-in to use unsafe
APIs" implies that: (1) unsafe APIs are something illegal and should be
avoided; (2) user has competencies to make such decisions; (3) user has
other choice than use unsafe APIs. From my point of view, all three
assumptions are false.

When you call unsafe API, you usually do that because you don't have other
options, just like embedding Python interpreter: you want not just any
Python interpreter, you want Cython library ecosystem which exists only
with that specific native interpreter and requires specific unsafe native
calls to interact with. You don't have an option to avoid unsafe native
API. At least this option is inseparable from avoiding Python itself. And
now you a forced to insert `--enable-native-access=MODULE` every time you
use PROGRAM. Not every time you write a code (like Scala feature flags for
certain dangerous features), but every time you deploy an application. This
breaks library encapsulation (you can't just "bring and use" the library).
Want to use Python in project A? Go ahead and patch launcher scripts that
were otherwise perfectly good. Want to use Python in project B? Well, you
know what to do, go ahead and patch launcher scripts. Some of your libraies
transitively used native code? You just got a new quest: find which module
did it.

There are many libraries which (currently) exists as native-only and
probably will remain in that state for a long time. WebP image decoder
(backed by libwebp.so), H.264 video decoder (backed by libavcodec), Python.
With this approach, they are becoming "second-class citizens" that are
facing deployment difficulties just because they use large existing
codebases that will never be ported to Java. Never. You will forever have
to add `--allow-native-access=X` every time you use them.

So, the second assumption: users are competent to make such decisions. No,
they aren't. In server world, when people want to install some application,
they will even tweak kernel sysctls, not to mention JVM command line flags.
Does it make application any safer or secure? No. You choose applications
to deploy and use by their business features, not by the API they use. If
deployment requires some specific flags - they will be either included in
default launcher scripts or copy-pasted without any examination. Windows
Vista experienced that with UAC: turns out that asking for consent for no
reason just makes user to give their consent for no reason. It does not
make the system any safer or more secure.

Malicious (or poorly written) program or application can do much more to
crash the JVM or whole OS. Throwing an Error without a stack trace, calling
System.exit or just doing `rm -rf /` or `rm -rf ${HOME}`, just to name a
few. So, what else should we expect? --enable-system-exit? --enable-file-io?
--enable-process-builder? --enable-override-stack-trace? Java has long
story of trying to isolate untrusted code from a system which ended with a
SecurityManager being terminally deprecated. When you have untrusted code
in your class path (hello, JNDI!) - you are in much larger troubles than
unrestricted native access. Even regular file I/O is dangerous. For
fault-tolerant systems the right way to achieve fault tolerance with
native access involved is to separate distinct processes and let OS do
it's job. No flags will help there.

Programs crash, yes. We live in real world where nothing is perfect.
High-level programs give high-level crashes, low-level programs give low-
level crashes. Note that when saying "crashes", I talk about unexpected
malfunctions, not about something like FileNotFoundException that is
expected to come from file routines. Yes, low-level crashes cannot be
caught and usually shut down whole JVM. Most of high-level crashes are not
better in any way and fail the environment in which they occur: request,
thread, processing job or whole application. But feature flag is required
even for well-debugged program "just because", and does not help at all for
buggy programs.

Third assumption is already outlined, but just to repeat. If someone wants
to run a specific application - they will do it. Same applies for app
development - if you want to use the library, you usually want it for a
specific reasons. Feature flag just discourage usage of all native
libraries just for being inherently unsafe. Moreover, it pushes the final
decision to an entity which usually has no power to choose APIs used by an
application. They just need to run it.

The constructive part: what will be more helpful and much less burdensome? I
would say that much better alternative to a feature flag is just to record
all modules (or classes) which do native access, either by FFI or (in
future) by JNI. Then just include list of such modules or classes in crash
dump. Straightforward: the VM has crashed, but classes A, B or C might be
an issue since they used native access. Try to removing these classes first
before reporting any bugs. No deployment burden, no breaking of compatibilty
by introducing new restrictions and much more helpful crash dumps.

Hope that this feature flag will be transformed to a helpful retrospecive
debugging tool instead of restrictive "yes, i want to run program" consent.
Feature flag in it's current implementation is no different from requiring
a government license to use knife at kitchen since knives are so unsafe
and someone can accidentally hurt itself.

Otherwise, FFI API is great and surely will be very useful in many cases
where JNI is an overkill. Just don't introduce deployment burden with
useless feature gates. Or at least invert the logic for rare people that
really need to control which parts of system have native access capability.
E.g. deny native access only if some flag like --enable-native-access-
control is set and module is not explicitly permitted.

With best hopes,
Maxim Karpov.