Feedback about an experiment to embed Python interpreter with FFI API

Mon Jan 10 19:13:35 UTC 2022

Hi Maurizio,

Well, yes, the definition of "programs" and "users" is rather 
oversimplified. So let's narrow it down to the fact that there are three 
distinct entities: (1) people who write actual native code, (2) people 
who use it in their applications (that is, write business logic that 
directly or indirectly uses native libraries) and (3) people who 
deploy/run these applications. Current permission system, along with 
amendments discussed in [1], mixes these groups together, making people 
suffer for the things over which they have no control.

Let's start from the "far end": people who deploy or run the code on 
their machines. There are two points to consider: trust and 
fault-tolerance. As it was pointed out, feature flag is by no means a 
security mechanism, and it cannot protect machine from malicious code. 
On deployment, application trust is usually evaluated "as a whole" and 
restrictions are usually enforced from OS level: restricted users, 
access permissions, containers, namespaces, firewalls and all those 
things. They work the same way regardless on whether module calls the 
native code or not -- after all, everything that executes on CPU is a 
native code. So the only area in which flag can be helpful is a fault 
tolerance, because errors in native programming could easily crash the 
JVM itself, ignoring any try-catches or other measures placed to limit 
error impact.

My personal opinion is that feature flag is completely useless for 
improvement of fault tolerance: the only thing that it could do is to 
transform application that crashes sometimes to an application that 
crashes always (albeit with higher-level exception). It is directly 
harmful for well-written application (by introducing deployment burden) 
and useless for poorly-written. DevOps receives most of the damage 
caused by introduction of a feature flag, but they have no power to 
magically make application use only safe API. They are deploying what 
they have to deploy. . From the user (or devops) perspective to usual 
reaction to crash is to (1) disable or bypass faulty component to 
restore current service operation and (2) to pass the crash log to the 
developers who will investigate it in greater detail. The only thing 
that feature flag could do here is to make application "fail fast" by 
denying native access in the first place. Surely, it could help to 
restore current service operation if some higher level traps exception 
and e.g. disables faulty module. But the point is that all this logic 
applies only when application has already crashed.

So I think that feature flag should be inverted at least: it should 
allow all native access by default and provide a way to switch module 
whitelist on. This way it would not hamper well-working applications 
(because default policy would be "allow all access", even without 
warnings) and could be a useful tool to isolate faulty code from the 
rest of a system. Just implementing tainting mechanism (and listing all 
modules with native access in a crash dumps) and providing a way to 
disable some of the modules in case of faults would be helpful for 
DevOps side. I still don't think that it's necessary: you can usually 
disable faulty components in application config, or remove plugin, or do 
similar application-level change. The only thing that you need is to 
identify faulty code. Feature flag does not help here: it just gives an 
upper bound. You cannot easily identify which module from the allowed 
set caused a crash. So I don't think that there is a need for even 
inverted version of the flag: tainting mechanism and useful debug 
reports (mentioning list of modules that have native access, e.g. module 
that did most recent native operation, e.g. stack trace of Java code 
that did most recent native operation in crashed thread if stack is not 
corrupted) would be much more helpful. But this way it will be not 
directly harmful at least.

Also I can't agree to claim that FFI allows you to easily abuse native 
API. There are usually very little things that you could do "easily" in 
native. For most of the use cases FFI without proper tooling is actually 
much more compilcated than JNI. FFI forces you to redo by yourself most 
of the work that will be otherwise done by gcc/clang such as parsing 
include files, finding function signatures and determining structure 
layouts. So you either do one-shot simple calls (like ioctl's on file 
descriptors) which I cannot call an "abuse" or setup a complex tooling 
to generate FFI wrappers. The only removed obstactle is, again, 
deployment one: you don't have to carry intermediate native library for 
every architecture you target to. Maybe it also has a performance 
benefits. But from development perspective it is no way easier (and 
harder, I would say) than JNI.

To summarize, I don't see any real-world situation where permission 
mechanism could help, but I do see many situations where it is just a 
burden that you have to carry. Not to mention the fact that adoption 
ratio of module ecosystem is far from stellar.

Best regards, Maxim.

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2021-September/015036.html

04.01.2022 12:09, Maurizio Cimadamore writes:
> Hi Maxim
>> The topic I want to talk about is a small flag 
>> `--enable-native-access=X`
>> that looks harmless and useful at first glance. tl,dr: it is neither
>> harmless nor useful. The reasoning that "user must opt-in to use unsafe
>> APIs" implies that: (1) unsafe APIs are something illegal and should be
>> avoided; (2) user has competencies to make such decisions; (3) user has
>> other choice than use unsafe APIs. From my point of view, all three
>> assumptions are false.
>
> This has been discussed before (see [1]). My main suggestion here is 
> to avoid over-generalization - e.g. speaking about "programs" and 
> "users" in general is almost always incorrect and prone to 
> simplifications and biases. Which programs? Which users? Note that 
> there are actual instances of users who found the flag very useful 
> (see [2]), which suggests that reality is probably more fragmented 
> than what we'd like it to be.
>
> Native access is not illegal, of course (though some of the wording 
> around the exceptions thrown when the flag isn't there might be 
> unfortunate and suggest that). But doing native access (of any kind) 
> puts the application under much greater risk of misuse, as there is no 
> way for the JVM to enforce safety for native code (e.g. module access 
> boundaries). The fact that other mechanisms such as System.exit, or 
> process builder exists by which bad things can happen and for which no 
> protection exists is not, in itself, a justification as to why 
> accessing native features of the Java API should not be protected.
>
> When a developer uses JNI a number of things have to happen for that 
> code to be in a workable state. When writing a program, native code 
> has to be compiled - when executing, native libraries must be made 
> available. These things all require command line flags (e.g. 
> -Djava.library.path). There are, of course, frameworks that will 
> side-step the library loading requirement by embedding the desired 
> library somewhere in a dependency jar file, extract it into a temp 
> folder, and then load the library directly from there, so that these 
> libraries appear to work "seamlessly".
>
> With the foreign API, when writing code there's no requirement to use 
> GCC/clang. So that's one obstacle removed for the developer. And, when 
> executing, assuming you are only using system libraries (e.g. posix, 
> windows API), an application could just start calling native functions 
> seamlessly (w/o any kind of workaround, unlike in JNI).
>
> This means that it is now much easier for seemingly innocuous Java 
> code to behave in unpredictable ways, hence the flag.
>
>
>>
>>
>> The constructive part: what will be more helpful and much less 
>> burdensome? I
>> would say that much better alternative to a feature flag is just to 
>> record
>> all modules (or classes) which do native access, either by FFI or (in
>> future) by JNI. Then just include list of such modules or classes in 
>> crash
>> dump. Straightforward: the VM has crashed, but classes A, B or C 
>> might be
>> an issue since they used native access. Try to removing these classes 
>> first
>> before reporting any bugs. No deployment burden, no breaking of 
>> compatibilty
>> by introducing new restrictions and much more helpful crash dumps.
>
>
> I think a better way, which we have thought about (but have not got to 
> yet) is to think about native access as a permission that can be 
> granted to a module to other modules. Let's say you have an 
> application module App - and App depends on Foo and Bar, where Bar 
> requires native access. If the required information is captured in the 
> module system, then the application developer can simply grant native 
> access to App, and App will be responsible to transfer the native 
> access permission as required to the modules which requires it. I 
> think such a solution would address most of the issues with the 
> current flag (e.g. finding all the modules which need native access 
> and granting them explicitly) and retain the spirit of capturing 
> important permission information at the module-graph level (so that, 
> if an application is executed w/o enough permission, it would fail to 
> start early, rather than wait for the first unsafe method to be called).
>
> Another option we considered was to turn native access errors into 
> warnings if no --enable-native-access flag is specified (but retain 
> existing behavior if the flag is specified). This would allow for a 
> smoother migration, while allowing clients that want to be strict 
> about native access (see [2]) to still function in the way they do 
> today. To conclude, I think there are possible tweaks to make the flag 
> more useful w/o merely turning it into a debugging feature as you 
> propose.
>
> Maurizio
>
> [1] - 
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-September/015036.html
> [2] - 
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-December/015981.html
>