Getting a live view of environment variables (Gradle and JDK 9)

Thu May 11 10:50:44 UTC 2017

Hi Cedric,

On 11/05/2017 5:02 PM, Cédric Champeau wrote:
> Thanks for the answers, folks, but I think they are kind of missing the
> point. The fact is that environment variables *are* mutable. Java

Yes they are, provided you do it carefully and don't have to deal with 
the fact that its basically just a malloc'd char* array with no 
requirements for any kind of thread-safety. And given the JVM is 
multi-threaded almost as soon as it is launched we're not off to a great 
start in terms of safe and reliable access to a potentially concurrently 
mutable environment.

> happens to return an immutable view of the environment variables when
> the VM was started, which is not the reality. We cannot trust what
> `System.geteenv` returns. The Javadocs say "current system environment"
> but it's just not true. The method should be called
> `getEnvWhenTheVMWasStarted`.

It takes the snapshot on first call, but yes the docs are wrong.

> We also have the problem that the environment of the client and the
> daemon differ over time, as I explained in the previous email. And it is
> pretty easy to understand that _when the build runs_, we need to get the
> environment variables of the _client_, not the daemon. Imagine something
> as simple as this:
>
> String toolPath = System.getenv('SOMETOOL_HOME')
>
> and imagine that this code runs in the daemon. When the daemon is
> started, there's absolutely no guarantee that this variable is going to
> exist. However, we know that when we're going to execute the build, this
> variable *has* to be defined. Obviously, it's going to be done from the CLI:
>
> $ export SOMETOOL_HOME = ...
> $ ./gradlew run ---> connects to the daemon, passes environment
> variables, mutates those of the daemon, which then executes the build

Thanks for clarifying the scenario. So IIUC there is a set of env vars 
defined that provide the communication between the client and the 
daemon. The value of those vars in the client are passed to the daemon 
(via some normal IPC mechanism) and then JNI code on the daemon is used 
to update the daemon's process env so that if the Java getenv looked it 
would see the updated value.

> Similarly, from a *single* CLI/terminal, it's very common to change the
> values of environment variables (because, hey, they are mutable!):
> $ export SOMETOOL_HOME = another path I want to try out
> $ ./gradlew run --> ... must update env vars again
>
> So, to work around the problem that Java doesn't provide a way to mutate
> the current environment variables, we perform a JNI call to do it.
> *Then* we need to mutate the "immutable view" that Java provides, or all
> Java code is going to get wrong information, and we would never succeed.
> Note that having to resort on JNI to mutate the environment is not the
> problem. We can live with that. Once can think it's a bad idea to allow
> mutation, in practice, it is necessary, but I reckon it could be a bad
> idea to have an API for this. The *real* issue is that `System.getenv`
> does *not* return the real values, because it's an immutable view of
> what existed at the VM startup.

Understood.

> Note that it's not recommended to mutate environment variables in a
> multi-threaded environment. But in practice, you can assimilate what we
> do to the VM startup: we get environment variables, mutate them, _then_
> the build runs (and only at that moment we would effectively be
> multi-threaded). We can live with potential issues of mutating the
> environment: the benefits outperforms the drawbacks by orders of
> magnitude. When the daemon is activated, we're not talking about 10%
> faster builds. They can effectively be 3, 5 or 10x faster!
>
> Now, back to the problem with JDK 9:
>
> - first, the implementation has changed. But we could adapt our code.
> - second, calling `setAccessible` is not allowed anymore. And this is a
> MAJOR pita. The problem is that we're trying to access the java base
> module, and reflection on that module is not allowed anymore. There are

I was going to suggest --permit-illegal-access but from what you write 
below it isn't feasible to manage this through VM startup options. I 
don't know whether there is a programmatic equivalent to the "big kill 
switch" - probably not given it will only be there for 9.

> no good solutions for this: we could write a JVM agent, like you
> suggested, that rewrites bytecode. But since we're trying to rewrite a
> core class, it would have to be found on the invocation of `java`
> command itself, and cannot be dynamically loaded. In addition, we would
> have to add a flag to open core classes to rewrite. There are multiple
> problems with this approach: first, we don't know on which environment
> we run before we're started. Gradle can run on JDK 7, 8, 9, ... and we
> cannot have a startup script which tries to infer the version from
> whatever it finds, this is not realistic and would add unacceptable
> latency on the client side (plus, a lot of headaches to support the
> various environments we support: Linux, Mac, Windows, ...). Second, we
> may not have a hand on the CLI at all. For example, if the build runs in
> Travis CI, Amazon, or whatever cloudish thing that spawns its own
> containers, we cannot tweak the VM arguments. We just have to use
> whatever is there. Last, rewriting bytecode has a cost, that you pay at
> startup.
>
> Again, what we need is that Java just returns the live view of the
> environment variables. Allowing *mutation* is not necessary, technically
> speaking, because we can rely on JNI. However, I reckon that returning
> an immutable view is done for performance reasons. That's why adding a
> way to mutate "the view" would be an acceptable thing I think. No
> reflection, no bytecode rewriting, just give a way to mutate the map
> that `ProcessEnvironment` retains. The advantage of this is that you
> would not effectively mutate the environment variables, so you, on the
> JVM side, would be safe. What you would mutate is the view you are
> retaining. Alternatively, provide us with an API to "refresh" the view.
> Like, `System.getenv(true)`. The benefit would be that you would only
> have to get the new view of environment variables if the user asks for
> it. And, subsequent callers would get the refreshed view, so people
> using `gettenv` in their code would be up-to-date.

That may be feasible. I like the idea of getenv(true) to request a 
refresh of the underlying data structure.

> I'm really concerned that this problem is not taken seriously. It's a
> major performance problem for the most widely used build tool out there
> (considering all users, from native to Java and Android). Just saying
> "you're doing something nasty" is not a valid answer: we have to do it,
> and do it for good reasons.

It isn't a matter of not taking the problem seriously. You've raised an 
issue for discussion and it is being discussed, but it is very late in 
the JDK 9 end game to try and add new APIs. This issue has been known 
for a long time and a Gradle issue was filed last September [1]. There 
was some discussion on jigsaw-dev [2] last October, and Alan even 
suggested back then to raise the "setenv" issue on core-libs-dev.

Cheers,
David

[1] https://issues.gradle.org/browse/GRADLE-3565
[2] 
http://mail.openjdk.java.net/pipermail/jigsaw-dev/2016-October/009551.html
>
>
> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <martinrb at google.com
> <mailto:martinrb at google.com>>:
>
>     Since you're OK with doing brain surgery on the JDK and you control
>     the entire process, consider controlling your daemon with a bytecode
>     rewriting agent (e.g. byteman) that changes the definition of
>     System.getenv etc.
>
>     (Whose side are you on Martin?! ...  I confess I also wish for a
>     faster gradle ...)
>
>