Getting a live view of environment variables (Gradle and JDK 9)

Thu May 11 07:02:09 UTC 2017

Thanks for the answers, folks, but I think they are kind of missing the
point. The fact is that environment variables *are* mutable. Java happens
to return an immutable view of the environment variables when the VM was
started, which is not the reality. We cannot trust what `System.geteenv`
returns. The Javadocs say "current system environment" but it's just not
true. The method should be called `getEnvWhenTheVMWasStarted`.

We also have the problem that the environment of the client and the daemon
differ over time, as I explained in the previous email. And it is pretty
easy to understand that _when the build runs_, we need to get the
environment variables of the _client_, not the daemon. Imagine something as
simple as this:

String toolPath = System.getenv('SOMETOOL_HOME')

and imagine that this code runs in the daemon. When the daemon is started,
there's absolutely no guarantee that this variable is going to exist.
However, we know that when we're going to execute the build, this variable
*has* to be defined. Obviously, it's going to be done from the CLI:

$ export SOMETOOL_HOME = ...
$ ./gradlew run ---> connects to the daemon, passes environment variables,
mutates those of the daemon, which then executes the build

Similarly, from a *single* CLI/terminal, it's very common to change the
values of environment variables (because, hey, they are mutable!):

$ export SOMETOOL_HOME = another path I want to try out
$ ./gradlew run --> ... must update env vars again

So, to work around the problem that Java doesn't provide a way to mutate
the current environment variables, we perform a JNI call to do it. *Then*
we need to mutate the "immutable view" that Java provides, or all Java code
is going to get wrong information, and we would never succeed. Note that
having to resort on JNI to mutate the environment is not the problem. We
can live with that. Once can think it's a bad idea to allow mutation, in
practice, it is necessary, but I reckon it could be a bad idea to have an
API for this. The *real* issue is that `System.getenv` does *not* return
the real values, because it's an immutable view of what existed at the VM
startup.

Note that it's not recommended to mutate environment variables in a
multi-threaded environment. But in practice, you can assimilate what we do
to the VM startup: we get environment variables, mutate them, _then_ the
build runs (and only at that moment we would effectively be
multi-threaded). We can live with potential issues of mutating the
environment: the benefits outperforms the drawbacks by orders of magnitude.
When the daemon is activated, we're not talking about 10% faster builds.
They can effectively be 3, 5 or 10x faster!

Now, back to the problem with JDK 9:

- first, the implementation has changed. But we could adapt our code.
- second, calling `setAccessible` is not allowed anymore. And this is a
MAJOR pita. The problem is that we're trying to access the java base
module, and reflection on that module is not allowed anymore. There are no
good solutions for this: we could write a JVM agent, like you suggested,
that rewrites bytecode. But since we're trying to rewrite a core class, it
would have to be found on the invocation of `java` command itself, and
cannot be dynamically loaded. In addition, we would have to add a flag to
open core classes to rewrite. There are multiple problems with this
approach: first, we don't know on which environment we run before we're
started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
script which tries to infer the version from whatever it finds, this is not
realistic and would add unacceptable latency on the client side (plus, a
lot of headaches to support the various environments we support: Linux,
Mac, Windows, ...). Second, we may not have a hand on the CLI at all. For
example, if the build runs in Travis CI, Amazon, or whatever cloudish thing
that spawns its own containers, we cannot tweak the VM arguments. We just
have to use whatever is there. Last, rewriting bytecode has a cost, that
you pay at startup.

Again, what we need is that Java just returns the live view of the
environment variables. Allowing *mutation* is not necessary, technically
speaking, because we can rely on JNI. However, I reckon that returning an
immutable view is done for performance reasons. That's why adding a way to
mutate "the view" would be an acceptable thing I think. No reflection, no
bytecode rewriting, just give a way to mutate the map that
`ProcessEnvironment` retains. The advantage of this is that you would not
effectively mutate the environment variables, so you, on the JVM side,
would be safe. What you would mutate is the view you are retaining.
Alternatively, provide us with an API to "refresh" the view. Like,
`System.getenv(true)`. The benefit would be that you would only have to get
the new view of environment variables if the user asks for it. And,
subsequent callers would get the refreshed view, so people using `gettenv`
in their code would be up-to-date.

I'm really concerned that this problem is not taken seriously. It's a major
performance problem for the most widely used build tool out there
(considering all users, from native to Java and Android). Just saying
"you're doing something nasty" is not a valid answer: we have to do it, and
do it for good reasons.

2017-05-11 4:50 GMT+02:00 Martin Buchholz <martinrb at google.com>:

> Since you're OK with doing brain surgery on the JDK and you control the
> entire process, consider controlling your daemon with a bytecode rewriting
> agent (e.g. byteman) that changes the definition of System.getenv etc.
>
> (Whose side are you on Martin?! ...  I confess I also wish for a faster
> gradle ...)
>