Getting a live view of environment variables (Gradle and JDK 9)

Tue May 16 17:08:01 UTC 2017

2017-05-16 18:57 GMT+02:00 Sanne Grinovero <sanne at redhat.com>:

> Hi Cédric,
>
> we use Gradle a lot in our team (Hibernate) and all your efforts to
> make it faster are highly appreciated.
>
> I can assure you that during a normal day of work while we might
> rebuild a project a few hundred times, I'll rarely have to change
> environment variables.
>
> Obviously I don't represent all users, but as you said yourself the
> need to change environment variables is both something you have to
> support and something which is extremely rare.

So can't you simply keep the daemon around, as long as environment
> variables don't change?
>

That's one of the suggestions in the original email, but as you can see,
it's not uncommon to have environment variables mutated "behind you back",
even in a simple command line. Sometimes, it's as easy as having 2
different terminals open (which I think is far from being a rare case):
then you cannot share the same daemon instance, which is either very bad
for performance or memory usage. Why? Because the open terminal has some
specific variables (WINDOWID, ...).

>
> For the rare case in which environment variables have been mutated,
> you could trigger a full restart.
>
> As a second step, I should mention that on a typical workstation I
> might have multiple terminals open, each configured with a different
> environment and possibly working on a different project or different
> branch - or just testing on a different environment.
> It would be even nicer if one could have a different Gradle daemon
> instances running for each environment; incidentally this might have
> other benefits, like we often need to switch JDK build or even vendor.
> I'd be happy to be able to give them some name as identifier.
>
>
That's something we consider, but it's not the same use case at all (my
typical setup is pretty much the opposite: a lot of terminals open for the
same environment/build), so it has to be supported differently. I don't
think we should mix this in the game.

> I think this solution would improve Gradle and allow the JDK to move
> on with this quite nice cleanup.
>
> HTH
>
> Thanks,
> Sanne
>
>
> On Tue, May 16, 2017 at 11:05 AM, Cédric Champeau
> <cedric.champeau at gmail.com> wrote:
> > Thanks Peter for your thoughts, but I don't think it's as simple as that.
> > As I explained in my original email, there are multiple ways the
> > environment variables can be mutated, and it can be done _externally_.
> > Typically, during a task execution, a JNI call performed by a random
> native
> > tool could mutate the environment. That's something we, as a build tool,
> > have to consider as a use case. It's very unlikely but it can happen.
> This
> > means that for the same classloader, the environment can change. And for
> > performance reasons, we cache classloaders between builds, and reuse them
> > as much as possible (because classloading is far from being cheap).
> >
> > 2017-05-14 19:51 GMT+02:00 Peter Levart <peter.levart at gmail.com>:
> >
> >> Hi Cedric,
> >>
> >> Chiming in with my thoughts...
> >>
> >> It's unfortunate that Gradle plugins or libraries used by plugins use
> >> environment variables at all. I wonder who was the first? Did Gradle
> >> introduce the feature of passing client environment to the daemon and
> >> establishing the view of System.getenv for plugins 1st or did libraries
> >> used by plugins happen to use environment variables using System.getenv
> and
> >> Gradle was just kind enough to provide them a dynamic view controlled by
> >> client?
> >>
> >> In the end it doesn't matter. The fact is that System.getenv is part of
> >> standard Java API and anyone using it should be aware that by doing so,
> >> they are limiting their software to be (re)configured only by spawning
> new
> >> process(es). UNIX environment was not designed to be mutated during the
> >> course of a long-running process. Maybe just at process startup/setup
> when
> >> conditions are under control (i.e. a single running thread) but later,
> the
> >> environment is meant to be read only.
> >>
> >> Maybe there is a solution for Gradle and othert though. But that
> solution,
> >> I think, is not in exposing a "live" view of process environment through
> >> System.getenv or new methods to "refresh" the view as you are proposing,
> >> since that would only encourage people to mutate the process environment
> >> which, as was established on this thread, is not safe in a long-running
> >> multi-threaded process, which Java processes usually are. Perhaps the
> >> solution is in extending the System.getenv API with ways to provide
> >> "custom" views of variables for code that runs in a "container".
> >>
> >> Gradle is, among other things, a container for plugins. Is Gradle
> running
> >> plugins in a separate ClassLoader? Does it use a separate ClassLoader
> for
> >> each "build" which might bring with it a new set of environment
> variables
> >> from the client? In such a setup, one could provide a separate set of
> >> environment variables for each ClassLoader, simply by passing them to
> the
> >> ClassLoader constructor. System.getenv would need to be a
> @CallerSensitive
> >> method which would return caller's ClassLoader view of variables, if any
> >> such view was established, or simply the view inherited from the parent
> >> ClassLoader.
> >>
> >> Such would be a "functional" approach, which would keep environment
> >> variables immutable, but allow child "contexts" to have different views
> of
> >> them. Such approach would also play well with idioms like:
> >>
> >> class AbcPlugin {
> >>     static final String SOME_SETTING = System.getenv("SOME_SETTING");
> >>
> >> ...and would also help other containers (such as app servers) support
> >> existing libraries that use environment variables to be configured
> >> differently in different apps deployed in the same VM process.
> >>
> >> Just a thought.
> >>
> >> Regards, Peter
> >>
> >>
> >>
> >> On 05/11/2017 09:02 AM, Cédric Champeau wrote:
> >>
> >> Thanks for the answers, folks, but I think they are kind of missing the
> >> point. The fact is that environment variables *are* mutable. Java
> happens
> >> to return an immutable view of the environment variables when the VM was
> >> started, which is not the reality. We cannot trust what `System.geteenv`
> >> returns. The Javadocs say "current system environment" but it's just not
> >> true. The method should be called `getEnvWhenTheVMWasStarted`.
> >>
> >> We also have the problem that the environment of the client and the
> daemon
> >> differ over time, as I explained in the previous email. And it is pretty
> >> easy to understand that _when the build runs_, we need to get the
> >> environment variables of the _client_, not the daemon. Imagine
> something as
> >> simple as this:
> >>
> >> String toolPath = System.getenv('SOMETOOL_HOME')
> >>
> >> and imagine that this code runs in the daemon. When the daemon is
> started,
> >> there's absolutely no guarantee that this variable is going to exist.
> >> However, we know that when we're going to execute the build, this
> variable
> >> *has* to be defined. Obviously, it's going to be done from the CLI:
> >>
> >> $ export SOMETOOL_HOME = ...
> >> $ ./gradlew run ---> connects to the daemon, passes environment
> variables,
> >> mutates those of the daemon, which then executes the build
> >>
> >> Similarly, from a *single* CLI/terminal, it's very common to change the
> >> values of environment variables (because, hey, they are mutable!):
> >>
> >> $ export SOMETOOL_HOME = another path I want to try out
> >> $ ./gradlew run --> ... must update env vars again
> >>
> >> So, to work around the problem that Java doesn't provide a way to mutate
> >> the current environment variables, we perform a JNI call to do it.
> *Then*
> >> we need to mutate the "immutable view" that Java provides, or all Java
> code
> >> is going to get wrong information, and we would never succeed. Note that
> >> having to resort on JNI to mutate the environment is not the problem. We
> >> can live with that. Once can think it's a bad idea to allow mutation, in
> >> practice, it is necessary, but I reckon it could be a bad idea to have
> an
> >> API for this. The *real* issue is that `System.getenv` does *not* return
> >> the real values, because it's an immutable view of what existed at the
> VM
> >> startup.
> >>
> >> Note that it's not recommended to mutate environment variables in a
> >> multi-threaded environment. But in practice, you can assimilate what we
> do
> >> to the VM startup: we get environment variables, mutate them, _then_ the
> >> build runs (and only at that moment we would effectively be
> >> multi-threaded). We can live with potential issues of mutating the
> >> environment: the benefits outperforms the drawbacks by orders of
> magnitude.
> >> When the daemon is activated, we're not talking about 10% faster builds.
> >> They can effectively be 3, 5 or 10x faster!
> >>
> >> Now, back to the problem with JDK 9:
> >>
> >> - first, the implementation has changed. But we could adapt our code.
> >> - second, calling `setAccessible` is not allowed anymore. And this is a
> >> MAJOR pita. The problem is that we're trying to access the java base
> >> module, and reflection on that module is not allowed anymore. There are
> no
> >> good solutions for this: we could write a JVM agent, like you suggested,
> >> that rewrites bytecode. But since we're trying to rewrite a core class,
> it
> >> would have to be found on the invocation of `java` command itself, and
> >> cannot be dynamically loaded. In addition, we would have to add a flag
> to
> >> open core classes to rewrite. There are multiple problems with this
> >> approach: first, we don't know on which environment we run before we're
> >> started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a startup
> >> script which tries to infer the version from whatever it finds, this is
> not
> >> realistic and would add unacceptable latency on the client side (plus, a
> >> lot of headaches to support the various environments we support: Linux,
> >> Mac, Windows, ...). Second, we may not have a hand on the CLI at all.
> For
> >> example, if the build runs in Travis CI, Amazon, or whatever cloudish
> thing
> >> that spawns its own containers, we cannot tweak the VM arguments. We
> just
> >> have to use whatever is there. Last, rewriting bytecode has a cost, that
> >> you pay at startup.
> >>
> >> Again, what we need is that Java just returns the live view of the
> >> environment variables. Allowing *mutation* is not necessary, technically
> >> speaking, because we can rely on JNI. However, I reckon that returning
> an
> >> immutable view is done for performance reasons. That's why adding a way
> to
> >> mutate "the view" would be an acceptable thing I think. No reflection,
> no
> >> bytecode rewriting, just give a way to mutate the map that
> >> `ProcessEnvironment` retains. The advantage of this is that you would
> not
> >> effectively mutate the environment variables, so you, on the JVM side,
> >> would be safe. What you would mutate is the view you are retaining.
> >> Alternatively, provide us with an API to "refresh" the view. Like,
> >> `System.getenv(true)`. The benefit would be that you would only have to
> get
> >> the new view of environment variables if the user asks for it. And,
> >> subsequent callers would get the refreshed view, so people using
> `gettenv`
> >> in their code would be up-to-date.
> >>
> >> I'm really concerned that this problem is not taken seriously. It's a
> major
> >> performance problem for the most widely used build tool out there
> >> (considering all users, from native to Java and Android). Just saying
> >> "you're doing something nasty" is not a valid answer: we have to do it,
> and
> >> do it for good reasons.
> >>
> >>
> >> 2017-05-11 4:50 GMT+02:00 Martin Buchholz <martinrb at google.com> <
> martinrb at google.com>:
> >>
> >>
> >> Since you're OK with doing brain surgery on the JDK and you control the
> >> entire process, consider controlling your daemon with a bytecode
> rewriting
> >> agent (e.g. byteman) that changes the definition of System.getenv etc.
> >>
> >> (Whose side are you on Martin?! ...  I confess I also wish for a faster
> >> gradle ...)
> >>
> >>
> >>
> >>
>