Getting a live view of environment variables (Gradle and JDK 9)

Tue May 16 19:57:41 UTC 2017

I don't deny some people do not have the problem :)

2017-05-16 21:56 GMT+02:00 Uwe Schindler <uschindler at apache.org>:

> Hi,
>
> Another example is not having your problem is Jenkins CI: It maintains a
> map of environment variables throughout the build. And you have to use that
> a a state container throughout the build. I have never seen a plug-in not
> using that, because it's fundamental to the whole build server.
>
> If you don't have that, then you have a fundamental API design problem! I
> don't think Java should take care of that. Backwards compatibility for some
> broken plugins is a no-go. You have to communicate that to developers. I'd
> run the build in a security manager and whenever somebody calls getenv()
> print a warning and in later version fail completely. Elasticsearch is
> doing similar stuff for their plugins.
>
> Uwe
>
>
> Am 16. Mai 2017 20:11:10 MESZ schrieb "Cédric Champeau" <
> cedric.champeau at gmail.com>:
>>
>> Hi Uwe,
>>
>> I already explained multiple times why we need this. Short answer:
>> because we must. Slightly longer answer: because the build environment, the
>> daemon, has to reflect the environment from the CLI (or the IDE, in case we
>> call using the tooling API), at the moment the build is executed. Why?
>> Because a user wouldn't understand that a script which does:
>>
>> println System.getenv('MY_VAR')
>>
>> doesn't print "foo" after doing:
>>
>> MY_VAR=foo gradle printVar
>>
>> (that's a silly example, but the simplest we can come with). Not
>> everything runs in a separate process: there are things that execute in the
>> daemon itself. A lot of things (starting from build scripts). And yes, we
>> can provide a different API to get environment variables, but:
>>
>> 1. it's a breaking change
>> 2. there are lots of plugins which use libraries, which do NOT depend on
>> the Gradle API, so that API wouldn't be available
>>
>> I'm honestly starting to get tired of explaining again and again WHY we
>> need this. If it was easy, we would have done it differently for years. We
>> worked around a bug in the JDK, which doesn't return the true environment
>> variables but the ones snapshotted when the process started. Now in JDK 9
>> we cannot workaround anymore, which either just kills Gradle performance or
>> forces us to write even nastier code with bytecode manipulating agents to
>> replace what `System.getenv` does.
>>
>> 2017-05-16 19:46 GMT+02:00 Uwe Schindler <uschindler at apache.org>:
>>
>>> Hi,
>>>
>>> I still don't understand why you need to change the environment
>>> variables of the actual process. I was talking with Rémi about that last
>>> week, and it's not obvious to me why you need that. Sure, it is easier to
>>> change the environment of the actual process and then spawn a child process
>>> for some non-java build tool like GCC or even another standalone java
>>> program with option fork=true. But: Why not use the ProcessBuilder API when
>>> spawning those childs? So you just add an "official" build API inside
>>> Gradle (an official one, documented that allows Gradle plugins to modify
>>> the environment variables for the current build running). If you missed to
>>> add such an API from the beginning (IMHO its essential for a build system
>>> like Gradle), then you now have to tell your users: "Sorry we did something
>>> wrong and all our (bad) hacks to allow you to change environment variables
>>> are no longer working in the future, so please change your code. We are so
>>> sorry!"
>>>
>>> If some plugin is not using that new API, then it's not your problem.
>>> You document that you *have* to use the Environment API, because plugins
>>> cannot rely on the fact that another plugin or maybe another build running
>>> at a later time is changing the Gradle process environment.
>>>
>>> At some point you have to break backwards compatibility and tell users:
>>> Please update your code, otherwise plugin won't work anymore with Gradle
>>> version x.y (the one that's compatible to Java 9).
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> uschindler at apache.org
>>> ASF Member, Apache Lucene PMC / Committer
>>> Bremen, Germany
>>> http://lucene.apache.org/
>>>
>>> > -----Original Message-----
>>> > From: core-libs-dev [mailto:core-libs-dev-bounces at openjdk.java.net] On
>>> > Behalf Of Cédric Champeau
>>> > Sent: Thursday, May 11, 2017 9:02 AM
>>> > To: Martin Buchholz <martinrb at google.com>
>>> > Cc: core-libs-dev <core-libs-dev at openjdk.java.net>
>>> > Subject: Re: Getting a live view of environment variables (Gradle and
>>> JDK 9)
>>> >
>>> > Thanks for the answers, folks, but I think they are kind of missing the
>>> > point. The fact is that environment variables *are* mutable. Java
>>> happens
>>> > to return an immutable view of the environment variables when the VM
>>> was
>>> > started, which is not the reality. We cannot trust what
>>> `System.geteenv`
>>> > returns. The Javadocs say "current system environment" but it's just
>>> not
>>> > true. The method should be called `getEnvWhenTheVMWasStarted`.
>>> >
>>> > We also have the problem that the environment of the client and the
>>> > daemon
>>> > differ over time, as I explained in the previous email. And it is
>>> pretty
>>> > easy to understand that _when the build runs_, we need to get the
>>> > environment variables of the _client_, not the daemon. Imagine
>>> something
>>> > as
>>> > simple as this:
>>> >
>>> > String toolPath = System.getenv('SOMETOOL_HOME')
>>> >
>>> > and imagine that this code runs in the daemon. When the daemon is
>>> started,
>>> > there's absolutely no guarantee that this variable is going to exist.
>>> > However, we know that when we're going to execute the build, this
>>> variable
>>> > *has* to be defined. Obviously, it's going to be done from the CLI:
>>> >
>>> > $ export SOMETOOL_HOME = ...
>>> > $ ./gradlew run ---> connects to the daemon, passes environment
>>> variables,
>>> > mutates those of the daemon, which then executes the build
>>> >
>>> > Similarly, from a *single* CLI/terminal, it's very common to change the
>>> > values of environment variables (because, hey, they are mutable!):
>>> >
>>> > $ export SOMETOOL_HOME = another path I want to try out
>>> > $ ./gradlew run --> ... must update env vars again
>>> >
>>> > So, to work around the problem that Java doesn't provide a way to
>>> mutate
>>> > the current environment variables, we perform a JNI call to do it.
>>> *Then*
>>> > we need to mutate the "immutable view" that Java provides, or all Java
>>> code
>>> > is going to get wrong information, and we would never succeed. Note
>>> that
>>> > having to resort on JNI to mutate the environment is not the problem.
>>> We
>>> > can live with that. Once can think it's a bad idea to allow mutation,
>>> in
>>> > practice, it is necessary, but I reckon it could be a bad idea to have
>>> an
>>> > API for this. The *real* issue is that `System.getenv` does *not*
>>> return
>>> > the real values, because it's an immutable view of what existed at the
>>> VM
>>> > startup.
>>> >
>>> > Note that it's not recommended to mutate environment variables in a
>>> > multi-threaded environment. But in practice, you can assimilate what
>>> we do
>>> > to the VM startup: we get environment variables, mutate them, _then_
>>> the
>>> > build runs (and only at that moment we would effectively be
>>> > multi-threaded). We can live with potential issues of mutating the
>>> > environment: the benefits outperforms the drawbacks by orders of
>>> > magnitude.
>>> > When the daemon is activated, we're not talking about 10% faster
>>> builds.
>>> > They can effectively be 3, 5 or 10x faster!
>>> >
>>> > Now, back to the problem with JDK 9:
>>> >
>>> > - first, the implementation has changed. But we could adapt our code.
>>> > - second, calling `setAccessible` is not allowed anymore. And this is a
>>> > MAJOR pita. The problem is that we're trying to access the java base
>>> > module, and reflection on that module is not allowed anymore. There
>>> are no
>>> > good solutions for this: we could write a JVM agent, like you
>>> suggested,
>>> > that rewrites bytecode. But since we're trying to rewrite a core
>>> class, it
>>> > would have to be found on the invocation of `java` command itself, and
>>> > cannot be dynamically loaded. In addition, we would have to add a flag
>>> to
>>> > open core classes to rewrite. There are multiple problems with this
>>> > approach: first, we don't know on which environment we run before we're
>>> > started. Gradle can run on JDK 7, 8, 9, ... and we cannot have a
>>> startup
>>> > script which tries to infer the version from whatever it finds, this
>>> is not
>>> > realistic and would add unacceptable latency on the client side (plus,
>>> a
>>> > lot of headaches to support the various environments we support: Linux,
>>> > Mac, Windows, ...). Second, we may not have a hand on the CLI at all.
>>> For
>>> > example, if the build runs in Travis CI, Amazon, or whatever cloudish
>>> thing
>>> > that spawns its own containers, we cannot tweak the VM arguments. We
>>> just
>>> > have to use whatever is there. Last, rewriting bytecode has a cost,
>>> that
>>> > you pay at startup.
>>> >
>>> > Again, what we need is that Java just returns the live view of the
>>> > environment variables. Allowing *mutation* is not necessary,
>>> technically
>>> > speaking, because we can rely on JNI. However, I reckon that returning
>>> an
>>> > immutable view is done for performance reasons. That's why adding a way
>>> > to
>>> > mutate "the view" would be an acceptable thing I think. No reflection,
>>> no
>>> > bytecode rewriting, just give a way to mutate the map that
>>> > `ProcessEnvironment` retains. The advantage of this is that you would
>>> not
>>> > effectively mutate the environment variables, so you, on the JVM side,
>>> > would be safe. What you would mutate is the view you are retaining.
>>> > Alternatively, provide us with an API to "refresh" the view. Like,
>>> > `System.getenv(true)`. The benefit would be that you would only have
>>> to get
>>> > the new view of environment variables if the user asks for it. And,
>>> > subsequent callers would get the refreshed view, so people using
>>> `gettenv`
>>> > in their code would be up-to-date.
>>> >
>>> > I'm really concerned that this problem is not taken seriously. It's a
>>> major
>>> > performance problem for the most widely used build tool out there
>>> > (considering all users, from native to Java and Android). Just saying
>>> > "you're doing something nasty" is not a valid answer: we have to do
>>> it, and
>>> > do it for good reasons.
>>> >
>>> >
>>> > 2017-05-11 4:50 GMT+02:00 Martin Buchholz <martinrb at google.com>:
>>> >
>>> > > Since you're OK with doing brain surgery on the JDK and you control
>>> the
>>> > > entire process, consider controlling your daemon with a bytecode
>>> rewriting
>>> > > agent (e.g. byteman) that changes the definition of System.getenv
>>> etc.
>>> > >
>>> > > (Whose side are you on Martin?! ...  I confess I also wish for a
>>> faster
>>> > > gradle ...)
>>> > >
>>>
>>>
>>