Class#getResource returns null in JDK9 b140 if security manager is enabled (was: RE: [JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+140) - Build # 18064 - Unstable!)
Wang Weijun
weijun.wang at oracle.com
Tue Oct 18 02:31:17 UTC 2016
I do some investigation, looks like a method in randomizedtesting-runner-2.3.4.jar is trying to call a method in morfologik-polish-2.1.0.jar and this 2nd method uses getResource() to read something inside this 2nd jar.
This will fail because the 1st jar's ProtectionDomain is not granted the permission to read the 2nd jar. And yes, this means the 2nd jar should call getResource() in a doPrivileged block. Otherwise you need to grant that permission to every caller on the stack. This is not a bug.
This can be demonstrated with a simple example:
class T5 {
public static void main(String[] args) throws Exception {
go(args[0]);
}
public static void go(String arg) throws Exception {
System.out.println(new T5().getClass().getResource(arg));
}
}
class T6 {
public static void main(String[] args) throws Exception {
T5.go(args[0]);
}
}
Now you pack T5.class into 5.jar and T6.class into 6.jar.
5.jar is automatically granted the permission to read itself:
$ java -Djava.security.manager -cp 5.jar:6.jar T5 T5.class
jar:file:/Volumes/ServerHD/old/work/space/nb/build/classes/5.jar!/T5.class
But not 6.jar:
$ java -Djava.security.manager -cp 5.jar:6.jar T6 T5.class
null
Thanks
Max
> On Oct 18, 2016, at 1:44 AM, Uwe Schindler <uschindler at apache.org> wrote:
>
> Hi,
>
> we already had off-list contact, initiated by Rory O'Donnel - thanks!
>
> The issue was indeed caused by symlinks. The issue here: The Jenkins server where the tests are running had a home directory that was symlinked somewhere else. All file paths during tests runs and also JAR files had the "correct" (canonical path). But the homedir was defined with the alternate, symlinked "old" path in /etc/passwd. Effect was that the test's policy file referring to the IVY cache in ~/.ivy/cache for loading JAR files used the path extracted from ${user.home}. Of course this broke.
>
> I am sure this will hit many people, so I have some suggestions how to solve this: In short, when parsing policy files and FilePermissions inside, just "expand" the symlink to be canonic and add *both* (the symlink and the canonic path) as 2 separate FilePermissions to the collection. This spares the runtime check on every file access but still catches all "known" paths.
>
> In addition, there is also a "bug" in the security permissions system that made the above extra permission needed. We have some third party JARs, that use Class#getResource() to load their own resources. But as getResource does not document any security exceptions or other implications, almost all code out there does not wrap with doPrivileged(). This very old bug required the extra permission to the lib/ folder. The workaround just broke.
>
> Here is what I wrote in the private discussion:
>
> --snip--
>> Yes, this is where the problem is.
>>
>> So it looks like the permission is granted in a policy file instead of being
>> granted by the class loader itself. In this case, the path of the permission
>> must match how you access that file.
>
> Yes. I think the problem is that the 3rd party JAR file does not have a doPrivileged block around the getResource/openStream part, so ist running with the permissions of the calling code (a different JAR file - the test runner). IMHO, this is one of the really strange things of the security model in Java and most people do it wrong. Especially it is not clear from Class#getResource that this can be affected by any security policy! It does not even throw a declared SecurityException (because it is swallowed).
>
> We have the extra path in our policy file for exactly that reason (to work around issues in 3rd party JARs) that don't wrap this into doPrivileged!
>
>> I'll think about your suggestion. However, can you guarantee the code always
>> accesses the file using the canonicalized path? For example, suppose the
>> actual file is /x, both /a and /b symlink to it, and you grant the permission on
>> /a in a policy file. Even if I canonicalize /a to /x and grant permissions on
>> both /a and /x, you will still see a failure if the code access /b.
>
> I am coming from the full text search engine people. I see the issue that you have with getting the canonical name on every file access (this slows down!). The approach the full text people use is to make "synonyms" of terms and index all of them. Somebody searching for the term will find it under any name. To be ported over to your issue: Instead of doing the canonicalized check on every access, just put *both* known file names into the "search index" (in your case policy file). Means: When parsing the policy file, create 2 file permissions: One as given in the policy and an additional one with the canonical name. This does not solve all problems, but helps around issues like the one we encountered.
>
> I changed the setup of the Jenkins machine that hit this issue first to not have a symlinked entry in /etc/passwd - instead I placed the real path there - so ${user.home} is right. I will see if the issues are gone. Nevertheless I have just brought this into your attention, so we can figure out what could get wrong on people's systems after this change. I will also figure out with my colleagues how to solve the permission checks in 3rd party jars - especially as they did not do anything wrong - why should one wrap Class#getResource() with doPrivileged?!
> --snip--
>
> Maybe we can discuss the ideas on the public mailing list. Was quite hard to figure out (with lots of debugging output) until I discovered the problem.
>
> Uwe
>
> -----
> Uwe Schindler
> uschindler at apache.org
> ASF Member, Apache Lucene PMC / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>> -----Original Message-----
>> From: Sean Mullan [mailto:sean.mullan at oracle.com]
>> Sent: Monday, October 17, 2016 7:33 PM
>> To: Uwe Schindler <uschindler at apache.org>; dev at lucene.apache.org
>> Cc: 'jdk9-dev' <jdk9-dev at openjdk.java.net>; 'Dawid Weiss'
>> <dawid.weiss at cs.put.poznan.pl>; balchandra.vaidya at oracle.com
>> Subject: Re: Class#getResource returns null in JDK9 b140 if security manager
>> is enabled (was: RE: [JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-
>> ea+140) - Build # 18064 - Unstable!)
>>
>> Weijun Wang is the best person to respond as he is the RE of JDK-8164705
>> - right now it is the middle of the night for him, but I would expect a
>> response from him once he comes online.
>>
>> --Sean
>>
>> On 10/16/2016 06:09 PM, Uwe Schindler wrote:
>>> Hi again,
>>>
>>> with jdk.io.permissionsUseCanonicalPath=true it also works, so it is related
>> to the new FilePermission code, so my first guess was true, the issue is JDK-
>> 8164705.
>>>
>>> Uwe
>>>
>>>> (I cc'ed jdk-dev at openjdk, reader there please read the previous mails
>>>> below, too).
>>>>
>>>> I analyzed the problem, although I don't know exactly why it happens:
>>>> - On Windows it does not happen on my machine (no idea why!)
>>>> - On Linux it happens when tests are running with security manager (this
>> is
>>>> the default for Lucene and Jenkins does this)
>>>> - On Linux it does not happen if I run Lucene tests with "-
>>>> Dtests.useSecurityManager=false"
>>>>
>>>> This makes me think it is related to this: "Remove pathname
>> canonicalization
>>>> from FilePermission" (https://bugs.openjdk.java.net/browse/JDK-
>> 8164705)
>>>>
>>>> What seems to happen: The code calls Class.getResource to get back an
>> URL.
>>>> As the JAR file is somehow outside of the FilePermissions given to the test
>>>> suite, it seems to fail. Maybe because some of the checks failed,
>>>> Class.getResource then returns a null reference, because it was not able
>> to
>>>> access the JAR file.
>>>>
>>>> Were there some changes related to this: URLClassLoader and
>> FilePermission
>>>> checks?
>>>>
>>>> How should we proceed?
>>>>
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> uschindler at apache.org
>>>> ASF Member, Apache Lucene PMC / Committer
>>>> Bremen, Germany
>>>> http://lucene.apache.org/
>>>>
>>>>> -----Original Message-----
>>>>> From: Uwe Schindler [mailto:uwe at thetaphi.de]
>>>>> Sent: Sunday, October 16, 2016 10:10 PM
>>>>> To: dev at lucene.apache.org
>>>>> Cc: dalibor.topic at oracle.com; balchandra.vaidya at oracle.com; 'Muneer
>>>>> Kolarkunnu' <abdul.kolarkunnu at oracle.com>; 'Dawid Weiss'
>>>>> <dawid.weiss at cs.put.poznan.pl>
>>>>> Subject: RE: [JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+140)
>> -
>>>>> Build # 18064 - Unstable!
>>>>>
>>>>> Hi,
>>>>>
>>>>> I reverted the Lucene builds to build Java 9 138 for now. I will later check
>> if
>>>>> this also happens with build 139, which I have to download first. I will
>> also
>>>>> debug locally.
>>>>>
>>>>> The code fails because this code hits "null" on getResource() at
>>>>>
>> morfologik.stemming.polish.PolishStemmer.<init>(PolishStemmer.java:34)
>>>>>
>>>>> https://github.com/morfologik/morfologik-
>>>>> stemming/blob/master/morfologik-
>>>>>
>>>>
>> polish/src/main/java/morfologik/stemming/polish/PolishStemmer.java#L32
>>>>>
>>>>> This is impossible to happen, because the dict file is in same package. I
>>>> have
>>>>> no idea why this only fails here and not at other places in Lucene. The
>> main
>>>>> difference looks like the use of URL instead of getResourceAsStream()
>> like
>>>>> other places in Lucene.
>>>>>
>>>>> So this seems to be a major regression in Java 9 build 140.
>>>>>
>>>>> Uwe
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>> http://www.thetaphi.de
>>>>> eMail: uwe at thetaphi.de
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Uwe Schindler [mailto:uwe at thetaphi.de]
>>>>>> Sent: Sunday, October 16, 2016 8:38 PM
>>>>>> To: dev at lucene.apache.org
>>>>>> Cc: dalibor.topic at oracle.com; balchandra.vaidya at oracle.com;
>> 'Muneer
>>>>>> Kolarkunnu' <abdul.kolarkunnu at oracle.com>; 'Dawid Weiss'
>>>>>> <dawid.weiss at cs.put.poznan.pl>; dev at lucene.apache.org
>>>>>> Subject: RE: [JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-
>> ea+140) -
>>>>>> Build # 18064 - Unstable!
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> this seems to be a new regression in Java 9 ea build 140. Interestingly
>> this
>>>>>> only affects 2 libraries (morphologic and commons-codec phonetic). We
>>>>> use
>>>>>> loading of resources from classloaders at many places; it is unclear to
>> me,
>>>>>> why it only fails here. I will look into the code, but this is outside of
>>>> Lucene.
>>>>> I
>>>>>> think it might be some crazyness like using context class loader in non-
>>>>> proper
>>>>>> ways or similar.
>>>>>>
>>>>>> Maybe it is a new bug in JDK 9 build 139 or build 140 (the last working
>> one
>>>>>> was build 138).
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: uwe at thetaphi.de
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Policeman Jenkins Server [mailto:jenkins at thetaphi.de]
More information about the jdk9-dev
mailing list