RFR: JDK-8210337: runtime/NMT/VirtualAllocTestType.java failed on RuntimeException missing from stdout/stderr

gary.adams at oracle.com gary.adams at oracle.com
Thu Nov 8 03:43:52 UTC 2018


There is recovery code that cleans up old attach files,
but a permission denied error would prevent the clean
up operation from taking place. We're looking at a case of
a file owned by another user as well as a pid recycling
and failed clean up situation.

On 11/7/18 3:09 PM, Chris Plummer wrote:
> Are you saying that the PID was recycled, so the issue is a test run 
> from long ago leaving behind the file? If so, I thought there was 
> something in the attach code that cleaned up these PID files that were 
> left behind. In general I would not count this as an infra issue. PID 
> files left behind are the fault of the JVM. Maybe there is something 
> wrong with the cleanup code on solaris, or maybe we don't clean up on 
> any platform, but cycle through PIDs a lot faster on solaris.
>
> Chris
>
> On 11/7/18 11:20 AM, Gary Adams wrote:
>> If there are no further suggestions on JDK-8210337,
>> I plan to close it out as cannot reproduce.
>>
>> Similar bugs had been filed for the "Permission denied" error
>> from the openDoor request failure and each was attributed
>> to an infrastructure issue. e.g. another user with the same
>> pid left a temporary file that is blocking the current test
>> from attaching correctly.
>>
>> On 10/4/18, 1:49 PM, Gary Adams wrote:
>>> My delay and retry did not fix the problem with permission denied.
>>>
>>> When I was diagnosing the problem I instrumented the code
>>> to catch an IOException and call checkPermission to get
>>> more detail about the IOException. The error reported
>>> from calling checkPermission was ENOENT (stat).
>>>
>>> The code change I then proposed was catch the IOException,
>>> delay, and retry the open. That fixed the problem of
>>> ENOENT, but had nothing to do with "permission denied".
>>>
>>> On 10/4/18, 1:25 PM, Chris Plummer wrote:
>>>> But I also thought you said the delay and retry fixed the problem. 
>>>> How could fix the problem if it is just duplicating something that 
>>>> is already in place?
>>>>
>>>> Chris
>>>>
>>>> On 10/4/18 9:48 AM, Gary Adams wrote:
>>>>> My delay and retry just duplicated the openDoor retry.
>>>>> The normal processing of FileNotFoundException(ENOENT) is to retry
>>>>> several times until the file is available.
>>>>>
>>>>> But the original problem reported is a "Permission denied" 
>>>>> (EACCESS or EPERM).
>>>>> Delay and retry will not resolve a permissions error.
>>>>>
>>>>> On 10/4/18, 12:30 PM, Chris Plummer wrote:
>>>>>> Didn't the retry after 100ms delay work? If yes, why would it if 
>>>>>> the problem is that a java_pid was not cleaned up?
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 10/4/18 8:54 AM, Gary Adams wrote:
>>>>>>> First, let me retract the proposed change,
>>>>>>> it is not the right solution to the problem originally
>>>>>>> reported.
>>>>>>>
>>>>>>> Second, as a bit of explanation consider the code fragments below.
>>>>>>>
>>>>>>> The high level processing calls openDoor which is willing to retry
>>>>>>> the operation as long as the error is flagged specifically
>>>>>>> as a FileNotFoundException.
>>>>>>>
>>>>>>>   VirtualMachineImpl.java:72
>>>>>>>   VirtualMachineImpl.c:81
>>>>>>>
>>>>>>> During my testing I had added a check VirtualMachineImpl.java:214
>>>>>>> and when an IOException was detected made a call to checkPermissions
>>>>>>> to get more detailed information about the IOException. The error
>>>>>>> I saw was an ENOENT from the stat call. And not the detailed 
>>>>>>> checks for
>>>>>>> specific permissions issues (VirtualMachineImpl.c:143)
>>>>>>>
>>>>>>>     VirtualMachineImpl.c:118
>>>>>>>     VirtualMachineImpl.c:147
>>>>>>>
>>>>>>> What I missed in the original proposed solution was a 
>>>>>>> FileNotFoundException
>>>>>>> extends IOException. That means my delay and retry just 
>>>>>>> duplicates the higher
>>>>>>> level retry around the openDoor call.
>>>>>>>
>>>>>>> Third, the original error message logged in the bug report :
>>>>>>>
>>>>>>> java.io.IOException: Permission denied
>>>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native 
>>>>>>> Method)
>>>>>>>
>>>>>>> had to have come from
>>>>>>>
>>>>>>>   VirtualMachineImpl.c:70
>>>>>>>   VirtualMachineImpl.c:84
>>>>>>>
>>>>>>> which means the actual open call reported the file does exist
>>>>>>> but the permissions do not allow the file to be accessed.
>>>>>>> That also means the normal mechanism of removing leftover
>>>>>>> java_pid files would not have cleaned up another user's
>>>>>>> java_pid files.
>>>>>>>
>>>>>>> =====
>>>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:
>>>>>>> ...
>>>>>>>     67            // Opens the door file to the target VM. If 
>>>>>>> the file is not
>>>>>>>     68            // found it might mean that the attach 
>>>>>>> mechanism isn't started in the
>>>>>>>     69            // target VM so we attempt to start it and retry.
>>>>>>>     70            try {
>>>>>>>     71                fd = openDoor(pid);
>>>>>>>     72            } catch (FileNotFoundException fnf1) {
>>>>>>>     73                File f = createAttachFile(pid);
>>>>>>>     74                try {
>>>>>>>     75                    sigquit(pid);
>>>>>>>     76
>>>>>>>     77                    // give the target VM time to start 
>>>>>>> the attach mechanism
>>>>>>>     78                    final int delay_step = 100;
>>>>>>>     79                    final long timeout = attachTimeout();
>>>>>>>     80                    long time_spend = 0;
>>>>>>>     81                    long delay = 0;
>>>>>>>     82                    do {
>>>>>>>     83                        // Increase timeout on each 
>>>>>>> attempt to reduce polling
>>>>>>>     84                        delay += delay_step;
>>>>>>>     85                        try {
>>>>>>>     86                            Thread.sleep(delay);
>>>>>>>     87                        } catch (InterruptedException x) { }
>>>>>>>     88                        try {
>>>>>>>     89                            fd = openDoor(pid);
>>>>>>>     90                        } catch (FileNotFoundException fnf2) {
>>>>>>>     91                            // pass
>>>>>>>     92                        }
>>>>>>>     93
>>>>>>>     94                        time_spend += delay;
>>>>>>>     95                        if (time_spend > timeout/2 && fd 
>>>>>>> == -1) {
>>>>>>>     96                            // Send QUIT again to give 
>>>>>>> target VM the last chance to react
>>>>>>>     97                            sigquit(pid);
>>>>>>>     98                        }
>>>>>>>     99                    } while (time_spend <= timeout && fd 
>>>>>>> == -1);
>>>>>>>    100                    if (fd  == -1) {
>>>>>>>    101                        throw new AttachNotSupportedException(
>>>>>>>    102 String.format("Unable to open door %s: " +
>>>>>>>    103                              "target process %d doesn't 
>>>>>>> respond within %dms " +
>>>>>>>    104                              "or HotSpot VM not loaded", 
>>>>>>> socket_path, pid, time_spend));
>>>>>>>    105                    }
>>>>>>> ...
>>>>>>>    212        // The door is attached to .java_pid<pid> in the 
>>>>>>> temporary directory.
>>>>>>>    213        private int openDoor(int pid) throws IOException {
>>>>>>>    214            socket_path = tmpdir + "/.java_pid" + pid;
>>>>>>>    215            fd = open(socket_path);
>>>>>>>    216
>>>>>>>    217            // Check that the file owner/permission to 
>>>>>>> avoid attaching to
>>>>>>>    218            // bogus process
>>>>>>>    219            try {
>>>>>>>    220                checkPermissions(socket_path);
>>>>>>>    221            } catch (IOException ioe) {
>>>>>>>    222                close(fd);
>>>>>>>    223                throw ioe;
>>>>>>>    224            }
>>>>>>>    225            return fd;
>>>>>>>    226        }
>>>>>>>
>>>>>>> =====
>>>>>>> src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:
>>>>>>> ...
>>>>>>>     59    JNIEXPORT jint JNICALL 
>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>>>>     60      (JNIEnv *env, jclass cls, jstring path)
>>>>>>>     61    {
>>>>>>>     62        jboolean isCopy;
>>>>>>>     63        const char* p = GetStringPlatformChars(env, path, 
>>>>>>> &isCopy);
>>>>>>>     64        if (p == NULL) {
>>>>>>>     65            return 0;
>>>>>>>     66        } else {
>>>>>>>     67            int fd;
>>>>>>>     68            int err = 0;
>>>>>>>     69
>>>>>>>     70            fd = open(p, O_RDWR);
>>>>>>>     71            if (fd == -1) {
>>>>>>>     72                err = errno;
>>>>>>>     73            }
>>>>>>>     74
>>>>>>>     75            if (isCopy) {
>>>>>>>     76 JNU_ReleaseStringPlatformChars(env, path, p);
>>>>>>>     77            }
>>>>>>>     78
>>>>>>>     79            if (fd == -1) {
>>>>>>>     80                if (err == ENOENT) {
>>>>>>>     81                    JNU_ThrowByName(env, 
>>>>>>> "java/io/FileNotFoundException", NULL);
>>>>>>>     82                } else {
>>>>>>>     83                    char* msg = strdup(strerror(err));
>>>>>>>     84                    JNU_ThrowIOException(env, msg);
>>>>>>>     85                    if (msg != NULL) {
>>>>>>>     86                        free(msg);
>>>>>>>     87                    }
>>>>>>>     88                }
>>>>>>>     89            }
>>>>>>>     90            return fd;
>>>>>>>     91        }
>>>>>>>     92    }
>>>>>>> ...
>>>>>>>     99    JNIEXPORT void JNICALL 
>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
>>>>>>>    100      (JNIEnv *env, jclass cls, jstring path)
>>>>>>>    101    {
>>>>>>>    102        jboolean isCopy;
>>>>>>>    103        const char* p = GetStringPlatformChars(env, path, 
>>>>>>> &isCopy);
>>>>>>>    104        if (p != NULL) {
>>>>>>>    105            struct stat64 sb;
>>>>>>>    106            uid_t uid, gid;
>>>>>>>    107            int res;
>>>>>>>    108
>>>>>>>    109            memset(&sb, 0, sizeof(struct stat64));
>>>>>>>    110
>>>>>>>    111            /*
>>>>>>>    112             * Check that the path is owned by the 
>>>>>>> effective uid/gid of this
>>>>>>>    113             * process. Also check that group/other access 
>>>>>>> is not allowed.
>>>>>>>    114             */
>>>>>>>    115            uid = geteuid();
>>>>>>>    116            gid = getegid();
>>>>>>>    117
>>>>>>>    118            res = stat64(p, &sb);
>>>>>>>    119            if (res != 0) {
>>>>>>>    120                /* save errno */
>>>>>>>    121                res = errno;
>>>>>>>    122            }
>>>>>>>    123
>>>>>>>    124            if (res == 0) {
>>>>>>>    125                char msg[100];
>>>>>>>    126                jboolean isError = JNI_FALSE;
>>>>>>>    127                if (sb.st_uid != uid && uid != ROOT_UID) {
>>>>>>>    128                    snprintf(msg, sizeof(msg),
>>>>>>>    129                        "file should be owned by the 
>>>>>>> current user (which is %d) but is owned by %d", uid, sb.st_uid);
>>>>>>>    130                    isError = JNI_TRUE;
>>>>>>>    131                } else if (sb.st_gid != gid && uid != 
>>>>>>> ROOT_UID) {
>>>>>>>    132                    snprintf(msg, sizeof(msg),
>>>>>>>    133                        "file's group should be the 
>>>>>>> current group (which is %d) but the group is %d", gid, sb.st_gid);
>>>>>>>    134                    isError = JNI_TRUE;
>>>>>>>    135                } else if ((sb.st_mode & 
>>>>>>> (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
>>>>>>>    136                    snprintf(msg, sizeof(msg),
>>>>>>>    137                        "file should only be readable and 
>>>>>>> writable by the owner but has 0%03o access", sb.st_mode & 0777);
>>>>>>>    138                    isError = JNI_TRUE;
>>>>>>>    139                }
>>>>>>>    140                if (isError) {
>>>>>>>    141                    char buf[256];
>>>>>>>    142                    snprintf(buf, sizeof(buf), "well-known 
>>>>>>> file %s is not secure: %s", p, msg);
>>>>>>>    143                    JNU_ThrowIOException(env, buf);
>>>>>>>    144                }
>>>>>>>    145            } else {
>>>>>>>    146                char* msg = strdup(strerror(res));
>>>>>>>    147                JNU_ThrowIOException(env, msg);
>>>>>>>    148                if (msg != NULL) {
>>>>>>>    149                    free(msg);
>>>>>>>    150                }
>>>>>>>    151            }
>>>>>>>
>>>>>>> On 10/2/18, 6:23 PM, David Holmes wrote:
>>>>>>>> Minor correction: EPERM -> EACCES for Solaris
>>>>>>>>
>>>>>>>> Hard to see how to get a transient EACCES when opening a file 
>>>>>>>> ... though as it is really a door I guess there could be 
>>>>>>>> additional complexity.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 3/10/2018 7:54 AM, Chris Plummer wrote:
>>>>>>>>> On 10/2/18 2:38 PM, David Holmes wrote:
>>>>>>>>>> Chris,
>>>>>>>>>>
>>>>>>>>>> On 3/10/2018 6:57 AM, Chris Plummer wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/2/18 1:44 PM, gary.adams at oracle.com wrote:
>>>>>>>>>>>> The general attach sequence ...
>>>>>>>>>>>>
>>>>>>>>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  the attacher creates an attach_pid file in a directory 
>>>>>>>>>>>> where the attachee is runnning
>>>>>>>>>>>>  issues a signal to the attacheee
>>>>>>>>>>>>
>>>>>>>>>>>>   loops waiting for the java_pid file to be created
>>>>>>>>>>>>   default timeout is 10 seconds
>>>>>>>>>>>>
>>>>>>>>>>> So getting a FileNotFoundException while in this loop is OK, 
>>>>>>>>>>> but IOException is not.
>>>>>>>>>>>
>>>>>>>>>>>> src/hotspot/os/solaris/attachListener_solaris.cpp
>>>>>>>>>>>>
>>>>>>>>>>>>    attachee creates the java_pid file
>>>>>>>>>>>>    listens til the attacher opens the door
>>>>>>>>>>>>
>>>>>>>>>>> I'm don't think this is related, but JDK-8199811 made a fix 
>>>>>>>>>>> in attachListener_solaris.cpp to make it wait up to 10 
>>>>>>>>>>> seconds for initialization to complete before failing the 
>>>>>>>>>>> enqueue.
>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>> Not sure when a bare IOException is thrown rather than the
>>>>>>>>>>>> more specific FileNotFoundException.
>>>>>>>>>>> Where is the IOException originating from? I wonder if the 
>>>>>>>>>>> issue is that the file is in the process of being created, 
>>>>>>>>>>> but is not fully created yet. Maybe it is there, but 
>>>>>>>>>>> owner/group/permissions have not been set yet, and this 
>>>>>>>>>>> results in an IOException instead of FileNotFoundException.
>>>>>>>>>>
>>>>>>>>>> The exception is shown in the bug report:
>>>>>>>>>>
>>>>>>>>>>  [java.io.IOException: Permission denied
>>>>>>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native 
>>>>>>>>>> Method)
>>>>>>>>>> at 
>>>>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215) 
>>>>>>>>>>
>>>>>>>>>> at 
>>>>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71) 
>>>>>>>>>>
>>>>>>>>>> at 
>>>>>>>>>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) 
>>>>>>>>>>
>>>>>>>>>> at 
>>>>>>>>>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) 
>>>>>>>>>>
>>>>>>>>>> at 
>>>>>>>>>> jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
>>>>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
>>>>>>>>>>
>>>>>>>>>> And if you look at the native code the EPERM from open will 
>>>>>>>>>> cause IOException to be thrown.
>>>>>>>>>>
>>>>>>>>>> ./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
>>>>>>>>>>
>>>>>>>>>> JNIEXPORT jint JNICALL 
>>>>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>>>>>>>   (JNIEnv *env, jclass cls, jstring path)
>>>>>>>>>> {
>>>>>>>>>>     jboolean isCopy;
>>>>>>>>>>     const char* p = GetStringPlatformChars(env, path, &isCopy);
>>>>>>>>>>     if (p == NULL) {
>>>>>>>>>>         return 0;
>>>>>>>>>>     } else {
>>>>>>>>>>         int fd;
>>>>>>>>>>         int err = 0;
>>>>>>>>>>
>>>>>>>>>>         fd = open(p, O_RDWR);
>>>>>>>>>>         if (fd == -1) {
>>>>>>>>>>             err = errno;
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>         if (isCopy) {
>>>>>>>>>>             JNU_ReleaseStringPlatformChars(env, path, p);
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>         if (fd == -1) {
>>>>>>>>>>             if (err == ENOENT) {
>>>>>>>>>>                 JNU_ThrowByName(env, 
>>>>>>>>>> "java/io/FileNotFoundException", NULL);
>>>>>>>>>>             } else {
>>>>>>>>>>                 char* msg = strdup(strerror(err));
>>>>>>>>>>                 JNU_ThrowIOException(env, msg);
>>>>>>>>>>                 if (msg != NULL) {
>>>>>>>>>>                     free(msg);
>>>>>>>>>>                 }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We should add the path to the exception message.
>>>>>>>>>>
>>>>>>>>> Thanks David. So if EPERM is the error and a retry 100ms later 
>>>>>>>>> works, I think that supports my hypothesis that the file is 
>>>>>>>>> not quite fully created. So Gary's fix is probably fine. The 
>>>>>>>>> only other possible fix I can think of that wouldn't require 
>>>>>>>>> an explicit delay (or multiple retries) is probably not worth 
>>>>>>>>> the complexity. It would require that the attachee create two 
>>>>>>>>> files, and the attacher try to open the second file first. 
>>>>>>>>> When it either opens or returns EPERM, you know the first file 
>>>>>>>>> can safety be opened.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/2/18 4:11 PM, Chris Plummer wrote:
>>>>>>>>>>>>> Can you summarize how the attach handshaking is suppose to 
>>>>>>>>>>>>> work? I'm just wondering why the attacher would ever be 
>>>>>>>>>>>>> looking for the file before the attachee has created it. 
>>>>>>>>>>>>> It seems a proper handshake would prevent this. Maybe 
>>>>>>>>>>>>> there's some sort of visibility issue where the attachee 
>>>>>>>>>>>>> has indeed created the file, but it is not immediately 
>>>>>>>>>>>>> visible to the attacher process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/2/18 12:27 PM, gary.adams at oracle.com wrote:
>>>>>>>>>>>>>> The problem reproduced pretty quickly.
>>>>>>>>>>>>>> I added a call to checkPermission and revealed the
>>>>>>>>>>>>>> "file not found" from the stat call when the IOException
>>>>>>>>>>>>>> was detected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There has been some flakiness from the Solaris test 
>>>>>>>>>>>>>> machines today,
>>>>>>>>>>>>>> so I'll continue with the testing a bit longer.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/2/18 3:12 PM, Chris Plummer wrote:
>>>>>>>>>>>>>>> Without the fix was this issue easy enough to reproduce 
>>>>>>>>>>>>>>> that you can be sure this is resolving it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/2/18 8:16 AM, Gary Adams wrote:
>>>>>>>>>>>>>>>> Solaris debug builds are failing tests that use the 
>>>>>>>>>>>>>>>> attach interface.
>>>>>>>>>>>>>>>> An IOException is reported when the java_pid file is 
>>>>>>>>>>>>>>>> not opened.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It appears that the attempt to attach is taking place 
>>>>>>>>>>>>>>>> too quickly.
>>>>>>>>>>>>>>>> This workaround will allow the open operation to be 
>>>>>>>>>>>>>>>> retried
>>>>>>>>>>>>>>>> after a short pause.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   Webrev: 
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~gadams/8210337/webrev/
>>>>>>>>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Testing is in progress.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181107/71d7aa75/attachment-0001.html>


More information about the serviceability-dev mailing list