RFR: JDK-8210337: runtime/NMT/VirtualAllocTestType.java failed on RuntimeException missing from stdout/stderr

Gary Adams gary.adams at oracle.com
Thu Oct 4 17:49:35 UTC 2018


My delay and retry did not fix the problem with permission denied.

When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).

The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".

On 10/4/18, 1:25 PM, Chris Plummer wrote:
> But I also thought you said the delay and retry fixed the problem. How 
> could fix the problem if it is just duplicating something that is 
> already in place?
>
> Chris
>
> On 10/4/18 9:48 AM, Gary Adams wrote:
>> My delay and retry just duplicated the openDoor retry.
>> The normal processing of FileNotFoundException(ENOENT) is to retry
>> several times until the file is available.
>>
>> But the original problem reported is a "Permission denied" (EACCESS 
>> or EPERM).
>> Delay and retry will not resolve a permissions error.
>>
>> On 10/4/18, 12:30 PM, Chris Plummer wrote:
>>> Didn't the retry after 100ms delay work? If yes, why would it if the 
>>> problem is that a java_pid was not cleaned up?
>>>
>>> Chris
>>>
>>> On 10/4/18 8:54 AM, Gary Adams wrote:
>>>> First, let me retract the proposed change,
>>>> it is not the right solution to the problem originally
>>>> reported.
>>>>
>>>> Second, as a bit of explanation consider the code fragments below.
>>>>
>>>> The high level processing calls openDoor which is willing to retry
>>>> the operation as long as the error is flagged specifically
>>>> as a FileNotFoundException.
>>>>
>>>>   VirtualMachineImpl.java:72
>>>>   VirtualMachineImpl.c:81
>>>>
>>>> During my testing I had added a check VirtualMachineImpl.java:214
>>>> and when an IOException was detected made a call to checkPermissions
>>>> to get more detailed information about the IOException. The error
>>>> I saw was an ENOENT from the stat call. And not the detailed checks for
>>>> specific permissions issues (VirtualMachineImpl.c:143)
>>>>
>>>>     VirtualMachineImpl.c:118
>>>>     VirtualMachineImpl.c:147
>>>>
>>>> What I missed in the original proposed solution was a 
>>>> FileNotFoundException
>>>> extends IOException. That means my delay and retry just duplicates 
>>>> the higher
>>>> level retry around the openDoor call.
>>>>
>>>> Third, the original error message logged in the bug report :
>>>>
>>>> java.io.IOException: Permission denied
>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
>>>>
>>>> had to have come from
>>>>
>>>>   VirtualMachineImpl.c:70
>>>>   VirtualMachineImpl.c:84
>>>>
>>>> which means the actual open call reported the file does exist
>>>> but the permissions do not allow the file to be accessed.
>>>> That also means the normal mechanism of removing leftover
>>>> java_pid files would not have cleaned up another user's
>>>> java_pid files.
>>>>
>>>> =====
>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:
>>>> ...
>>>>     67            // Opens the door file to the target VM. If the 
>>>> file is not
>>>>     68            // found it might mean that the attach mechanism 
>>>> isn't started in the
>>>>     69            // target VM so we attempt to start it and retry.
>>>>     70            try {
>>>>     71                fd = openDoor(pid);
>>>>     72            } catch (FileNotFoundException fnf1) {
>>>>     73                File f = createAttachFile(pid);
>>>>     74                try {
>>>>     75                    sigquit(pid);
>>>>     76
>>>>     77                    // give the target VM time to start the 
>>>> attach mechanism
>>>>     78                    final int delay_step = 100;
>>>>     79                    final long timeout = attachTimeout();
>>>>     80                    long time_spend = 0;
>>>>     81                    long delay = 0;
>>>>     82                    do {
>>>>     83                        // Increase timeout on each attempt 
>>>> to reduce polling
>>>>     84                        delay += delay_step;
>>>>     85                        try {
>>>>     86                            Thread.sleep(delay);
>>>>     87                        } catch (InterruptedException x) { }
>>>>     88                        try {
>>>>     89                            fd = openDoor(pid);
>>>>     90                        } catch (FileNotFoundException fnf2) {
>>>>     91                            // pass
>>>>     92                        }
>>>>     93
>>>>     94                        time_spend += delay;
>>>>     95                        if (time_spend > timeout/2 && fd == -1) {
>>>>     96                            // Send QUIT again to give target 
>>>> VM the last chance to react
>>>>     97                            sigquit(pid);
>>>>     98                        }
>>>>     99                    } while (time_spend <= timeout && fd == -1);
>>>>    100                    if (fd  == -1) {
>>>>    101                        throw new AttachNotSupportedException(
>>>>    102                            String.format("Unable to open 
>>>> door %s: " +
>>>>    103                              "target process %d doesn't 
>>>> respond within %dms " +
>>>>    104                              "or HotSpot VM not loaded", 
>>>> socket_path, pid, time_spend));
>>>>    105                    }
>>>> ...
>>>>    212        // The door is attached to .java_pid<pid> in the 
>>>> temporary directory.
>>>>    213        private int openDoor(int pid) throws IOException {
>>>>    214            socket_path = tmpdir + "/.java_pid" + pid;
>>>>    215            fd = open(socket_path);
>>>>    216
>>>>    217            // Check that the file owner/permission to avoid 
>>>> attaching to
>>>>    218            // bogus process
>>>>    219            try {
>>>>    220                checkPermissions(socket_path);
>>>>    221            } catch (IOException ioe) {
>>>>    222                close(fd);
>>>>    223                throw ioe;
>>>>    224            }
>>>>    225            return fd;
>>>>    226        }
>>>>
>>>> =====
>>>> src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:
>>>> ...
>>>>     59    JNIEXPORT jint JNICALL 
>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>     60      (JNIEnv *env, jclass cls, jstring path)
>>>>     61    {
>>>>     62        jboolean isCopy;
>>>>     63        const char* p = GetStringPlatformChars(env, path, 
>>>> &isCopy);
>>>>     64        if (p == NULL) {
>>>>     65            return 0;
>>>>     66        } else {
>>>>     67            int fd;
>>>>     68            int err = 0;
>>>>     69
>>>>     70            fd = open(p, O_RDWR);
>>>>     71            if (fd == -1) {
>>>>     72                err = errno;
>>>>     73            }
>>>>     74
>>>>     75            if (isCopy) {
>>>>     76                JNU_ReleaseStringPlatformChars(env, path, p);
>>>>     77            }
>>>>     78
>>>>     79            if (fd == -1) {
>>>>     80                if (err == ENOENT) {
>>>>     81                    JNU_ThrowByName(env, 
>>>> "java/io/FileNotFoundException", NULL);
>>>>     82                } else {
>>>>     83                    char* msg = strdup(strerror(err));
>>>>     84                    JNU_ThrowIOException(env, msg);
>>>>     85                    if (msg != NULL) {
>>>>     86                        free(msg);
>>>>     87                    }
>>>>     88                }
>>>>     89            }
>>>>     90            return fd;
>>>>     91        }
>>>>     92    }
>>>> ...
>>>>     99    JNIEXPORT void JNICALL 
>>>> Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
>>>>    100      (JNIEnv *env, jclass cls, jstring path)
>>>>    101    {
>>>>    102        jboolean isCopy;
>>>>    103        const char* p = GetStringPlatformChars(env, path, 
>>>> &isCopy);
>>>>    104        if (p != NULL) {
>>>>    105            struct stat64 sb;
>>>>    106            uid_t uid, gid;
>>>>    107            int res;
>>>>    108
>>>>    109            memset(&sb, 0, sizeof(struct stat64));
>>>>    110
>>>>    111            /*
>>>>    112             * Check that the path is owned by the effective 
>>>> uid/gid of this
>>>>    113             * process. Also check that group/other access is 
>>>> not allowed.
>>>>    114             */
>>>>    115            uid = geteuid();
>>>>    116            gid = getegid();
>>>>    117
>>>>    118            res = stat64(p, &sb);
>>>>    119            if (res != 0) {
>>>>    120                /* save errno */
>>>>    121                res = errno;
>>>>    122            }
>>>>    123
>>>>    124            if (res == 0) {
>>>>    125                char msg[100];
>>>>    126                jboolean isError = JNI_FALSE;
>>>>    127                if (sb.st_uid != uid && uid != ROOT_UID) {
>>>>    128                    snprintf(msg, sizeof(msg),
>>>>    129                        "file should be owned by the current 
>>>> user (which is %d) but is owned by %d", uid, sb.st_uid);
>>>>    130                    isError = JNI_TRUE;
>>>>    131                } else if (sb.st_gid != gid && uid != ROOT_UID) {
>>>>    132                    snprintf(msg, sizeof(msg),
>>>>    133                        "file's group should be the current 
>>>> group (which is %d) but the group is %d", gid, sb.st_gid);
>>>>    134                    isError = JNI_TRUE;
>>>>    135                } else if ((sb.st_mode & 
>>>> (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
>>>>    136                    snprintf(msg, sizeof(msg),
>>>>    137                        "file should only be readable and 
>>>> writable by the owner but has 0%03o access", sb.st_mode & 0777);
>>>>    138                    isError = JNI_TRUE;
>>>>    139                }
>>>>    140                if (isError) {
>>>>    141                    char buf[256];
>>>>    142                    snprintf(buf, sizeof(buf), "well-known 
>>>> file %s is not secure: %s", p, msg);
>>>>    143                    JNU_ThrowIOException(env, buf);
>>>>    144                }
>>>>    145            } else {
>>>>    146                char* msg = strdup(strerror(res));
>>>>    147                JNU_ThrowIOException(env, msg);
>>>>    148                if (msg != NULL) {
>>>>    149                    free(msg);
>>>>    150                }
>>>>    151            }
>>>>
>>>> On 10/2/18, 6:23 PM, David Holmes wrote:
>>>>> Minor correction: EPERM -> EACCES for Solaris
>>>>>
>>>>> Hard to see how to get a transient EACCES when opening a file ... 
>>>>> though as it is really a door I guess there could be additional 
>>>>> complexity.
>>>>>
>>>>> David
>>>>>
>>>>> On 3/10/2018 7:54 AM, Chris Plummer wrote:
>>>>>> On 10/2/18 2:38 PM, David Holmes wrote:
>>>>>>> Chris,
>>>>>>>
>>>>>>> On 3/10/2018 6:57 AM, Chris Plummer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/2/18 1:44 PM, gary.adams at oracle.com wrote:
>>>>>>>>> The general attach sequence ...
>>>>>>>>>
>>>>>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  the attacher creates an attach_pid file in a directory where 
>>>>>>>>> the attachee is runnning
>>>>>>>>>  issues a signal to the attacheee
>>>>>>>>>
>>>>>>>>>   loops waiting for the java_pid file to be created
>>>>>>>>>   default timeout is 10 seconds
>>>>>>>>>
>>>>>>>> So getting a FileNotFoundException while in this loop is OK, 
>>>>>>>> but IOException is not.
>>>>>>>>
>>>>>>>>> src/hotspot/os/solaris/attachListener_solaris.cpp
>>>>>>>>>
>>>>>>>>>    attachee creates the java_pid file
>>>>>>>>>    listens til the attacher opens the door
>>>>>>>>>
>>>>>>>> I'm don't think this is related, but JDK-8199811 made a fix in 
>>>>>>>> attachListener_solaris.cpp to make it wait up to 10 seconds for 
>>>>>>>> initialization to complete before failing the enqueue.
>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> Not sure when a bare IOException is thrown rather than the
>>>>>>>>> more specific FileNotFoundException.
>>>>>>>> Where is the IOException originating from? I wonder if the 
>>>>>>>> issue is that the file is in the process of being created, but 
>>>>>>>> is not fully created yet. Maybe it is there, but 
>>>>>>>> owner/group/permissions have not been set yet, and this results 
>>>>>>>> in an IOException instead of FileNotFoundException.
>>>>>>>
>>>>>>> The exception is shown in the bug report:
>>>>>>>
>>>>>>>  [java.io.IOException: Permission denied
>>>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native 
>>>>>>> Method)
>>>>>>> at 
>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215) 
>>>>>>>
>>>>>>> at 
>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71) 
>>>>>>>
>>>>>>> at 
>>>>>>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) 
>>>>>>>
>>>>>>> at 
>>>>>>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) 
>>>>>>>
>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
>>>>>>>
>>>>>>> And if you look at the native code the EPERM from open will 
>>>>>>> cause IOException to be thrown.
>>>>>>>
>>>>>>> ./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
>>>>>>>
>>>>>>> JNIEXPORT jint JNICALL 
>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>>>>   (JNIEnv *env, jclass cls, jstring path)
>>>>>>> {
>>>>>>>     jboolean isCopy;
>>>>>>>     const char* p = GetStringPlatformChars(env, path, &isCopy);
>>>>>>>     if (p == NULL) {
>>>>>>>         return 0;
>>>>>>>     } else {
>>>>>>>         int fd;
>>>>>>>         int err = 0;
>>>>>>>
>>>>>>>         fd = open(p, O_RDWR);
>>>>>>>         if (fd == -1) {
>>>>>>>             err = errno;
>>>>>>>         }
>>>>>>>
>>>>>>>         if (isCopy) {
>>>>>>>             JNU_ReleaseStringPlatformChars(env, path, p);
>>>>>>>         }
>>>>>>>
>>>>>>>         if (fd == -1) {
>>>>>>>             if (err == ENOENT) {
>>>>>>>                 JNU_ThrowByName(env, 
>>>>>>> "java/io/FileNotFoundException", NULL);
>>>>>>>             } else {
>>>>>>>                 char* msg = strdup(strerror(err));
>>>>>>>                 JNU_ThrowIOException(env, msg);
>>>>>>>                 if (msg != NULL) {
>>>>>>>                     free(msg);
>>>>>>>                 }
>>>>>>>
>>>>>>>
>>>>>>> We should add the path to the exception message.
>>>>>>>
>>>>>> Thanks David. So if EPERM is the error and a retry 100ms later 
>>>>>> works, I think that supports my hypothesis that the file is not 
>>>>>> quite fully created. So Gary's fix is probably fine. The only 
>>>>>> other possible fix I can think of that wouldn't require an 
>>>>>> explicit delay (or multiple retries) is probably not worth the 
>>>>>> complexity. It would require that the attachee create two files, 
>>>>>> and the attacher try to open the second file first. When it 
>>>>>> either opens or returns EPERM, you know the first file can safety 
>>>>>> be opened.
>>>>>>
>>>>>> Chris
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/2/18 4:11 PM, Chris Plummer wrote:
>>>>>>>>>> Can you summarize how the attach handshaking is suppose to 
>>>>>>>>>> work? I'm just wondering why the attacher would ever be 
>>>>>>>>>> looking for the file before the attachee has created it. It 
>>>>>>>>>> seems a proper handshake would prevent this. Maybe there's 
>>>>>>>>>> some sort of visibility issue where the attachee has indeed 
>>>>>>>>>> created the file, but it is not immediately visible to the 
>>>>>>>>>> attacher process.
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On 10/2/18 12:27 PM, gary.adams at oracle.com wrote:
>>>>>>>>>>> The problem reproduced pretty quickly.
>>>>>>>>>>> I added a call to checkPermission and revealed the
>>>>>>>>>>> "file not found" from the stat call when the IOException
>>>>>>>>>>> was detected.
>>>>>>>>>>>
>>>>>>>>>>> There has been some flakiness from the Solaris test machines 
>>>>>>>>>>> today,
>>>>>>>>>>> so I'll continue with the testing a bit longer.
>>>>>>>>>>>
>>>>>>>>>>> On 10/2/18 3:12 PM, Chris Plummer wrote:
>>>>>>>>>>>> Without the fix was this issue easy enough to reproduce 
>>>>>>>>>>>> that you can be sure this is resolving it?
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/2/18 8:16 AM, Gary Adams wrote:
>>>>>>>>>>>>> Solaris debug builds are failing tests that use the attach 
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>> An IOException is reported when the java_pid file is not 
>>>>>>>>>>>>> opened.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It appears that the attempt to attach is taking place too 
>>>>>>>>>>>>> quickly.
>>>>>>>>>>>>> This workaround will allow the open operation to be retried
>>>>>>>>>>>>> after a short pause.
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
>>>>>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
>>>>>>>>>>>>>
>>>>>>>>>>>>> Testing is in progress.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181004/b27c0245/attachment-0001.html>


More information about the serviceability-dev mailing list