RFR: JDK-8210337: runtime/NMT/VirtualAllocTestType.java failed on RuntimeException missing from stdout/stderr
Gary Adams
gary.adams at oracle.com
Thu Oct 4 17:49:35 UTC 2018
My delay and retry did not fix the problem with permission denied.
When I was diagnosing the problem I instrumented the code
to catch an IOException and call checkPermission to get
more detail about the IOException. The error reported
from calling checkPermission was ENOENT (stat).
The code change I then proposed was catch the IOException,
delay, and retry the open. That fixed the problem of
ENOENT, but had nothing to do with "permission denied".
On 10/4/18, 1:25 PM, Chris Plummer wrote:
> But I also thought you said the delay and retry fixed the problem. How
> could fix the problem if it is just duplicating something that is
> already in place?
>
> Chris
>
> On 10/4/18 9:48 AM, Gary Adams wrote:
>> My delay and retry just duplicated the openDoor retry.
>> The normal processing of FileNotFoundException(ENOENT) is to retry
>> several times until the file is available.
>>
>> But the original problem reported is a "Permission denied" (EACCESS
>> or EPERM).
>> Delay and retry will not resolve a permissions error.
>>
>> On 10/4/18, 12:30 PM, Chris Plummer wrote:
>>> Didn't the retry after 100ms delay work? If yes, why would it if the
>>> problem is that a java_pid was not cleaned up?
>>>
>>> Chris
>>>
>>> On 10/4/18 8:54 AM, Gary Adams wrote:
>>>> First, let me retract the proposed change,
>>>> it is not the right solution to the problem originally
>>>> reported.
>>>>
>>>> Second, as a bit of explanation consider the code fragments below.
>>>>
>>>> The high level processing calls openDoor which is willing to retry
>>>> the operation as long as the error is flagged specifically
>>>> as a FileNotFoundException.
>>>>
>>>> VirtualMachineImpl.java:72
>>>> VirtualMachineImpl.c:81
>>>>
>>>> During my testing I had added a check VirtualMachineImpl.java:214
>>>> and when an IOException was detected made a call to checkPermissions
>>>> to get more detailed information about the IOException. The error
>>>> I saw was an ENOENT from the stat call. And not the detailed checks for
>>>> specific permissions issues (VirtualMachineImpl.c:143)
>>>>
>>>> VirtualMachineImpl.c:118
>>>> VirtualMachineImpl.c:147
>>>>
>>>> What I missed in the original proposed solution was a
>>>> FileNotFoundException
>>>> extends IOException. That means my delay and retry just duplicates
>>>> the higher
>>>> level retry around the openDoor call.
>>>>
>>>> Third, the original error message logged in the bug report :
>>>>
>>>> java.io.IOException: Permission denied
>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
>>>>
>>>> had to have come from
>>>>
>>>> VirtualMachineImpl.c:70
>>>> VirtualMachineImpl.c:84
>>>>
>>>> which means the actual open call reported the file does exist
>>>> but the permissions do not allow the file to be accessed.
>>>> That also means the normal mechanism of removing leftover
>>>> java_pid files would not have cleaned up another user's
>>>> java_pid files.
>>>>
>>>> =====
>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:
>>>> ...
>>>> 67 // Opens the door file to the target VM. If the
>>>> file is not
>>>> 68 // found it might mean that the attach mechanism
>>>> isn't started in the
>>>> 69 // target VM so we attempt to start it and retry.
>>>> 70 try {
>>>> 71 fd = openDoor(pid);
>>>> 72 } catch (FileNotFoundException fnf1) {
>>>> 73 File f = createAttachFile(pid);
>>>> 74 try {
>>>> 75 sigquit(pid);
>>>> 76
>>>> 77 // give the target VM time to start the
>>>> attach mechanism
>>>> 78 final int delay_step = 100;
>>>> 79 final long timeout = attachTimeout();
>>>> 80 long time_spend = 0;
>>>> 81 long delay = 0;
>>>> 82 do {
>>>> 83 // Increase timeout on each attempt
>>>> to reduce polling
>>>> 84 delay += delay_step;
>>>> 85 try {
>>>> 86 Thread.sleep(delay);
>>>> 87 } catch (InterruptedException x) { }
>>>> 88 try {
>>>> 89 fd = openDoor(pid);
>>>> 90 } catch (FileNotFoundException fnf2) {
>>>> 91 // pass
>>>> 92 }
>>>> 93
>>>> 94 time_spend += delay;
>>>> 95 if (time_spend > timeout/2 && fd == -1) {
>>>> 96 // Send QUIT again to give target
>>>> VM the last chance to react
>>>> 97 sigquit(pid);
>>>> 98 }
>>>> 99 } while (time_spend <= timeout && fd == -1);
>>>> 100 if (fd == -1) {
>>>> 101 throw new AttachNotSupportedException(
>>>> 102 String.format("Unable to open
>>>> door %s: " +
>>>> 103 "target process %d doesn't
>>>> respond within %dms " +
>>>> 104 "or HotSpot VM not loaded",
>>>> socket_path, pid, time_spend));
>>>> 105 }
>>>> ...
>>>> 212 // The door is attached to .java_pid<pid> in the
>>>> temporary directory.
>>>> 213 private int openDoor(int pid) throws IOException {
>>>> 214 socket_path = tmpdir + "/.java_pid" + pid;
>>>> 215 fd = open(socket_path);
>>>> 216
>>>> 217 // Check that the file owner/permission to avoid
>>>> attaching to
>>>> 218 // bogus process
>>>> 219 try {
>>>> 220 checkPermissions(socket_path);
>>>> 221 } catch (IOException ioe) {
>>>> 222 close(fd);
>>>> 223 throw ioe;
>>>> 224 }
>>>> 225 return fd;
>>>> 226 }
>>>>
>>>> =====
>>>> src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:
>>>> ...
>>>> 59 JNIEXPORT jint JNICALL
>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>> 60 (JNIEnv *env, jclass cls, jstring path)
>>>> 61 {
>>>> 62 jboolean isCopy;
>>>> 63 const char* p = GetStringPlatformChars(env, path,
>>>> &isCopy);
>>>> 64 if (p == NULL) {
>>>> 65 return 0;
>>>> 66 } else {
>>>> 67 int fd;
>>>> 68 int err = 0;
>>>> 69
>>>> 70 fd = open(p, O_RDWR);
>>>> 71 if (fd == -1) {
>>>> 72 err = errno;
>>>> 73 }
>>>> 74
>>>> 75 if (isCopy) {
>>>> 76 JNU_ReleaseStringPlatformChars(env, path, p);
>>>> 77 }
>>>> 78
>>>> 79 if (fd == -1) {
>>>> 80 if (err == ENOENT) {
>>>> 81 JNU_ThrowByName(env,
>>>> "java/io/FileNotFoundException", NULL);
>>>> 82 } else {
>>>> 83 char* msg = strdup(strerror(err));
>>>> 84 JNU_ThrowIOException(env, msg);
>>>> 85 if (msg != NULL) {
>>>> 86 free(msg);
>>>> 87 }
>>>> 88 }
>>>> 89 }
>>>> 90 return fd;
>>>> 91 }
>>>> 92 }
>>>> ...
>>>> 99 JNIEXPORT void JNICALL
>>>> Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
>>>> 100 (JNIEnv *env, jclass cls, jstring path)
>>>> 101 {
>>>> 102 jboolean isCopy;
>>>> 103 const char* p = GetStringPlatformChars(env, path,
>>>> &isCopy);
>>>> 104 if (p != NULL) {
>>>> 105 struct stat64 sb;
>>>> 106 uid_t uid, gid;
>>>> 107 int res;
>>>> 108
>>>> 109 memset(&sb, 0, sizeof(struct stat64));
>>>> 110
>>>> 111 /*
>>>> 112 * Check that the path is owned by the effective
>>>> uid/gid of this
>>>> 113 * process. Also check that group/other access is
>>>> not allowed.
>>>> 114 */
>>>> 115 uid = geteuid();
>>>> 116 gid = getegid();
>>>> 117
>>>> 118 res = stat64(p, &sb);
>>>> 119 if (res != 0) {
>>>> 120 /* save errno */
>>>> 121 res = errno;
>>>> 122 }
>>>> 123
>>>> 124 if (res == 0) {
>>>> 125 char msg[100];
>>>> 126 jboolean isError = JNI_FALSE;
>>>> 127 if (sb.st_uid != uid && uid != ROOT_UID) {
>>>> 128 snprintf(msg, sizeof(msg),
>>>> 129 "file should be owned by the current
>>>> user (which is %d) but is owned by %d", uid, sb.st_uid);
>>>> 130 isError = JNI_TRUE;
>>>> 131 } else if (sb.st_gid != gid && uid != ROOT_UID) {
>>>> 132 snprintf(msg, sizeof(msg),
>>>> 133 "file's group should be the current
>>>> group (which is %d) but the group is %d", gid, sb.st_gid);
>>>> 134 isError = JNI_TRUE;
>>>> 135 } else if ((sb.st_mode &
>>>> (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
>>>> 136 snprintf(msg, sizeof(msg),
>>>> 137 "file should only be readable and
>>>> writable by the owner but has 0%03o access", sb.st_mode & 0777);
>>>> 138 isError = JNI_TRUE;
>>>> 139 }
>>>> 140 if (isError) {
>>>> 141 char buf[256];
>>>> 142 snprintf(buf, sizeof(buf), "well-known
>>>> file %s is not secure: %s", p, msg);
>>>> 143 JNU_ThrowIOException(env, buf);
>>>> 144 }
>>>> 145 } else {
>>>> 146 char* msg = strdup(strerror(res));
>>>> 147 JNU_ThrowIOException(env, msg);
>>>> 148 if (msg != NULL) {
>>>> 149 free(msg);
>>>> 150 }
>>>> 151 }
>>>>
>>>> On 10/2/18, 6:23 PM, David Holmes wrote:
>>>>> Minor correction: EPERM -> EACCES for Solaris
>>>>>
>>>>> Hard to see how to get a transient EACCES when opening a file ...
>>>>> though as it is really a door I guess there could be additional
>>>>> complexity.
>>>>>
>>>>> David
>>>>>
>>>>> On 3/10/2018 7:54 AM, Chris Plummer wrote:
>>>>>> On 10/2/18 2:38 PM, David Holmes wrote:
>>>>>>> Chris,
>>>>>>>
>>>>>>> On 3/10/2018 6:57 AM, Chris Plummer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/2/18 1:44 PM, gary.adams at oracle.com wrote:
>>>>>>>>> The general attach sequence ...
>>>>>>>>>
>>>>>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> the attacher creates an attach_pid file in a directory where
>>>>>>>>> the attachee is runnning
>>>>>>>>> issues a signal to the attacheee
>>>>>>>>>
>>>>>>>>> loops waiting for the java_pid file to be created
>>>>>>>>> default timeout is 10 seconds
>>>>>>>>>
>>>>>>>> So getting a FileNotFoundException while in this loop is OK,
>>>>>>>> but IOException is not.
>>>>>>>>
>>>>>>>>> src/hotspot/os/solaris/attachListener_solaris.cpp
>>>>>>>>>
>>>>>>>>> attachee creates the java_pid file
>>>>>>>>> listens til the attacher opens the door
>>>>>>>>>
>>>>>>>> I'm don't think this is related, but JDK-8199811 made a fix in
>>>>>>>> attachListener_solaris.cpp to make it wait up to 10 seconds for
>>>>>>>> initialization to complete before failing the enqueue.
>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> Not sure when a bare IOException is thrown rather than the
>>>>>>>>> more specific FileNotFoundException.
>>>>>>>> Where is the IOException originating from? I wonder if the
>>>>>>>> issue is that the file is in the process of being created, but
>>>>>>>> is not fully created yet. Maybe it is there, but
>>>>>>>> owner/group/permissions have not been set yet, and this results
>>>>>>>> in an IOException instead of FileNotFoundException.
>>>>>>>
>>>>>>> The exception is shown in the bug report:
>>>>>>>
>>>>>>> [java.io.IOException: Permission denied
>>>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
>>>>>>> Method)
>>>>>>> at
>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
>>>>>>>
>>>>>>> at
>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
>>>>>>>
>>>>>>> at
>>>>>>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>>>>>>>
>>>>>>> at
>>>>>>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>>>>>>>
>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
>>>>>>>
>>>>>>> And if you look at the native code the EPERM from open will
>>>>>>> cause IOException to be thrown.
>>>>>>>
>>>>>>> ./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
>>>>>>>
>>>>>>> JNIEXPORT jint JNICALL
>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>>>> (JNIEnv *env, jclass cls, jstring path)
>>>>>>> {
>>>>>>> jboolean isCopy;
>>>>>>> const char* p = GetStringPlatformChars(env, path, &isCopy);
>>>>>>> if (p == NULL) {
>>>>>>> return 0;
>>>>>>> } else {
>>>>>>> int fd;
>>>>>>> int err = 0;
>>>>>>>
>>>>>>> fd = open(p, O_RDWR);
>>>>>>> if (fd == -1) {
>>>>>>> err = errno;
>>>>>>> }
>>>>>>>
>>>>>>> if (isCopy) {
>>>>>>> JNU_ReleaseStringPlatformChars(env, path, p);
>>>>>>> }
>>>>>>>
>>>>>>> if (fd == -1) {
>>>>>>> if (err == ENOENT) {
>>>>>>> JNU_ThrowByName(env,
>>>>>>> "java/io/FileNotFoundException", NULL);
>>>>>>> } else {
>>>>>>> char* msg = strdup(strerror(err));
>>>>>>> JNU_ThrowIOException(env, msg);
>>>>>>> if (msg != NULL) {
>>>>>>> free(msg);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> We should add the path to the exception message.
>>>>>>>
>>>>>> Thanks David. So if EPERM is the error and a retry 100ms later
>>>>>> works, I think that supports my hypothesis that the file is not
>>>>>> quite fully created. So Gary's fix is probably fine. The only
>>>>>> other possible fix I can think of that wouldn't require an
>>>>>> explicit delay (or multiple retries) is probably not worth the
>>>>>> complexity. It would require that the attachee create two files,
>>>>>> and the attacher try to open the second file first. When it
>>>>>> either opens or returns EPERM, you know the first file can safety
>>>>>> be opened.
>>>>>>
>>>>>> Chris
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/2/18 4:11 PM, Chris Plummer wrote:
>>>>>>>>>> Can you summarize how the attach handshaking is suppose to
>>>>>>>>>> work? I'm just wondering why the attacher would ever be
>>>>>>>>>> looking for the file before the attachee has created it. It
>>>>>>>>>> seems a proper handshake would prevent this. Maybe there's
>>>>>>>>>> some sort of visibility issue where the attachee has indeed
>>>>>>>>>> created the file, but it is not immediately visible to the
>>>>>>>>>> attacher process.
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On 10/2/18 12:27 PM, gary.adams at oracle.com wrote:
>>>>>>>>>>> The problem reproduced pretty quickly.
>>>>>>>>>>> I added a call to checkPermission and revealed the
>>>>>>>>>>> "file not found" from the stat call when the IOException
>>>>>>>>>>> was detected.
>>>>>>>>>>>
>>>>>>>>>>> There has been some flakiness from the Solaris test machines
>>>>>>>>>>> today,
>>>>>>>>>>> so I'll continue with the testing a bit longer.
>>>>>>>>>>>
>>>>>>>>>>> On 10/2/18 3:12 PM, Chris Plummer wrote:
>>>>>>>>>>>> Without the fix was this issue easy enough to reproduce
>>>>>>>>>>>> that you can be sure this is resolving it?
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/2/18 8:16 AM, Gary Adams wrote:
>>>>>>>>>>>>> Solaris debug builds are failing tests that use the attach
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>> An IOException is reported when the java_pid file is not
>>>>>>>>>>>>> opened.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It appears that the attempt to attach is taking place too
>>>>>>>>>>>>> quickly.
>>>>>>>>>>>>> This workaround will allow the open operation to be retried
>>>>>>>>>>>>> after a short pause.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
>>>>>>>>>>>>>
>>>>>>>>>>>>> Testing is in progress.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181004/b27c0245/attachment-0001.html>
More information about the serviceability-dev
mailing list