RFR: JDK-8210337: runtime/NMT/VirtualAllocTestType.java failed on RuntimeException missing from stdout/stderr
Gary Adams
gary.adams at oracle.com
Wed Nov 7 19:20:12 UTC 2018
If there are no further suggestions on JDK-8210337,
I plan to close it out as cannot reproduce.
Similar bugs had been filed for the "Permission denied" error
from the openDoor request failure and each was attributed
to an infrastructure issue. e.g. another user with the same
pid left a temporary file that is blocking the current test
from attaching correctly.
On 10/4/18, 1:49 PM, Gary Adams wrote:
> My delay and retry did not fix the problem with permission denied.
>
> When I was diagnosing the problem I instrumented the code
> to catch an IOException and call checkPermission to get
> more detail about the IOException. The error reported
> from calling checkPermission was ENOENT (stat).
>
> The code change I then proposed was catch the IOException,
> delay, and retry the open. That fixed the problem of
> ENOENT, but had nothing to do with "permission denied".
>
> On 10/4/18, 1:25 PM, Chris Plummer wrote:
>> But I also thought you said the delay and retry fixed the problem.
>> How could fix the problem if it is just duplicating something that is
>> already in place?
>>
>> Chris
>>
>> On 10/4/18 9:48 AM, Gary Adams wrote:
>>> My delay and retry just duplicated the openDoor retry.
>>> The normal processing of FileNotFoundException(ENOENT) is to retry
>>> several times until the file is available.
>>>
>>> But the original problem reported is a "Permission denied" (EACCESS
>>> or EPERM).
>>> Delay and retry will not resolve a permissions error.
>>>
>>> On 10/4/18, 12:30 PM, Chris Plummer wrote:
>>>> Didn't the retry after 100ms delay work? If yes, why would it if
>>>> the problem is that a java_pid was not cleaned up?
>>>>
>>>> Chris
>>>>
>>>> On 10/4/18 8:54 AM, Gary Adams wrote:
>>>>> First, let me retract the proposed change,
>>>>> it is not the right solution to the problem originally
>>>>> reported.
>>>>>
>>>>> Second, as a bit of explanation consider the code fragments below.
>>>>>
>>>>> The high level processing calls openDoor which is willing to retry
>>>>> the operation as long as the error is flagged specifically
>>>>> as a FileNotFoundException.
>>>>>
>>>>> VirtualMachineImpl.java:72
>>>>> VirtualMachineImpl.c:81
>>>>>
>>>>> During my testing I had added a check VirtualMachineImpl.java:214
>>>>> and when an IOException was detected made a call to checkPermissions
>>>>> to get more detailed information about the IOException. The error
>>>>> I saw was an ENOENT from the stat call. And not the detailed
>>>>> checks for
>>>>> specific permissions issues (VirtualMachineImpl.c:143)
>>>>>
>>>>> VirtualMachineImpl.c:118
>>>>> VirtualMachineImpl.c:147
>>>>>
>>>>> What I missed in the original proposed solution was a
>>>>> FileNotFoundException
>>>>> extends IOException. That means my delay and retry just duplicates
>>>>> the higher
>>>>> level retry around the openDoor call.
>>>>>
>>>>> Third, the original error message logged in the bug report :
>>>>>
>>>>> java.io.IOException: Permission denied
>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native Method)
>>>>>
>>>>> had to have come from
>>>>>
>>>>> VirtualMachineImpl.c:70
>>>>> VirtualMachineImpl.c:84
>>>>>
>>>>> which means the actual open call reported the file does exist
>>>>> but the permissions do not allow the file to be accessed.
>>>>> That also means the normal mechanism of removing leftover
>>>>> java_pid files would not have cleaned up another user's
>>>>> java_pid files.
>>>>>
>>>>> =====
>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java:
>>>>> ...
>>>>> 67 // Opens the door file to the target VM. If the
>>>>> file is not
>>>>> 68 // found it might mean that the attach mechanism
>>>>> isn't started in the
>>>>> 69 // target VM so we attempt to start it and retry.
>>>>> 70 try {
>>>>> 71 fd = openDoor(pid);
>>>>> 72 } catch (FileNotFoundException fnf1) {
>>>>> 73 File f = createAttachFile(pid);
>>>>> 74 try {
>>>>> 75 sigquit(pid);
>>>>> 76
>>>>> 77 // give the target VM time to start the
>>>>> attach mechanism
>>>>> 78 final int delay_step = 100;
>>>>> 79 final long timeout = attachTimeout();
>>>>> 80 long time_spend = 0;
>>>>> 81 long delay = 0;
>>>>> 82 do {
>>>>> 83 // Increase timeout on each attempt
>>>>> to reduce polling
>>>>> 84 delay += delay_step;
>>>>> 85 try {
>>>>> 86 Thread.sleep(delay);
>>>>> 87 } catch (InterruptedException x) { }
>>>>> 88 try {
>>>>> 89 fd = openDoor(pid);
>>>>> 90 } catch (FileNotFoundException fnf2) {
>>>>> 91 // pass
>>>>> 92 }
>>>>> 93
>>>>> 94 time_spend += delay;
>>>>> 95 if (time_spend > timeout/2 && fd ==
>>>>> -1) {
>>>>> 96 // Send QUIT again to give
>>>>> target VM the last chance to react
>>>>> 97 sigquit(pid);
>>>>> 98 }
>>>>> 99 } while (time_spend <= timeout && fd == -1);
>>>>> 100 if (fd == -1) {
>>>>> 101 throw new AttachNotSupportedException(
>>>>> 102 String.format("Unable to open
>>>>> door %s: " +
>>>>> 103 "target process %d doesn't
>>>>> respond within %dms " +
>>>>> 104 "or HotSpot VM not loaded",
>>>>> socket_path, pid, time_spend));
>>>>> 105 }
>>>>> ...
>>>>> 212 // The door is attached to .java_pid<pid> in the
>>>>> temporary directory.
>>>>> 213 private int openDoor(int pid) throws IOException {
>>>>> 214 socket_path = tmpdir + "/.java_pid" + pid;
>>>>> 215 fd = open(socket_path);
>>>>> 216
>>>>> 217 // Check that the file owner/permission to avoid
>>>>> attaching to
>>>>> 218 // bogus process
>>>>> 219 try {
>>>>> 220 checkPermissions(socket_path);
>>>>> 221 } catch (IOException ioe) {
>>>>> 222 close(fd);
>>>>> 223 throw ioe;
>>>>> 224 }
>>>>> 225 return fd;
>>>>> 226 }
>>>>>
>>>>> =====
>>>>> src/jdk.attach/solaris/native/libattach/VirtualMachineImpl.c:
>>>>> ...
>>>>> 59 JNIEXPORT jint JNICALL
>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>> 60 (JNIEnv *env, jclass cls, jstring path)
>>>>> 61 {
>>>>> 62 jboolean isCopy;
>>>>> 63 const char* p = GetStringPlatformChars(env, path,
>>>>> &isCopy);
>>>>> 64 if (p == NULL) {
>>>>> 65 return 0;
>>>>> 66 } else {
>>>>> 67 int fd;
>>>>> 68 int err = 0;
>>>>> 69
>>>>> 70 fd = open(p, O_RDWR);
>>>>> 71 if (fd == -1) {
>>>>> 72 err = errno;
>>>>> 73 }
>>>>> 74
>>>>> 75 if (isCopy) {
>>>>> 76 JNU_ReleaseStringPlatformChars(env, path, p);
>>>>> 77 }
>>>>> 78
>>>>> 79 if (fd == -1) {
>>>>> 80 if (err == ENOENT) {
>>>>> 81 JNU_ThrowByName(env,
>>>>> "java/io/FileNotFoundException", NULL);
>>>>> 82 } else {
>>>>> 83 char* msg = strdup(strerror(err));
>>>>> 84 JNU_ThrowIOException(env, msg);
>>>>> 85 if (msg != NULL) {
>>>>> 86 free(msg);
>>>>> 87 }
>>>>> 88 }
>>>>> 89 }
>>>>> 90 return fd;
>>>>> 91 }
>>>>> 92 }
>>>>> ...
>>>>> 99 JNIEXPORT void JNICALL
>>>>> Java_sun_tools_attach_VirtualMachineImpl_checkPermissions
>>>>> 100 (JNIEnv *env, jclass cls, jstring path)
>>>>> 101 {
>>>>> 102 jboolean isCopy;
>>>>> 103 const char* p = GetStringPlatformChars(env, path,
>>>>> &isCopy);
>>>>> 104 if (p != NULL) {
>>>>> 105 struct stat64 sb;
>>>>> 106 uid_t uid, gid;
>>>>> 107 int res;
>>>>> 108
>>>>> 109 memset(&sb, 0, sizeof(struct stat64));
>>>>> 110
>>>>> 111 /*
>>>>> 112 * Check that the path is owned by the effective
>>>>> uid/gid of this
>>>>> 113 * process. Also check that group/other access
>>>>> is not allowed.
>>>>> 114 */
>>>>> 115 uid = geteuid();
>>>>> 116 gid = getegid();
>>>>> 117
>>>>> 118 res = stat64(p, &sb);
>>>>> 119 if (res != 0) {
>>>>> 120 /* save errno */
>>>>> 121 res = errno;
>>>>> 122 }
>>>>> 123
>>>>> 124 if (res == 0) {
>>>>> 125 char msg[100];
>>>>> 126 jboolean isError = JNI_FALSE;
>>>>> 127 if (sb.st_uid != uid && uid != ROOT_UID) {
>>>>> 128 snprintf(msg, sizeof(msg),
>>>>> 129 "file should be owned by the current
>>>>> user (which is %d) but is owned by %d", uid, sb.st_uid);
>>>>> 130 isError = JNI_TRUE;
>>>>> 131 } else if (sb.st_gid != gid && uid !=
>>>>> ROOT_UID) {
>>>>> 132 snprintf(msg, sizeof(msg),
>>>>> 133 "file's group should be the current
>>>>> group (which is %d) but the group is %d", gid, sb.st_gid);
>>>>> 134 isError = JNI_TRUE;
>>>>> 135 } else if ((sb.st_mode &
>>>>> (S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)) != 0) {
>>>>> 136 snprintf(msg, sizeof(msg),
>>>>> 137 "file should only be readable and
>>>>> writable by the owner but has 0%03o access", sb.st_mode & 0777);
>>>>> 138 isError = JNI_TRUE;
>>>>> 139 }
>>>>> 140 if (isError) {
>>>>> 141 char buf[256];
>>>>> 142 snprintf(buf, sizeof(buf), "well-known
>>>>> file %s is not secure: %s", p, msg);
>>>>> 143 JNU_ThrowIOException(env, buf);
>>>>> 144 }
>>>>> 145 } else {
>>>>> 146 char* msg = strdup(strerror(res));
>>>>> 147 JNU_ThrowIOException(env, msg);
>>>>> 148 if (msg != NULL) {
>>>>> 149 free(msg);
>>>>> 150 }
>>>>> 151 }
>>>>>
>>>>> On 10/2/18, 6:23 PM, David Holmes wrote:
>>>>>> Minor correction: EPERM -> EACCES for Solaris
>>>>>>
>>>>>> Hard to see how to get a transient EACCES when opening a file ...
>>>>>> though as it is really a door I guess there could be additional
>>>>>> complexity.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 3/10/2018 7:54 AM, Chris Plummer wrote:
>>>>>>> On 10/2/18 2:38 PM, David Holmes wrote:
>>>>>>>> Chris,
>>>>>>>>
>>>>>>>> On 3/10/2018 6:57 AM, Chris Plummer wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/2/18 1:44 PM, gary.adams at oracle.com wrote:
>>>>>>>>>> The general attach sequence ...
>>>>>>>>>>
>>>>>>>>>> src/jdk.attach/solaris/classes/sun/tools/attach/VirtualMachineImpl.java
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> the attacher creates an attach_pid file in a directory where
>>>>>>>>>> the attachee is runnning
>>>>>>>>>> issues a signal to the attacheee
>>>>>>>>>>
>>>>>>>>>> loops waiting for the java_pid file to be created
>>>>>>>>>> default timeout is 10 seconds
>>>>>>>>>>
>>>>>>>>> So getting a FileNotFoundException while in this loop is OK,
>>>>>>>>> but IOException is not.
>>>>>>>>>
>>>>>>>>>> src/hotspot/os/solaris/attachListener_solaris.cpp
>>>>>>>>>>
>>>>>>>>>> attachee creates the java_pid file
>>>>>>>>>> listens til the attacher opens the door
>>>>>>>>>>
>>>>>>>>> I'm don't think this is related, but JDK-8199811 made a fix in
>>>>>>>>> attachListener_solaris.cpp to make it wait up to 10 seconds
>>>>>>>>> for initialization to complete before failing the enqueue.
>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> Not sure when a bare IOException is thrown rather than the
>>>>>>>>>> more specific FileNotFoundException.
>>>>>>>>> Where is the IOException originating from? I wonder if the
>>>>>>>>> issue is that the file is in the process of being created, but
>>>>>>>>> is not fully created yet. Maybe it is there, but
>>>>>>>>> owner/group/permissions have not been set yet, and this
>>>>>>>>> results in an IOException instead of FileNotFoundException.
>>>>>>>>
>>>>>>>> The exception is shown in the bug report:
>>>>>>>>
>>>>>>>> [java.io.IOException: Permission denied
>>>>>>>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.open(Native
>>>>>>>> Method)
>>>>>>>> at
>>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.openDoor(VirtualMachineImpl.java:215)
>>>>>>>>
>>>>>>>> at
>>>>>>>> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:71)
>>>>>>>>
>>>>>>>> at
>>>>>>>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>>>>>>>>
>>>>>>>> at
>>>>>>>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>>>>>>>>
>>>>>>>> at
>>>>>>>> jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
>>>>>>>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
>>>>>>>>
>>>>>>>> And if you look at the native code the EPERM from open will
>>>>>>>> cause IOException to be thrown.
>>>>>>>>
>>>>>>>> ./jdk.attach/solaris/native/libattach/VirtualMachineImpl.c
>>>>>>>>
>>>>>>>> JNIEXPORT jint JNICALL
>>>>>>>> Java_sun_tools_attach_VirtualMachineImpl_open
>>>>>>>> (JNIEnv *env, jclass cls, jstring path)
>>>>>>>> {
>>>>>>>> jboolean isCopy;
>>>>>>>> const char* p = GetStringPlatformChars(env, path, &isCopy);
>>>>>>>> if (p == NULL) {
>>>>>>>> return 0;
>>>>>>>> } else {
>>>>>>>> int fd;
>>>>>>>> int err = 0;
>>>>>>>>
>>>>>>>> fd = open(p, O_RDWR);
>>>>>>>> if (fd == -1) {
>>>>>>>> err = errno;
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (isCopy) {
>>>>>>>> JNU_ReleaseStringPlatformChars(env, path, p);
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (fd == -1) {
>>>>>>>> if (err == ENOENT) {
>>>>>>>> JNU_ThrowByName(env,
>>>>>>>> "java/io/FileNotFoundException", NULL);
>>>>>>>> } else {
>>>>>>>> char* msg = strdup(strerror(err));
>>>>>>>> JNU_ThrowIOException(env, msg);
>>>>>>>> if (msg != NULL) {
>>>>>>>> free(msg);
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> We should add the path to the exception message.
>>>>>>>>
>>>>>>> Thanks David. So if EPERM is the error and a retry 100ms later
>>>>>>> works, I think that supports my hypothesis that the file is not
>>>>>>> quite fully created. So Gary's fix is probably fine. The only
>>>>>>> other possible fix I can think of that wouldn't require an
>>>>>>> explicit delay (or multiple retries) is probably not worth the
>>>>>>> complexity. It would require that the attachee create two files,
>>>>>>> and the attacher try to open the second file first. When it
>>>>>>> either opens or returns EPERM, you know the first file can
>>>>>>> safety be opened.
>>>>>>>
>>>>>>> Chris
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/2/18 4:11 PM, Chris Plummer wrote:
>>>>>>>>>>> Can you summarize how the attach handshaking is suppose to
>>>>>>>>>>> work? I'm just wondering why the attacher would ever be
>>>>>>>>>>> looking for the file before the attachee has created it. It
>>>>>>>>>>> seems a proper handshake would prevent this. Maybe there's
>>>>>>>>>>> some sort of visibility issue where the attachee has indeed
>>>>>>>>>>> created the file, but it is not immediately visible to the
>>>>>>>>>>> attacher process.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> On 10/2/18 12:27 PM, gary.adams at oracle.com wrote:
>>>>>>>>>>>> The problem reproduced pretty quickly.
>>>>>>>>>>>> I added a call to checkPermission and revealed the
>>>>>>>>>>>> "file not found" from the stat call when the IOException
>>>>>>>>>>>> was detected.
>>>>>>>>>>>>
>>>>>>>>>>>> There has been some flakiness from the Solaris test
>>>>>>>>>>>> machines today,
>>>>>>>>>>>> so I'll continue with the testing a bit longer.
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/2/18 3:12 PM, Chris Plummer wrote:
>>>>>>>>>>>>> Without the fix was this issue easy enough to reproduce
>>>>>>>>>>>>> that you can be sure this is resolving it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/2/18 8:16 AM, Gary Adams wrote:
>>>>>>>>>>>>>> Solaris debug builds are failing tests that use the
>>>>>>>>>>>>>> attach interface.
>>>>>>>>>>>>>> An IOException is reported when the java_pid file is not
>>>>>>>>>>>>>> opened.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It appears that the attempt to attach is taking place too
>>>>>>>>>>>>>> quickly.
>>>>>>>>>>>>>> This workaround will allow the open operation to be retried
>>>>>>>>>>>>>> after a short pause.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8210337/webrev/
>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8210337
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Testing is in progress.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181107/83b4eb72/attachment-0001.html>
More information about the serviceability-dev
mailing list