From Alan.Bateman at oracle.com Thu Jan 2 08:18:01 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 02 Jan 2014 16:18:01 +0000 Subject: 8031113: TEST_BUG: java/nio/channels/AsynchronousChannelGroup/Basic.java fails intermittently Message-ID: <52C59139.3080707@oracle.com> The test uses a relatively short timeout (3 seconds) when checking that a channel group has terminated. It turns out that this timeout is insufficient on some machines so I'd like to bump it up to 20 seconds (which should be more than ample). There is another case in GroupOfOne.java so I've bumped that one up too. Also these tests were originally written when interruptible I/O was enabled on Solaris. This is no longer the case so there isn't any need for the special @run command for these tests. The webrev with the changes is here: http://cr.openjdk.java.net/~alanb/8031113/webrev/ Thanks, Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140102/1387dbf9/attachment.html From Alan.Bateman at oracle.com Fri Jan 3 04:41:58 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 03 Jan 2014 12:41:58 +0000 Subject: 8029018: (bf) Check src/share/native/java/nio/Bits.c for JNI pending excepitons Message-ID: <52C6B016.6010400@oracle.com> This is trivial update to Bits.c to squash a warning of potential JNI usage when there is a pending exception. This is the code that is used when copying between direct buffers of different endianness. The warning is that GetPrimitiveArrayCritical could potentially fail (with a pending exception) although in the case of HotSpot then I don't think this is possible so this change is really just to squash the warning. The webrev with the change is here: http://cr.openjdk.java.net/~alanb/8029018/webrev/ -Alan. From chris.hegarty at oracle.com Fri Jan 3 05:33:37 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Fri, 3 Jan 2014 13:33:37 +0000 Subject: 8029018: (bf) Check src/share/native/java/nio/Bits.c for JNI pending excepitons In-Reply-To: <52C6B016.6010400@oracle.com> References: <52C6B016.6010400@oracle.com> Message-ID: The changes look fine to me. I assume that the hotspot VM will not attempt to allocation any memory and always returns a pointer to the array. So as you say, this is really just to squash the warning, but would it not be better to continue to throw the Error AND then return? Rather than just returning. -Chris. On 3 Jan 2014, at 12:41, Alan Bateman wrote: > > This is trivial update to Bits.c to squash a warning of potential JNI usage when there is a pending exception. This is the code that is used when copying between direct buffers of different endianness. The warning is that GetPrimitiveArrayCritical could potentially fail (with a pending exception) although in the case of HotSpot then I don't think this is possible so this change is really just to squash the warning. > > The webrev with the change is here: > > http://cr.openjdk.java.net/~alanb/8029018/webrev/ > > -Alan. From chris.hegarty at oracle.com Fri Jan 3 05:36:18 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Fri, 3 Jan 2014 13:36:18 +0000 Subject: 8029018: (bf) Check src/share/native/java/nio/Bits.c for JNI pending excepitons In-Reply-To: References: <52C6B016.6010400@oracle.com> Message-ID: <52F740AF-8822-4038-B8AF-A7FB03EEF1FD@oracle.com> On 3 Jan 2014, at 13:33, Chris Hegarty wrote: > The changes look fine to me. > > I assume that the hotspot VM will not attempt to allocation any memory and always returns a pointer to the array. So as you say, this is really just to squash the warning, but would it not be better to continue to throw the Error AND then return? Rather than just returning. D?oh, you are assuming that on other VM?s there will be a pending exception if GetPrimitiveArrayCritical returns NULL. In which case, you could check for a pending exception and set one, if not already set, before returning. Or maybe this is just not worth it. -Chris. > > -Chris. > > On 3 Jan 2014, at 12:41, Alan Bateman wrote: > >> >> This is trivial update to Bits.c to squash a warning of potential JNI usage when there is a pending exception. This is the code that is used when copying between direct buffers of different endianness. The warning is that GetPrimitiveArrayCritical could potentially fail (with a pending exception) although in the case of HotSpot then I don't think this is possible so this change is really just to squash the warning. >> >> The webrev with the change is here: >> >> http://cr.openjdk.java.net/~alanb/8029018/webrev/ >> >> -Alan. > From Alan.Bateman at oracle.com Fri Jan 3 05:51:41 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 03 Jan 2014 13:51:41 +0000 Subject: 8029018: (bf) Check src/share/native/java/nio/Bits.c for JNI pending excepitons In-Reply-To: <52F740AF-8822-4038-B8AF-A7FB03EEF1FD@oracle.com> References: <52C6B016.6010400@oracle.com> <52F740AF-8822-4038-B8AF-A7FB03EEF1FD@oracle.com> Message-ID: <52C6C06D.9040108@oracle.com> On 03/01/2014 13:36, Chris Hegarty wrote: > : > D?oh, you are assuming that on other VM?s there will be a pending exception if GetPrimitiveArrayCritical returns NULL. In which case, you could check for a pending exception and set one, if not already set, before returning. Or maybe this is just not worth it. > Right, if GetPrimitiveArrayCritical were to return NULL (and I don't think it can in the HotSpot implementation) then it would do so with a pending exception. Checking for the exception is a good idea as it would eliminate any doubt around this. That would give us: -#define GETCRITICAL(bytes, env, obj) { \ +#define GETCRITICAL_OR_RETURN(bytes, env, obj) { \ bytes = (*env)->GetPrimitiveArrayCritical(env, obj, NULL); \ - if (bytes == NULL) \ - JNU_ThrowInternalError(env, "Unable to get array"); \ + if (bytes == NULL) { \ + if ((*env)->ExceptionOccurred(env) == NULL) \ + JNU_ThrowInternalError(env, "Unable to get array"); \ + return; \ + } \ } -Alan. From chris.hegarty at oracle.com Fri Jan 3 06:09:45 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Fri, 3 Jan 2014 14:09:45 +0000 Subject: 8029018: (bf) Check src/share/native/java/nio/Bits.c for JNI pending excepitons In-Reply-To: <52C6C06D.9040108@oracle.com> References: <52C6B016.6010400@oracle.com> <52F740AF-8822-4038-B8AF-A7FB03EEF1FD@oracle.com> <52C6C06D.9040108@oracle.com> Message-ID: On 3 Jan 2014, at 13:51, Alan Bateman wrote: > On 03/01/2014 13:36, Chris Hegarty wrote: >> : >> D?oh, you are assuming that on other VM?s there will be a pending exception if GetPrimitiveArrayCritical returns NULL. In which case, you could check for a pending exception and set one, if not already set, before returning. Or maybe this is just not worth it. >> > Right, if GetPrimitiveArrayCritical were to return NULL (and I don't think it can in the HotSpot implementation) then it would do so with a pending exception. Checking for the exception is a good idea as it would eliminate any doubt around this. That would give us: > > -#define GETCRITICAL(bytes, env, obj) { \ > +#define GETCRITICAL_OR_RETURN(bytes, env, obj) { \ > bytes = (*env)->GetPrimitiveArrayCritical(env, obj, NULL); \ > - if (bytes == NULL) \ > - JNU_ThrowInternalError(env, "Unable to get array"); \ > + if (bytes == NULL) { \ > + if ((*env)->ExceptionOccurred(env) == NULL) \ > + JNU_ThrowInternalError(env, "Unable to get array"); \ > + return; \ > + } \ > } Perfect. -Chris. > > -Alan. From chris.hegarty at oracle.com Fri Jan 3 07:47:43 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Fri, 3 Jan 2014 15:47:43 +0000 Subject: 8031113: TEST_BUG: java/nio/channels/AsynchronousChannelGroup/Basic.java fails intermittently In-Reply-To: <52C59139.3080707@oracle.com> References: <52C59139.3080707@oracle.com> Message-ID: <283B0128-D29A-4076-BB46-53CC95AA85FA@oracle.com> The changes look good to me. -Chris. On 2 Jan 2014, at 16:18, Alan Bateman wrote: > > The test uses a relatively short timeout (3 seconds) when checking that a channel group has terminated. It turns out that this timeout is insufficient on some machines so I'd like to bump it up to 20 seconds (which should be more than ample). There is another case in GroupOfOne.java so I've bumped that one up too. Also these tests were originally written when interruptible I/O was enabled on Solaris. This is no longer the case so there isn't any need for the special @run command for these tests. The webrev with the changes is here: > > http://cr.openjdk.java.net/~alanb/8031113/webrev/ > > Thanks, > Alan From jwha at google.com Thu Jan 9 16:43:54 2014 From: jwha at google.com (Jungwoo Ha) Date: Thu, 9 Jan 2014 16:43:54 -0800 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow Message-ID: Hi, I found a performance issues on DaCapo tradesoap benchmark. *Commandline* $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar tradesoap -n 7 76MB is 2 times of minimum heap size requirement on tradesoap, i.e., tradesoap can run on 38MB but not less. Measure the last iteration (steady state performance) *Execution time on the last iteration* 7u25: 17910ms 7u40: 21263ms So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that 7u40 executed far more concurrent-mode-failure. 7u25: 2 Full GC, 60 concurrent-mode-failure 7u40: 9 Full GC, 70 concurrent-mode-failure and this is the cause of slowdown. Looking at the GC log, I noticed that 7u40 uses more memory. 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K), 0.2212080 secs] 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K), 0.2138640 secs] After the Full GC, 7u40 has 2.2MB more live objects. This is always repeatable. So I got the heapdump of live objects and found that the most noticeable difference is the byte[] of *EPollArrayWrapper.eventsLow.* I think this field is added on 7u40 and was occupying 122 instances * 32K = 3.8MB. Here goes my question. 1) How are the # of instances of this type expected grow on large heap size? How does it correlate to the network usage or typical server applications? 2) Is there a way to reduce the memory? Thanks, Jungwoo -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140109/8c584234/attachment.html From vitalyd at gmail.com Thu Jan 9 18:20:37 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 9 Jan 2014 21:20:37 -0500 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: Having 122 instances of the epollarraywrapper seems odd - that's basically 122 selectors monitoring connections. Typically you'd have just one selector and thus one epollarraywrapper. I'm not familiar with tradesoap so don't know what it's doing internally. One could probably slim down epollarraywrapper a bit but I think the reason the eventsLow[] is pre allocated with a large value is probably because it's expected to just have one or a few of them in the process. Sent from my phone On Jan 9, 2014 7:44 PM, "Jungwoo Ha" wrote: > Hi, > > I found a performance issues on DaCapo tradesoap benchmark. > > *Commandline* > $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar tradesoap > -n 7 > > 76MB is 2 times of minimum heap size requirement on tradesoap, i.e., > tradesoap can run on 38MB but not less. > Measure the last iteration (steady state performance) > > *Execution time on the last iteration* > 7u25: 17910ms > 7u40: 21263ms > > So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that > 7u40 executed far more concurrent-mode-failure. > 7u25: 2 Full GC, 60 concurrent-mode-failure > 7u40: 9 Full GC, 70 concurrent-mode-failure > and this is the cause of slowdown. > > Looking at the GC log, I noticed that 7u40 uses more memory. > 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K), > 0.2212080 secs] > 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K), > 0.2138640 secs] > > After the Full GC, 7u40 has 2.2MB more live objects. This is always > repeatable. > > So I got the heapdump of live objects and found that the most noticeable > difference is the byte[] of *EPollArrayWrapper.eventsLow.* > I think this field is added on 7u40 and was occupying 122 instances * 32K > = 3.8MB. > > Here goes my question. > 1) How are the # of instances of this type expected grow on large heap > size? > How does it correlate to the network usage or typical server > applications? > 2) Is there a way to reduce the memory? > > Thanks, > Jungwoo > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140109/3ee7005e/attachment.html From jwha at google.com Thu Jan 9 23:14:20 2014 From: jwha at google.com (Jungwoo Ha) Date: Thu, 9 Jan 2014 23:14:20 -0800 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: tradesoap is a benchmark in Dacapo Suite (http://dacapobench.org) It is one of the popular public benchmark used in both industry and academia. What are the scenarios that EPollArrayWrapper has more than one instance? On Thu, Jan 9, 2014 at 6:20 PM, Vitaly Davidovich wrote: > Having 122 instances of the epollarraywrapper seems odd - that's basically > 122 selectors monitoring connections. Typically you'd have just one > selector and thus one epollarraywrapper. I'm not familiar with tradesoap > so don't know what it's doing internally. > > One could probably slim down epollarraywrapper a bit but I think the > reason the eventsLow[] is pre allocated with a large value is probably > because it's expected to just have one or a few of them in the process. > > Sent from my phone > On Jan 9, 2014 7:44 PM, "Jungwoo Ha" wrote: > >> Hi, >> >> I found a performance issues on DaCapo tradesoap benchmark. >> >> *Commandline* >> $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar >> tradesoap -n 7 >> >> 76MB is 2 times of minimum heap size requirement on tradesoap, i.e., >> tradesoap can run on 38MB but not less. >> Measure the last iteration (steady state performance) >> >> *Execution time on the last iteration* >> 7u25: 17910ms >> 7u40: 21263ms >> >> So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that >> 7u40 executed far more concurrent-mode-failure. >> 7u25: 2 Full GC, 60 concurrent-mode-failure >> 7u40: 9 Full GC, 70 concurrent-mode-failure >> and this is the cause of slowdown. >> >> Looking at the GC log, I noticed that 7u40 uses more memory. >> 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K), >> 0.2212080 secs] >> 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K), >> 0.2138640 secs] >> >> After the Full GC, 7u40 has 2.2MB more live objects. This is always >> repeatable. >> >> So I got the heapdump of live objects and found that the most noticeable >> difference is the byte[] of *EPollArrayWrapper.eventsLow.* >> I think this field is added on 7u40 and was occupying 122 instances * 32K >> = 3.8MB. >> >> Here goes my question. >> 1) How are the # of instances of this type expected grow on large heap >> size? >> How does it correlate to the network usage or typical server >> applications? >> 2) Is there a way to reduce the memory? >> >> Thanks, >> Jungwoo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140109/9355606b/attachment.html From Alan.Bateman at oracle.com Fri Jan 10 01:07:14 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 10 Jan 2014 09:07:14 +0000 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: <52CFB842.3050705@oracle.com> On 10/01/2014 00:43, Jungwoo Ha wrote: > Hi, > > I found a performance issues on DaCapo tradesoap benchmark. Thanks for mail and the analysis. As Vitaly mentioned, the number of Selectors is typically small. However I will add that sometimes you can encounter cases where there are a lot of "temporary Selectors", basically short lived and only used to select on a one channel. One of the main causes of temporary Selectors used to be the "socket adapters", basically the Socket you get when you invoke a SocketChannel's socket method. In JDK 7 and older then doing a timed connect or timed read with one of these Sockets involves the use of a temporary Selector. Temporary Selectors are cached (and so are long lived) so I wouldn't expect to see them allocated very often, unless of course tradesoap or some library that is used is doing this. Do you have the environment/motivation to repeat the exercise with JDK 8 to see if there is a different? JDK 8 eliminates the use of temporary Selectors and so might quickly answer the question as to whether the issue is temporary Selectors or not. As regards the size of eventsLow (from your analysis) then there is an undocumented/unsupported system property that you could use in experiments. The property is sun.nio.ch.maxUpdateArraySize and it defaults to the file descriptor limit or 64k (whichever is smaller). From your mail then I will guess that you might have the hard limit set to unlimited (or maybe 64k). So you would run with -Dsun.nio.ch.maxUpdateArraySize=8192 for example, just don't set it to a value larger than the hard limit. -Alan. From martinrb at google.com Fri Jan 10 08:37:10 2014 From: martinrb at google.com (Martin Buchholz) Date: Fri, 10 Jan 2014 08:37:10 -0800 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: I took a look at EPollArrayWrapper, it's basically implementing a map int -> byte by combining a byte array for "small" integers and a HashMap for large ones. The 64k byte array does look like it may be spending too much memory for the performance gain - typical java memory bloat. In the common case file descriptors will be "small". One simple approach to economizing in the common case is to initialize the byte array eventsLow to a much smaller size, and grow it if a sufficiently large file descriptor is encountered. In fact, looking closer, you already have a data structure here that works that way - BitSet registered is a map int -> boolean that grows only up to the max registered fd. The jdk doesn't have a ByteSet, but it seems that's what we want here. It's not too painful to roll our own. A lock is already held whenever accessing any of the internal data here. Minor things to fix in EPollArrayWrapper: // maximum size of updatesLow comment is wrong: s/updatesLow/eventsLow/ -- short events = getUpdateEvents(fd); Using short here is really WEIRD. Either leave it as a byte or promote to int. --- private static final byte KILLED = (byte)-1; Remove stray SPACE. On Thu, Jan 9, 2014 at 6:20 PM, Vitaly Davidovich wrote: > Having 122 instances of the epollarraywrapper seems odd - that's basically > 122 selectors monitoring connections. Typically you'd have just one > selector and thus one epollarraywrapper. I'm not familiar with tradesoap > so don't know what it's doing internally. > > One could probably slim down epollarraywrapper a bit but I think the > reason the eventsLow[] is pre allocated with a large value is probably > because it's expected to just have one or a few of them in the process. > > Sent from my phone > On Jan 9, 2014 7:44 PM, "Jungwoo Ha" wrote: > >> Hi, >> >> I found a performance issues on DaCapo tradesoap benchmark. >> >> *Commandline* >> $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar >> tradesoap -n 7 >> >> 76MB is 2 times of minimum heap size requirement on tradesoap, i.e., >> tradesoap can run on 38MB but not less. >> Measure the last iteration (steady state performance) >> >> *Execution time on the last iteration* >> 7u25: 17910ms >> 7u40: 21263ms >> >> So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that >> 7u40 executed far more concurrent-mode-failure. >> 7u25: 2 Full GC, 60 concurrent-mode-failure >> 7u40: 9 Full GC, 70 concurrent-mode-failure >> and this is the cause of slowdown. >> >> Looking at the GC log, I noticed that 7u40 uses more memory. >> 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K), >> 0.2212080 secs] >> 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K), >> 0.2138640 secs] >> >> After the Full GC, 7u40 has 2.2MB more live objects. This is always >> repeatable. >> >> So I got the heapdump of live objects and found that the most noticeable >> difference is the byte[] of *EPollArrayWrapper.eventsLow.* >> I think this field is added on 7u40 and was occupying 122 instances * 32K >> = 3.8MB. >> >> Here goes my question. >> 1) How are the # of instances of this type expected grow on large heap >> size? >> How does it correlate to the network usage or typical server >> applications? >> 2) Is there a way to reduce the memory? >> >> Thanks, >> Jungwoo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/2f62a38f/attachment-0001.html From vitalyd at gmail.com Fri Jan 10 09:04:10 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 10 Jan 2014 12:04:10 -0500 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: You can also just do byte[][] chunks up to 64k total, allocating new ones only when necessary. Empty slots are word size waste so very minimal. However, I'm guessing this class isn't optimized for space because you're not supposed to have many of these, never mind >100 as in tradesoap. Sent from my phone On Jan 10, 2014 11:37 AM, "Martin Buchholz" wrote: > I took a look at EPollArrayWrapper, it's basically implementing a map int > -> byte by combining a byte array for "small" integers and a HashMap for > large ones. The 64k byte array does look like it may be spending too much > memory for the performance gain - typical java memory bloat. In the > common case file descriptors will be "small". > > One simple approach to economizing in the common case is to initialize > the byte array eventsLow to a much smaller size, and grow it if a > sufficiently large file descriptor is encountered. In fact, looking > closer, you already have a data structure here that works that way - BitSet > registered is a map int -> boolean that grows only up to the max registered > fd. The jdk doesn't have a ByteSet, but it seems that's what we want here. > It's not too painful to roll our own. A lock is already held whenever > accessing any of the internal data here. > > Minor things to fix in EPollArrayWrapper: > > // maximum size of updatesLow > > comment is wrong: s/updatesLow/eventsLow/ > > -- > > short events = getUpdateEvents(fd); > > Using short here is really WEIRD. Either leave it as a byte or promote to > int. > > --- > > private static final byte KILLED = (byte)-1; > > Remove stray SPACE. > > > > > On Thu, Jan 9, 2014 at 6:20 PM, Vitaly Davidovich wrote: > >> Having 122 instances of the epollarraywrapper seems odd - that's >> basically 122 selectors monitoring connections. Typically you'd have just >> one selector and thus one epollarraywrapper. I'm not familiar with >> tradesoap so don't know what it's doing internally. >> >> One could probably slim down epollarraywrapper a bit but I think the >> reason the eventsLow[] is pre allocated with a large value is probably >> because it's expected to just have one or a few of them in the process. >> >> Sent from my phone >> On Jan 9, 2014 7:44 PM, "Jungwoo Ha" wrote: >> >>> Hi, >>> >>> I found a performance issues on DaCapo tradesoap benchmark. >>> >>> *Commandline* >>> $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar >>> tradesoap -n 7 >>> >>> 76MB is 2 times of minimum heap size requirement on tradesoap, i.e., >>> tradesoap can run on 38MB but not less. >>> Measure the last iteration (steady state performance) >>> >>> *Execution time on the last iteration* >>> 7u25: 17910ms >>> 7u40: 21263ms >>> >>> So I compared the GC behavior using -XX:+PrintGCDetails, and noticed >>> that 7u40 executed far more concurrent-mode-failure. >>> 7u25: 2 Full GC, 60 concurrent-mode-failure >>> 7u40: 9 Full GC, 70 concurrent-mode-failure >>> and this is the cause of slowdown. >>> >>> Looking at the GC log, I noticed that 7u40 uses more memory. >>> 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K), >>> 0.2212080 secs] >>> 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K), >>> 0.2138640 secs] >>> >>> After the Full GC, 7u40 has 2.2MB more live objects. This is always >>> repeatable. >>> >>> So I got the heapdump of live objects and found that the most noticeable >>> difference is the byte[] of *EPollArrayWrapper.eventsLow.* >>> I think this field is added on 7u40 and was occupying 122 instances * >>> 32K = 3.8MB. >>> >>> Here goes my question. >>> 1) How are the # of instances of this type expected grow on large heap >>> size? >>> How does it correlate to the network usage or typical server >>> applications? >>> 2) Is there a way to reduce the memory? >>> >>> Thanks, >>> Jungwoo >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/513a8444/attachment.html From srikalyan.chandrashekar at oracle.com Fri Jan 10 11:11:58 2014 From: srikalyan.chandrashekar at oracle.com (srikalyan) Date: Fri, 10 Jan 2014 11:11:58 -0800 Subject: RFR for JDK-6963118 Intermittent test failure: test/java/nio/channels/Selector/Wakeup.java fail intermittently (win) In-Reply-To: <52C1BFBA.1050808@oracle.com> References: <52B0ABBE.8050903@oracle.com> <52B33555.2090002@oracle.com> <52B3461C.1080903@oracle.com> <52B4711A.90701@oracle.com> <52C18204.3010402@oracle.com> <52C1BFBA.1050808@oracle.com> Message-ID: <52D045FE.7060106@oracle.com> Hi Alan, please find the new webrev at http://cr.openjdk.java.net/~srikchan/Regression/6963118-Wakeup-webrev-V2/ , this includes the refactoring you did + timing adjustments. Ideally with the fix the test should never fail which holds true on all platforms except windows(though failures reduced to 2/1000 as opposed to 10/1000(per our experiments)). Possible root cause: There are atleast 2 places in the windows specific Selector implementation(WindowsSelectorImpl) which has wakeup() calls, this is not the case in Solaris/Linux implementations(DevPollSelectorImpl and EPollSeclectorImpl). It would be helpful if some from NIO team can explain when the wakeup() in these 2 places are invoked which i suspect could cause spurious wakeup from select(during double wakeup phase of test/java/nio/channels/Selector/Wakeup.java on windows). Please see the attachment in JDK-6963118 . --- Thanks kalyan On 12/30/2013 10:47 AM, srikalyan wrote: > Hi Alan, sorry for the delay. I have been trying to incorporate sleep > and make the code succeed but getting intermittent failures. The > reason is it is extremely difficult to synchronize the Sleeper and > Checker(main) threads as there is not way the Sleeper could > communicate to Checker that it is successfully blocked on the select() > call. At most we can mitigate by bring both threads synchronize at a > certain point and then make the Checker wait for some long time to > allow the Sleeper to march into and wait at select() call.I will do > some repeated runs of your version as well(meanwhile trying other > ways) and let you know before we flag ok on this. > > -- > Thanks > kalyan > Ph: (408)-585-8040 > > > On 12/30/13, 6:24 AM, Alan Bateman wrote: >> >> I didn't see any more on this one but I think it would be good to get >> it fixed and removed from the exclude list. Here's a slightly >> modified fix that might be a bit clearer: >> >> http://cr.openjdk.java.net/~alanb/6963118/webrev/ >> >> If we can agree this then I'll get it into jdk9/dev. >> >> -Alan. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/bf8e5e2e/attachment.html From Alan.Bateman at oracle.com Fri Jan 10 11:43:24 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 10 Jan 2014 19:43:24 +0000 Subject: RFR for JDK-6963118 Intermittent test failure: test/java/nio/channels/Selector/Wakeup.java fail intermittently (win) In-Reply-To: <52D045FE.7060106@oracle.com> References: <52B0ABBE.8050903@oracle.com> <52B33555.2090002@oracle.com> <52B3461C.1080903@oracle.com> <52B4711A.90701@oracle.com> <52C18204.3010402@oracle.com> <52C1BFBA.1050808@oracle.com> <52D045FE.7060106@oracle.com> Message-ID: <52D04D5C.20404@oracle.com> On 10/01/2014 19:11, srikalyan wrote: > Hi Alan, please find the new webrev at > http://cr.openjdk.java.net/~srikchan/Regression/6963118-Wakeup-webrev-V2/ > , this includes the refactoring you did + timing adjustments. Ideally > with the fix the test should never fail which holds true on all > platforms except windows(though failures reduced to 2/1000 as opposed > to 10/1000(per our experiments)). Thanks for checking. So if it's still failure 1/500 times hen we'll need to leave it excluded on Windows until we've solved the underlying issue. > > Possible root cause: > There are atleast 2 places in the windows specific Selector > implementation(WindowsSelectorImpl) which has wakeup() calls, this is > not the case in Solaris/Linux implementations(DevPollSelectorImpl and > EPollSeclectorImpl). It would be helpful if some from NIO team can > explain when the wakeup() in these 2 places are invoked which i > suspect could cause spurious wakeup from select(during double wakeup > phase of test/java/nio/channels/Selector/Wakeup.java on windows). Okay, I will try to look at this soon. -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/49b5dff6/attachment.html From Alan.Bateman at oracle.com Fri Jan 10 13:09:32 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 10 Jan 2014 21:09:32 +0000 Subject: Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow In-Reply-To: References: Message-ID: <52D0618C.4080002@oracle.com> On 10/01/2014 16:37, Martin Buchholz wrote: > I took a look at EPollArrayWrapper, it's basically implementing a map > int -> byte by combining a byte array for "small" integers and a > HashMap for large ones. The 64k byte array does look like it may be > spending too much memory for the performance gain - typical java > memory bloat. In the common case file descriptors will be "small". These are the data structures for queuing up the updates to epoll. The original assumption that was if the file descriptor limit is small then the updates could be directly addressed in eventsLow. When the file descriptor limit is very high or unlimited then anything above the threshold goes into the fd->event map. Looking at it now then maybe that the threshold is too high (meaning the maximum size of eventsLow is too big) but at least it is configurable and we'll see if the original poster comes back with any results from tuning that. As you noted, this is self-contained so it can easily be changed if we need it (but typically the number of Selectors is small so hasn't been an issue). -Alan From Alan.Bateman at oracle.com Fri Jan 17 03:44:48 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 11:44:48 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: Message-ID: <52D917B0.9090407@oracle.com> On 17/01/2014 11:25, Volker Simonis wrote: > : > > I've compiled and smoke-tested the changes on Linux/x86_64/PPC64, > Windows/x86_64, Solaris/SPARC, MacOS X and AIX. On all these platforms > they pass all the java/nio JTREG tests in the same way like without > this change. This means that on Linux/MacOS they pass all 261/256 > tests, on Windows, they pass 258 tests while the following two tests > fail: > > java/nio/channels/DatagramChannel/MulticastSendReceiveTests.java > java/nio/channels/DatagramChannel/Promiscuous.java > > But as I wrote before, these two test also fail without my changes > applied, so I'm confident that the failures aren't related to this > change. Any chance that this is firewall configuration or VPN that might be cause packets to be dropped? These tests should otherwise pass consistently on all platforms. If you have output from Linux or Solaris or OS X that you could send then it might help to diagnose this. -Alan From volker.simonis at gmail.com Fri Jan 17 05:45:26 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 14:45:26 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: <52D917B0.9090407@oracle.com> References: <52D917B0.9090407@oracle.com> Message-ID: On Fri, Jan 17, 2014 at 12:44 PM, Alan Bateman wrote: > On 17/01/2014 11:25, Volker Simonis wrote: >> >> : >> >> I've compiled and smoke-tested the changes on Linux/x86_64/PPC64, >> Windows/x86_64, Solaris/SPARC, MacOS X and AIX. On all these platforms >> they pass all the java/nio JTREG tests in the same way like without >> this change. This means that on Linux/MacOS they pass all 261/256 >> tests, on Windows, they pass 258 tests while the following two tests >> fail: >> >> java/nio/channels/DatagramChannel/MulticastSendReceiveTests.java >> java/nio/channels/DatagramChannel/Promiscuous.java >> >> But as I wrote before, these two test also fail without my changes >> applied, so I'm confident that the failures aren't related to this >> change. > > Any chance that this is firewall configuration or VPN that might be cause > packets to be dropped? These tests should otherwise pass consistently on all > platforms. If you have output from Linux or Solaris or OS X that you could > send then it might help to diagnose this. > Yes, you're right - it was because of a "VirtualBox Host-Only Network" network device which seems to fool the test. After I disabled it, all tests passed successfully! And what about the change itself :) > -Alan From Alan.Bateman at oracle.com Fri Jan 17 07:16:54 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 15:16:54 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: <52D917B0.9090407@oracle.com> Message-ID: <52D94966.8090407@oracle.com> On 17/01/2014 13:45, Volker Simonis wrote: > : > Yes, you're right - it was because of a "VirtualBox Host-Only Network" > network device which seems to fool the test. After I disabled it, all > tests passed successfully! > > And what about the change itself :) > The change itself looks mostly okay. For naming then I think I would have a slight preference for something like pollinValue to getNatvePollin so that it somewhat consistent with the other places where we do this (like in epoll code with eventSize, eventsOffset, ...). Naming is subjective of course so this isn't a big issue. I suspect you can drop POLLREMOVE, that is only used in the /dev/poll Selector and it has its own definition (and shouldn't be compiled on anything other than Solaris anyway). A minor comment on DatagramChannelImpl, SourceChannelImpl and a few more where the replacing PollArrayWrapper.POLL* with Net.POLL* means it is no longer necessary to split lines (just might be neater to bring these cases back on the one line again). I suspect we will be able to drop the changes to the Windows nio_util.h soon as these older versions of Windows are not long for this world. I assume that by taking on newer VC++ that it won't even be possible to build or run either. -Alan. From volker.simonis at gmail.com Fri Jan 17 09:42:26 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 17 Jan 2014 18:42:26 +0100 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: <52D94966.8090407@oracle.com> References: <52D917B0.9090407@oracle.com> <52D94966.8090407@oracle.com> Message-ID: On Fri, Jan 17, 2014 at 4:16 PM, Alan Bateman wrote: > On 17/01/2014 13:45, Volker Simonis wrote: >> >> : >> Yes, you're right - it was because of a "VirtualBox Host-Only Network" >> network device which seems to fool the test. After I disabled it, all >> tests passed successfully! >> >> And what about the change itself :) >> > The change itself looks mostly okay. > > For naming then I think I would have a slight preference for something like > pollinValue to getNatvePollin so that it somewhat consistent with the other > places where we do this (like in epoll code with eventSize, eventsOffset, > ...). Naming is subjective of course so this isn't a big issue. > Done. > I suspect you can drop POLLREMOVE, that is only used in the /dev/poll > Selector and it has its own definition (and shouldn't be compiled on > anything other than Solaris anyway). > Removed POLLREMOVE from sun.nio.ch.Net. > A minor comment on DatagramChannelImpl, SourceChannelImpl and a few more > where the replacing PollArrayWrapper.POLL* with Net.POLL* means it is no > longer necessary to split lines (just might be neater to bring these cases > back on the one line again). > Done. > I suspect we will be able to drop the changes to the Windows nio_util.h soon > as these older versions of Windows are not long for this world. I assume > that by taking on newer VC++ that it won't even be possible to build or run > either. > We still support Server 2003 :( > -Alan. Here's the new webrev: http://cr.openjdk.java.net/~simonis/webrevs/8031997_2/ Built and tested like before. Everything OK. Is this now ready for push into ppc-aix-port/stage-9? Regards, Volker PS: I've added you as reviewer, but unfortunately after I created the webrev. From Alan.Bateman at oracle.com Fri Jan 17 13:21:26 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 17 Jan 2014 21:21:26 +0000 Subject: RFR(M): 8031997: PPC64: Make the various POLL constants system dependant In-Reply-To: References: <52D917B0.9090407@oracle.com> <52D94966.8090407@oracle.com> Message-ID: <52D99ED6.1070806@oracle.com> On 17/01/2014 17:42, Volker Simonis wrote: > : > Here's the new webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/8031997_2/ > > Built and tested like before. Everything OK. > > Is this now ready for push into ppc-aix-port/stage-9? > Thanks the updates, this looks good to me. -Alan. From Alan.Bateman at oracle.com Mon Jan 20 04:03:45 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 12:03:45 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X Message-ID: <52DD10A1.3040202@oracle.com> One of the outstanding issues from the OS X port is that the async close of a FileChannel where there are threads blocked doing I/O operation does not work, instead close hang (potentially indefinitely, say when another thread is blocked waiting for a file lock to be acquired or where the file is something like a pipe or other type of file where you can block indefinitely). From what I can tell, it wasn't implemented in Apple's JDK6 either. In order to fix this on OS X then close needs to signal all threads that are blocked in I/O operations, something we already do on Linux. The other part is removing the preClose (the dup2) from the closing of FileChannels as it is not needed when you can signal. The webrev with the proposed changes is here: http://cr.openjdk.java.net/~alanb/7133499/webrev/ Fixing this issue means that two tests can be removed from the exclude list (there is third test removed from the exclude list too, that shouldn't have been there). -Alan. From Alan.Bateman at oracle.com Mon Jan 20 11:36:23 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 19:36:23 +0000 Subject: 8032220: Files.createDirectories throws exception with confusing message for root directories that exist Message-ID: <52DD7AB7.4080205@oracle.com> This one came up on nio-discuss a few days ago. The issue is that Files.createDirectories fails on some platforms when called with a path to a root directory that exists. The exception (which is misleading) happens on OS X and Windows (Linux and Solaris are okay). On OS X then the issue is that mkdir fails with EISDIR when you attempt to use it to create /. On Windows it's because the win32 CreateDirectory fails with an ERROR_ACCESS_DENIED for this case. I've updated both implementations to better handle this case so that createDirectory throws the expected FileAlreadyExistsException. I've also tweaked the exception throw by createDirectories so it's less confusing for the case that the root directory is not accessible. The webrev with the changes is here: http://cr.openjdk.java.net/~alanb/8032220/webrev/ -Alan. From volker.simonis at gmail.com Mon Jan 20 11:57:03 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Jan 2014 20:57:03 +0100 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: <52DD10A1.3040202@oracle.com> References: <52DD10A1.3040202@oracle.com> Message-ID: Hi Alan, I've tried your patch with our port on AIX. The good news is that it fixes: java/nio/channels/AsynchronousFileChannel/Lock.java on AIX as well. The bad news is, that it doesn't seem to help for: java/nio/channels/AsyncCloseAndInterrupt.java Here's a stack trace of where the VM gets stuck: TestThread-FileChannel/transferTo/interrupt" #12 daemon prio=5 os_prio=57 tid=0x0000000119c07800 nid=0x1310131 runnable [0x000000011 a1e3000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:167) - locked <0x0a000100255c9330> (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:468) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:564) at AsyncCloseAndInterrupt$18.doIO(AsyncCloseAndInterrupt.java:391) at AsyncCloseAndInterrupt$Tester.go(AsyncCloseAndInterrupt.java:485) at TestThread.run(TestThread.java:55) "MainThread" #9 prio=5 os_prio=57 tid=0x0000000119289800 nid=0x2d50115 runnable [0x0000000119497000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.preClose0(Native Method) at sun.nio.ch.FileDispatcherImpl.preClose(FileDispatcherImpl.java:102) at sun.nio.ch.SinkChannelImpl.implCloseSelectableChannel(SinkChannelImpl.java:88) - locked <0x0a000100255c9340> (a java.lang.Object) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) at java.nio.channels.spi.AbstractInterruptibleChannel$1.interrupt(AbstractInterruptibleChannel.java:165) - locked <0x0a000100255c9300> (a java.lang.Object) at java.lang.Thread.interrupt(Thread.java:918) - locked <0x0a000100255ceca0> (a java.lang.Object) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:573) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:609) at AsyncCloseAndInterrupt.main(AsyncCloseAndInterrupt.java:680) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:744) As you can see, it hangs in preclose, because it is also blocked in write. I think here is where calling the interruptible write in my initial change helped. But now, after I saw your solution here for 7133499 I wonder if the same technique you applied in FileChannelImpl.implCloseChannel() wouldn't work here as well. However if I naively call NativeThread.signal(th) before calling nd.preClose(fd) in SinkChannelImpl.implCloseSelectableChannel() this improves the situation only slightly, because the VM now hangs in a read(): "TestThread-FileChannel/transferFrom/interrupt" #12 daemon prio=5 os_prio=57 tid=0x00000001199a0800 nid=0x429000f runnable [0x0000000119f02000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SourceChannelImpl.read(SourceChannelImpl.java:167) - locked <0x0a0001003ab1ee68> (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:625) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:663) at AsyncCloseAndInterrupt$19.doIO(AsyncCloseAndInterrupt.java:401) at AsyncCloseAndInterrupt$Tester.go(AsyncCloseAndInterrupt.java:485) at TestThread.run(TestThread.java:55) "main" #1 prio=5 os_prio=57 tid=0x000000011022c800 nid=0x50e0101 runnable [0x000000011021d000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.preClose0(Native Method) at sun.nio.ch.FileDispatcherImpl.preClose(FileDispatcherImpl.java:102) at sun.nio.ch.SourceChannelImpl.implCloseSelectableChannel(SourceChannelImpl.java:88) - locked <0x0a0001003ab1ee78> (a java.lang.Object) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) at java.nio.channels.spi.AbstractInterruptibleChannel$1.interrupt(AbstractInterruptibleChannel.java:165) - locked <0x0a0001003ab1ee38> (a java.lang.Object) at java.lang.Thread.interrupt(Thread.java:918) - locked <0x0a0001003ab1f5b8> (a java.lang.Object) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:573) at AsyncCloseAndInterrupt.test(AsyncCloseAndInterrupt.java:609) at AsyncCloseAndInterrupt.main(AsyncCloseAndInterrupt.java:681) So I wonder if we would have to wrap all Java-calls to close()/preClose() with NativeThread.signal() and all calls to IO-functions like read/write/fcntl with NativeThreadSet.add()/remove()? Maybe then its easier doing it in the native interface (i.e. the NET_ wrappers) to just mimic the "usual" behaviour on AIX? Regards, Volker On Mon, Jan 20, 2014 at 1:03 PM, Alan Bateman wrote: > > One of the outstanding issues from the OS X port is that the async close of > a FileChannel where there are threads blocked doing I/O operation does not > work, instead close hang (potentially indefinitely, say when another thread > is blocked waiting for a file lock to be acquired or where the file is > something like a pipe or other type of file where you can block > indefinitely). From what I can tell, it wasn't implemented in Apple's JDK6 > either. > > In order to fix this on OS X then close needs to signal all threads that are > blocked in I/O operations, something we already do on Linux. The other part > is removing the preClose (the dup2) from the closing of FileChannels as it > is not needed when you can signal. The webrev with the proposed changes is > here: > > http://cr.openjdk.java.net/~alanb/7133499/webrev/ > > Fixing this issue means that two tests can be removed from the exclude list > (there is third test removed from the exclude list too, that shouldn't have > been there). > > -Alan. From chris.hegarty at oracle.com Mon Jan 20 12:54:45 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Mon, 20 Jan 2014 20:54:45 +0000 Subject: 8032220: Files.createDirectories throws exception with confusing message for root directories that exist In-Reply-To: <52DD7AB7.4080205@oracle.com> References: <52DD7AB7.4080205@oracle.com> Message-ID: <40C56DE5-F491-4426-95F6-EBB0E51AB07A@oracle.com> Looks ok to me Alan. -Chris. On 20 Jan 2014, at 19:36, Alan Bateman wrote: > > This one came up on nio-discuss a few days ago. > > The issue is that Files.createDirectories fails on some platforms when called with a path to a root directory that exists. The exception (which is misleading) happens on OS X and Windows (Linux and Solaris are okay). > > On OS X then the issue is that mkdir fails with EISDIR when you attempt to use it to create /. On Windows it's because the win32 CreateDirectory fails with an ERROR_ACCESS_DENIED for this case. I've updated both implementations to better handle this case so that createDirectory throws the expected FileAlreadyExistsException. I've also tweaked the exception throw by createDirectories so it's less confusing for the case that the root directory is not accessible. The webrev with the changes is here: > > http://cr.openjdk.java.net/~alanb/8032220/webrev/ > > -Alan. From Alan.Bateman at oracle.com Mon Jan 20 13:34:18 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 20 Jan 2014 21:34:18 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: References: <52DD10A1.3040202@oracle.com> Message-ID: <52DD965A.1030505@oracle.com> On 20/01/2014 19:57, Volker Simonis wrote: > Hi Alan, > > I've tried your patch with our port on AIX. > The good news is that it fixes: > > java/nio/channels/AsynchronousFileChannel/Lock.java > > on AIX as well. > > The bad news is, that it doesn't seem to help for: > > java/nio/channels/AsyncCloseAndInterrupt.java > > Here's a stack trace of where the VM gets stuck: In these stack traces then the channels are Pipe.SourceChannel or Pipe.SinkChannel where the file descriptor is to one end of a pipe. Are these the only cases where you see these hangs? I'm interested to know if the async close of SocketChannel and ServerSocketChannel when configured blocked also hangs (I will guess that it will as the behavior is likely to be the same as pipe). I have an idea on how to fix this so that the preClose isn't used when the channel is configured blocking (or isn't registered with a Selector). The patch doesn't use NativeThreadSet because it isn't efficient when the number of threads is limited to 1 or 2 (FileChannel uses NativeThreadSet because it defines positional read/write and so the number of concurrent reader/writers is unlimited). I'll send a patch the other channels soon and we can see if this works for you. If it does work then I have a bit of a preference to being it in via jdk9/dev rather than via the AIX staging forest because the changes impact all platforms. That said, if we being this FileChannel fix for OSX in then it means that the platform specific changes would be in, the patch for the other platforms shouldn't require porting. -Alan. From chris.hegarty at oracle.com Tue Jan 21 07:49:55 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Tue, 21 Jan 2014 15:49:55 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: <52DD10A1.3040202@oracle.com> References: <52DD10A1.3040202@oracle.com> Message-ID: The changes in the webrev look ok to me Alan, we?ll now be using signals on all ?unix? platforms. -Chris. On 20 Jan 2014, at 12:03, Alan Bateman wrote: > > One of the outstanding issues from the OS X port is that the async close of a FileChannel where there are threads blocked doing I/O operation does not work, instead close hang (potentially indefinitely, say when another thread is blocked waiting for a file lock to be acquired or where the file is something like a pipe or other type of file where you can block indefinitely). From what I can tell, it wasn't implemented in Apple's JDK6 either. > > In order to fix this on OS X then close needs to signal all threads that are blocked in I/O operations, something we already do on Linux. The other part is removing the preClose (the dup2) from the closing of FileChannels as it is not needed when you can signal. The webrev with the proposed changes is here: > > http://cr.openjdk.java.net/~alanb/7133499/webrev/ > > Fixing this issue means that two tests can be removed from the exclude list (there is third test removed from the exclude list too, that shouldn't have been there). > > -Alan. From volker.simonis at gmail.com Tue Jan 21 09:49:07 2014 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 21 Jan 2014 18:49:07 +0100 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: <52DD965A.1030505@oracle.com> References: <52DD10A1.3040202@oracle.com> <52DD965A.1030505@oracle.com> Message-ID: On Mon, Jan 20, 2014 at 10:34 PM, Alan Bateman wrote: > On 20/01/2014 19:57, Volker Simonis wrote: >> >> Hi Alan, >> >> I've tried your patch with our port on AIX. >> The good news is that it fixes: >> >> java/nio/channels/AsynchronousFileChannel/Lock.java >> >> on AIX as well. >> >> The bad news is, that it doesn't seem to help for: >> >> java/nio/channels/AsyncCloseAndInterrupt.java >> >> Here's a stack trace of where the VM gets stuck: > > In these stack traces then the channels are Pipe.SourceChannel or > Pipe.SinkChannel where the file descriptor is to one end of a pipe. Are > these the only cases where you see these hangs? I'm interested to know if > the async close of SocketChannel and ServerSocketChannel when configured > blocked also hangs (I will guess that it will as the behavior is likely to > be the same as pipe). I also think they will hang, but I'm not sure how to test it. The java/nio/channels/AsynchronousServerSocketChannel and java/nio/channels/AsynchronousSocketChannel all pass, but I'm not sure if they test the same thing. In the NIO are, I currently (with your change) have problems with the following tests: java/nio/channels/AsyncCloseAndInterrupt.java (hangs) java/nio/channels/AsynchronousChannelGroup/Basic.java (hangs somtimes) java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java (hangs) java/nio/channels/AsynchronousChannelGroup/Unbounded.java (hangs somtimes) java/nio/channels/Selector/RacyDeregister.java (fails) However, java/nio/channels/AsynchronousChannelGroup/Unbounded.java hangs in AixPollPort.pollset_poll() (which is our implementation of AsynchronousChannelGroup) so that may be a completely different problem. I'm currently try to debug it. > I have an idea on how to fix this so that the preClose isn't used when the > channel is configured blocking (or isn't registered with a Selector). The > patch doesn't use NativeThreadSet because it isn't efficient when the number > of threads is limited to 1 or 2 (FileChannel uses NativeThreadSet because it > defines positional read/write and so the number of concurrent reader/writers > is unlimited). > Yes, I'm definitely interested to see and test your patch on AIX. > I'll send a patch the other channels soon and we can see if this works for > you. If it does work then I have a bit of a preference to being it in via > jdk9/dev rather than via the AIX staging forest because the changes impact > all platforms. Yes, that's no problem. I think the class library for AIX will be fine and ready for integration into jdk9/dev without these changes. We can fix that later and backport it to 8u-dev as required. > That said, if we being this FileChannel fix for OSX in then > it means that the platform specific changes would be in, the patch for the > other platforms shouldn't require porting. > > -Alan. > From Alan.Bateman at oracle.com Tue Jan 21 12:41:03 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 21 Jan 2014 20:41:03 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: References: <52DD10A1.3040202@oracle.com> <52DD965A.1030505@oracle.com> Message-ID: <52DEDB5F.6050301@oracle.com> On 21/01/2014 17:49, Volker Simonis wrote: > : > I also think they will hang, but I'm not sure how to test it. The > java/nio/channels/AsynchronousServerSocketChannel and > java/nio/channels/AsynchronousSocketChannel all pass, but I'm not sure > if they test the same thing. > > In the NIO are, I currently (with your change) have problems with the > following tests: > > java/nio/channels/AsyncCloseAndInterrupt.java (hangs) > java/nio/channels/AsynchronousChannelGroup/Basic.java (hangs somtimes) > java/nio/channels/AsynchronousChannelGroup/GroupOfOne.java (hangs) > java/nio/channels/AsynchronousChannelGroup/Unbounded.java (hangs somtimes) > java/nio/channels/Selector/RacyDeregister.java (fails) > > However, java/nio/channels/AsynchronousChannelGroup/Unbounded.java > hangs in AixPollPort.pollset_poll() (which is our implementation of > AsynchronousChannelGroup) so that may be a completely different > problem. I'm currently try to debug it. The SelectableChannel and AsynchronousChannel implementations are very different. In the SelectableChannel implementations then closing is complicated due to the possibility of threads being blocked in I/O operations. From the mails then it is clear that AIX hangs in dup2 but an alternative approach to initially signal the blocked threads should work there. One of the reasons for not agreeing to the calling into the NET_* function is that it results in double accounting, the selectable channels already track it. I'll send a patch soon to try and we can see about resolving this once the changes are in jdk9/dev. On testing it then AsyncCloseAndInterrupt will close and interrupt on each of the channels so it is a useful test. I don't know what to say about the AsynchronousChannelGroup tests that are hanging, I think I'd need to see the full stack trace. Closing of these channels is cooperative as there isn't any blocking so it's much simpler. So is portset_poll your implementation of Port.startPoll? > : > Yes, that's no problem. I think the class library for AIX will be fine > and ready for integration into jdk9/dev without these changes. We can > fix that later and backport it to 8u-dev as required. I think this makes sense. -Alan. From Alan.Bateman at oracle.com Tue Jan 21 12:49:50 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 21 Jan 2014 20:49:50 +0000 Subject: 7133499: (fc) FileChannel.read not preempted by asynchronous close on OS X In-Reply-To: References: <52DD10A1.3040202@oracle.com> Message-ID: <52DEDD6E.3040602@oracle.com> On 21/01/2014 15:49, Chris Hegarty wrote: > The changes in the webrev look ok to me Alan, we?ll now be using signals on all ?unix? platforms. > > -Chris. Yes, as we have to do it on Linux and OS X anyway. -Alan. From Alan.Bateman at oracle.com Wed Jan 22 06:28:36 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 22 Jan 2014 14:28:36 +0000 Subject: 8032451: (dc) DatagramChannel.join should support include-mode filtering on OS X Message-ID: <52DFD594.3090006@oracle.com> This is another OS X specific issue (again going back original port where there were a number of things left out). DatagramChannel has API support for source-specific multicasting but it has always been disabled on OS X (this is allowed by the spec). Part of the issue seems to be the definition of ip_mreq_source that came from the BSD porting project and doesn't seem to match the header files on OS X. With this removed (meaning using the definitions from the system header files) then include-mode filtering with IPv4 works and the tests pass on 10.7, 10.8 and 10.9. There are still problems with IPv6 source-specific multicasting where it's not completely clear what each OS X release supports. So for now I would like to enable include-mode filtering for IPv4 with the following changes: http://cr.openjdk.java.net/~alanb/8032451/webrev/ -Alan. From chris.hegarty at oracle.com Fri Jan 24 02:27:18 2014 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Fri, 24 Jan 2014 10:27:18 +0000 Subject: 8032451: (dc) DatagramChannel.join should support include-mode filtering on OS X In-Reply-To: <52DFD594.3090006@oracle.com> References: <52DFD594.3090006@oracle.com> Message-ID: <52E24006.8080005@oracle.com> The changes look good to me Alan. I always like when a bunch of these definitions can be cleaned up/removed. -Chris/ On 22/01/14 14:28, Alan Bateman wrote: > > This is another OS X specific issue (again going back original port > where there were a number of things left out). > > DatagramChannel has API support for source-specific multicasting but it > has always been disabled on OS X (this is allowed by the spec). Part of > the issue seems to be the definition of ip_mreq_source that came from > the BSD porting project and doesn't seem to match the header files on OS > X. With this removed (meaning using the definitions from the system > header files) then include-mode filtering with IPv4 works and the tests > pass on 10.7, 10.8 and 10.9. There are still problems with IPv6 > source-specific multicasting where it's not completely clear what each > OS X release supports. So for now I would like to enable include-mode > filtering for IPv4 with the following changes: > > http://cr.openjdk.java.net/~alanb/8032451/webrev/ > > -Alan. From Alan.Bateman at oracle.com Thu Jan 30 09:31:29 2014 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 30 Jan 2014 17:31:29 +0000 Subject: 8030795: java/nio/file/Files/probeContentType/ForceLoad.java failing with ServiceConfigurationError without jtreg -agentvm option Message-ID: <52EA8C71.3060708@oracle.com> This test fails when run in jtreg othervm mode because there is a service configuration file in the directory that lists a FileTypeDetector that isn't compiled by the test. There are many ways to fix this, the simplest is to just add @build tag to compile the dependency. I've also changed it to run in othervm mode because the test is to ensure that the underlying native code can be loaded before any other parts of the API are used. The proposed changed is below. -Alan diff --git a/test/java/nio/file/Files/probeContentType/ForceLoad.java b/test/java/nio/file/Files/probeContentType/ForceLoad.java --- a/test/java/nio/file/Files/probeContentType/ForceLoad.java +++ b/test/java/nio/file/Files/probeContentType/ForceLoad.java @@ -25,6 +25,8 @@ * @bug 4313887 * @summary Test library dependencies by invoking Files.probeContentType * before other methods that would cause nio.dll to be loaded. + * @build ForceLoad SimpleFileTypeDetector + * @run main/othervm ForceLoad */