Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow
Martin Buchholz
martinrb at google.com
Fri Jan 10 08:37:10 PST 2014
I took a look at EPollArrayWrapper, it's basically implementing a map int
-> byte by combining a byte array for "small" integers and a HashMap for
large ones. The 64k byte array does look like it may be spending too much
memory for the performance gain - typical java memory bloat. In the
common case file descriptors will be "small".
One simple approach to economizing in the common case is to initialize the
byte array eventsLow to a much smaller size, and grow it if a sufficiently
large file descriptor is encountered. In fact, looking closer, you already
have a data structure here that works that way - BitSet registered is a map
int -> boolean that grows only up to the max registered fd. The jdk
doesn't have a ByteSet, but it seems that's what we want here. It's not
too painful to roll our own. A lock is already held whenever accessing any
of the internal data here.
Minor things to fix in EPollArrayWrapper:
// maximum size of updatesLow
comment is wrong: s/updatesLow/eventsLow/
--
short events = getUpdateEvents(fd);
Using short here is really WEIRD. Either leave it as a byte or promote to
int.
---
private static final byte KILLED = (byte)-1;
Remove stray SPACE.
On Thu, Jan 9, 2014 at 6:20 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> Having 122 instances of the epollarraywrapper seems odd - that's basically
> 122 selectors monitoring connections. Typically you'd have just one
> selector and thus one epollarraywrapper. I'm not familiar with tradesoap
> so don't know what it's doing internally.
>
> One could probably slim down epollarraywrapper a bit but I think the
> reason the eventsLow[] is pre allocated with a large value is probably
> because it's expected to just have one or a few of them in the process.
>
> Sent from my phone
> On Jan 9, 2014 7:44 PM, "Jungwoo Ha" <jwha at google.com> wrote:
>
>> Hi,
>>
>> I found a performance issues on DaCapo tradesoap benchmark.
>>
>> *Commandline*
>> $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar
>> tradesoap -n 7
>>
>> 76MB is 2 times of minimum heap size requirement on tradesoap, i.e.,
>> tradesoap can run on 38MB but not less.
>> Measure the last iteration (steady state performance)
>>
>> *Execution time on the last iteration*
>> 7u25: 17910ms
>> 7u40: 21263ms
>>
>> So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that
>> 7u40 executed far more concurrent-mode-failure.
>> 7u25: 2 Full GC, 60 concurrent-mode-failure
>> 7u40: 9 Full GC, 70 concurrent-mode-failure
>> and this is the cause of slowdown.
>>
>> Looking at the GC log, I noticed that 7u40 uses more memory.
>> 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K),
>> 0.2212080 secs]
>> 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K),
>> 0.2138640 secs]
>>
>> After the Full GC, 7u40 has 2.2MB more live objects. This is always
>> repeatable.
>>
>> So I got the heapdump of live objects and found that the most noticeable
>> difference is the byte[] of *EPollArrayWrapper.eventsLow.*
>> I think this field is added on 7u40 and was occupying 122 instances * 32K
>> = 3.8MB.
>>
>> Here goes my question.
>> 1) How are the # of instances of this type expected grow on large heap
>> size?
>> How does it correlate to the network usage or typical server
>> applications?
>> 2) Is there a way to reduce the memory?
>>
>> Thanks,
>> Jungwoo
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/2f62a38f/attachment-0001.html
More information about the nio-dev
mailing list