Performance regression on jdk7u25 vs jdk7u40 due to EPollArrayWrapper.eventsLow

Fri Jan 10 08:37:10 PST 2014

I took a look at EPollArrayWrapper, it's basically implementing a map int
-> byte by combining a byte array for "small" integers and a HashMap for
large ones.  The 64k byte array does look like it may be spending too much
memory for the performance gain - typical java memory bloat.    In the
common case file descriptors will be "small".

One simple approach to economizing in the common case  is to initialize the
byte array eventsLow to a much smaller size, and grow it if a sufficiently
large file descriptor is encountered.  In fact, looking closer, you already
have a data structure here that works that way - BitSet registered is a map
int -> boolean that grows only up to the max registered fd.  The jdk
doesn't have a ByteSet, but it seems that's what we want here.  It's not
too painful to roll our own.  A lock is already held whenever accessing any
of the internal data here.

Minor things to fix in EPollArrayWrapper:

    // maximum size of updatesLow

comment is wrong: s/updatesLow/eventsLow/

--

                short events = getUpdateEvents(fd);

Using short here is really WEIRD.  Either leave it as a byte or promote to
int.

---

    private static final byte  KILLED = (byte)-1;

Remove stray SPACE.

On Thu, Jan 9, 2014 at 6:20 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

> Having 122 instances of the epollarraywrapper seems odd - that's basically
> 122 selectors monitoring connections.  Typically you'd have just one
> selector and thus one epollarraywrapper.  I'm not familiar with tradesoap
> so don't know what it's doing internally.
>
> One could probably slim down epollarraywrapper a bit but I think the
> reason the eventsLow[] is pre allocated with a large value is probably
> because it's expected to just have one or a few of them in the process.
>
> Sent from my phone
> On Jan 9, 2014 7:44 PM, "Jungwoo Ha" <jwha at google.com> wrote:
>
>> Hi,
>>
>> I found a performance issues on DaCapo tradesoap benchmark.
>>
>> *Commandline*
>> $ java -XX:+UseConcMarkSweepGC -Xmx76m -jar dacapo-9.12-bach.jar
>> tradesoap -n 7
>>
>>   76MB is 2 times of minimum heap size requirement on tradesoap, i.e.,
>> tradesoap can run on 38MB but not less.
>>   Measure the last iteration (steady state performance)
>>
>> *Execution time on the last iteration*
>>   7u25: 17910ms
>>   7u40: 21263ms
>>
>> So I compared the GC behavior using -XX:+PrintGCDetails, and noticed that
>> 7u40 executed far more concurrent-mode-failure.
>>   7u25: 2 Full GC, 60 concurrent-mode-failure
>>   7u40: 9 Full GC, 70 concurrent-mode-failure
>> and this is the cause of slowdown.
>>
>> Looking at the GC log, I noticed that 7u40 uses more memory.
>> 7u25 : [Full GC .... (concurrent mode failure): 48145K->*42452K*(51904K),
>> 0.2212080 secs]
>> 7u40 : [Full GC .... (concurrent mode failure): 47923K->*44672K*(51904K),
>> 0.2138640 secs]
>>
>> After the Full GC, 7u40 has 2.2MB more live objects. This is always
>> repeatable.
>>
>> So I got the heapdump of live objects and found that the most noticeable
>> difference is the byte[] of *EPollArrayWrapper.eventsLow.*
>> I think this field is added on 7u40 and was occupying 122 instances * 32K
>> = 3.8MB.
>>
>> Here goes my question.
>> 1) How are the # of instances of this type expected grow on large heap
>> size?
>>     How does it correlate to the network usage or typical server
>> applications?
>> 2) Is there a way to reduce the memory?
>>
>> Thanks,
>> Jungwoo
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20140110/2f62a38f/attachment-0001.html