[sctp-dev] Linux Kernel Failure (Fedora) and sctp close()
Danny
danny at tower.telenet.be
Tue Feb 9 03:40:12 PST 2010
Chris,
I'll try to answer your questions as accurate as possible.
1) Feel free to forward the trace to the lksctp developers. I have
forwarded it only to the "kerneloops.org" as the kernel failure pop-up
suggested.
2) Yes I use the before last kernel upgrade. I am waiting for nvidia
drivers to upgrade to last kernel and if upgrade would solve the problem
i will report it. Fedora version 11.
3) Is it reproducible. yes. I did further tests to fine-tune the
information and can at this point provided detailed test results that i
include + another trace for this failure which will probably confirm my
reasoning.
Test results to fine-tune info on sctp_close() kernel failure obtained
by moving "if(true) { return; }" statements in the code in order to see
what particular statement triggers the failure.
1) The failure is not due to simply loading SCTP API/lksctp. Tested with
an if(false) return; in first line of test method.
2) SCTP server (open, bind, socket options, connect, wait for accept on
Selector, and closing) without any connections to the server being setup
does not trigger the failure.
3) idem, but with a connection to the server (which results in an ACCEPT
from the selector and calling accept()) does trigger the kernel failure
when close() is called LATER on the server.
4) idem, but close() is not called explicitly: in that case the failure
does not occur immediately but when i stop the program (at which case
the i geus an auto-close is performed implicitly).
When the failure is provoke in the way as described in point 4 then the
kernel trace is somewhat different (see trace joined) as the one i
posted before when close() is called implicitly.
As a synthesis i would say that for the failure to occur (and remember i
only tested it with SCTP server) the server must be setup and must have
accepted at least one incoming connection at which point the failure
will occur when the server channel is closed, by calling the close()
statement explicitly or automatic close implicitly.
Extra information:
- The accept(), send() and receive() are called from an other thread
then open(), bind(), connect() and close() because a Selector is used.
Therefore the API and lksctp will see different ThreadId's if they use them.
- At the moment that the explicit close() is called on the server,
resulting in the failure to occur immediately (as in point 3 above), no
close() has been called yet on the out-bound, nor in-bound SCTPClients
and the session is still connected.
- The in-bound and out-bound client sessions seems not to be affected by
the kernel failure (i can send() and receive() data on it and perform
graceful shutdown()).
- The failure also occurs if after closing the sever i endlessly block
my code in a waiting state to make sure no SCTP statements are called
anymore after the server close().
- Last but not least, i am under the impression, emphasize the word
'impression', that if i do not stop the program after the kernel
failure occurs my system fan increases speed after a while which could
point to some endless loop in lksctp. I do however have no other proof
or indication of that. The system CPU continues to perform within
parameters, hence the word 'impression'.
Danny
Christopher Hegarty - Sun Microsystems Ireland wrote:
> Danny,
>
> I can say for sure that I never seen this issue before, and it
> certainly looks like the problem is in the lksctp kernel
> implementation. I think you should send the stack trace to
> lksctp-developers. Or I can do this. Is this reproducible? It looks
> like you are on a fairly recent kernel, good.
>
> On the Java side I can let you know how we are handling closing of the
> sockets.
> 1) On initialization of the Java SCTP implementation.
> Create an open connected pair of sockets. Close one and keep the
> other for later closures, preCloseFD.
> 2) When closing a specific socket.
> If it is known that the socket is being read from another thread,
> sending on another thread, or is registered with a selector. Dup
> the file descriptor to preCloseFD ( dup2(preCloseFD, fd) ).
> 3) When all other threads/selectors are finished with the socket,
> close is called on the underlying file descriptor.
>
> This is a common strategy employed by the Java platform across TCP and
> UDP sockets also in order to support the semantics we required for
> asynchronous closing of these sockets.
>
> -Chris.
>
> On 06/02/2010 11:10, Danny wrote:
>> Hi,
>>
>> Since i use the lksctp lib and the SCTP API (JDK 7), and only since
>> starting to use these two, i started having Kernel Failure warnings on
>> Fedora 11.
>> By looking into the detail i found out that it has something to do with
>> closing an sctp socket.
>>
>> I was wondering if others have this too. I haven't posted the failure
>> yet with the Linux Kerel group.
>> I would like some feedback from this mailing list to first find out if i
>> am the only one and should look into my system or forward the problem.
>> I am also not sure if this is lksctp related or SCTP API related. The
>> word 'kernel' makes me believe the first one but the cause could be in
>> the second one.
>>
>> Here is the trace :
>>
>> Kernel failure message 1:
>> ------------[ cut here ]------------
>> WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x100/0x11e()
>> (Tainted: P W )
>> Hardware name: Aspire 8920
>> Modules linked in: sctp libcrc32c fuse rfcomm bridge stp llc bnep sco
>> l2cap vboxnetadp vboxnetflt vboxdrv sunrpc ip6t_REJECT nf_conntrack_ipv6
>> ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table
>> dm_multipath uinput nvidia(P) btusb snd_hda_codec_realtek bluetooth
>> snd_hda_intel snd_hda_codec arc4 ecb snd_hwdep snd_seq snd_seq_device
>> iwlagn snd_pcm uvcvideo iwlcore acer_wmi snd_timer rfkill video videodev
>> lib80211 i2c_i801 snd mac80211 iTCO_wdt v4l1_compat i2c_core atl1e
>> soundcore pcspkr v4l2_compat_ioctl32 wmi serio_raw iTCO_vendor_support
>> output joydev snd_page_alloc cfg80211 [last unloaded: microcode]
>> Pid: 5010, comm: java Tainted: P W 2.6.30.10-105.2.4.fc11.x86_64 #1
>> Call Trace:
>> [<ffffffff81049505>] warn_slowpath_common+0x84/0x9c
>> [<ffffffff81049531>] warn_slowpath_null+0x14/0x16
>> [<ffffffff813921da>] inet_sock_destruct+0x100/0x11e
>> [<ffffffff813337ca>] sk_free+0x23/0xf4
>> [<ffffffffa0e21dbc>] sctp_close+0x1c1/0x1d0 [sctp]
>> <-------------------------------- see here
>> [<ffffffff8104e6cc>] ? local_bh_enable_ip+0xe/0x10
>> [<ffffffff813db5b8>] ? _write_unlock_bh+0x19/0x1b
>> [<ffffffff81391ce4>] inet_release+0x55/0x5c
>> [<ffffffffa0b52f32>] inet6_release+0x35/0x3a [ipv6]
>> [<ffffffff81330142>] sock_release+0x1f/0x6c
>> [<ffffffff813301b6>] sock_close+0x27/0x2b
>> [<ffffffff810e4988>] __fput+0xf9/0x1a0
>> [<ffffffff810e4a49>] fput+0x1a/0x1c
>> [<ffffffff810e1c2d>] filp_close+0x68/0x72
>> [<ffffffff810ef880>] sys_dup3+0x11e/0x143
>> [<ffffffff810ef8ed>] sys_dup2+0x48/0x4b
>> [<ffffffff81010c82>] system_call_fastpath+0x16/0x1b
>> ---[ end trace 061fb956b7b1c9d5 ]---
>>
>>
>> Greetz,
>> Danny
>
>
More information about the sctp-dev
mailing list