 |
|
| Linux Forum Index » Linux Development » TCP connection crash detection |
|
Page 1 of 1 |
|
| Author |
Message |
| Serge Wenger |
Posted: Fri Aug 26, 2005 2:07 am |
|
|
|
Guest
|
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
How can I modifiy this timeout. On Windows (winsock) the timeout is less
than one min with the same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return 0 (no error)),
but it change nothing.
Did somebody have an idea to change this timeout?
Thanks
Serge |
|
|
| Back to top |
|
|
|
| Jan Panteltje |
Posted: Fri Aug 26, 2005 6:28 am |
|
|
|
Guest
|
On a sunny day (Fri, 26 Aug 2005 10:07:18 +0200) it happened "Serge Wenger"
<serge_wenger001@hotmail.com> wrote in <demijn$njs$1@atlas.ip-plus.net>:
Quote:
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
How can I modifiy this timeout. On Windows (winsock) the timeout is less
than one min with the same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return 0 (no error)),
but it change nothing.
Did somebody have an idea to change this timeout?
Thanks
Serge
Well, assuming your client does a read() from the socket at a regular
basis, test for an error return in read(), it will return EOF if the server
shuts down.
You can also look for other errors and make a timeout loop:
time_t read_timer;
#define READ_TIMEOUT30 // seconds
/* read from socket */
while(1)
{
if(debug_flag)
{
fprintf(stderr, "before read(): socketfd=%d http_buffer=%p http_bptr=%p\n",\
socketfd, http_buffer, http_bptr);
}
/* start the read timer */
read_timer = time(0);
while(1)
{
a = read(socketfd, http_bptr, content_length / 2);
if(debug_flag)
{
fprintf(stderr, "read() returned a=%d\n", a);
}
if(a > 0) break;
else if(a == 0) /* EOF server closed connection? */
{
fprintf(stderr, "read(): returned EOF (power failure, network, interference?)\n");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
if(debug_flag)
{
fprintf(stderr, "read() returned error because");
perror("");
}
if(errno == EAGAIN)
{
/* test for connect time out */
if( (time(0) - read_timer) > READ_TIMEOUT)
{
fprintf(stderr, "timeout in read(): (power failure, network, interference?)\n");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
usleep(1000);
continue;
}
else
{
fprintf(stderr, "fatal error in read() because\n");
perror("");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
} /* end while read timer */
if(restart_flag) break; |
|
|
| Back to top |
|
|
|
| Serge Wenger |
Posted: Fri Aug 26, 2005 7:19 am |
|
|
|
Guest
|
"Jan Panteltje" <panteltje@yahoo.com> a écrit dans le message de news:
1125055697.bb1b839bff3701c634513316b674aa10@teranews...
Quote: On a sunny day (Fri, 26 Aug 2005 10:07:18 +0200) it happened "Serge
Wenger"
serge_wenger001@hotmail.com> wrote in <demijn$njs$1@atlas.ip-plus.net>:
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
How can I modifiy this timeout. On Windows (winsock) the timeout is less
than one min with the same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return 0 (no
error)),
but it change nothing.
Did somebody have an idea to change this timeout?
Thanks
Serge
Well, assuming your client does a read() from the socket at a regular
basis, test for an error return in read(), it will return EOF if the
server
shuts down.
You can also look for other errors and make a timeout loop:
time_t read_timer;
#define READ_TIMEOUT30 // seconds
/* read from socket */
while(1)
{
if(debug_flag)
{
fprintf(stderr, "before read(): socketfd=%d http_buffer=%p
http_bptr=%p\n",\
socketfd, http_buffer, http_bptr);
}
/* start the read timer */
read_timer = time(0);
while(1)
{
a = read(socketfd, http_bptr, content_length / 2);
if(debug_flag)
{
fprintf(stderr, "read() returned a=%d\n", a);
}
if(a > 0) break;
else if(a == 0) /* EOF server closed connection? */
{
fprintf(stderr, "read(): returned EOF (power failure, network,
interference?)\n");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
if(debug_flag)
{
fprintf(stderr, "read() returned error because");
perror("");
}
if(errno == EAGAIN)
{
/* test for connect time out */
if( (time(0) - read_timer) > READ_TIMEOUT)
{
fprintf(stderr, "timeout in read(): (power failure, network,
interference?)\n");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
usleep(1000);
continue;
}
else
{
fprintf(stderr, "fatal error in read() because\n");
perror("");
report_read_timeout_flag = 1;
restart_flag = 1;
break;
}
} /* end while read timer */
if(restart_flag) break;
Thanks for your suggestion, but it is exactly what I do. The problem is that
the recv (blocking call) return only after 15 min. Then I see the error
(EOF).
Serge |
|
|
| Back to top |
|
|
|
| Jan Panteltje |
Posted: Fri Aug 26, 2005 8:02 am |
|
|
|
Guest
|
On a sunny day (Fri, 26 Aug 2005 15:19:34 +0200) it happened "Serge Wenger"
<serge_wenger001@hotmail.com> wrote in <den4ta$p1j$1@atlas.ip-plus.net>:
Quote:
Serge
Yes, but if you use read you should use non-blocking!
For a complete program, see this (routines taken from that)
pls note GPL license, violaters will be sued.
http://panteltje.com/panteltje/mcamip/
hp = gethostbyname(server);
if(hp == 0)
{
fprintf(stderr,\
"gethostbyname: returned NULL cannot get host %s by name.\n", server);
/* signal FD_SET (main) that this is no longer a valid filedescriptor */
*socketfd = -1;
return 0;
}
/* gethostbyname() leaves port and host address in network byte order */
bzero(&sa, sizeof(sa) );
bcopy(hp->h_addr, (char *)&sa.sin_addr, hp->h_length);
sa.sin_family = AF_INET;
sa.sin_port = htons( (u_short)port);
/* sa.sin_addr and sa.sin_port now in network byte order */
/* create a socket */
*socketfd = socket(hp->h_addrtype, SOCK_STREAM, 0);
if(*socketfd < 0)
{
fprintf(stderr, "socket failed\n");
*socketfd = -1;
return(0);
}
/* set for nonblocking socket */
if (fcntl(*socketfd, F_SETFL, O_NONBLOCK) < 0)
{
return(0);
}
sprintf(server_ip_address, "%s", inet_ntoa (sa.sin_addr) );
fprintf(stderr,\
"connecting to %s (%s) port %d timeout %d\n",\
server, server_ip_address, port, connect_to_http_server_timeout);
/* prevent the program from hanging if connect takes a long time, now a return 0 is forced. */
/* start the timer */
connect_timer = time(0);
/* keep testing for a connect */
while(1)
{
/* connect */
a = connect(*socketfd, (struct sockaddr*)&sa, sizeof(sa) );
if(a == 0) break; /* connected */
if(a < 0)
{
if(debug_flag)
{
fprintf(stderr, "connect() failed because: ");
perror("");
}
/* test for connect time out */
if( (time(0) - connect_timer) > connect_to_http_server_timeout)
{
/* close the socket */
close(*socketfd);
/* set socketfd to invalid, it was valid! */
*socketfd = -1;
fprintf(stderr, "connect timeout\n");
return 0;
}
}/* end connect < 0 */
}/* end while test for a connect */
#define KEEP_ALIVE
#ifdef KEEP_ALIVE_
b = sizeof(int);
//int getsockopt (int SOCKET, int LEVEL, int OPTNAME, void *OPTVAL, socklen_t *OPTLEN-PTR)
a = getsockopt(*socketfd, SOL_SOCKET, SO_KEEPALIVE, &optval, &b);
if(debug_flag)
{
fprintf(stderr, "SO_KEEPALIVE: a=%d optval=%d\n", a, optval);
}
optval = 1;
//int setsockopt (int SOCKET, int LEVEL, int OPTNAME, void *OPTVAL, socklen_t OPTLEN);
a = setsockopt(*socketfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(int) );
if(debug_flag)
{
fprintf(stderr, "setsockopt returned a=%d b=%d\n", a, b);
}
a = getsockopt(*socketfd, SOL_SOCKET, SO_KEEPALIVE, &optval, &b);
if(debug_flag)
{
fprintf(stderr, "getsockopt: SO_KEEPALIVE: a=%d optval=%d b=%d\n", a, optval, b);
}
#endif // KEEP_ALIVE_ |
|
|
| Back to top |
|
|
|
| Serge Wenger |
Posted: Mon Aug 29, 2005 8:44 am |
|
|
|
Guest
|
"Jan Panteltje" <pNaonStpealmtje@yahoo.com> a écrit dans le message de news:
1125064942.bd240467b98fcd1a6b8d65d85f12e927@teranews...
Quote: On a sunny day (Fri, 26 Aug 2005 15:19:34 +0200) it happened "Serge
Wenger"
serge_wenger001@hotmail.com> wrote in <den4ta$p1j$1@atlas.ip-plus.net>:
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
Yes, but if you use read you should use non-blocking!
Thanks for you answer, but I have the same problem qith non-blocking socket.
The server never send data, so I cannot use your method. The EOF arrive only
after 15 min. How can I change this timeout?
Help please.
Serge |
|
|
| Back to top |
|
|
|
| Jan Panteltje |
Posted: Mon Aug 29, 2005 9:17 am |
|
|
|
Guest
|
On a sunny day (Mon, 29 Aug 2005 16:44:25 +0200) it happened "Serge Wenger"
<serge_wenger001@hotmail.com> wrote in <dev708$e2g$1@atlas.ip-plus.net>:
Quote: "Jan Panteltje" <pNaonStpealmtje@yahoo.com> a écrit dans le message de news:
1125064942.bd240467b98fcd1a6b8d65d85f12e927@teranews...
On a sunny day (Fri, 26 Aug 2005 15:19:34 +0200) it happened "Serge
Wenger"
serge_wenger001@hotmail.com> wrote in <den4ta$p1j$1@atlas.ip-plus.net>:
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
Yes, but if you use read you should use non-blocking!
Thanks for you answer, but I have the same problem qith non-blocking socket.
The server never send data, so I cannot use your method. The EOF arrive only
after 15 min. How can I change this timeout?
Help please.
Serge
Well, I dunno, but if the client only sends the data, and the server only listens,
and this infrequent too, then why not use a procedure / protocol that goes like:
start:
client tries to connect to server.
server accepts connection
client sends data
client closes connnection
wait until enough data
goto start
For example in case of a http server, and you uploading something, the client
would likely close the connection after uploading.
? |
|
|
| Back to top |
|
|
|
| Grant Edwards |
Posted: Mon Aug 29, 2005 10:20 am |
|
|
|
Guest
|
On 2005-08-26, Serge Wenger <serge_wenger001@hotmail.com> wrote:
Quote: I work on a TCP client (linux 2.4.25 elinos) that connect to
an external TCP Server (I can modifiy nothing on the TCP
server). The client send data to the server. If I switch off
the server, the TCP client see an error after more than 15
min.
15 minutes?!?? It should take 2 hours IF you've got the
KEEPALIVE option set. Without the KEEPALIVE option, it should
take forever.
Quote: How can I modifiy this timeout.
You apparently already have.
Quote: On Windows (winsock) the timeout is less than one min with the
same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return
0 (no error)), but it change nothing.
It the KEEPALIVE timer values that you need to change. The odd
thing is that you detect the link's absence in 15 minutes. The
default KEEPALIVE timeout is 2 hours.
Quote: Did somebody have an idea to change this timeout?
Let's ask Google...
You can get at it via /proc or via a sysctl call:
http://www.linux.com/howtos/Adv-Routing-HOWTO/lartc.kernel.obscure.shtml
http://linux.about.com/od/commands/l/blcmdl7_tcp.htm
With a recent kernel you can tweak them on a per-socket basis:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg91834.html
--
Grant Edwards grante Yow! Let's all show human
at CONCERN for REVERAND MOON's
visi.com legal difficulties!! |
|
|
| Back to top |
|
|
|
| David Schwartz |
Posted: Mon Aug 29, 2005 11:54 am |
|
|
|
Guest
|
"Serge Wenger" <serge_wenger001@hotmail.com> wrote in message
news:demijn$njs$1@atlas.ip-plus.net...
Quote: ++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
How can I modifiy this timeout. On Windows (winsock) the timeout is less
than one min with the same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return 0 (no
error)), but it change nothing.
Did somebody have an idea to change this timeout?
Follow the protocol specification. I'm assuming, since you didn't write
the server and can't change it, that there's a protocol specification you're
following. Are you dealing with a well-known protocol?
DS |
|
|
| Back to top |
|
|
|
| Serge Wenger |
Posted: Wed Aug 31, 2005 5:47 am |
|
|
|
Guest
|
"Jan Panteltje" <pNaonStpealmtje@yahoo.com> a écrit dans le message de news:
1125328671.e3e1e0d8a752d0e6125969c9cd1bec29@teranews...
Quote: On a sunny day (Mon, 29 Aug 2005 16:44:25 +0200) it happened "Serge
Wenger"
serge_wenger001@hotmail.com> wrote in <dev708$e2g$1@atlas.ip-plus.net>:
"Jan Panteltje" <pNaonStpealmtje@yahoo.com> a écrit dans le message de
news:
1125064942.bd240467b98fcd1a6b8d65d85f12e927@teranews...
On a sunny day (Fri, 26 Aug 2005 15:19:34 +0200) it happened "Serge
Wenger"
serge_wenger001@hotmail.com> wrote in <den4ta$p1j$1@atlas.ip-plus.net>:
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP
Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
Yes, but if you use read you should use non-blocking!
Thanks for you answer, but I have the same problem qith non-blocking
socket.
The server never send data, so I cannot use your method. The EOF arrive
only
after 15 min. How can I change this timeout?
Help please.
Serge
Well, I dunno, but if the client only sends the data, and the server only
listens,
and this infrequent too, then why not use a procedure / protocol that goes
like:
start:
client tries to connect to server.
server accepts connection
client sends data
client closes connnection
wait until enough data
goto start
For example in case of a http server, and you uploading something, the
client
would likely close the connection after uploading.
You are right, but in my case, the server send sometimes something. The send
only when he detect an customer event. This happens very slowly (once or two
per day)...
Serge |
|
|
| Back to top |
|
|
|
| Serge Wenger |
Posted: Wed Aug 31, 2005 5:55 am |
|
|
|
Guest
|
"David Schwartz" <davids@webmaster.com> a écrit dans le message de news:
devi4o$s8m$1@nntp.webmaster.com...
Quote:
"Serge Wenger" <serge_wenger001@hotmail.com> wrote in message
news:demijn$njs$1@atlas.ip-plus.net...
++++++++++++++ **************
+ TCP Server +-------<--------* TCP Client *
++++++++++++++ **************
Hello,
I work on a TCP client (linux 2.4.25 elinos) that connect to an external
TCP Server (I can modifiy nothing on the TCP server).
The client send data to the server. If I switch off the server, the TCP
client see an error after more than 15 min.
How can I modifiy this timeout. On Windows (winsock) the timeout is less
than one min with the same program.
I try to use setSock with SO_SNDTIMEO AND SO_RCVTIMEO (return 0 (no
error)), but it change nothing.
Did somebody have an idea to change this timeout?
Follow the protocol specification. I'm assuming, since you didn't write
the server and can't change it, that there's a protocol specification
you're following. Are you dealing with a well-known protocol?
Exact, I didin't write ther Server.No there is no protocol specification and
no well-known port. The server is an RS232->Ethernet converter.
Serge |
|
|
| Back to top |
|
|
|
| Serge Wenger |
Posted: Wed Aug 31, 2005 7:18 am |
|
|
|
Guest
|
"Len Holgate" <Len.Holgate@jetbyte.com> a écrit dans le message de news:
43138310$0$19759$cc9e4d1f@news.dial.pipex.com...
Quote: 15 minutes?!?? It should take 2 hours IF you've got the
KEEPALIVE option set. Without the KEEPALIVE option, it should
take forever.
You're confused.
The 15 mins he's seeing is probably the retransmission timeout eventually
expiring. See Stevens TCP/IP Illustrated Vol 1 (21.2). The client has sent
data. No response from the server has reached the client. TCP is
retransmitting the data using exponential backoff and it takes a while
before it gives up and errors. This should be around 9mins...
I think your are right. 15 seem common
(http://www.developerweb.net/sock-faq/flatfaq.php#faq14)
Quote:
The 2 hr keepalive is only relevant if you don't send any data over the
connection (and if keepalive is turned on).
On Windows (winsock) the timeout is less than one min with the
same program.
Have you shutdown the server or just pulled out a plug or turned off a
router? I would have expected Windows to take around the same time to
discover the problem.
I switch off the server, not the router
Quote: It's best to use a protocol level timeout...
not possible in my case.
Thanks for you help. Now I am searching to modify the retransmission
timeout. It seem not so easy...
Serge |
|
|
| Back to top |
|
|
|
| Grant Edwards |
Posted: Wed Aug 31, 2005 7:49 am |
|
|
|
Guest
|
On 2005-08-31, Serge Wenger <serge_wenger001@hotmail.com> wrote:
Quote: 15 minutes?!?? It should take 2 hours IF you've got the
KEEPALIVE option set. Without the KEEPALIVE option, it should
take forever.
You're confused.
Could be. :)
Quote: The 15 mins he's seeing is probably the retransmission timeout
eventually expiring. See Stevens TCP/IP Illustrated Vol 1
(21.2). The client has sent data. No response from the server
has reached the client. TCP is retransmitting the data using
exponential backoff and it takes a while before it gives up
and errors. This should be around 9mins...
I think your are right. 15 seem common
(http://www.developerweb.net/sock-faq/flatfaq.php#faq14)
The 2 hr keepalive is only relevant if you don't send any data over the
connection (and if keepalive is turned on).
I was under the impression he wasn't sending data. If he is,
then it is the transmit failing (and the 15 minutes makes more
sense).
Quote: On Windows (winsock) the timeout is less than one min with the
same program.
Have you shutdown the server or just pulled out a plug or turned off a
router? I would have expected Windows to take around the same time to
discover the problem.
I switch off the server, not the router
It's best to use a protocol level timeout...
not possible in my case.
Thanks for you help. Now I am searching to modify the retransmission
timeout. It seem not so easy...
You could enable KEEPALIVE, and then change the keepalive
timout.
--
Grant Edwards grante Yow! How many retired
at bricklayers from FLORIDA
visi.com are out purchasing PENCIL
SHARPENERS right NOW?? |
|
|
| Back to top |
|
|
|
| David Schwartz |
Posted: Wed Aug 31, 2005 11:20 am |
|
|
|
Guest
|
"Serge Wenger" <serge_wenger001@hotmail.com> wrote in message
news:df4aml$q8r$1@atlas.ip-plus.net...
Quote: It's best to use a protocol level timeout...
not possible in my case.
Are you really, absolutely, 100% sure?
Quote: Thanks for you help. Now I am searching to modify the retransmission
timeout. It seem not so easy...
It's very hard to believe this could possibly be the right solution.
DS |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Sun Nov 29, 2009 9:11 am
|
|