Ubuntu + sysklogd + remote logging = slow
Tuesday, July 21, 2009 11:01:32 PM
Fortunately, today I found the reason why it was slow, and how to "solve" it.
Details
- Operating system
- Ubuntu server 8.04
- linux 2.6.24-24-server SMP
- Hardware
- 1 AMD 64 CPU, single core
- Package versions
- klogd 1.5-1ubuntu1
- sysklogd 1.5-1ubuntu1
/etc/default/syslogd contains "-r" as the only parameter. This parameter has been added because this machine receives logs from many devices.
The symptoms
Opening a ssh connection to that machine was taking too long, about a few minutes. However, once the shell was open, the ssh session was fast and responsive. CPU usage showed around more than 90% of idle CPU time. Thus, that the machine was not overloaded at all.
Not just ssh was taking too long, but also Apache. Even though there was more than 90% of free CPU time and lots of unused bandwidth, web pages served from that machine were being delivered after some long seconds of waiting.
The clue
Then... I ran crontab -l to print the current crontab (actually, it wasn't me who ran it, but nevermind), and it took about half minute to print the output... Wait a minute! That is very odd, since it should print the output instantly!
Okay, now I know a simple command that triggers that slow-ish behavior. I just need to find out exactly what is slowing it down. For this job, I'll use strace, which is a command that traces all system calls and signals of a process.
strace -o LOG -r -f crontab -l
The above command will run crontab -l, follow any forks (-f), log everything to a file called LOG, and also print a relative timestamp before each system call (-r). With this, I was able to track down the exact issue: it was taking about 40 seconds to send a message through /dev/log socket.
Then, I decided to write a small Python program that just writes some test string to syslog and nothing more. I wanted to reproduce this issue on my own, just o be sure I am on the right track.
import socket
#client = socket.socket(socket.PF_FILE, socket.SOCK_DGRAM)
try:
client = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
client.connect("/dev/log")
except socket.error:
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
client.connect("/dev/log")
client.send("%s: TESTING - writing directly to /dev/log" % sys.argv[0])
client.close()
Yeah, this issue is reproducible. This simple Python program also spent about 30~40 seconds running.
Google time
After a quick search on Google, I found basically two bugs:
- Bug #26986 in sysklogd (Ubuntu): “Remote syslog logging paralyzes system”
- #334295 - slow syslogd makes system unusable - Debian Bug report logs
It seems that very few people have remote logging enabled in syslog, and probably this is the reason why there are not many reports about this issue.
What I've learnt is that sysklogd might have a bug somewhere, but also it seems to do a reverse DNS lookup for each line it receives. Since this machine receives a lot of log (coming from remote machines and also generated from local services), this might explain the slowdown. What's more, this same machine also runs Bind and acts as a DNS server, and named (the Bind daemon) also logs on syslog.
I've read that one guy could "solve" the slowdown by adding entries to the /etc/hosts file. That's not a good solution for me.
I've also read that Ubuntu was planning to replace sysklogd by rsyslog, and also that another guy "solved" his issue by doing exactly this.
Solution
So, I removed sysklogd (and klogd) and installed rsyslog. So far, everything is working fine, no more slowdowns.
Questions?
Huh... Apache uses its own logging code, and thus does not use syslog. So... Why apache was slowed down by sysklogd? I have no idea.
Could I have used syslog-ng instead of rsyslog? Probably yes.








How to use Quote function: