How to optimize Ganglia for too much Disk IO
I was facing ganglia server performance issues. It was showing too much disk IO and too much Load average on the server. So I thought of some optimization. All of my setup is on AWS and I was using 10000 IOPS EBS volume for my server but still was facing disk performance issues.
My Server Details:
8 CPU – 2.1 GHZ
RAM – 8GB
No. of hosts graphing – ~150
Current Load on server:
avg-cpu: %user %nice %system %iowait %steal %idle
9.88 0.00 9.11 20.02 1.28 59.72
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvdap1 0.00 8.00 0.00 13.00 0.00 0.33 51.32 0.09 7.02 1.11 1.44
xvdh 0.00 473.00 98.60 9700.00 0.39 40.39 8.52 145.26 14.82 0.10 100.00
I have used RRDCache for disk IO improvement. Below are the steps to setup.
* Uninstall your existing rrdtool rpm and install it from rpmforge or download them from apt.sw.be site as below:
rpm -e rrdtool --nodeps
rpm -e rrdtool-perl --nodeps
yum install -y perl-Time-HiRes
yum install -y libdbi
yum install -y xorg-x11-fonts-Type1
rpm -ivh perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm rrdtool-devel-1.4.7-1.el5.rf.x86_64.rpm
* Since gmetad runs as ganglia user and rrdcached requires access to write to rrd files and apache needs access of same directory. So, Add ganglia to apache group.
usermod -a -G apache ganglia
* Give apache group access to rrd dir. In my case, I am saving RRDs to /ganglia partition. By default its /var/lib/ganglia.
chown -R ganglia:apache /ganglia/rrds/
* Now change the rrdcached startup options:
OPTIONS="rrdcached -s apache -m 664 -l unix:/tmp/rrdcached.sock -s apache -m 777 -P FLUSH,STATS,HELP -l unix:/tmp/rrdcached.limited.sock -b /ganglia/rrds -B"
* Also update the gmetad startup variables to use rrdcache socket file.
* Now you have to tell ganglia web to read from socket file.
Change the below variable with the socket file location. By default it does not have any value.
$conf['rrdcached_socket'] = "/tmp/rrdcached.sock";
* Always make sure, rrdcached is started before gmetad process. So
* Now monitor your logs ganglia logs:
tail -f /var/log/messages
If you don’t see any error like: “Unable to connect to rrdcache: No such file or directory” then you are good to assume your rrdcache setting are correct.
Now check “
ps aux | grep -i rrdcached“, if you see couple of rrdcache processes are running with ganglia user. You are good.
By checking both the above commands you can consider rrdcache is working fine.
* Alternatively you can check if RRDcache is working on not with:
while true; do clear ; echo STATS | socat - /var/rrdtool/rrdcached/rrdcached.sock; sleep 1; done
Now you should check your
iostat command output again and see the IO difference. It would have decreased by atleast 10 times. 🙂
There is another method to improve the ganglia performance by moving the RRD dir to tmpfs. As suggested: here: