Technology Notes

  • Archive
  • RSS
  • Ask me anything

Scalr running out of disk space if sendmail not running!

Recently had another quirky issue with Scalr.  

Scalr generates a lot of email notifications via the Cron jobs.  On my instance I have sendmail turned off so that these email notifications don’t go out.

After running continuously for about 4 months, the Scalr instance started failing with an “Out of diskspace” error.  This was very strange since there was still about 25GB free on the disk.

After checking all kinds of configurations and permissions, a google search yielded lack of inodes as a possible error!

A quick “df -i” confirmed the error, IUse% was 99 and there were a handful of inodes free!  Tracing this back to find out where all the inodes had been consumed took longer. After a lot of hunting I ended up with the /var/spool/clientmqueue directory.  An “ls” in this folder ended up with my terminal “hanging” as there were so many files created in that folder.  Even an “rm -rf *” at the clientmqueue folder will not work and you will see the “Argument list too long” after some time.

Instead use the “find . -type f -print0 | xargs -0 rm” command in the /var/spool/clientmqueue folder.  On my instance this command ran for a few hours due to the large number of files.  (You may want to execute the command in the background in a screen command).

Once done the inode utilization fell to 7% and the system is again fine.  In order to avoid future issues, I added a cron job to delete the files in the clientmqueue folder periodically.

Note: Scalr is a fantastic software that works great.  This blog post only addresses one of the issues faced in running Scalr and has nothing to do with the Scalr software itself.

    • #Scalr
  • 7 months ago
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Scalr cron daemon tasks lose MySQL database connections over time

We use Scalr to manage some of our cloud infrastructure.  Scalr is a great Open Source alternative to more pricey solutions and has a fantastic feature set that is just about enough.

Using Scalr in a self hosted mode does have some challenges.  One of the most important components in the Scalr setup is the running of Cron jobs to handle all the batch jobs.  Cron is an extremely significant part of Scalr as the messaging is dependent on the running of tasks repetitively.  Most tasks run on demand and terminate.  

However 2 tasks run as daemons indefinitely. The first is SzrMessaging and the second DBQueueEvent.

I had a strange issue on my setup where after running for a day or thereabouts the messages from the EC2 instances would never reach the Scalr site and vice versa.  Restarts of apache/httpd, cron etc didn’t help.  Finally figured out that restarts of mysqld along with killing the daemon threads was setting it right.

Digging deeper into the code of class.SzrMessagingProcess.php found that the DB connection obtained is held for as long as the daemon is alive.  The daemon is killed only if the memory usage goes beyond a preset limit.

Now this solution seems to obviously work for all but unfortunately not for me. I modified the code after the memory limit checking to also kill the daemon thread if the thread has been running for a few hours.

Restarted cron and the tasks now run smoothly and the problem has gone away!  Don’t know why this happened on my setup but this could be a gotcha for others as well.

    • #Scalr
  • 7 months ago
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

About

Miscellaneous notes on Mobile, Cloud and other interesting technologes

Twitter

loading tweets…

Top

  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile
Effector Theme by Pixel Union