I’ve recently moved one of the products I’m maintaining to a new server, because it wasn’t performing as well as it was supposed to. In the process I’ve spent some time tweaking the server and simplifying the setup, and the following is an overview over some of the tools I’ve found most useful.
I thought it might be useful to give you a basic idea of what the application does: The application TidTilMere is a highly specialized administration app for Danish folk high-schools (højskoler). It is build around the idea of having a single place to store information on everyone the school has had contact with, and it therefore spans many different areas:
The application is used on a daily basis by the schools staff (1-4 people), and has to be very responsive. It is not likely to win any awards for front-end design, but it gets the job done. If you want to know more (and happen to speak danish), you can visit TidTilMere.net.
With that out of the way, lets get to the setup.
I’ve been hosting the application with HEXONET for the past two years, and have been very happy with them. The reason I went with them, was that they were hosting out of Germany, which meant lower round-trip-times than a server farm in the US (~20ms vs. ~150ms), and we wanted the application to be very responsive.
HEXONET only sell to companies (not individuals), but have a wide range of virtual server offerings. The few times I’ve needed to contact support, they’ve been quick in getting back to me. The old server was a “vServer Platinum Pro”, whereas the new server is a “dServer VNS Premium”, which translates into 28€ for a Debian-etch with 1500MHz CPU, 45GB HD, 1000GB traffic and 1024MB RAM (1536 MB burstable). You can see more options on their feature/price list.
If you need to service customers in the EU, I think HEXONET is a good option.
A couple of weeks ago i switched all my applications from Apache 2 + Mongrel to Apache 2 + Passenger, and I’m glad I did, because it is so much easier to install and maintain. I haven’t noticed any performance gains/hits, but the application is much easier to deploy and maintain now than it was before, and installation was simpler than getting Mongrel to work with Apache 2 through mod_proxy_balancer.
Passenger will automatically launch additional workers according to load, and will also relieve them of duty when they are no longer needed. A sideeffect of this, is that if no one is accessing your site for a period of 5 minutes (default), all the workers will be killed, and Passenger will have to instantiate a new worker + the rails environment when the next visitor comes by. This can take some time, and proved to be a little annoying for my users, because they were often doing other tasks for 5 minute periods. Fortunately, it is very easy to set up a cron job (or something similar) to poll your application every once in a while, to make sure a least one worker is alway ready:
# m h dom mon dow command */3 * * * * wget -O /dev/null http://localhost:80/admin/login --no-check-certificate 2>/dev/null
I highly recommend giving Passenger a try.
I use Ruby Enterprise Edition together with Passenger, to improve performance and decrease memory consumption.
Recently though, comparisons here and here indicate that the tides may be changing, and that Ruby 1.9 or JRuby might be faster. For now, I’m sticking with REE, since Passenger doesn’t support JRuby yet, and Ruby 1.8.x is still what the Rails people recommends. I’m keeping my eye out though.
Monit is a monitoring tool for Unix systems, that can monitor processes, files etc. It can both alert the administrator when something is wrong, and in many cases automatically correct the problem.
I found Monit to be crucial when I had my application on Mongrel, because a Mongrel instance would crash once in a while. Monit would then automatically restart the instance and notify me, without the users experiencing any down-time. Now that Passenger takes care of always having running workers, Monit rarely has to restart anything, but I still count on it for notifying me if anything out of the ordinary happens.
I use monit for monitoring my system for abnormal load, checking that apache, mysql and my mailer deamon are running (and automatically restart them if they are not), and for making sure that I don’t accidentally run out of disk space (this has happened once or twice, and took me longer-than-I-liked to pinpoint). This is an example from my monit configuration file, which also shows some of Monits power:
set daemon 60 # Poll at 1-minute intervals
set alert erik@underbjerg.com
# Check for abnormal load
check system localhost
if loadavg (1min) > 4 for 3 cycles then alert
if loadavg (5min) > 3 for 3 cycles then alert
if memory usage > 90% then alert
if cpu usage (user) > 70% for 5 cycles then alert
if cpu usage (system) > 30% for 5 cycles then alert
if cpu usage (wait) > 20% for 5 cycles then alert
# Check apache2
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host localhost port 80 protocol http
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then timeout
group server
# Check that mysql is running
check process mysql with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
# Check that the ar_sendmail mailer daemon is running
check process ar_sendmail
with pidfile /home/erik/apps/hojskole_sys/shared/log/ar_sendmail.pid
start program = "/etc/init.d/ar_sendmail start" with uid erik and gid erik
stop program = "/etc/init.d/ar_sendmail stop" with uid erik and gid erik
if totalmem is greater than 65.0 MB for 2 cycles then restart # eating up memory?
if loadavg(5min) greater than 10 for 8 cycles then restart # bad, bad, bad
if 20 restarts within 20 cycles then timeout # something is wrong, call the sys-admin
group ar_sendmail
# Check disk space
check device vzfs with path /
if space usage > 85% then alert
group server
So if you don’t already have a way to automatically check the health of server, I recommend giving Monit a try. Note that Monit is in no way specific to rails, so you should be able to use it for any kind of Unix server.
The New Relic RPM performance monitoring tool is a real gem, that I wish I had when I was trying to optimize performance two years ago. It is an amazing piece of software, that plugs right in to your rails app and gives you detailed breakdowns, for each browser request, of where time is spent. It’s incredibly easy to install, and you can start using it in development mode right away.
It was never been this easy to figure out exactly where to optimize your rails app.
New Relic is free to use in development mode on your local machine, where you can get extremely detailed info, and in production, where the Lite-edition gives you basic metrics for the last 30 minutes. If you want more metrics or longer time periods in production, you will have to buy a licence, which I think is fair, since the Lite edition is already so useful. So far, I’ve gotten tremendous results out of using the Lite edition on my local machine, where I have been able to find exactly the places that were slow, and in production, where I’ve been able to see how the system was performing under real load.
If you are running a Rails application in production, and if you are just the slightest bit interested in how it performs (why wouldn’t you be?), download New Relic right now.
I hope you have found it useful. Please feel free to leave any comments, suggestions or questions you might have, and I’ll try to answer them.
there is a mod rails setup which you can set to have it not “turn off” your rails app for X seconds. Setting it high is almost as good as your cron suggestion
-=r
[...] So far, I’ve gotten tremendous results out of using the Lite edition on my local machine, where I have been able to find exactly the places that were slow, and in production, where I’ve been able to see how the system …Continue [...]
Thanks for the tip faith – I’ll check it out. I thought there might be something like that, but I wanted to make sure that there was no startup-delay, even if no-one had accessed the application for several hours.