On Performance and Scalability
If you're an avid follower of the all-seeing, all-knowing GrabPerf, or just use the Blogdigger web site a lot, you may have noticed we've been sucking wind quite a bit lately. A number of factors have combined to push usage of Blogdigger to new levels. Although I haven't blogged much about it, addressing this rapid growth has been pretty much the main focus of my time over the past two months. We have made several improvements, each of which, for one reason or another, soon became irrelavent as site usage continued to increase*. We pushed out some changes last night, which seem to have made things happy for now (fingers crossed, no jinxes); Feb. 1 was our highest traffic day yet, which says to me that if we can keep the site up and running, we've got an audience just waiting to use the service.
Folks talk about scalability, etc., but the truth of the matter is, when you're running on limited resources (Blogdigger currently runs on two boxes), you can design your software to be as scalable as you like, but until you have the money to grow your infrastrucuture, it's a bit of a moot point. I'd argue that is a good thing; it's easy to take a system, throw more hardware at it, and walk away without really attempting to understand where things are going wrong. The first step in addressing scalability should be to look at the application, your deployment, configuration, etc., and see if you can improve things without adding more hardware. From what I've seen, most problems in software development come from misuse or misconfiguration of a software component, rather than limited resources. In other words, sometimes you don't need another database server, you just need more database connections in your pool.
Just so you know, we're continually monitoring the site for performance, and are always looking at ways to make things better. If you have ideas, questions, comments, please don't hesitate to let us know.
* - As an example, after we made a performance enhancement sometime in December, Bloglines started up their new crawler. Bloglines new crawler is a hungry little bugger; it likes to visit each of it's feeds twice an hour, independant of any other variables such as whether the feed has recently updated/pinged, how frequently new stuff is available on a feed, or what the ttl value for the feed is set to. Another example, just a few days after we pushed out a really nice performance enhancement (Steven from Grabperf even emailed me to check if we'd gotten more boxes) one of our partner sites, Webjay, went and got acquired by Yahoo.
Folks talk about scalability, etc., but the truth of the matter is, when you're running on limited resources (Blogdigger currently runs on two boxes), you can design your software to be as scalable as you like, but until you have the money to grow your infrastrucuture, it's a bit of a moot point. I'd argue that is a good thing; it's easy to take a system, throw more hardware at it, and walk away without really attempting to understand where things are going wrong. The first step in addressing scalability should be to look at the application, your deployment, configuration, etc., and see if you can improve things without adding more hardware. From what I've seen, most problems in software development come from misuse or misconfiguration of a software component, rather than limited resources. In other words, sometimes you don't need another database server, you just need more database connections in your pool.
Just so you know, we're continually monitoring the site for performance, and are always looking at ways to make things better. If you have ideas, questions, comments, please don't hesitate to let us know.
* - As an example, after we made a performance enhancement sometime in December, Bloglines started up their new crawler. Bloglines new crawler is a hungry little bugger; it likes to visit each of it's feeds twice an hour, independant of any other variables such as whether the feed has recently updated/pinged, how frequently new stuff is available on a feed, or what the ttl value for the feed is set to. Another example, just a few days after we pushed out a really nice performance enhancement (Steven from Grabperf even emailed me to check if we'd gotten more boxes) one of our partner sites, Webjay, went and got acquired by Yahoo.

0 Comments:
Post a Comment
<< Home