There has been a lot of discussion lately about scalability, brought about by Friendster's move to PHP. Once again, I am amazed at how many people don't understand what scalability means (even though I'm glad to see fewer and fewer people misspelling it). Scalability means "How well a solution to some problem will work when the size of the problem increases" (from Dictionary.com). This is interpreted in drastically different ways, and you can find my interpretation in What Is Scalability?.
Before I continue, let's look at some of the clueful comments from Joyce Park's blog entry:
Rasmus Lerdorf writes:
Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely. A typical Java application will make use of the fact that it is running under a JVM in which you can store session and state data very easily and you can effectively write a web application very much the same way you would write a desktop application. This is very convenient, but it doesn't scale. To scale this you then have to add other mechanisms to do intra-JVM message passing which adds another level of complexity and performance issues. There are of course ways to avoid this, but the typical first Java implementation of something will fall into this trap.
PHP has no scalability issues of this nature. Each request is completely sandboxed from every other request and there is nothing in the language that leads people towards writing applications that don't scale.
Harry Fuecks writes (in response to someone citing performance benchmarks to support a scalability argument):
But performance != scalability.
Joyce Park writes (in response to someone suggesting that Friendster's Java developers must have been sub-par):
1) We had not one but TWO guys here who had written bestselling JSP books. Not that this necessarily means they're great Java devs, but I actually think our guys were as good as any team.
2) We tried rewriting the site in Java twice, using MVC and all available best practices. It actually got slower. Anyway, what does MVC have to do with speed or scalability? I thought it was a design cleanliness and maintainability thing.
3) We tried different app servers, different JVMs, different machines.
4) Anything that money could do, it did.
There has been a lot of discussion elsewhere, too. Harry Fuecks explains that The J2EE guy still doesn't get PHP and discusses Why PHP Scales. Harry understands what scalability means and takes the time to try to it explain it to everyone else. If you have read The PHP Scalability Myth or think that scalability is a measure of performance (or both), please take the time to read what Harry has written.
Jeff Moore, in The PHP scalability saga continues, writes:
I think I'll end this post with heresy. The field of web development seems to have a mental model of application development forged from the dot-com boom era. We operate with the vision that our applications are going to experience exponential usage growth. Perhaps this leads to an unhealthy focus on scalability in web applications versus other requirements. Perhaps this also leads us to employ optimizations prematurely before we can even understand their impact or even have a need for them. Perhaps these premature optimizations even hurt scalability and performance and needlessly complicate our applications.
Perhaps the Java Culture is more infected with "dot-com-itis" than the php culture?
Technical details aside, I think PHP can be made to scale because so many people think that it can't. This skepticism means that people buy into the fact that it takes hard work and intelligent design to make a PHP-based system work right. 'Intelligent design' doesn't mean adhering to MVC or design patterns, writing OO code or assembler. It means looking at your system as a whole, figuring out what it needs to do, and then devising a plan for doing that as cheaply as possible. The critical bit, of course, is that you need to put that sort of work into any large architecture; PHP doesn't magically scale 'naturally', but neither will planting a Java Bean in your backyard create a magic scalable beanstalk.
His entire "answer" is very informative, even if most of it is obvious. Sometimes what people need is for someone to stand up and state the obvious, and I think now is such a time.
Of course, there are plenty of people who aren't as clueful as Harry and George. Unfortunately, it's difficult to know who to listen to. John Lim says "High Performance, High Scalability PHP is a Lie". I assume that he just wanted a nice headline, but his statement couldn't be further from the truth.
Last October, I briefly answered the question What Is Scalability?. Perhaps my use of Big O notation wasn't the best approach, since most people who truly understand my point likely already know what scalability means. A simpler explanation might be better. In fact, we need to eliminate computers from the explanation altogether, because that alone seems to confuse people.
Compare a truck and a tractor (hypothetically). To simplify our comparison, let's assume that both have the exact same towing capacity (this might be unrealistic, but such is the beauty of hypothetical situations). With no load, the truck has a maximum speed of 125 mph (about 200 kph), and the tractor has a maximum speed of 15 mph (about 25 kph). With a load equivalent to their maximum towing capacity, the truck has a maximum speed of 45 mph (about 70 kph), and the tractor has a maximum speed of 10 mph (about 15 kph). Which scales better? If you think the truck does, you're wrong. Although the truck is faster in all cases (loaded, it is even faster than the tractor with no load), it slows down the most under load, proportionately.
If you're only concerned with speed, you should choose a Ferrari Modena rather than decide between the truck and the tractor. If you're only concerned with scalability (which is highly unlikely), you should choose the tractor. If you're concerned with the best combination of speed and scalability, the truck is a good choice.
So how does scalability apply to the Web? First, you should ask yourself whether the Web's fundamental architecture is scalable. The answer is yes. Some people will describe HTTP's statelessness in a derogatory manner. The more enlightened people, however, understand that this is one of the key characteristics that make HTTP such a scalable protocol. What makes it scalable? With every HTTP transaction being completely independent, the amount of resources necessary grows linearly with the amount of requests received. In a system that does not scale (where "does not scale" means that it scales poorly), the amount of resources necessary would increase at a higher rate than the number of requests. While HTTP has its flaws (the proper spelling of referrer being one), there's no arguing that it scales, and this is one of the things that made the Web's early explosive growth less painful than it would have otherwise been.
The present discussion is about developing Web applications that scale well, and whether particular languages, technologies, and platforms are more appropriate than others. My opinion is that some things scale more naturally than others, and Rasmus's explanation above touches on this. PHP, when compiled as an Apache module (mod_php), fits nicely into the basic Web paradigm. In fact, it might be easier to imagine PHP as a new skill that Apache can learn. HTTP requests are still handled by Apache, and unless your programming logic specifically requires interaction with another source (database, filesystem, network), your application will scale as well as Apache (with a decrease in performance based upon the complexity of your programming logic). This is why PHP naturally scales. The caveat I mention is why your PHP application may not scale.
A common (and somewhat trite) argument being tossed around is that scalability has nothing to do with the programming language. While it is true that language syntax is irrelevant, the environments in which languages typically operate can vary drastically, and this makes a big difference. PHP is much different than ColdFusion or JSP. In terms of scalability, PHP has an advantage, but it loses a few features that some developers miss (which is why there are efforts to create application servers for PHP). The PHP versus JSP argument should focus on environment, otherwise the point gets lost.
I actually disagree with George's statement, "PHP doesn't magically scale 'naturally'". Of course, I understand and agree with the spirit of what he's trying to say, which is that using PHP isn't going to make your applications magically scale well, but I do believe that PHP has a natural advantage, as I just described. Rasmus seems to agree with me, and George might also agree, despite his statement.
I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.