The Economics of Google's Hardware Infrastructure
|
|
FastCompany has a piece about "How Google Grows...and Grows ...and Grows". Its got some good insight on what makes google successful. However, I've found some interesting financial calculations while reading Web Search for a Planet. The Google Cluster Architecture that explains it in a more technical/financial perspective.
Because Google servers are custom made, we’ll use pricing information for comparable PC-based server racks for illustration. For example, in late 2002 a rack of 88 dual-CPU 2-GHz Intel Xeon servers with 2 Gbytes of RAM and an 80-Gbyte hard disk was offered on RackSaver.com for around $278,000. This figure translates into a monthly capital cost of $7,700 per rack over three years.
The cost advantages of using inexpensive, PC-based clusters over high-end multiprocessor servers can be quite substantial, at least for a highly parallelizable application like ours. The example $278,000 rack contains 176 2-GHz Xeon CPUs, 176 Gbytes of RAM, and 7 Tbytes of disk space. In comparison, a typical x86-based server contains eight 2-GHz Xeon CPUs, 64 Gbytes of RAM, and 8 Tbytes of disk space; it costs about $758,000.2 In other words, the multiprocessor server is about three times more expensive but has 22 times fewer CPUs, three times less RAM, and slightly more disk space. Much of the cost difference derives from the much higher interconnect bandwidth and reliability of a high-end server, but again, Google’s highly redundant architecture does not rely on either of these attributes.
Google in short doesn't invest in high-end hardware, rather they use the commodity white box hardware that you can find in a typical desktop PC. In fact they don't even buy the fastest microprocessor, they buy the ones that gives the best "cost per query". It's an extremely interesting article that goes into calculating depreciation, operating costs and even optimizing watts per unit of performance. It's impressive to see how its financial viability is so tightly intertwined with its choice of hardware. Google has made sure that as the web grows, their infrastructure grows with it at a financially viable pace.
It's well known that google puts together their own hardware, what struck me is that until you've seen it presented in naked numeric terms, you don't fully grasp the significance of the strategy. Commodity hardware inside the datacenter is going to be an extremely powerful trend, this is in fact being driven by commodity software (i.e. open source). See, if software is priced per CPU or server, the percentage of the software cost becomes prohibitively large. A few months ago I put together a decently equipped machine for around 300 dollars, now if you include say a Microsoft OS that costs say over 100 dollars then software will be at least one fourth of the overall cost. Multiply that with thousands of servers and you can see how big an obstacle it is.
However, simply slapping together hundred or thousands of boxes isn't going to cut it. You've got to add some "manageability" into the equation. Presented against this backdrop, Grid computing now seems extremely compelling proposition.

