Scaling the YouTube Heights with MySQL

Share

One of the comforting facts about open source enterprise software is that some very big names depend on it. Google, with its hundreds of thousands of servers running a custom version of Linux, is probably the best-known example, but practically all the leading Web 2.0 companies – del.icio.us, Digg, Flickr etc. - also utilise buckets of free software to run their sites.

As this recording of an entertaining talk by YouTube's Paul Tuckfield explains, Google also uses a lot of MySQL, and pushes it very hard:

YouTube uses MySQL as the back-end. When Paul joined YouTube, he had 15 years of experience solving database scalability problems and administering computer networks. However, he was completely new to MySQL. Within weeks, the set of challenges he faced about scaling MySQL taught him so much more than one could learn over years. He's all excited about sharing his insights.

That's obviously an indication of how easy it is (well, relatively) to implement enterprise-level MySQL solutions. Tuckfield's talk offers some tips on how to attain YouTube-like levels of scalability using MySQL:

According to him, the three important reasons for YouTube's scalability are Python, Memcache and MySQL replication, the last having the most impact. Most people think that the answer to scalability is in upgrading hardware and CPU power. Adding CPUs doesn't work on its own; wisdom is in getting the maximum amount of RAM for the CPU and then fine tuning. He talks about replication in detail sharing his experience in dealing with problems such as time lags in replication between master and slave disks, RAID caching, OS level caching on Linux and cache at the database.