Scaling MySQL: thoughts on replication, sharding, and MySQL Cluster
I’ve been reading through Building Scalable Web Sites, by Cal Henderson, and he makes a useful distinction between speed and scalability. For instance, PHP is comparatively slow. If you really want it to run fast, write your program in C or some such. However, PHP applications can rather easily be made to scale horizontally; so, while PHP is not fast, it is scalable. Flickr and Friendster are examples or PHP working on a pretty large scale.
MySQL Replication and Sharding
MySQL is another animal altogether. For instance, MySQL replication only natively allows for a single master (CORRECTION: it is possible to have a multi-master setup, but it’s not very fault-tolerant); that’s a pretty concrete ceiling on UPDATE and INSERT handling. If you reach that ceiling, assuming you’ve expended every reasonable effort to improve performance, you have to start sharding. Since sharding means recoding the part of your application that is coupled to the database setup, I would call it a failure to scale on the part of the database solution. Let’s face it, you lose a lot of the great features of your relational database when you shard (each type of sharding representing a different compromise). Not to mention, even after you shard, those master DBs are still single points of failure. Automatic fail-over can mitigate that issue, but it’s not very classy.
MySQL Cluster
It seems a better approach to scaling MySQL is MySQL Cluster, a more horizontal solution. SQL nodes and data nodes can be added somewhat trivially, and data redundancy within a cluster is easy to set up. On top of that, clusters can be replicated, allowing for a master-slave relationship between clusters.
In reality, MySQL cluster isn’t quite ready for the enterprise, and I don’t imagine it plays a central role in any of the big guys’ scaling schemes quite yet. Bugs, and major limitations like support for only READ COMMITTED transaction isolation and ignoring of foreign keys in the NDB storage engine, mean MySQL Cluster still has some growing up to do before even the bold can use it for business-critical data management.






