The Difference Between TokuMX Partitioning and Sharding

In my last post, I described a new feature in TokuMX 1.5—partitioned collections—that’s aimed at making it easier and faster to work with time series data. Feedback from that post made me realize that some users may not immediately understand the differences between partitioning a collection and sharding a collection. In this post, I hope to clear that up.

On the surface, partitioning a collection and sharding a collection seem similar. Both actions take a collection and break it into smaller pieces for some performance benefit. Also, the terms are sometimes used interchangeably when discussing other technologies. But for TokuMX, the two features are very different in purpose and implementation. In describing each feature’s purpose and implementation, I hope to clarify the differences between the two features.