How to Back Up and Restore a 10-TB Cluster at 1+ GB/s

Backing up or restoring large-scale distributed databases is time-consuming. When it takes a lot of time to backup or restore a database, Garbage Collection might break the snapshot used in the backup or restore process. Thus, some changes might be missing. This threatens data safety.

As an open-source, distributed SQL database, TiDB fulfills the requirement for backing up and restoring large-scale clusters. TiDB 4.0 release candidate (RC) introduced Backup & Restore (BR), a distributed backup and restore tool, that offers high backup and restore speeds—1 GB/s or more for 10 TB of data.

A Quick Look Into TiDB Performance on a Single Server

TiDB is an open-source, distributed database developed by PingCAP. This is a very interesting project as it is can be used as a MySQL drop-in replacement: it implements MySQL protocols and basically emulates MySQL. PingCAP defines TiDB is as a “one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.” In this blog post, I have decided to see how TiDB performs on a single server compared to MySQL for both OLTP and OLAP workload. Please note, this benchmark is very limited in scope: we are only testing TiDB and MySQL on a single server – TiDB is a distributed database out of the box.

Short version: TiDB supports parallel query execution for selects and can utilize many more CPU cores – MySQL is limited to a single CPU core for a single select query. For the higher-end hardware – ec2 instances in my case – TiDB can be 3-4 times faster for complex select queries (OLAP workload), which do not use, or benefit from, indexes. At the same time, point selects and writes, especially inserts, can be 5x-10x slower. Again, please note that this test was on a single server with a single TiKV process.