ShardingSphere’s Built-In Metadata Handling Function for Sharded Database Environments

Apache ShardingSphere is a widely recognized and trusted open-source data management platform that provides robust support for key functionalities such as sharding, encryption, read/write splitting, transactions, and high availability. The metadata of ShardingSphere encompasses essential components such as rules, data sources, and table structures, which are fundamental for the smooth operation of the platform. ShardingSphere leverages the advanced capabilities of governance centers like ZooKeeper and etc., for efficient sharing and modification of cluster configurations, enabling seamless horizontal expansion of computing nodes.

In this informative blog post, our emphasis will be on gaining a comprehensive understanding of the metadata structure employed by Apache ShardingSphere. We will delve into the intricacies of the three-layer metadata structure within ZooKeeper, which encompasses crucial components such as metadata information, built-in metadata database, and simulated MySQL database.

Metadata Structure

For a comprehensive grasp of the metadata structure utilized in Apache ShardingSphere, a closer examination of the cluster mode of ShardingSphere-Proxy can be beneficial. The metadata structure in ZooKeeper adopts a three-layer hierarchy, with the first layer being the governance_ds. This layer encompasses critical components such as metadata information, built-in metadata database, and simulated MySQL database.

governance_ds
--metadata (metadata information)
----sharding_db (logical database name)
------active_version (currently active version)
------versions
--------0
----------data_sources (underlying database information)
----------rules (rules of logical database, such as sharding, encryption, etc.)
------schemas (table and view information)
--------sharding_db
----------tables
------------t_order
------------t_single
----------views
----shardingsphere (built-in metadata database)
------schemas
--------shardingsphere
----------tables
------------sharding_table_statics (sharding statistics table)
------------cluster_information （version information）
----performance_schema (simulated MySQL database)
------schemas
--------performance_schema
----------tables
------------accounts
----information_schema (simulated MySQL database)
------schemas
--------information_schema
----------tables
------------tables
------------schemata
------------columns
------------engines
------------routines
------------parameters
------------views
----mysql
----sys
--sys_data (specific row information of built-in metadata database)
----shardingsphere
------schemas
--------shardingsphere
----------tables
------------sharding_table_statistics
--------------79ff60bc40ab09395bed54cfecd08f94
--------------e832393209c9a4e7e117664c5ff8fc61
------------cluster_information
--------------d387c4f7de791e34d206f7dd59e24c1c

The metadata directory serves as a repository for storing essential rules and data source information, including the currently active metadata version, which is stored under the active_version node. Meanwhile, the versions stored within the metadata directory house different iterations of rules and database connection details.

On the other hand, the schemas directory is designated for storing comprehensive tables and viewing information from the logical database. ShardingSphere meticulously preserves the decorated table structure information after applying the rules. For instance, in the case of sharding tables, it retrieves the structure from one of the actual tables, replaces the table name, and omits the real encrypted column information in the table structure, allowing users to conveniently operate on the logical database directly. The built-in metadata database, located within the metadata directory, boasts a structure that resembles that of the logical database. However, it is specifically designed to house certain built-in table structures such as sharding_table_statics and cluster_information, which will be elaborated on in subsequent discussions. In addition, the metadata directory also includes other nodes such as performance_schema, information_schema, mysql, sys, and more, which emulate the data dictionary of MySQL. These nodes serve the purpose of supporting various client tools to connect to the proxy, and future plans involve expanding data collection to facilitate queries on these data dictionaries.

The three-layer metadata structure of ShardingSphere consists of governance_ds, metadata, and a built-in metadata database, is designed to provide compatibility with different database formats. For instance, PostgreSQL has a three-layer structure consisting of instance, database, and schema, whereas MySQL has a two-layer structure of database and table. Therefore, ShardingSphere adds an identical logical schema layer for MySQL to ensure logical uniformity. The meticulously designed three-layer metadata structure of ShardingSphere, encompassing governance_ds, metadata, and a built-in metadata database, has been formulated to ensure seamless compatibility with diverse database formats. For instance, while PostgreSQL follows a three-layer structure comprising instance, database, and schema, MySQL adopts a two-layer structure of database and table.

To ensure logical uniformity, ShardingSphere introduces an additional logical schema layer for MySQL.Gaining a comprehensive understanding of the metadata structure employed in Apache ShardingSphere is of utmost significance for developers seeking to utilize the platform optimally. By thoroughly examining the metadata structure of ShardingSphere, developers can gain valuable insights into how the platform effectively stores and manages data sources and table structures.

In the preceding section, we examined ShardingSphere's integrated metadata database, encompassing two tables: sharding_table_statistics (a table for collecting sharding information) and cluster_information(a table for storing version information). We also explored the potential of the metadata database to house both internal collection data and user-defined information (yet to be implemented).In this section, we will delve into the inner workings of the built-in metadata database, including its data collection and query implementation mechanisms.

Data Collection

The ShardingSphere's integrated metadata database relies on data collection to aggregate information into memory and synchronizes it with the governance center to ensure consistency across clusters. To illustrate the process of data collection into memory, let's use the sharding_table_statistics table as an example. The ShardingSphereDataCollectorinterface outlines a method for data collection:

     Java
    
    public interface ShardingSphereDataCollector extends TypedSPI {
    Optional<ShardingSphereTableData> collect(String databaseName, ShardingSphereTable table, Map<String, ShardingSphereDatabase> shardingSphereDatabases) throws SQLException;
}

The aforementioned method is invoked by the ShardingSphereDataCollectorRunnable scheduled task. The current implementation initiates a scheduled task on the Proxy for data collection, utilizing the built-in metadata table to differentiate data collectors for specific data collection tasks. It is worth noting that based on feedback from the community, this approach may evolve into an e-job trigger method for collection in the future. The logic for collecting information is encapsulated in the ShardingStatisticsTableCollectorclass. This class employs the underlying data source and sharding rules to query relevant database information and extract statistical data.

Query Implementation

Upon completion of the data collection process, the ShardingSphereDataScheduleCollector class compares the collected information and the data stored in memory. In the event of any inconsistencies, it triggers an event EVENTBUSto notify the governance center. Subsequently, upon receiving the event, the governance center updates the information of other nodes and executes memory synchronization accordingly. The code for the event listening class is depicted below:

     Java
    
 

    public final class ShardingSphereSchemaDataRegistrySubscriber {
    
    private final ShardingSphereDataPersistService persistService;
    
    private final GlobalLockPersistService lockPersistService;
    
    public ShardingSphereSchemaDataRegistrySubscriber(final ClusterPersistRepository repository, final GlobalLockPersistService globalLockPersistService, final EventBusContext eventBusContext) {
        persistService = new ShardingSphereDataPersistService(repository);
        lockPersistService = globalLockPersistService;
        eventBusContext.register(this);
    }
    
    @Subscribe
    public void update(final ShardingSphereSchemaDataAlteredEvent event) {
        String databaseName = event.getDatabaseName();
        String schemaName = event.getSchemaName();
        GlobalLockDefinition lockDefinition = new GlobalLockDefinition("sys_data_" + event.getDatabaseName() + event.getSchemaName() + event.getTableName());
        if (lockPersistService.tryLock(lockDefinition, 10_000)) {
            try {
                persistService.getTableRowDataPersistService().persist(databaseName, schemaName, event.getTableName(), event...