Alibaba, one of the largest Chinese IT companies, opened the source texts of the distributed DBMS PolarDB based on PostgreSQL. POLARDB Expands POSTGRESQL capabilities tools for distributed data storage with integrity and support for acid transactions in the context of the entire global database spaced over different cluster nodes. POLARDB also supports distributed SQL-query processing, providing fault tolerance and redundant data storage to replenish information after one or more nodes fails. If you need to extend the storage, just add new nodes to the cluster. Code Open Under the APACHE 2.0 license.
polardb consists of of two components – extensions and Patches set to PostgreSQL. Patches are expanding the capabilities of the PostgreSQL kernel, and extensions include components implemented separately from PostgreSQL, such as a distributed transaction management mechanism, global services, distributed SQL-query handler, additional metadata, tools for controlling the cluster, deploying cluster and simplify the transfer of existing systems into it.
Patches are added to the POSTGRESQL core distributed option for managing parallel access to data using multi-time content (MVCC, MULTIVERSION CONCURRENCY CONTROL) for different levels of insulation. Most of the POLARDB functionality is made in extension, which reduces the dependence on PostgreSQL and simplifies the update and implementation of POLARDB-based solutions (simplifies the transition to new versions of PostgreSQL and maintain full compatibility with PostgreSQL). To control the cluster, the PGXC_CTL toolkit is used, based on a similar utility from PostgreSQL-XC and PostgreSql-xl .
Three basic components are highlighted in the cluster: BD nodes (DN), cluster manager (CM) and transaction management service (TM). Additionally, the proxy loader of the load can be coaxed. Each component is a separate process and can be run on different server. The database nodes serve SQL queries from customers and at the same time act as coordinators of distributed queries with the participation of other bd nodes. The cluster manager tracks the status of each database node, stores the cluster configuration and provides tools for managing, backup, load balancing, updates, starting and stopping nodes. Transaction management service is responsible for maintaining overall integrity throughout the cluster.