Oracle Sharding is a scalability, availability and geo-distribution feature for applications that enables distribution and replication of data across a pool of Oracle databases that share no hardware or software. Applications elastically scale (data, transactions and concurrent users) to any level, on any platform, simply by adding additional databases (shards) to the pool. Oracle Sharding allows applications to perform high velocity relational transactions. It is applicable to applications (OLTP and Data Analytics) where the primary access pattern is via the Sharding key. Sharding supports multi-shard operations as well.
Oracle Sharding provides the advantages of an enterprise DBMS, including: relational schema, SQL, and other programmatic interfaces, support for complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.
In this blog, I will go over the various components of a Sharded Database (SDB) – Shards, Shard Catalog, Shard Directors (GSM), Global Service and Connection Pools. Figure 1 showcases the architectural components of Oracle Sharding:
Figure 1: Oracle Sharding Architecture with no SPOF
Sharded Database and Shards
Shards are independent Oracle databases that are hosted on database servers which have their own local resources – CPU, memory, and disk . No shared-storage is required across the shards. A sharded database is a collection of shards. Shards can all be placed in one region (datacenter[s]) or can be placed in different regions. A region in the context of Oracle Sharding represents a datacenter or a multiple datacenters that are in close network proximity.
Shards are replicated for High Availability (HA) and Disaster Recovery (DR) with Oracle replication technologies such as Active Data Guard or Oracle GoldenGate. For HA, the standby shards can be placed in the same region where the primary shards are placed. For DR, the standby shards are located in another region.
The shard catalog is a special-purpose Oracle Database that is a persistent store for SDB configuration data and plays a key role in automated deployment and centralized management of a sharded database. It also hosts the gold schema of the application and the master copies of common reference data (duplicated tables). The shard catalog database also acts as a query coordinator used to process multi-shard queries and queries that do not specify a sharding key.
All configuration changes, such as adding and removing shards and global services, are initiated on the shard catalog. All DDLs in an SDB are executed by connecting to the shard catalog.
An outage of the shard catalog does not affect the availability of the SDB. A shard catalog outage only affects the ability to perform maintenance operations or multi-shard queries during the brief period required to failover to a standby shard catalog. OLTP transactions are unaffected; they continue to be routed and executed on the shards. Oracle MAA recommends that a local Active Data Guard standby database be configured with Maximum Availability database protection mode with Fast-Start Failover and a remote physical standby database.
The Shard Director is a regional network listener for clients that connect to an SDB. It maintains an up-to-date topology of the sharded database. Shard Directors route connections to the appropriate shards based on the sharding key passed during a connection request. The Shard Director is built upon the Oracle Global Service Manager (GSM). GSMs route connections based on database role, load, replication lag, and locality. For Oracle Sharding, GSMs have been enhanced to support management of SDB and routing of connections based on the location of data.
For a typical sharded database, shard directors (GSMs) are installed on dedicated low-end commodity servers in each region. Multiple shard directors should be deployed for high availability. In Oracle Database 12.2, up to 5 shard directors can be deployed in a given region. Oracle MAA recommends deploying three shard directors per region for availability and scalability.
Shard Directors provide the following set of functions
- Maintain runtime metadata about SDB configuration and availability of shards
- Publish SDB topology changes to connection pools
- Measure network latency between its own and other regions
- Act as a regional listeners for clients to connect to an SDB
- Manage global services
- Perform connection load balancing
- Publish runtime load balancing information to connection pools
- Monitor availability of database instances and global services, and notify clients via FAN HA events upon failure incidents
- Aid in schema propagation to all the shards
A Global Service is a database service that can run across multiple databases. A global service allows clients to access data on any shard in the SDB. Global Services offer additional properties for sharded databases – e.g., database role, replication lag tolerance, region affinity, etc. You can create role-based global services: create a read-write global service that accesses data from primary shards, and a separate read-only global service for Active Data Guard shards.
At runtime, connection pools act as shard directors by routing database requests across pooled connections. Key enhancements have been made to Oracle connection pools and drivers to support Sharding. Starting from 12.2, JDBC/UCP, OCI and Oracle Data Provider for .NET (ODP.NET) recognize the sharding keys as part of the connection check out. Apache Tomcat, JBoss, IBM WebSphere and Oracle WebLogic can use UCP support for sharding. PHP, Python, Perl, and Node.js can use OCI support.
I will go over the details of the data-dependent routing against an SDB in the next blog.