In Oracle Database 12.2.0.1, Oracle Sharding supports the complete lifecycle of a Sharded Database. In this three part blog post series, we will review the capabilities of Oracle Sharding. In the first part, we will study the automated data distribution and centralized schema management.
Automated Data Distribution
In Oracle Database 12.2.0.1, Oracle Sharding supports two methods of sharding: system-managed and composite.
System-managed sharding:
System-managed sharding method automatically distributes data across shards using consistent hash (sharding_key). This algorithm evenly and randomly distributes data across shards to eliminate hot spots and provide uniform performance across shards. Oracle Sharding automatically rebalances data when shards are added to or removed. System-managed sharding is the most used form of sharding.
Oracle Sharding uses the familiar SQL syntax for table partitioning to specify how table rows are partitioned across shards. For example, the CREATE SHARDED TABLE statement is used to create a sharded table based on cust_id as the sharding key.
CREATE SHARDED TABLE customers
( cust_id NUMBER NOT NULL
, name VARCHAR2(50)
, address VARCHAR2(250)
, region VARCHAR2(20)
, class VARCHAR2(3)
, signup DATE
CONSTRAINT cust_pk PRIMARY KEY(cust_id)
)
PARTITION BY CONSISTENT HASH (cust_id)
TABLESPACE SET ts1
PARTITIONS AUTO;
Oracle Sharding automates the creation of tablespaces on all the shards as a unit called a tablespace set. The PARTITIONS AUTO clause specifies that the number of partitions should be automatically determined and mapped to the tablespaces.
The unit of data migration between shards is a chunk. A chunk is a set of tablespaces that collocate corresponding partitions of all tables in a table family. A chunk contains a single partition from each table of a set of related tables. This guarantees that related data from different sharded tables can be moved together. The number of chunks within each shard is specified when the SDB is created (default is 120 chunks per shard).
In addition to sharded tables, many applications require common reference data (e.g. Products) that needs to be accessed as part of the queries in the transactions. Multi-shard queries can be avoided by replicating the small number of read-only or read-mostly “non-shardable” tables across all shards. Replication of complete tables is a good choice for relatively small tables that are often accessed together with sharded tables. A table with the same contents in each shard is called a Duplicated Table. For example a Customers–Orders–Line Items schema may also include a Products table. This table contains data shared by all customers and cannot be sharded by the customer number. Instead, the entire table is duplicated on all databases to prevent multi-shard queries during order processing. In the example of the Products table, it is created using the CREATE DUPLICATED TABLE statement.
The combination of sharded and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.
Composite Sharding:
With composite sharding method, data is first partitioned by list or range (super_sharding_key) and then further partitioned by consistent hash (sharding_key). The two levels of sharding make it possible to map data to a set of shards, and then automatically maintain balanced distribution of data across that set of shards. Composite sharding is ideal for global data distribution where shards are placed in each geography and within a given geography data is uniformly distributed and enables linear scalability
The composite sharding method allows you to create multiple shardspaces for different subsets of data in a table partitioned by consistent hash. A shardspace is set of shards that store data that corresponds to a range or list of key values. For example, the following CREATE SHARDED TABLE statement is used to create a table with composite sharding method based on geo as the super_sharding_key and cust_id as the sharding key.
CREATE SHARDED TABLE customers
( cust_id NUMBER NOT NULL
, name VARCHAR2(50)
, address VARCHAR2(250)
, geo VARCHAR2(20)
, class VARCHAR2(3)
, signup_date DATE
, CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id)
)
PARTITIONSET BY LIST (geo)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO
(PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1,
PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2);
Oracle Sharding also supports subpartitioning methods provided by Oracle Database and enables information lifecycle management (ILM) by placing subpartitions on separate tablespaces and moving them between storage tiers. Migration of subpartitions between storage tiers can be done without sacrificing the scalability and availability benefits of sharding and the ability to perform partition pruning and partition-wise joins on a primary key. This composite Sharding with sub-partitioning provides three-levels of data organization.
Centralized schema management
The SDB schema, including the structure of sharded and duplicated tables and the data distribution, is maintained in the Shard Catalog. The Shard Directors in conjunction with the Shard Catalog propagate the schema to all the shards.
Figure 1 . Schema management in a Sharded Database
The above pictorial illustrates how the schema is propagated by the shard director and the shard catalog to all the shards.
In the first part of the three part blog series , we have looked at the automated data distribution and centralized schema management. In the second part, we will go over the automated creation and replication of shards and data-dependent routing.