Daily Archives: July 31, 2018

Oracle Sharding 18c Blog Post Series – #1: User-defined Sharding

When the Oracle Sharding feature was introduced in Oracle Database 12c Release 2, the focus was on internet scale applications that require linear scalability and fault isolation. These applications deal with 100s of millions or billions of users, massive concurrent user base and extremely large databases. With Oracle 18c, Oracle Sharding has been made more inclusive to applications and various other use cases by increasing its flexibility and strategies.

Here is the list of the salient Oracle Database 18c Sharding features:

  1. User-defined Sharding
  2. PDB Sharding
  3. RAC Sharding
  4. JSON & Spatial capabilities and Multi-shard query enhancements
  5. Midtier Sharding

I plan to cover the details of these Sharding features in this blog post series. In the first post, we will discuss the new user-defined sharding method that aids in global data distribution and hybrid cloud deployment.

#1 User-defined Sharding:

In Oracle Database 12c, we introduced two sharding methods – system-managed sharding and composite sharding. 

System-managed sharding is based on partitioning by consistent hash. This sharding method randomly and evenly distributes data across shards and automatically redistributes it when shards are added to or removed from the sharded database. Consistent Hash is good for application where there are millions and even billions of values of the sharding key and it is not practical to manage them individually. This method enables linear scalability of transactions, concurrent users and the database capacity.

With composite sharding method, data is first partitioned by list or range (super_sharding_key) and then further partitioned by consistent hash (sharding_key). The two levels of sharding make it possible to map data to a set of shards, and then automatically maintain balanced distribution of data across that set of shards. Composite sharding is ideal for global data distribution where shards are placed in each geography and within a given geography data is uniformly distributed. Another use case for this method is – a set of powerful shards dedicated to Gold class users and set of low powered shards for Silver class users. Within each class of users, data is uniformly distributed.

But what if there are much less distinct values of the sharding key, maybe thousands, and the customer wants tighter control in mapping data to shards. For such cases, in Oracle Database 18c, we introduced user-defined sharding – a sharding method which is based on partitioning by RANGE or LIST and allows the user to explicitly specify mapping of data to shards. User-defined sharding is ideal for performance, regulatory, or other reasons when the user needs to store related data in the same shard and have full control on moving data between shards.

User-controlled data distribution provides:

  • Regulatory compliance – data remains in country of origin
  • Hybrid clouds – some shards are on-premises and other shards in the cloud
  • Cloud bursting – ability to move data from on-premises to the cloud during peak seasons
  • Increased visibility into planned maintenance – when a shard needs to be brought down for maintenance, the administrator knows exactly which data will not be available
  • Each shard can have different hardware and HA configuration
  • More efficient range queries

With user-defined sharding, the user owns the control to maintain balanced data distribution.

The term shardspace is used to describe a shard or a set of fully replicated shards that is used to store a part of an SDB partitioned by user-defined subset of key values. The shardspaces comprising an SDB may be in the same data center or in different data centers. With user-defined sharding, you can scale by adding additional shardspaces – extend the range or list of sharding_keys. Each shardspace can have different location, platform and replication topology.

UDS1

Figure 1.  Oracle Sharded Database with User-defined sharding

High-Level Steps for Creating a User-defined Sharded Schema

  • Create Shardspaces for each shard and all of its replicas
  • Manually create a tablespace for each partition in the primary shard of a given shardspace
  • Specify partitioning scheme and column that is the sharding key for the root table
    • The DBA explicitly maps partitions to the tablespaces
    • Child and grandchild tables all inherit the same partitioning scheme using the partition by reference clause
  • Specify which tables are duplicated on all shards (common reference data)

Here are the GDSCTL commands to create a sharded table for data sovereignty using the user-defined sharding method. In this example there are three shardspaces: NA_space, EU_space and ASIA_space.

ADD SHARDSPACE –SHARDSPACE NA_space

ADD SHARDSPACE –SHARDSPACE EU_space

ADD SHARDSPACE –SHARDSPACE ASIA_space

CREATE SHARD -SHARDSPACE NA_space -DEPLOY_AS primary  -DESTINATION host01 -CREDENTIAL oracle_cred -REGION NA

CREATE SHARD -SHARDSPACE EU_space -DEPLOY_AS primary -DESTINATION host02 -CREDENTIAL oracle_cred -REGION EU

CREATE SHARD -SHARDSPACE ASIA_space -DEPLOY_AS primary -DESTINATION host03 -CREDENTIAL oracle_cred -REGION ASIA

DEPLOY

UDS2

Figure 2.  Dedicated partitions for different user-defined shardspaces

Now create the shardspace specific tablespaces using SQL.

CREATE TABLESPACE NA_tbs IN SHARDSPACE NA_space;

CREATE TABLESPACE EU_tbs IN SHARDSPACE EU_space;

CREATE TABLESPACE ASIA_tbs IN SHARDSPACE ASIA_space;

Once the shards and the tablespaces are created, you can go ahead and create the sharded tables using SQL as shown below.

CREATE SHARDED TABLE accounts

( id NUMBER, account_number NUMBER , customer_id NUMBER

, branch_id NUMBER , geo VARCHAR2(20), status VARCHAR2(1)
)
PARTITION BY LIST (geo)
( PARTITION p_NA VALUES (‘NA’) TABLESPACE NA_tbs

, PARTITION p_EU VALUES (‘EU’) TABLESPACE EU_tbs

, PARTITION p_ASIA VALUES (‘ASIA’) TABLESPACE ASIA_tbs
);

It is as simple as that. This blog covered how user-defined sharding helps prepare organizations to honor data sovereignty requirements by preventing access to customer data outside of their home legal jurisdiction.

For more information on user-defined sharding, please peruse the following documentation links:

Sharding Methods:

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-methods.html#GUID-3B07D91C-CEAA-4170-A94B-ACF47BEE617B

SDB Deployment:

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-deployment.html#GUID-F99B8742-4089-4E77-87D4-4691EA932207

For in-depth collateral on Sharding, visit the Oracle Sharding OTN portal – https://www.oracle.com/goto/sharding and do follow me on Twitter  – https://twitter.com/nageshbattula