Tag Archives: Sharding

Oracle Sharding – Capabilities – Part 1 of 3

In Oracle Database 12.2.0.1, Oracle Sharding supports the complete lifecycle of a Sharded Database. In this three part blog post series, we will review the capabilities of Oracle Sharding. In the first part, we will study the automated data distribution and centralized schema management.

Automated Data Distribution

In Oracle Database 12.2.0.1, Oracle Sharding supports two methods of sharding: system-managed and composite.

System-managed sharding:

System-managed sharding method automatically distributes data across shards using consistent hash (sharding_key). This algorithm evenly and randomly distributes data across shards to eliminate hot spots and provide uniform performance across shards. Oracle Sharding automatically rebalances data when shards are added to or removed. System-managed sharding is the most used form of sharding.

Oracle Sharding uses the familiar SQL syntax for table partitioning to specify how table rows are partitioned across shards.  For example, the CREATE SHARDED TABLE statement is used to create a sharded table based on cust_id as the sharding key.

CREATE SHARDED TABLE customers

( cust_id     NUMBER NOT NULL

, name       VARCHAR2(50)

, address     VARCHAR2(250)

, region     VARCHAR2(20)

, class       VARCHAR2(3)

, signup     DATE

CONSTRAINT cust_pk PRIMARY KEY(cust_id)

)

PARTITION BY CONSISTENT HASH (cust_id)

TABLESPACE SET ts1

PARTITIONS AUTO;

Oracle Sharding automates the creation of tablespaces on all the shards as a unit called a tablespace set. The PARTITIONS AUTO clause specifies that the number of partitions should be automatically determined and mapped to the tablespaces.

The unit of data migration between shards is a chunk. A chunk is a set of tablespaces that collocate corresponding partitions of all tables in a table family. A chunk contains a single partition from each table of a set of related tables. This guarantees that related data from different sharded tables can be moved together. The number of chunks within each shard is specified when the SDB is created (default is 120 chunks per shard).

In addition to sharded tables, many applications require common reference data (e.g. Products) that needs to be accessed as part of the queries in the transactions. Multi-shard queries can be avoided by replicating the small number of read-only or read-mostly “non-shardable” tables across all shards. Replication of complete tables is a good choice for relatively small tables that are often accessed together with sharded tables. A table with the same contents in each shard is called a Duplicated Table. For example a Customers–Orders–Line Items schema may also include a Products table. This table contains data shared by all customers and cannot be sharded by the customer number. Instead, the entire table is duplicated on all databases to prevent multi-shard queries during order processing. In the example of the Products table, it is created using the CREATE DUPLICATED TABLE statement.

The combination of sharded and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.

Composite Sharding:

With composite sharding method, data is first partitioned by list or range (super_sharding_key) and then further partitioned by consistent hash (sharding_key). The two levels of sharding make it possible to map data to a set of shards, and then automatically maintain balanced distribution of data across that set of shards. Composite sharding is ideal for global data distribution where shards are placed in each geography and within a given geography data is uniformly distributed and enables linear scalability

The composite sharding method allows you to create multiple shardspaces for different subsets of data in a table partitioned by consistent hash. A shardspace is set of shards that store data that corresponds to a range or list of key values. For example, the following CREATE SHARDED TABLE statement is used to create a table with composite sharding method based on geo as the super_sharding_key and cust_id as the sharding key.

CREATE SHARDED TABLE customers
( cust_id NUMBER NOT NULL
, name VARCHAR2(50)
, address VARCHAR2(250)
, geo VARCHAR2(20)
, class VARCHAR2(3)
, signup_date DATE
, CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id)
)
PARTITIONSET BY LIST (geo)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO
(PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1,
PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2);

Oracle Sharding also supports subpartitioning methods provided by Oracle Database and enables information lifecycle management (ILM) by placing subpartitions on separate tablespaces and moving them between storage tiers. Migration of subpartitions between storage tiers can be done without sacrificing the scalability and availability benefits of sharding and the ability to perform partition pruning and partition-wise joins on a primary key. This composite Sharding with sub-partitioning provides three-levels of data organization.

Centralized schema management

The SDB schema, including the structure of sharded and duplicated tables and the data distribution, is maintained in the Shard Catalog. The Shard Directors in conjunction with the Shard Catalog propagate the schema to all the shards.

central

Figure 1 . Schema management in a Sharded Database

The above pictorial illustrates how the schema is propagated by the shard director and the shard catalog to all the shards.

In the first part of the three part blog series , we have looked at the automated data distribution and centralized schema management. In the second part, we will go over the  automated creation and replication of shards and data-dependent routing.

 

Examples of target customers and applications for Oracle Sharding

In the last blog post, we looked at why a given customer would be interested in Oracle Sharding.  In this post, let’s see some examples of various customers and their applications that can benefit from Oracle Sharding. These target customers for Oracle sharding can come from any industry vertical. Examples include:

  • Mass Media and Financial Information Services providers who need massive scalability with high availability for online storage and retrieval of information.
  • Airline ticketing systems whose main driver for sharding is fault isolation. They want to shard across tens of independent databases. Failure of a database only makes 1/N of the data momentarily unavailable.
  • Social Media companies who may wish to allocate different shards for different classes of users/customer profiles, at different price levels.
  • Online Payment Systems that shard for linear scalability and fault isolation, and who may need to satisfy regulatory requirements for storing user data in the country of citizenship.
  • Financial and Tax preparation companies who shard by customer id to scale users, workload and transactions. Sharding provides these companies with elasticity required when demand for service peaks during tax filing season.
  • Large billing systems where each customer can be identified by a customer ID, phone number, or user ID.

In summary, Oracle Sharding 12.2.0.1 is a good fit for the applications (OLTP and data analytics) whose primary access pattern is based on a sharding key. The secondary pattern – multi-shard querying is also supported for reporting and data aggregation use cases. In the next blog post, I will cover the salient capabilities of Oracle Sharding.

For more info, do visit the Oracle Sharding portal on Oracle Technology Network (OTN). Follow me on Twitter @nageshbattula

Why would a customer be attracted to Oracle Sharding?

Oracle Sharding is a response to customer demand for a relational database architecture that can deliver the combination of linear scalability and complete fault isolation without requiring shared storage or compromising on the enterprise qualities of the Oracle Database: strict consistency, the full power of SQL, developer agility with JSON, security, high availability, backup and recovery, life-cycle management, etc.

OLTP applications designed for Oracle sharding can elastically scale (data, transactions and users) to any level, on any platform, simply by deploying new shards on additional stand-alone servers. The unavailability or slowdown of a shard due to either an unplanned outage or planned maintenance affects only the users of that shard, it does not affect the availability or performance of the application for users of other shards. Each shard may run a different release of the Oracle Database as long as the application is backward compatible with the oldest running version – making it simple to maintain availability of an application while performing database maintenance.

Oracle Sharding also does more than just extend the enterprise qualities of Oracle to a sharded database architecture. Oracle Sharding uses automation to simplify life- cycle management, advanced partitioning methods to address a wide array of use-cases, and data-dependent routing for superior runtime performance. Collectively these capabilities provide customers substantial advantages compared to competitive sharding solutions or custom deployments.

For more info, do visit the Oracle Sharding portal on Oracle Technology Network (OTN).

Benefits of Oracle Sharding

In this post, we will take a look at the advantages of Oracle Sharding.

  • Linear scalability with complete fault isolation. OLTP applications designed for Oracle sharding can elastically scale (data, transactions and users) to any level, on any platform, simply by deploying new shards on additional stand-alone servers. The unavailability or slowdown of a shard due to either an unplanned outage or planned maintenance affects only the users of that shard, it does not affect the availability or performance of the application for users of other shards. Upon the unavailability of a shard, failover is initiated automatically to another copy of the data. Each shard may run a different release of the Oracle Database as long as the application is backward compatible with the oldest running version – making it simple to maintain availability of an application while performing database maintenance.

linear-scaling

  • Global data distribution for data proximity – to bring data closer to the consumers and data sovereignty – to meet data privacy regulations.

geo-distrib

  • Simplicity via automation of many life-cycle management tasks including: automatic creation of shards and replication, system managed partitioning, single command deployment, elastic scale-out and fine-grained resharding.
  • Superior run-time performance using intelligent, data-dependent routing.
  • All of the advantages of sharding without sacrificing the capabilities of an enterprise RDBMS, including: relational schema, SQL, and other programmatic interfaces, complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.

Having looked at the benefits, in the next post we will study the key capabilities of Oracle Sharding.

Oracle Sharding – Introduction

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016.  On March 1, 2017, we have announced Oracle Database 12c Release 2 for on-premises as well. Oracle Sharding is one of the marquee features of Oracle Database 12.2.

We, from the Oracle Sharding Product Development team will be publishing periodic blog posts on various topics on Sharding.  The topics will include – Oracle Sharding benefits, capabilities, methods, data modeling and application requirements, high availability architecture, replication, deployment automation, direct and proxy routing, life cycle management, benchmarking results, monitoring, patching and many others that you will find interesting.

So, what is Oracle Sharding? It is a scalability, availability and geo-distribution feature for suitable OLTP applications that enables distribution and replication of data across a pool of discrete Oracle databases that share no hardware or software. Each database in the pool is referred to as a shard. The pool of shards is presented to an application as a single logical Oracle database (a sharded database or SDB).

Oracle sharding distributes data across shards using horizontal partitioning. Horizontal partitioning splits a database table across shards so that each shard contains the table with the same columns but a different subset of rows.The number of shards and the distribution of data across them are completely transparent to database applications. SQL statements issued by an application do not refer to shards nor are they dependent on the number of shards and their configuration.OLTP applications must be explicitly designed for a sharded database architecture in order to realize the benefits of scalability and availability. This is different from an HA architecture based upon Oracle Real Application Clusters (Oracle RAC) where scalability and availability are achieved transparent to an application. Applications that use a sharded database must have a well-defined data model and data distribution strategy (consistent hash, range, list or composite) that primarily accesses data via a sharding key. Examples of a shard key includes customer_id, account_no, country_id, etc. Oracle Sharding also supports data placement policies (rack and geo awareness) and all deployment models: on-premises and public or hybrid clouds.Transactions that require high performance must be single-shard transactions. For example, lookup and update of a customer’s billing record, lookup and update of a subscriber’s documents etc. There is no communication or coordination between shards for high performance transactions. Multi-shard operations and non-sharding key access are also supported. Such transactions include simple aggregations, reporting, etc. In return for these design considerations, applications that run on a sharded database architecture can achieve even higher levels of scalability and availability. Performance scales linearly as shards are added to the pool because each shard is completely independent from other shards. Each shard typically uses local storage, flash, and memory offering customers a further opportunity to optimize performance at relatively low cost. The first release of Oracle Sharding is designed to scale up to 1,000 shards. Isolation between shards also means that outages or poor performance of one shard does not impact the availability or performance of transactions executing at other shards.

Fig0

High Availability (HA) for individual shards is provided by automatic deployment of database replication. Simple, one-way Data Guard physical replication with automatic database failover is the default configuration. Active Data Guard (copies open read-only) or Oracle GoldenGate (bi-directional replication with all copies open read-write) may also be automatically deployed. Shards may be replicated within and across data centers. Replication is data-center and rack aware using data placement policies supported by Oracle Sharding. Optionally, Oracle RAC may be manually configured to provide Shard HA.

Shards are front-ended by a set of replicated listeners called Shard Directors that act as routers. Oracle clients (JDBC, OCI, and ODP.net) and the Oracle Universal Connection Pool (UCP) have been enhanced to recognize shard keys specified in a connection string and to insure availability by controlling the maximum number of connections allowed per shard. A shard routing cache in the connection layer (populated by the initial request to a shard) is used to route requests directly to the shard where the data resides for optimal runtime performance. The shard routing cache is automatically refreshed if there is any change made to the sharded database (e.g. automatic rebalancing or add/delete of shards).

In  this post, we have introduced you to Oracle Sharding at a high level. In the next post, we will look at the benefits of Oracle Sharding.