Oracle Sharding – Introduction

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016.  On March 1, 2017, we have announced Oracle Database 12c Release 2 for on-premises as well. Oracle Sharding is one of the marquee features of Oracle Database 12.2.

We, from the Oracle Sharding Product Development team will be publishing periodic blog posts on various topics on Sharding.  The topics will include – Oracle Sharding benefits, capabilities, methods, data modeling and application requirements, high availability architecture, replication, deployment automation, direct and proxy routing, life cycle management, benchmarking results, monitoring, patching and many others that you will find interesting.

So, what is Oracle Sharding? It is a scalability, availability and geo-distribution feature for suitable OLTP applications that enables distribution and replication of data across a pool of discrete Oracle databases that share no hardware or software. Each database in the pool is referred to as a shard. The pool of shards is presented to an application as a single logical Oracle database (a sharded database or SDB).

Oracle sharding distributes data across shards using horizontal partitioning. Horizontal partitioning splits a database table across shards so that each shard contains the table with the same columns but a different subset of rows.The number of shards and the distribution of data across them are completely transparent to database applications. SQL statements issued by an application do not refer to shards nor are they dependent on the number of shards and their configuration.OLTP applications must be explicitly designed for a sharded database architecture in order to realize the benefits of scalability and availability. This is different from an HA architecture based upon Oracle Real Application Clusters (Oracle RAC) where scalability and availability are achieved transparent to an application. Applications that use a sharded database must have a well-defined data model and data distribution strategy (consistent hash, range, list or composite) that primarily accesses data via a sharding key. Examples of a shard key includes customer_id, account_no, country_id, etc. Oracle Sharding also supports data placement policies (rack and geo awareness) and all deployment models: on-premises and public or hybrid clouds.Transactions that require high performance must be single-shard transactions. For example, lookup and update of a customer’s billing record, lookup and update of a subscriber’s documents etc. There is no communication or coordination between shards for high performance transactions. Multi-shard operations and non-sharding key access are also supported. Such transactions include simple aggregations, reporting, etc. In return for these design considerations, applications that run on a sharded database architecture can achieve even higher levels of scalability and availability. Performance scales linearly as shards are added to the pool because each shard is completely independent from other shards. Each shard typically uses local storage, flash, and memory offering customers a further opportunity to optimize performance at relatively low cost. The first release of Oracle Sharding is designed to scale up to 1,000 shards. Isolation between shards also means that outages or poor performance of one shard does not impact the availability or performance of transactions executing at other shards.

Fig0

High Availability (HA) for individual shards is provided by automatic deployment of database replication. Simple, one-way Data Guard physical replication with automatic database failover is the default configuration. Active Data Guard (copies open read-only) or Oracle GoldenGate (bi-directional replication with all copies open read-write) may also be automatically deployed. Shards may be replicated within and across data centers. Replication is data-center and rack aware using data placement policies supported by Oracle Sharding. Optionally, Oracle RAC may be manually configured to provide Shard HA.

Shards are front-ended by a set of replicated listeners called Shard Directors that act as routers. Oracle clients (JDBC, OCI, and ODP.net) and the Oracle Universal Connection Pool (UCP) have been enhanced to recognize shard keys specified in a connection string and to insure availability by controlling the maximum number of connections allowed per shard. A shard routing cache in the connection layer (populated by the initial request to a shard) is used to route requests directly to the shard where the data resides for optimal runtime performance. The shard routing cache is automatically refreshed if there is any change made to the sharded database (e.g. automatic rebalancing or add/delete of shards).

In  this post, we have introduced you to Oracle Sharding at a high level. In the next post, we will look at the benefits of Oracle Sharding.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s