Tag Archives: Sharding

Benefits of Oracle Sharding

In this post, we will take a look at the advantages of Oracle Sharding.

  • Linear scalability with complete fault isolation. OLTP applications designed for Oracle sharding can elastically scale (data, transactions and users) to any level, on any platform, simply by deploying new shards on additional stand-alone servers. The unavailability or slowdown of a shard due to either an unplanned outage or planned maintenance affects only the users of that shard, it does not affect the availability or performance of the application for users of other shards. Upon the unavailability of a shard, failover is initiated automatically to another copy of the data. Each shard may run a different release of the Oracle Database as long as the application is backward compatible with the oldest running version – making it simple to maintain availability of an application while performing database maintenance.

linear-scaling

  • Global data distribution for data proximity – to bring data closer to the consumers and data sovereignty – to meet data privacy regulations.

geo-distrib

  • Simplicity via automation of many life-cycle management tasks including: automatic creation of shards and replication, system managed partitioning, single command deployment, elastic scale-out and fine-grained resharding.
  • Superior run-time performance using intelligent, data-dependent routing.
  • All of the advantages of sharding without sacrificing the capabilities of an enterprise RDBMS, including: relational schema, SQL, and other programmatic interfaces, complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.

Having looked at the benefits, in the next post we will study the key capabilities of Oracle Sharding.

Oracle Sharding – Introduction

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016.  On March 1, 2017, we have announced Oracle Database 12c Release 2 for on-premises as well. Oracle Sharding is one of the marquee features of Oracle Database 12.2.

We, from the Oracle Sharding Product Development team will be publishing periodic blog posts on various topics on Sharding.  The topics will include – Oracle Sharding benefits, capabilities, methods, data modeling and application requirements, high availability architecture, replication, deployment automation, direct and proxy routing, life cycle management, benchmarking results, monitoring, patching and many others that you will find interesting.

So, what is Oracle Sharding? It is a scalability, availability and geo-distribution feature for suitable OLTP applications that enables distribution and replication of data across a pool of discrete Oracle databases that share no hardware or software. Each database in the pool is referred to as a shard. The pool of shards is presented to an application as a single logical Oracle database (a sharded database or SDB).

Oracle sharding distributes data across shards using horizontal partitioning. Horizontal partitioning splits a database table across shards so that each shard contains the table with the same columns but a different subset of rows.The number of shards and the distribution of data across them are completely transparent to database applications. SQL statements issued by an application do not refer to shards nor are they dependent on the number of shards and their configuration.OLTP applications must be explicitly designed for a sharded database architecture in order to realize the benefits of scalability and availability. This is different from an HA architecture based upon Oracle Real Application Clusters (Oracle RAC) where scalability and availability are achieved transparent to an application. Applications that use a sharded database must have a well-defined data model and data distribution strategy (consistent hash, range, list or composite) that primarily accesses data via a sharding key. Examples of a shard key includes customer_id, account_no, country_id, etc. Oracle Sharding also supports data placement policies (rack and geo awareness) and all deployment models: on-premises and public or hybrid clouds.Transactions that require high performance must be single-shard transactions. For example, lookup and update of a customer’s billing record, lookup and update of a subscriber’s documents etc. There is no communication or coordination between shards for high performance transactions. Multi-shard operations and non-sharding key access are also supported. Such transactions include simple aggregations, reporting, etc. In return for these design considerations, applications that run on a sharded database architecture can achieve even higher levels of scalability and availability. Performance scales linearly as shards are added to the pool because each shard is completely independent from other shards. Each shard typically uses local storage, flash, and memory offering customers a further opportunity to optimize performance at relatively low cost. The first release of Oracle Sharding is designed to scale up to 1,000 shards. Isolation between shards also means that outages or poor performance of one shard does not impact the availability or performance of transactions executing at other shards.

Fig0

High Availability (HA) for individual shards is provided by automatic deployment of database replication. Simple, one-way Data Guard physical replication with automatic database failover is the default configuration. Active Data Guard (copies open read-only) or Oracle GoldenGate (bi-directional replication with all copies open read-write) may also be automatically deployed. Shards may be replicated within and across data centers. Replication is data-center and rack aware using data placement policies supported by Oracle Sharding. Optionally, Oracle RAC may be manually configured to provide Shard HA.

Shards are front-ended by a set of replicated listeners called Shard Directors that act as routers. Oracle clients (JDBC, OCI, and ODP.net) and the Oracle Universal Connection Pool (UCP) have been enhanced to recognize shard keys specified in a connection string and to insure availability by controlling the maximum number of connections allowed per shard. A shard routing cache in the connection layer (populated by the initial request to a shard) is used to route requests directly to the shard where the data resides for optimal runtime performance. The shard routing cache is automatically refreshed if there is any change made to the sharded database (e.g. automatic rebalancing or add/delete of shards).

In  this post, we have introduced you to Oracle Sharding at a high level. In the next post, we will look at the benefits of Oracle Sharding.