Oracle Sharding – A Comprehensive White Paper

Oracle Sharding is a scalability, availability and geo-distribution feature for OLTP applications that enables distribution and replication of data across a pool of Oracle databases that share no hardware or software. Applications elastically scale (data, transactions and concurrent users) to any level, on any platform, simply by adding additional databases (shards) to the pool. Oracle Sharding allows applications to perform high velocity relational transactions.

Oracle Sharding automates the entire lifecycle of a sharded database – deployment, schema creation, data-dependent routing with superior run-time performance, elastic scaling and simpler life-cycle management.

This white paper on Oracle Sharding is intended for Enterprise Architects, Database Architects, Database Administrators, Application Architects and those who are involved in the design and architecture of distributed database systems. In this paper, I cover various aspects of Oracle Sharding – concepts, architecture, capabilities, application requirements, benchmark results, licensing and many more. Read this paper to get answers to many of the questions that you may have on Oracle Sharding.

Oracle Sharding Benchmark on Oracle Bare Metal Cloud (IaaS)

As Oracle Sharding is built on shared-nothing hardware architecture, OLTP applications designed to run on it can elastically scale (data, transactions and concurrent users) to any level, on any platform, simply by deploying new shards on additional stand-alone servers. Also, the unavailability or slowdown of a shard due to either an unplanned outage or planned maintenance affects only the users of that shard; it does not affect the availability or performance of the application for users of other shards (i.e. faults are contained). This blog covers the results of the Oracle Sharding MAA benchmark on Oracle Bare Metal Cloud.

The objectives of this benchmark are to:

  • Elastically scale-out the Sharded Database (SDB) on Oracle Bare Metal IaaS Cloud – following the Oracle Maximum Availability Architecture (MAA) best practices
  • Demonstrate the linear scalability of relational transactions with Oracle Sharding
  • Observe the fault isolation aspects of a sharded database

SDB Benchmark

Figure 1. Sharded Database Topology on Oracle Bare Metal Cloud

As shown in Figure 1, the sharded database used in this benchmark has two shard groups – one in each of the availability domains of the Oracle Bare Metal Cloud. The network latency between the availability domains is less than 1ms. Shardgroup1 in Availability_Domain1 has 100 primary shards and the shardgroup2 in Availability_Domain2 has 100 HA standby shards. Each shard is hosted on dedicated bare metal server, which has 36 cores, 512G RAM and four 12.8TB NVMe flash disks. These flash disks are used for the creation of two ASM Diskgroups (+DATA, +RECO) with normal redundancy. Three shard directors were deployed in each of the Availability Domains for high availability. The shard catalog is placed in Availability_Domain1 and its standby is located in Availability_Domain2.

For this benchmark, Swingbench Order-entry application has been used. Customer_id column is defined as the sharding key. The data model has been modified so that every sharded table in the table family contains the sharding key. Tables used for reference data have been defined as duplicated tables. From the application stand point, the connection pooling code has been modified to use the new Oracle Sharding API calls – to check out a connection on a given shard based on the sharding key. Once the connection is checked out, transactions and queries are executed within the scope of that shard.

Role-based global services have been created so that read-write transactions run on primary shards and queries run on standby shards. The total size of the sharded database is 50TB with 500G on each of the 200 shards.

The replication chosen for this sharded database is Active Data Guard in Max Availability mode. SDB automated deployment method using the “CREATE SHARD” command is used. So, the creation of shards and the configuration of the Active Data Guard replication setup are automatically done by the Oracle Sharding management tier. After which the Data Guard Broker configurations with Fast-Start Failover (FSFO) are automatically enabled for all the shards. The FSFO observers are also automatically started on the regional shard director.

Here are the steps taken for the execution of the benchmark:

  • Begin the application workload on 25 primary shards and 25 standby shards
  • Bring up 25 more primary and 25 more standby shards
  • Repeat step #2 until all the 200 shards are online on both Availability Domains
  • Observe linear scaling of transactions per second (TPS) and queries per second (QPS)
  • Compute the total read-write transaction per second and queries per second across all shards in the sharded database

The following table (Table #1) shows the raw data of the TPS and QPS observed, as the sharded database is elastically scaled-out.

Primary shards Standby  shards Read/Write
Transactions per Second
Ready Only
Queries per Second
25 25 1,180,000 1,625,000
50 50 2,110,000 3,260,000
75 75 3,570,000 5,050,000
100 100 4,380,000 6,820,000

Table 1. Linear scalability of relational transactions as shards are added

linear scaling - graph

Figure 2. Transactions and queries scaled linearly as shards are added

Figure 2 illustrates that as shards were doubled, tripled and quadrupled, we were able to observe that the rate of transactions and queries doubled, tripled and quadrupled accordingly. This demonstrated the frictionless linear scaling due to shared-nothing hardware architecture of Oracle Sharding. Altogether we were able to execute 11.2 Million transactions per second that includes 4.38 Million read-write transactions per second across all the 100 primary shards and 6.82 Million read-only transactions per second across all the 100 active standby shards.

The study also illustrated that Oracle Sharding provides extreme data availability. When a fault was induced on a given shard, there was absolutely no impact to the other shards in the SDB. This is due to zero shared hardware or software among the shards.

fault isolationFigure 3. A shard outage has zero impact on surviving shards

Figure 3 depicts that the application remained available on other shards even when a fault is induced on a given shard.

This blog covered the results of the Oracle MAA study on Oracle Sharding and highlighted the linear scalability and fault isolation benefits. This study showcased that high-performance and reliable Oracle Bare Metal Cloud infrastructure is an ideal platform for running Oracle sharded databases. Visit www.oracle.com/goto/oraclesharding for cookbooks on SDB deployment on Oracle Cloud and Oracle Sharding MAA Best Practices White Paper (To be published soon). Follow me on Twitter @nageshbattula

Oracle Global Data Services (GDS): Part 2 – Load Balancing Use Cases

Oracle Database 12c Global Data Services galvanizes the asset utilization of replicated database resources. It allows connect-time and run-time load balancing, routing and service failover across replicated databases situated in any data center in any geographical region. With GDS, customers can now achieve these capabilities without the need to either integrate their High Availability stack with hardware load balancers or write custom homegrown connection managers. And remember that GDS comes with the Active Data Guard license and is also available to Oracle GoldenGate customers at no additional charge as well.

In this blog, we follow up on the introduction to GDS from Part 1 and walk through key use cases for workload balancing:
1. The first use case (shown below) is load balancing for reader farms:
readerfarmFigure 1: Load balancing of Read-Only workloads on a reader farm
Imagine a scenario where GDS is enabled for an Active Data Guard or Oracle GoldenGate reader farm with standby replicas located in both local and remote data centers. Let’s say a Read Write global service for Order Entry runs on the Primary database and the Read Only Global Services for Reporting run on the reader farm. Using GDS, the client connections are automatically load balanced among the Read Only global services running on the reader farm (across data centers). This capability improves resource utilization, performance and scalability with Read Only workload balancing on Active Data Guard or Oracle GoldenGate reader farms.

2. Another use case (as shown below) is load balancing of Read Write services among multi-masters within and across regions:

gds_ogg_lbsFigure 2: Load balancing for Read-Write workloads with GDS
Let’s take a scenario of active/active databases using Oracle GoldenGate in a GDS configuration. In this case the Read Write global service is configured to run on each of the masters. For this scenario, GDS automatically balances the Read Write workload among the databases in the GoldenGate multi-master configuration.

This wraps up our exploration of key Oracle Database 12c GDS load balancing use cases. In the next installment of the GDS blog series (Part 3), we will take a look at the use cases on Global Service failover across databases.

Oracle Global Data Services (GDS): Part 1 – Automated Workload Management for Replicated Databases

Introduction
Global Data Services is a key offering within Oracle’s Maximum Availability Architecture. It’s really a must-have for organizations that are using Oracle high availability technologies such as Active Data Guard or Oracle GoldenGate to replicate data across multiple databases. With automated workload balancing and service failover capabilities, GDS improves performance, availability, scalability, and manageability for all databases that are replicated within a data center and across the globe. And GDS boosts resource utilization, which really improves the ROI of Active Data Guard and GoldenGate investments. It does this in an integrated, automated way that no other technology can match. Plus it’s included with the Active Data Guard license – and since GoldenGate customers have the right to use Active Data Guard, it’s available to them at no additional charge as well.

Customer Challenges
Enterprises typically deploy replication technologies for various business requirements – high availability and disaster recovery, content localization and caching, scalability, performance optimization for local clients or for compliance in accordance with local laws. Oracle customers use Active Data Guard and Oracle GoldenGate to address all of these business requirements. They use Active Data Guard to distribute their Read-Only workload and GoldenGate to distribute not only Read workloads but also Read Write workloads across their replicated databases.

However when you’re trying to optimize workload management across multiple database replicas, you run into certain challenges that simply extend beyond the capabilities of replication technology. That’s because customers are unable to manage replicated databases with a unified framework and instead have to deal with database silos from an application and DBA’s perspective.

Let’s look at a couple of the main problems with database silos.
The first is under-utilized resources – for example, when one replica cannot be leveraged to shoulder the workload of another over-utilized database. This leads to suboptimal resource utilization, which can adversely affect performance, availability and of course cost. The other problem with silos is the inability to automatically fail over a service across databases – let’s say a production application workload is running against a particular replica. If that replica goes down due to an unplanned event, customers don’t have a mechanism that automatically and transparently relocates the Service to another available replica. When a replica fails that can lead to application outages.

Until the introduction of Oracle Global Data Services (GDS), there really wasn’t a way for enterprises to achieve Service Failover and load balancing across replicas out of the Oracle Stack. To address this, some customers have chosen to compile their own homegrown connection managers and others have integrated their HA stack with hardware load balancers. But these solutions still don’t address all of the issues:

Manual load balancing using homegrown connection managers, for example, incurs huge development costs and yet cannot optimize performance and availability for replicated systems. Special purpose network load balancers can help but they introduce additional cost and complexity – and they still can’t offer database service failover and centralized workload management.

Global Data Services Overview
Global Data Services delivers automated workload management, which addresses all of these key pain points. It eliminates the need for custom connection managers and load balancers for database workloads.

With a newly created concept called Global Service, Oracle Global Data Services extends the familiar Oracle RAC-style connect-time and run-time load balancing, service failover and management capabilities beyond a single clustered database. Capabilities that were so far applicable only to a single database can now be applied to a set of replicated databases that may reside within or across data centers. Customers can achieve these capabilities by simply setting the pertinent attributes of the Global Service.

workload balance

Figure 1: Workload balance – Maximize application performance with GDS

service_failover

Figure 2: Global Service Failover – Maximize application availability with GDS

GDS sits between the application tier and the database tiers of the stack. It orchestrates the Service high availability, Service level load balancing and routing. Global Services run on the databases but are managed by GDS. GDS algorithms take into account DB instance load, network latency between data centers and the workload management policies (region affinity, load balancing goals, DB cardinality, DB role, replication lag tolerance) that the customers can configure. These workload management policies are enabled via the attributes of a given Global Service.

What are the key capabilities that are really unique to GDS?

1. For performance optimization, there’s region-based workload routing, which automatically routes workloads to the database closest to the clients. For example, what if the customer has a requirement that all the clients/applications closer to the North American data center need to be routed to the database in the North American data center? Likewise, European clients may need to be routed to the European database. GDS addresses this problem by managing this workload routing automatically.

2. In addition, GDS provides connect time load balancing and supports run time load balancing – another key performance advantage.

3. For higher application availability, GDS enables inter-database service failover. If a replica goes down as a result of a planned or unplanned event, GDS fails over the service to another replica.

4. And it also offers role based global services. GDS will make sure that the global services are always started on those databases whose database role matches the role specified for the service. For example, if Data Guard undergoes role transitions, the global services are relocated accordingly, maintaining availability requirements.

5. For improved data quality, there’s also replication lag-based workload routing. This capability routes read workloads to a Data Guard standby whose replication lag is within a customer-specified threshold that’s based on business needs

6. By managing all of the resources of the replicas efficiently, customers are able to maximize their ROI because there are no longer any under-utilized servers.

This wraps up the introductory blog post on Oracle Database 12c GDS. We looked at the challenges of workload management for replicated databases and how GDS addresses those challenges. In the next blog, we will review some of the key capabilities of GDS and the tangible business benefits.

Benefits of Oracle Sharding

In this post, we will take a look at the advantages of Oracle Sharding.

  • Linear scalability with complete fault isolation. OLTP applications designed for Oracle sharding can elastically scale (data, transactions and users) to any level, on any platform, simply by deploying new shards on additional stand-alone servers. The unavailability or slowdown of a shard due to either an unplanned outage or planned maintenance affects only the users of that shard, it does not affect the availability or performance of the application for users of other shards. Upon the unavailability of a shard, failover is initiated automatically to another copy of the data. Each shard may run a different release of the Oracle Database as long as the application is backward compatible with the oldest running version – making it simple to maintain availability of an application while performing database maintenance.

linear-scaling

  • Global data distribution for data proximity – to bring data closer to the consumers and data sovereignty – to meet data privacy regulations.

geo-distrib

  • Simplicity via automation of many life-cycle management tasks including: automatic creation of shards and replication, system managed partitioning, single command deployment, elastic scale-out and fine-grained resharding.
  • Superior run-time performance using intelligent, data-dependent routing.
  • All of the advantages of sharding without sacrificing the capabilities of an enterprise RDBMS, including: relational schema, SQL, and other programmatic interfaces, complex data types, online schema changes, multi-core scalability, advanced security, compression, high-availability, ACID properties, consistent reads, developer agility with JSON, and much more.

Having looked at the benefits, in the next post we will study the key capabilities of Oracle Sharding.

Oracle Sharding – Introduction

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016.  On March 1, 2017, we have announced Oracle Database 12c Release 2 for on-premises as well. Oracle Sharding is one of the marquee features of Oracle Database 12.2.

We, from the Oracle Sharding Product Development team will be publishing periodic blog posts on various topics on Sharding.  The topics will include – Oracle Sharding benefits, capabilities, methods, data modeling and application requirements, high availability architecture, replication, deployment automation, direct and proxy routing, life cycle management, benchmarking results, monitoring, patching and many others that you will find interesting.

So, what is Oracle Sharding? It is a scalability, availability and geo-distribution feature for suitable OLTP applications that enables distribution and replication of data across a pool of discrete Oracle databases that share no hardware or software. Each database in the pool is referred to as a shard. The pool of shards is presented to an application as a single logical Oracle database (a sharded database or SDB).

Oracle sharding distributes data across shards using horizontal partitioning. Horizontal partitioning splits a database table across shards so that each shard contains the table with the same columns but a different subset of rows.The number of shards and the distribution of data across them are completely transparent to database applications. SQL statements issued by an application do not refer to shards nor are they dependent on the number of shards and their configuration.OLTP applications must be explicitly designed for a sharded database architecture in order to realize the benefits of scalability and availability. This is different from an HA architecture based upon Oracle Real Application Clusters (Oracle RAC) where scalability and availability are achieved transparent to an application. Applications that use a sharded database must have a well-defined data model and data distribution strategy (consistent hash, range, list or composite) that primarily accesses data via a sharding key. Examples of a shard key includes customer_id, account_no, country_id, etc. Oracle Sharding also supports data placement policies (rack and geo awareness) and all deployment models: on-premises and public or hybrid clouds.Transactions that require high performance must be single-shard transactions. For example, lookup and update of a customer’s billing record, lookup and update of a subscriber’s documents etc. There is no communication or coordination between shards for high performance transactions. Multi-shard operations and non-sharding key access are also supported. Such transactions include simple aggregations, reporting, etc. In return for these design considerations, applications that run on a sharded database architecture can achieve even higher levels of scalability and availability. Performance scales linearly as shards are added to the pool because each shard is completely independent from other shards. Each shard typically uses local storage, flash, and memory offering customers a further opportunity to optimize performance at relatively low cost. The first release of Oracle Sharding is designed to scale up to 1,000 shards. Isolation between shards also means that outages or poor performance of one shard does not impact the availability or performance of transactions executing at other shards.

Fig0

High Availability (HA) for individual shards is provided by automatic deployment of database replication. Simple, one-way Data Guard physical replication with automatic database failover is the default configuration. Active Data Guard (copies open read-only) or Oracle GoldenGate (bi-directional replication with all copies open read-write) may also be automatically deployed. Shards may be replicated within and across data centers. Replication is data-center and rack aware using data placement policies supported by Oracle Sharding. Optionally, Oracle RAC may be manually configured to provide Shard HA.

Shards are front-ended by a set of replicated listeners called Shard Directors that act as routers. Oracle clients (JDBC, OCI, and ODP.net) and the Oracle Universal Connection Pool (UCP) have been enhanced to recognize shard keys specified in a connection string and to insure availability by controlling the maximum number of connections allowed per shard. A shard routing cache in the connection layer (populated by the initial request to a shard) is used to route requests directly to the shard where the data resides for optimal runtime performance. The shard routing cache is automatically refreshed if there is any change made to the sharded database (e.g. automatic rebalancing or add/delete of shards).

In  this post, we have introduced you to Oracle Sharding at a high level. In the next post, we will look at the benefits of Oracle Sharding.