Daily Archives: August 23, 2017

Oracle Sharding Capabilities – Part 2 of 3

In the first part of the three-part blog series, we reviewed the automated data distribution and centralized schema management capabilities of Oracle Sharding. In part 2, I will cover the automated deployment of a sharded database (SDB) and data-dependent routing against an SDB.

Automated creation and replication of shards

In the Oracle Database 12.2.0.1 release, Oracle Sharding supports three automatically configured replication options: Data Guard, Active Data Guard, or Oracle GoldenGate (Oracle GoldenGate 12.3 supports Oracle Sharding).

adg

Figure 2. Shard-level replication with Active Data Guard

ogg

Figure 3. chunkset-level replication with Oracle GoldenGate

Sharding supports two SDB deployment methods.

The first method is with the “CREATE SHARD” GDSCTL command. With this method, the shards and their respective listeners are automatically created. Once the primary shards are created, the corresponding standby shards are automatically built using the RMAN ‘duplicate’ command. After which, the Data Guard Broker configuration with Fast-Start Failover (FSFO) is automatically enabled. The FSFO observers are automatically started on the regional shard director.

The second method is with the “ADD SHARD” GDSCTL command. Many organizations have their own database creation standards and they may opt to deploy the SDB using their own pre-created databases (shards). The ADD SHARD based deployment method supports this requirement by simply adding the shards, which are pre-built by the user.

These two deployment methods support both the initial and incremental deployments.

Data-dependent routing

Oracle Sharding supports Direct and Proxy routing of database requests to shards.

Direct Routing:

Key enhancements have been made to Oracle connection pools and drivers to support Sharding. Starting from 12.2, JDBC/UCP, OCI and Oracle Data Provider for .NET (ODP.NET) recognize the sharding keys as part of the connection check out. Apache Tomcat, JBoss, IBM WebSphere and Oracle WebLogic can use UCP support for sharding. PHP, Python, Perl, and Node.js can use OCI support.

Sharding Key is used for routing the database connection requests at a user session level during connection checkout. Based on this information, connection is established to the relevant shard which contains the data pertinent to the given sharding_key. Once the session is made to a shard, all SQL queries and DMLs are supported, executed in the scope of the given shard and require no modification.

Upon the first connection to a given shard, the sharding key range mapping is collected from the shards to dynamically build the shard topology cache. This routing map is cached in the client. This allows subsequent requests for sharding keys within the cached range to be routed directly to the shard, bypassing the shard director. Such data-dependent routing of database requests eliminates an extra network-hop – thereby decreasing the latency for high volume OLTP applications.

When a connection request is made with a sharding key, connection pool looks up the corresponding shards on which this particular sharding key exists (from its topology cache). If a matching connection is available in the pool then the pool returns a connection to one of these shards by applying its internal connection selection algorithm. If a connection is not available, then forwarding the request with the sharding key (by the connection pool) to the shard director creates a new connection.

As illustrated in Figure 4, DB connection request for a given sharding key that is in any of the cached topology map, goes directly to the shard (i.e., bypassing the shard director).

pic1

Figure 4. Logical flow of direct routing – Connection pool as a Shard Director

Note: Super_sharding_key is needed only in the case of composite sharding.

The routing map automatically refreshes when a shard becomes unavailable or changes occur to the sharding topology. This is enabled by the Fast Application Notification (FAN) published by Shard Director via Oracle Notification Server (ONS).

Proxy Routing for Multi-Shard Queries:

Proxy routing is an ancillary usage pattern targeted for developer convenience. This requires connection be established to the coordinator. In Oracle Database 12.2.0.1, the shard catalog database assumes the role of the coordinator database. Once the session is made to the coordinator, SQL queries and DMLs are executed and require no modification. Proxy routing is suitable for the following scenarios:

»  When the application cannot pass the sharding key during connect

»  When the application needs to access data from multiple shards in the same query. For example, a query to aggregate data across all shards.

pic2

Figure 5. Logical flow of proxy routing

As illustrated in Figure 5, routing via the coordinator allows users to submit SQL statements without a sharding key value passed during connect. The Coordinator’s SQL compiler analyzes and rewrites the query into query fragments that are sent and executed by the participating shards. The queries are rewritten so that most of the query processing is done on the participating shards and then aggregated by the coordinator.

Applications should separate their workloads for direct vs. proxy routing. Separate connection pools must be created for these workloads.

In the second part of the three-part blog series , we have looked at the automated creation of shards and the replication setup. We also understood how routing happens against a sharded database. In the last part, we will go over the  lifecycle management aspects of a sharded database.

For more info, do visit the Oracle Sharding portal on Oracle Technology Network (OTN). Follow me on Twitter @nageshbattula

 

Oracle Sharding – Capabilities – Part 1 of 3

In Oracle Database 12.2.0.1, Oracle Sharding supports the complete lifecycle of a Sharded Database. In this three part blog post series, we will review the capabilities of Oracle Sharding. In the first part, we will study the automated data distribution and centralized schema management.

Automated Data Distribution

In Oracle Database 12.2.0.1, Oracle Sharding supports two methods of sharding: system-managed and composite.

System-managed sharding:

System-managed sharding method automatically distributes data across shards using consistent hash (sharding_key). This algorithm evenly and randomly distributes data across shards to eliminate hot spots and provide uniform performance across shards. Oracle Sharding automatically rebalances data when shards are added to or removed. System-managed sharding is the most used form of sharding.

Oracle Sharding uses the familiar SQL syntax for table partitioning to specify how table rows are partitioned across shards.  For example, the CREATE SHARDED TABLE statement is used to create a sharded table based on cust_id as the sharding key.

CREATE SHARDED TABLE customers

( cust_id     NUMBER NOT NULL

, name       VARCHAR2(50)

, address     VARCHAR2(250)

, region     VARCHAR2(20)

, class       VARCHAR2(3)

, signup     DATE

CONSTRAINT cust_pk PRIMARY KEY(cust_id)

)

PARTITION BY CONSISTENT HASH (cust_id)

TABLESPACE SET ts1

PARTITIONS AUTO;

Oracle Sharding automates the creation of tablespaces on all the shards as a unit called a tablespace set. The PARTITIONS AUTO clause specifies that the number of partitions should be automatically determined and mapped to the tablespaces.

The unit of data migration between shards is a chunk. A chunk is a set of tablespaces that collocate corresponding partitions of all tables in a table family. A chunk contains a single partition from each table of a set of related tables. This guarantees that related data from different sharded tables can be moved together. The number of chunks within each shard is specified when the SDB is created (default is 120 chunks per shard).

In addition to sharded tables, many applications require common reference data (e.g. Products) that needs to be accessed as part of the queries in the transactions. Multi-shard queries can be avoided by replicating the small number of read-only or read-mostly “non-shardable” tables across all shards. Replication of complete tables is a good choice for relatively small tables that are often accessed together with sharded tables. A table with the same contents in each shard is called a Duplicated Table. For example a Customers–Orders–Line Items schema may also include a Products table. This table contains data shared by all customers and cannot be sharded by the customer number. Instead, the entire table is duplicated on all databases to prevent multi-shard queries during order processing. In the example of the Products table, it is created using the CREATE DUPLICATED TABLE statement.

The combination of sharded and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.

Composite Sharding:

With composite sharding method, data is first partitioned by list or range (super_sharding_key) and then further partitioned by consistent hash (sharding_key). The two levels of sharding make it possible to map data to a set of shards, and then automatically maintain balanced distribution of data across that set of shards. Composite sharding is ideal for global data distribution where shards are placed in each geography and within a given geography data is uniformly distributed and enables linear scalability

The composite sharding method allows you to create multiple shardspaces for different subsets of data in a table partitioned by consistent hash. A shardspace is set of shards that store data that corresponds to a range or list of key values. For example, the following CREATE SHARDED TABLE statement is used to create a table with composite sharding method based on geo as the super_sharding_key and cust_id as the sharding key.

CREATE SHARDED TABLE customers
( cust_id NUMBER NOT NULL
, name VARCHAR2(50)
, address VARCHAR2(250)
, geo VARCHAR2(20)
, class VARCHAR2(3)
, signup_date DATE
, CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id)
)
PARTITIONSET BY LIST (geo)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO
(PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1,
PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2);

Oracle Sharding also supports subpartitioning methods provided by Oracle Database and enables information lifecycle management (ILM) by placing subpartitions on separate tablespaces and moving them between storage tiers. Migration of subpartitions between storage tiers can be done without sacrificing the scalability and availability benefits of sharding and the ability to perform partition pruning and partition-wise joins on a primary key. This composite Sharding with sub-partitioning provides three-levels of data organization.

Centralized schema management

The SDB schema, including the structure of sharded and duplicated tables and the data distribution, is maintained in the Shard Catalog. The Shard Directors in conjunction with the Shard Catalog propagate the schema to all the shards.

central

Figure 1 . Schema management in a Sharded Database

The above pictorial illustrates how the schema is propagated by the shard director and the shard catalog to all the shards.

In the first part of the three part blog series , we have looked at the automated data distribution and centralized schema management. In the second part, we will go over the  automated creation and replication of shards and data-dependent routing.