DynomiteDB’s network topology defines the overall structure of a DynomiteDB cluster including the location and configuration of nodes within each data center (DC), rack and server.
System Administrators and DevOps staff must fully understand DynomiteDB’s network topology. However, application developers do not require a deep knowledge of DynomiteDB’s topology in order to develop applications that use DynomiteDB.
Importantly, DynomiteDB’s topology determines the replication factor (RF). In a standard configuration, the RF within a DC is equal to the number of racks in that DC. For example, if your company has a San Francisco DC (named SFO) with 3 racks, then the RF for SFO is 3.
A cluster’s topology includes the following elements:
On Amazon Web Services (AWS), a DC = region, a rack = availability zone (AZ) and a server = an EC2 instance.
Topology hierarchy
A DynomiteDB cluster has a well defined hierarchy.
dynomite
instance) and a backend (ex. Redis or ForestDB)The DynomiteDB cluster is a conceptual container that contains one or more DCs. In each of the sections below we will discuss various topologies.
DynomiteDB uses the term token in three different contexts:
The important point is that these three uses of token overlap, as explained below.
The token range is the entire range of allowed tokens. The token range is from 0 to 4,294,967,295.
The token range defines the minimum and maximum values of both node tokens and data tokens.
Each node has a node token that indicates the beginning of the range of data tokens that the node owns. The node token is a value within the token range.
The token range is typically divided equally between each of the servers within a rack. For example, if a rack has 3 servers of equal size (same CPU, memory, disk, etc.) then the node on each server will own 1⁄3 of the token range.
The formula to calculate the node token is displayed below. numberOfNodesInRack
is a count of the servers/nodes within a rack. nodeIndex
is the node’s index (starting at 0) within the rack.
nodeToken = (4294967295 / numberOfNodesInRack) * nodeIndex
Using the above formula, we can calculate the node index for 3 nodes within a rack as follows:
firstNodeToken = (4294967295 / 3) * 0 = 0
secondNodeToken = (4294967295 / 3) * 1 = 1431655765
thirsNodeToken = (4294967295 / 3) * 2 = 2863311530
A data token is a hash of the key portion of a key/value pair. The key is hashed using a consistent hashing algorithm which outputs the data token. The data token is then compared against the node tokens to determine which nodes own the data token. In other words, the data token and node token determine where each key/value pair is located within a rack.
A node owns all data tokens within the data token range, which is calculated as follows:
minDataToken = nodeToken
maxDataToken = nextNodeToken - 1
A node owns all key/value pairs with a data token from minDataToken
to maxDataToken
.
1 DC, 1 rack, 3 servers
Never use a single DC with a single rack in production as this topology lacks replication, redundancy and HA. It is discussed here for learning purposes only.
A cluster with a single DC that contains a single rack is an overly simple topology that can be used during development but should not be used for production deployments. The cluster is comprised of a single DC which contains a single rack. The single rack contains three servers whose node tokens span the entire token range.
The following are the names of the various topology elements (indentation indicates the topology hierarchy):
Each rack in a DynomiteDB cluster contains the entire token range which means that the number of racks per DC is how you determine the replication factor (RF) per DC. Replication factor (RF) is the number of replicas (i.e. copies) of each key/value pair.
In this example we have one rack which means RF = 1
.
A rack contains the entire token range. The number of racks per data center (DC) sets the replication factor (RF) per DC.
In the diagram above, the rack contains 3 servers with the node token specified in the center of each node. Each server contains one instance of dynomite
(i.e. the node) and one instance of redis-server
(i.e. the backend). Please note that DynomiteDB supports pluggable backends which means that you can replace redis-server
with a different backend.
The entire cluster is exposed to clients via a Redis-compatible API. From an application developers perspective the DynomiteDB cluster operates similarly to a single Redis server. The fact that an application developer does not have to think about sharding, replication, redundancy or another other distributed database concepts is extremely powerful as it contributes to faster development velocity.
Intra-server architecture
A server is either a physical server or a virtual machine (such as an AWS EC2 instance).
Each server within the rack contains a node (a dynomite
instance) and a backend (a redis-server
instance or other backend). A key feature of DynomiteDB is pluggable backends which means that you can use DynomiteDB as a big data cache or database. To use DynomiteDB as a cache use an in-memory backend, such as Redis. To use DynomiteDB as a database use a persistent backend, such as ForestDB.
The node (i.e. the dynomite
instance) is a Dynamo-inspired server that provides the following capabilities:
DynomiteDB currently supports Redis as its primary backend. However, DynomiteDB supports pluggable backends and will add support for other backends, such as ForestDB, LMDB and Memcached, in the future.
A backend, which is technically a database storage engine, can provide volatile storage, persistent storage or a combination of both. Examples of volatile storage engines include Redis and Memcached. Examples of persistent storage engines include LMDB, RocksDB, LevelDB and ForestDB.
The following table shows the node tokens and data token ownership range.
Server | Node token | Min data token | Max data token |
---|---|---|---|
sfo-r1-s1 | 0 | 0 | 1431655764 | sfo-r1-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r1-s3 | 2863311530 | 2863311530 | 4294967295 |
The formulas used to calculate the data token ownership range follow below:
minDataToken = nodeToken
maxDataToken = nextNodeToken - 1
1 DC with 3 racks
The cluster is comprised of a single DC which contains three racks and each rack contains three servers. Each server contains a node (a dynomite
instance) and a backend (a redis-server
instance).
A cluster with a single DC that contains three racks is a production ready topology when HA in case of DC failure is not required.
The following are the names of the various topology elements (indentation indicates the topology hierarchy):
Each rack contains the entire token range which means that RF = 3
in this example. Specifically, a key/value pair where the data token = 100
will be replicated on sfo-r1-s1, sfo-r2-s1 and sfo-r3-s1.
In this example there are three racks, each of which contains three servers of equal size. The following table shows the node tokens and data token ownership range.
The following are the some salient points about the topology described in the table below:
Rack | Server | Node token | Min data token | Max data token |
---|---|---|---|---|
r1 | sfo-r1-s1 | 0 | 0 | 1431655764 | sfo-r1-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r1-s3 | 2863311530 | 2863311530 | 4294967295 |
r2 | sfo-r2-s1 | 0 | 0 | 1431655764 | sfo-r2-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r2-s3 | 2863311530 | 2863311530 | 4294967295 | r3 | sfo-r3-s1 | 0 | 0 | 1431655764 | sfo-r3-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r3-s3 | 2863311530 | 2863311530 | 4294967295 |
The formulas used to calculate the data token ownership range follow below:
minDataToken = nodeToken
maxDataToken = nextNodeToken - 1
2 DCs with 3 racks per DC
The cluster is comprised of a single DC which contains three racks and each rack contains three servers. Each server contains a node (a dynomite
instance) and a backend (a redis-server
instance).
A cluster with a two DCs that contains three racks per DC and three servers per rack is a robust, production ready topology that is resilient against server, rack and DC failure.
The following are the names of the various topology elements (indentation indicates the topology hierarchy):
Each rack contains the entire token range which means that RF = 3
for the SFO DC and JFK DC. Specifically, a key/value pair where the data token = 100
will be replicated on sfo-r1-s1, sfo-r2-s1 and sfo-r3-s1 within the SFO DC, plus jfk-r1-s1, jfk-r2-s1 AND jfk-r3-s1 within the JFK DC.
In this example there are six racks, each of which contains three servers of equal size. The following table shows the node tokens and data token ownership range.
The following are the some salient points about the topology described in the table below:
DC | Rack | Server | Node token | Min data token | Max data token |
---|---|---|---|---|---|
sfo | sfo-r1 | sfo-r1-s1 | 0 | 0 | 1431655764 | sfo-r1-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r1-s3 | 2863311530 | 2863311530 | 4294967295 |
sfo-r2 | sfo-r2-s1 | 0 | 0 | 1431655764 | sfo-r2-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r2-s3 | 2863311530 | 2863311530 | 4294967295 | sfo-r3 | sfo-r3-s1 | 0 | 0 | 1431655764 | sfo-r3-s2 | 1431655765 | 1431655765 | 2863311529 | sfo-r3-s3 | 2863311530 | 2863311530 | 4294967295 |
jfk | jfk-r1 | jfk-r1-s1 | 0 | 0 | 1431655764 | jfk-r1-s2 | 1431655765 | 1431655765 | 2863311529 | jfk-r1-s3 | 2863311530 | 2863311530 | 4294967295 |
jfk-r2 | jfk-r2-s1 | 0 | 0 | 1431655764 | jfk-r2-s2 | 1431655765 | 1431655765 | 2863311529 | jfk-r2-s3 | 2863311530 | 2863311530 | 4294967295 | jfk-r3 | jfk-r3-s1 | 0 | 0 | 1431655764 | jfk-r3-s2 | 1431655765 | 1431655765 | 2863311529 | jfk-r3-s3 | 2863311530 | 2863311530 | 4294967295 |
The formulas used to calculate the data token ownership range follow below:
minDataToken = nodeToken
maxDataToken = nextNodeToken - 1
Understanding the network topology is a critical task for System Administrators and operations staff. However, application developers are free to focus on application requirements as DynomiteDB’s Redis-compatible API provides a simple API that abstracts away the implementation details of DynomiteDB’s distributed nature.
The key elements in a DynomiteDB cluster’s topology are:
The number of racks per DC defines the replication factor (RF).
If your infrastructure runs on Amazon Web Services (AWS), then the topology elements map to AWS concepts as follows:
If your application runs within a single DC, then you may choose to run a topology with a single DC and three racks. However, if your application has high availability, scalability and performance requirements then a multi-DC, multi-rack per DC topology is a better choice.