Disaster tolerance with Apache Cassandra
Highly Available
The size and scope of today's Internet companies require more than your average SQL. Apache Cassandra is one of the NoSQL systems filling the need for high availability at scale.
Apache Cassandra is an open source NoSQL distributed database that stores and manages large volumes of data on standard servers. Cloud providers use Cassandra for configurations with many data centers spread across global networks.
The story of Apache Cassandra began in 2007 when Facebook engineers Prashant Malik and Avinash Lakshman developed a very early version for Facebook's inbox search. The challenge was to store the data for huge datasets residing on hundreds of servers. A year later, Facebook released Cassandra on Google Code, making it an open source project. In 2009, it joined the Apache incubator, paving the way to it becoming a top-level Apache Foundation project. Since then, many well-known companies have implemented Cassandra or a commercial version (DataStax Enterprise), including Apple, Netflix, Twitter, Sony, eBay, Walmart, and FedEx. Cassandra and other NoSQL alternatives are part of a new generation of data tools designed to fulfill the massive storage needs of the Internet era. A conventional relational database, such as an SQL database, is difficult to cluster, subdivide, or scale horizontally. Companies can either keep their data at a single location and let their customers contend with long wait times to access it remotely, or they can operate two instances of the database. Neither of these scenarios is viable for a modern international company that needs both global data availability and the ability to grow without incurring additional costs. NoSQL systems are built to be extremely scalable. To increase performance, you can simply add additional nodes to the cluster on the fly. To double the performance of the database, you just need to add the same number of nodes as the cluster already has. Apache Cassandra is based on Java and has symmetrical nodes organized in clusters, rather than the master and named nodes used with SQL implementations. Cassandra is useful for real-time data storage for online applications with multiple transactions. You can also use Cassandra as a read-intensive database for business intelligence systems. If you're accustomed to SQL, you'll find that the Cassandra Query Language (CQL) is strongly reminiscent of SQL in terms of syntax and keywords. Cassandra is designed for a distributed environment. To fully implement Cassandra's disaster tolerance capabilities on a massive scale, companies need to distribute the data across different regions or even different cloud providers. If one instance fails, some latency may occur, but the data remains available.
CAP Theorem
The CAP theorem is a principle of computer science that helps to explain why NoSQL systems like Cassandra differ from conventional data tools. The CAP theorem (or Brewer's theorem), which describes the relationship between consistency (C), availability (A), and partition tolerance (P), was first articulated by Eric Allen Brewer, Professor Emeritus of Computer Science at University of California, Berkeley and Vice President of Infrastructure at Google. CAP forms the basis for planning a distributed architecture. The basic parts of the CAP decision framework are:
- Consistency: "Each read operation accesses the last write operation or an error." A consistent system returns the same value from each node that is requested.
- Availability: "Each request receives an error-free response." Whatever happens within the cluster does not affect the clients. A highly available system always sends an answer, even if half of the cluster is already dead.
- Partition tolerance: "The system continues to work despite network problems between nodes." A partition-tolerant system continues to run even if there are serious communication problems within the cluster.
The CAP theorem states that no distributed system can fully achieve all three of these objectives. Because a distributed database must continue to operate if the network stops or part of the system is down, the third objective (partition tolerance) is required. That means a distributed database can either be consistent and partition-tolerant (CP) but less available; or it can be highly available and partition-tolerant (AP) and less consistent (Figure 1). These two mutually exclusive options are best understood if you consider the basic trade-offs between consistency and availability. If you wish to maximize availability, the system must continue to receive data when the distributed nodes are not able to communicate with each other (e.g., it can't just stop working and send an error message). But a scenario that calls for a node to provide data to a user when it is unable to verify that the data is up to date does not fulfill the ideal for consistency. On the other hand, if you wish to maximize consistency, the system will need to ensure that the data provided to the user is the latest version or else return an error, which means that, if the nodes cannot communicate, the system would not be fully available.
Cassandra is known as an AP system, because it maximizes availability and partition tolerance at the expense of consistency. Cassandra's developers are willing to tolerate some inconsistency in order to ensure that the database remains available when operating in a partitioned state. This emphasis on availability over consistency is one reason why Cassandra is so highly scalable compared to many conventional database options. The steps necessary to maximize consistency do not scale for multiple nodes and large datasets. However, although Cassandra emphasizes availability in the partitioned state, it does include synchronization features that provide consistency among the distributed nodes in normal operating conditions.
No SPoF
Cassandra's AP design, with its emphasis on availability, requires that the system eliminate all Single Points of Failure (SPoF). (In a relational database, by contrast, each master node is a potential SPoF.) In order to eliminate SPoF, either all components must be designed redundantly or the design must reflect a masterless architecture, in which the nodes are peers. In the case of Cassandra, every node can process a request, no matter if it needs to read or write. If one of these nodes fails, its data must be available at another location – waiting to restore a backup is not an option for a system designed to achieve zero downtime. Instead, the data is provisionally replicated before anyone needs it. Cassandra lets you define a replication factor. If you set the replication to 3
or 5
, each data element is replicated in the corresponding number of nodes. Redundant replication causes additional costs; however, the cost of storage is small compared to the loss of reputation and long-term economic damage associated with lost data. It is also important to remember that replicas should not reside next to each other on the servers. Servers in the same rack tend to fail together. Any event can paralyze not only the rack, but the entire data center. It is therefore advisable to opt for georedundant replication.
Automatic Recovery
More servers means the higher the probability of a failure. A cluster must be capable of restoring itself independently. There are two mechanisms for this restoration:
- Announced redirects: When one node fails, other nodes start keeping updates for the failed node. If the node is regenerated within a reasonable amount of time (typically one to four hours, depending on the configuration), these handover packages are reinstalled on the node, restoring it completely and autonomously.
- Repair/NodeSync: If network delays or similar issues cause problems, a cluster performs a health check and recovery operation. In Apache Cassandra, this is known as a "Repair."
The nodes constantly communicate with one another in order to implement workaround options when needed. In Cassandra, nodes immediately report when a new node enters the cluster or an old node fails. When a client application connects to the database using the specified IP address, it loads the database metadata and prepares to send a request to each subsequent node if the actual target node is not reached. Depending on the setting, an application may repeatedly request other nodes. Each node in the cluster can receive a request for data. A node receiving the request acts as a "coordinator node" and sends the request to the nodes responsible for data. This system requires knowledge of the cluster metadata, which the nodes constantly exchange. Each node knows the cluster schema and the position of all usable nodes.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.