HA clustering for all with Corosync and Pacemaker
In the previous recipes, we addressed HA by distributing traffic between two active application servers. However, this method is only effective for stateless applications where the server or browser doesn’t contain specific user or session data. For applications that are not stateless or run on a complex server, a different approach to HA is necessary. The solution is to start and stop the application components on different servers, using the combination of Pacemaker and Corosync. These two open source software projects work together to provide HA clustering for Linux-based systems. They coordinate and manage multiple nodes in a cluster, ensuring that critical services remain available even during hardware or software failures.
Corosync serves as the communication layer for the HA cluster stack, allowing for dependable communication between nodes. It utilizes a membership and quorum system to monitor the cluster’s active nodes and guarantee that only one node operates as the primary (or master) at a given time. The messaging layer is essential for sharing data regarding the cluster’s state, node status, and resource conditions. Corosync plays a vital role in the cluster’s functionality, providing key features such as the following:
- Cluster communication: Corosync enables nodes to exchange messages reliably and efficiently, allowing them to coordinate and synchronize their actions.
- Membership and quorum: Corosync is a tool that keeps track of active nodes in a cluster and uses a quorum algorithm to ensure that there are enough nodes available to make decisions. This helps avoid split-brain scenarios and makes sure that only one node is active. It’s crucial to avoid split-brain clusters because they can cause data inconsistencies, corruption, and service disruptions. A split-brain scenario occurs when nodes in a cluster lose communication with each other. As a result, each node thinks it’s the only active one in the cluster. This can happen because of network issues, communication failures, or misconfigurations.
Note
When there is a split-brain scenario, several nodes within the cluster may begin running services or using shared resources on their own, thinking that they are the only active node. This can cause conflicts and data inconsistencies since each node operates independently without any coordination. When possible, use an odd number of nodes in a cluster, or enable some protection using quorum.
Pacemaker is a cluster resource manager that utilizes Corosync’s messaging and membership features to manage cluster resources and handle resource failover. It determines which node in the cluster should run specific services (resources) based on established policies and constraints. Pacemaker brings the following features to the cluster:
- Resource management: With Pacemaker, administrators can set up resources that require strong availability, such as IP addresses, services, databases, and applications
- Resource monitoring: Pacemaker continuously monitors the status of resources and nodes to detect failures or changes in the cluster
- Resource failover: If a node fails or there are resource problems, Pacemaker will begin a failover process, transferring resources to functioning nodes to guarantee uninterrupted availability
- Resource constraints: Administrators can set constraints and rules for resource placement and failover, defining which nodes are preferred or prohibited for specific resources
- Colocation and order constraints: Pacemaker allows defining relationships between resources, specifying which resources must run together on the same node or in a specific order
- Cluster management: Pacemaker provides various command-line utilities and graphical interfaces (such as Hawk) for managing and configuring the cluster