Skip to content

High Availability

Introduction

About this Document

The aim of this guide is to provide the essential steps to get a new High Availability (HA) cluster up and running as quickly as possible and to provide general maintenance guidance. This document is intended for technical personnel, system administrators and network administrators.

Examples used in this Guide

All setups and configuration examples in this guide are executed as an advanced administrator. Some options are not available if you log in as a full administrator or a user with lower access privileges.

As software development and documentation are ongoing processes, the screenshots shown in this guide may slightly deviate from the current user interface.

High Availability Concepts

Overview

In this chapter, we explain the aim of a High Availability (HA) system and some important terminology associated with this technology.

Topics covered in this section include:

  • HA Prerequisites

  • What is High Availability?

  • The Heartbeat process

  • Virtual IP Addresses and Fail-over

  • HA and Load Balancing

Prerequisites

You cannot enable the HA Feature without a valid Able license. With a licensed appliance, configurations for all purchased options are possible for a user with full administration rights. The licensing process is explained in the AXS Guard System Administration How To and the Getting Started guide, which are accessible by clicking on the permanently available Documentation button in the Administrator Tool.

Licensing HA appliances

The same steps apply as explained in the System Administration and Getting Started Guides. However, HA is a special case which requires you to import a separate license file on each AXS Guard appliance in the HA cluster (An HA cluster typically consists of a Master and a Slave appliance). You can only enable the HA Feature on an appliance with the correct license. To upgrade your existing installation (e.g. a single AXS Guard) to an HA cluster, your appliance has to be replaced by 2 or more HA capable machines. Contact your reseller for information.

To correctly import a Able license file on your HA appliance, you have to:

  1. Follow the Licensing steps as explained in the System Administration How To and acquire a Able license file for the Master.

  2. Import the license.dat file on the Master.

  3. Follow the same steps to import a license on the Slave. Use the Contract ID of the Master to register the Slave and obtain a separate license file.

What is High Availability?

High-Availability clusters (also known as failover Clusters) are implemented primarily for the purpose of improving the availability of services that the cluster provides. They operate by having redundant nodes, which are then used to provide services when system components fail. The most common size for an HA cluster is two nodes, a Master and a Slave, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure.

image

The Heartbeat Process

Heartbeat is a daemon (process) that checks which peers in the HA cluster are online or offline. The Heartbeat communications are transmitted over ethernet or over a serial connection (RS-232). The serial cable is included with your delivery.

Virtual IP Addresses

A virtual IP address (VIP or VIPA) is an IP address that is not connected to a specific computer or network interface card (NIC) on a computer. Packets that are sent to the VIP address, are automatically redirected to the appropriate physical network interface. Virtual IPs are used for connection redundancy or Fail-over; a VIP address remains available if a computer or a NIC fails, because an alternative computer or NIC automatically assigns itself the VIP address and replies to incoming connections.

Virtual IPs can be assigned in the following AXS Guard network zones:

  • The Secure LAN

  • The DMZ

  • The Internet (WAN)

For details about the AXS Guard network zones, see the System Administration How To, which can be accessed via the Documentation button in the Administrator Tool.

Example: HA configuration with 2 SMTP servers

In the simple case of an AXS Guard HA configuration with 2 SMTP servers, a single virtual IP address is assigned to the virtual IP address "pool." The virtual IP address is assigned to the first SMTP server (HA Master), which handles the mail traffic that is explicitly routed to it. The second SMTP server, the HA Slave, assumes the virtual IP address and handles SMTP traffic only if the Master fails.

HA as a Load Balancer

Load balancing is a setup in which multiple computers are linked together to share computational workloads. It functions as a single “virtual” computer. Logically, there are multiple physical machines (the Master and the Slave), but they function as a single “virtual” machine. Requests initiated from users are managed by, and distributed among, all the standalone computers to form a cluster. This results in balanced computational work among different machines, improving the performance of the cluster systems.

Do not confuse HA with Load-balancing.

Load-balancing is provided by the AXS Guard Internet Redundancy Feature. For more information, see the Internet Redundancy How To, which is accessible via the Documentation button in the Administrator Tool. In some specific cases, HA may provide Load-balancing, as explained below.

Load balancing can be provided for, but is not limited to, the following services:

  • Web Access: rather than sending all Web Access (proxy) requests to the Master, they are sent to the Slave instead. This saves a lot of processing power on the Master.

  • Internet Redundancy: If you have 2 Internet Lines, one on the Master and one on the Slave, traffic from your LAN towards the Internet can be balanced over both lines, reducing the workload on the Master. This can only be set up by specialized Able personnel. Contact your reseller for details.

High Availability Configuration

Before you Start

In this chapter we explain how to set up an AXS Guard HA cluster from scratch. This is the most common setup in the field, but other setups are possible.

Important

  • Disable automated version and revision updates on the master and the slave unit; manual intervention is required to upgrade your HA cluster.

  • Keep one of the following guides in reach, as they explain the correct procedures to create administrator accounts, upgrade a system and how to obtain a valid license for your AXS Guard appliances:
    - The Installation Guide - The System Administration Guide

Feature Activation

  1. Go to System > Feature Activation.

  2. Enable the High Availability option.

  3. Update your configuration.

    High Availability Feature Activation

Licensing your HA Cluster

You must acquire a valid Able license before you can start configuring and using your HA cluster. The same Contract ID may be used to register the appliances of the HA cluster, but a separate license file has to be uploaded to each appliance, starting with the Master.

  1. Start the new Master System and log on as the sysadmin user.

  2. Create a new Advanced Administrator account and use it to log on to the Master.

  3. Navigate to System > Feature Activation and enable the High Availability option.

  4. Acquire a license.dat file.

  5. Save this file to the desired location and upload it to the Master.

  6. Repeat the same registration steps for the Slave. A new license.dat file is required for the Slave, but you can use the Master’s Contract ID to register the Slave.

    Importing a Able License

Setting up the Master Unit

Setup Sequence

Important

It is crucial to configure the HA Master before the Slave.

Viewing / Changing the Hostname

Important

The hostname and the domain name must be identical on all appliances in the HA cluster.

  1. Log on to the AXS Guard as explained in the System Administration Guide or the Installation Guide.

  2. Navigate to System > General. The screen below is displayed.

  3. Configure the fields as explained in the System Administration Guide or the Installation Guide.

  4. Click on Update to finish.

    Setting / Viewing the Host Name

PSTN Device Configuration

You need to set your PSTN device to Heartbeat.

  1. Navigate to Network > Devices > PSTN.

  2. Click on pstn0.

  3. Set the type to Heartbeat.

  4. Update your configuration.

    PSTN Device Configuration

Master General Settings

Important

  • The Master’s secure LAN IP must be one of the addresses specified for the Master’s Primary Secure Device, typically eth0 (Configured under Network > General).

  • If a device other than the standard eth0 device is selected, you must adjust the Primary Secure Device setting under Network > General.

  • If you change the IP address of a secure device which handles the HA connection under Network > Devices > Eth, you must also change the addresses under High Availability > General and High Availability > Virtual IP Addresses, otherwise severe problems will occur after a reboot .

  1. Navigate to High Availability > General.

  2. Check the High Availability Active option.

  3. Enter the secure LAN IP of the Master.

  4. Enter the secure LAN IP of the Slave. This IP address must be the same as one of the addresses specified for the Slave’s Primary Secure Device .

  5. Leave the other options unchecked.

  6. When finished, proceed to the Heartbeat Tab.

    Master General Settings

Option Description

High Availability Active

Check to enable HA. The option should be enabled on the master first, then on the slave. A system reboot is required. Ensure that the master is completely up and running before rebooting the slave.

Master primary secure LAN IP address

Enter the secure LAN IP address of the master system. The IP must be in the same range as the master’s primary secure device (Configured under Network > General)

Fail-over if Internet Down

If checked, the slave system will take over if all Internet devices on the master are down and if the Connectivity Check option is enabled for the Internet devices in question.

Time Internet must be down for fail-over

The Internet links are monitored on the master and the slave system. Specify the time in minutes during which the Internet link(s) on the master system is/are allowed to be down or unavailable before the slave system takes over the Internet services otherwise delivered by the master system.

Fail-over if Secure LAN down

If checked, the slave system will take over if all secure devices on the master are down and if the Connectivity Check option is enabled for the secure devices in question. Don’t use master or slave IP addresses for network connectivity checks .

Time Secure Lan must be down for fail-over

The Secure LAN links are monitored on the master and the slave system. Specify the time in minutes during which the Secure LAN link(s) on the master system is/are allowed to be down or unavailable before the slave system takes over the Secure LAN services otherwise delivered by the master system.

Use the Slave Proxy when available

Check this option if you want the slave system to handle all Web Access (proxy server) requests. This reduces the workload of the master system and optimizes performance.

Master Heartbeat Settings

  1. Leave the Time-out other node down setting to the system default setting (20 seconds).

  2. Ensure the heartbeat via serial port option is enabled.

  3. Ensure that the serial cable is connected to the 1st port on the Master and set the Master heartbeat serial device to pstn0-pstn0.

  4. Set the Slave heartbeat serial device to pstn0-pstn0.

  5. When finished, proceed to the Disk Replication Tab.

    Master Heartbeat Settings

Option Description

Time-out other node down

Beyond this time the other HA node is considered down. 20 seconds is the system default.

Heartbeat via serial port

Enable this option to allow Heartbeat communications between the Master and Slave units over a serial link (RS-232). If disabled, this communication occurs over the LAN devices, which is less efficient and causes problems in case of network congestion or a primary secure device failure. This option is highly recommended.

Master heartbeat serial device

The serial device used for Heartbeat communications on the Master unit. The Master PSTN device type must be set to Heartbeat (See Network > Devices > PSTN).

Slave heartbeat serial device

The serial device used for Heartbeat communications on the Slave unit. The Slave PSTN device type must be set to Heartbeat (See Network > Devices > PSTN).

Master Disk Replication Settings

  1. Set the Disk replication speed to a speed supported by the network devices. The entered value is expressed in Kilobytes per second.

  2. Leave the replication via primary secure LAN option disabled.

  3. Enter the Master Disk replication IP address. This must be an IP address in the same range of another Master secure LAN device which can access the Slave, e.g. eth1.

  4. Enter the Slave Disk replication IP address. This must be an IP address in the same range of another Slave secure LAN device which can access the Master, e.g. eth1.

  5. Click on Update and proceed to the Virtual IP Address settings.

    Master Disk Replication Settings

Option Description

Disk replication speed

Set the disk replication speed (expressed in Kilobits per second). Set a speed supported by the network devices.

Disk Replication via primary secure LAN

Check this option to transmit hard drive replication data over the primary secure LAN devices (not recommended). This option is disabled by default. It is highly recommended to assign other network devices to this task, due to the large amount of data which is being transferred during replication.

Master Disk Replication Address

This option is only available if the option above is disabled. Enter the Master Disk replication IP address. This IP address must be in the same range as a secure LAN device on the Master unit through which the Slave unit is accessible, e.g. eth1.

Slave Disk Replication Address

This option is only available if the Disk Replication via primary secure LAN option is disabled. Enter the Slave Disk replication IP address. This IP address must be in the same range as a secure LAN device on the Slave unit through which the Master unit is accessible, e.g. eth1.

Master Virtual IP Settings

  1. Navigate to High Availability > Virtual IP Addresses.

  2. Click on Add New.

  3. Check the Enabled option.

  4. Enter a Virtual IP in the same range as the Master’s and the Slave’s secure LAN.

  5. Enter the IP Address of the Master unit.

  6. Enter the IP Address of the Slave unit.

  7. Click on Update.

  8. Navigate to System > Tools > Actions and Reboot the Master.

Important

  • A Virtual IP must be unique; you may not create additional entries for the same VIP.

  • You may only assign one VIP per AXS Guard network device (one-to-one correlation). You may add as many Virtual IP Alias(es) per VIP as needed.

Master Virtual IP Settings

Option Description

Enabled

Enables the Virtual IP address. A system reboot is required. Reboot the Master first and wait until it is completely up an running before enabling this option on the Slave unit.

Virtual IP address

Enter a Virtual IP in the same range as the secure LAN device which connects the Master and the Slave units.

Virtual IP Aliases

Add one or more Aliases for the Virtual IP.

Master IP address

Enter the IP Address of the Master unit.

Slave IP address

Enter the IP Address of the Slave unit.

Setting up the Slave Unit

Follow the same procedures as used for the master unit.

Important

Only reboot the Slave unit when the Master unit is completely up and running.

HA Status and Logs

Overview

In this chapter we explain how to verify the status of your AXS Guard HA cluster and consult its logs, once you have completed all the configuration steps on the Master and the Slave.

Unsynced Items

Some configuration items are not synced between the master and the slave unit. A warning is displayed in the AXS Guard configuration tool for unsynced items.

Verifying the Status

  1. Log on to the AXS Guard Master / Slave, as explained in the System Administration guide.

  2. Navigate to High Availability > Status. A screen as shown below will appear. HA errors are listed on this screen.

    HA Cluster Status Information Example

Info

You can also use the Virtual IP address to log in to the HA Master. If the Slave has taken over, e.g. during maintenance of the Master, the Slave’s information is shown instead.

Accessing the Logs

  1. Navigate to High Availability > Logs.

  2. Select either:

    • Heartbeat and DRBD to view logging of Heartbeat and Synchronization events.

    • Failover to view information about Master / Slave alternation.

    High Availability Logs

HA Services

Overview

In this chapter we explain how we can divide tasks between the Master and the Slave. By default, the Slave only replicates the Master and has a standby function, in case the Master fails.

It is possible to configure the HA cluster so that certain services only run on the active Master, on the Slave or on both systems simultaneously. The latter option provides redundancy, while the former spreads the processing load over the appliances in the HA cluster.

Adding / Modifying HA Services

HA Services

  1. Log on to the AXS Guard Master, as explained in the System Administration Guide.

  2. Navigate to High Availability > Services.

  3. Click on Add to configure new settings or select an existing configuration to modify.

  4. Select the desired settings as explained in the table below.

    HA Services

Option Setting

Service Name

Select the desired service from the drop-down list.

Start

Choose to start the selected service either on the active Master, on the Slave or on both systems.

List of available Services

  • IPsec

  • OpenVPN

  • L2TP

  • PPTP

  • SSL VPN

  • SSL Web Portal

HA Tools

Overview

In this chapter we explain the use of the AXS Guard HA tools. You should only use these tools if a “split-brain” situation occurs.

What is a Split-Brain?

A split-brain is a situation in which the data on the Slave and on the Master is different and cannot be reconciled. Envision the following scenario:

  • The HA cluster is in sync

  • Turn off the Slave

  • Leave the Master running

  • When the Slave has shut down completely, turn off the Master

  • Turn the Slave back on and let it run for a while

  • Turn the Master back on

What happens in this scenario is that the data from the Slave is stale when it boots, but it has no way of detecting this. It boots and sees that the data on its disk is consistent and usable, so it uses it. When the Master boots up, it detects that its data is also consistent and usable, but different from the Slave. The cluster can only choose which data to use; it cannot merge it. If it cannot decide, it will refuse to damage either dataset, since data integrity is the first priority. In this split-brain situation, the cluster will usually continue its services, but it cannot perform a failover. The longer the situation persists, the larger the chance becomes that both datasets cannot be merged at all, resulting in eventual data loss.

image

If an e-mail arrives when only the first node is running, it will not be present on the disk of the second node when it boots. If an e-mail arrives after the second node has taken over the services, we have one e-mail in each dataset and no way to merge them. One e-mail will be lost.

Important

Contact your reseller as soon as possible in case of a problem.

Using the HA Tools

  1. Log on to the AXS Guard Master / Slave, as explained in the System Administration Guide.

  2. Navigate to High Availability > Tools.

  3. Start the appropriate tool as explained in the table below.

    HA Tools

Button Description

Make myself the running master

The other appliance will assume the slave function if it is currently running as master.

Make myself the running slave

The other appliance will assume the master function if it is currently running as slave. It will also become the DRBD primary.

High Availability Maintenance

Overview

In this chapter, we explain the principles behind a High Availability cluster. Understanding these principles will help you to use your cluster correctly and minimize downtime while ensuring maximum availability and reliability.

System Priorities

The AXS Guard High Availability solution has two main priorities. The priorities are listed by order of importance:

  1. Data Integrity

  2. Service Integrity

Conservation of your data has the highest priority. Keeping the services up and running has a lower priority. Services are only interrupted in case of problems or if the first priority (i.e. data integrity) requires maintenance. At all times, the cluster attempts to deliver the best possible service.

About Reboot Procedures

To allow the second node to take over from the first node, the second node requires a copy of the very latest data generated by the first node. Therefore, the cluster keeps a copy of its most current data on each node. To enable the HA cluster to keep this data synchronized, you must follow strict procedures when rebooting the HA nodes. Failure to follow the correct procedure will result in irrevocable loss of data and cause a disruption of services.

Example Scenarios

Rebooting the Second Node without Service Interruptions

  • The HA cluster is in sync

  • Reboot the second node

  • The second node will synchronize with the first node

  • The cluster is in sync

This procedure is highly recommended. The second node will disconnect from the cluster, reboot and reconnect. It will then start a resync process with the first node and update its data until the cluster is in sync again. This is a normal working scenario. There will be no interruption of services.

Rebooting the First Node with some Service Interruptions

  • The cluster is in sync

  • Reboot the first node

  • The second node will take over

  • The first node reboots

  • The first node synchronizes with the second node

  • The cluster is in sync

When you reboot the first node (which is at that time running the services), the second node takes over from the first node and continues to provide its services. The first node reboots, rejoins the HA cluster and resynchronizes with the second node until its data is up-to-date again. This is also a normal scenario. With a default configuration, the first node will take over the services again.

In this scenario, there are two service interruptions, the first one occurs when the second node takes over and the second one (if configured) occurs when the first node takes over the services from the second node.

Split-Brain Situation

Important

Contact your reseller as soon as possible in case of a problem.

Keeping your HA Cluster Healthy

First of all, do not shut down or reboot your HA nodes unless you really have to. Turning off the power also increases the risk of damaging the data on your disks. If you wish to reboot a node because it seems to have become unstable, use the Administrator Tool (web interface) or as a last resort, use the Ctrl-Alt-Del combination on the console. You should only shut down a node if it fails to respond on the console.

Important

Never reboot a node unless the other node is up and synchronized. Never reboot your system if it is out of sync. You will get messages on the console and via the web interface when the system is synchronizing.

Procedures to follow in case of a Split-Brain

Overview

In case a split-brain situation occurs, contact Able support immediately.

Your cluster might seem to function normally, but it will not perform a failover if something goes wrong. There is no protection against disk failures. The longer you delay you call or e-mail to Able support, the higher the risk of losing data. To solve the split-brain situation, our intervention is required. Do not reboot the nodes, as it will only make matters worse.

Take-Over

When the active node detects problems with its currently active services, it will attempt to fail over to the other node. A few common reasons for a take-over are an Internet failure, a quitting LAN interface or if there are any problems writing to the disk. First the active cluster will shut down the current resources and then the other node will start the services. Please keep in mind that this may take a while, especially when your system already has a high load.

If you think it’s taking too long, please use the Administrator Tool (web interface) to monitor what the cluster is doing. Do NOT REBOOT, unless you are absolutely sure something is wrong. When in doubt, contact Able support .

Info

A take-over event takes up a lot of system resources. All services are started and the system load will increase. Your system should be tailored to handle this extra load. If you notice that a fail-over routinely takes too long, please consider upgrading your AXS Guard hardware.

Summary: General Recommendations

Do not reboot unless you are sure that it will not hurt the HA cluster. A reboot in the hopes to quickly restore services could damage your data and keep the system off line for an extended period.

If you invest in a High-Availability system, make sure acquire hardware powerful enough to handle the demanding system load of an HA cluster. The cluster will not function well if it is overloaded, just like any other server.

A High-Availability system is not a replacement for good system administration practices. Make sure to frequently back up your configuration and user data on the AXS Guard. The daily backup feature is easy to use and well documented (see the AXS Guard System Administration How To for more information. This guide is accessible by clicking on the permanently available Documentation button in the Administrator Tool).

Upgrading and Updating your HA Cluster

Preparing the HA Upgrade

  • Before you upgrade you HA cluster to a next version or revision, we recommended that you back up all your AXS Guard user and configuration data as a precautionary measure.

  • It is highly recommended to reboot the slave unit prior to upgrading.

  • Do not skip or alter any steps provided in the following paragraphs, otherwise the upgrade may fail and result in a split-brain and unrecoverable data.

  • Disable automated version or revision updates on HA clusters. Manual intervention is always required.

The procedures to back up and restore your AXS Guard configuration and user data are explained in the AXS Guard System Administration How To, which can be accessed by clicking on the permanently available Documentation button in the Administrator Tool.

Upgrade Steps

  1. Shut down and completely power off the configured slave unit.

  2. Once the slave unit is shut down, ensure it no longer has any network connectivity (unplug the network cables if you are unsure).

  3. Log in to the master unit and update the system to the latest version.

  4. Once the update is complete and the master system is again fully operational, reconnect the slave unit back to the network and power it on.

  5. Log in to the slave unit and update the system to the latest AXS Guard version.

Info

A revision upgrade requires you to reboot a system twice. Version upgrades require three consecutive reboots.

Troubleshooting

The slave upgrade Fails

The slave automatically reboots and reverts to the previous version. The cluster remains operational. Contact our Support Department for assistance.

The slave upgrade succeeds, but the master upgrade fails

Contact our Support Department ASAP for assistance. Try rebooting the master system manually.

Support

If you encounter a problem

If you encounter a problem with AXS Guard, follow the steps below:

  1. Check the troubleshooting section of the feature-specific manual.

  2. Check the knowledge base on this site for information about special configurations.

  3. If no solution is available in any of the above sources, contact your AXS Guard vendor.

Contact Information

(+32) 15-504-400
support@axsguard.com