Primus HSM - High Availability Cloning
High Availability (HA) is a feature focused on ensuring continuous operation and minimal downtime by using redundant systems that can take over seamlessly in the event of failure or maintenance. Primus HSM HA enables multiple devices to work together in a cluster, synchronizing cryptographic data to maintain service availability and load distribution without manual intervention. This feature is licensed on all Primus HSMs.
Devices within a High Availability (HA) cluster are configured to synchronize continuously, ensuring real-time redundancy and balanced load distribution. This synchronization eliminates the need for manual cloning whenever keys, objects, or user data are created or modified. Unlike static manual cloning, which captures only a one-time snapshot, dynamic HA cloning keeps all user data - such as partitions, keys, certificates, objects, and security configurations - up to date across all connected devices in the cluster. As a result, cryptographic operations like signing and decryption can be executed on any HA node, improving geographic performance, system resiliency, and fault tolerance.
Setting up HA Cloning requires careful preparation. The devices are paired using a USB stick to securely exchange configuration data. After the HA pairing, all further synchronization occurs over the network, making the initial USB exchange the only manual step in establishing HA.
Certain operations require a HA Master device:
- Key and object creation, deletion, or modification must be verified by the Master. If the Master is temporarily unreachable, the operations are queued.
- For SKA configurations, the blocked key status is verified by the Master. This feature is enabled by default but can be disabled per Partition if not required.
- Administrative tasks such as Decanus Partition Administration (PSO) and Partition Audit are only supported on the HA Master.
If not configured otherwise a Clone attempts to establish a connection to the Master using the configured Master URLs to synchronize continuously. If the number of HA objects that are not syncronized exceeds a system-defined threshold — due to paused HA, network issues, or limited bandwidth — the device will automatically pause Client APIs. This pause prevents new client connections and allows existing sessions to close within two hours, promoting failover to other cluster members. After reboot, APIs still remain paused until the HA object backlog is reduced below the limit.
Prerequisites for HA
All devices must be properly initialized before starting the HA cloning process. This includes completing the Initial Wizard. For older firmware versions, at least one user Partition must also be created before HA cloning. Additionally, High Availability must be enabled on each device through both the licensing system and the device's security configuration, see Configuration for more information.
All cluster HSMs must have a license that supports at least as many user Partitions as those to be synchronized across the cluster, along with the licensed features in use. During cloning, the user configuration is replicated from the Master to each Clone. All user management operations are performed exclusively on the Master.
Any existing users (Partitions) and configuration (excluding network configuration) on a Clone will be overwritten during the cloning process to ensure alignment with the Master.
Authentication mode in a HA cluster should be the same. See Authentication Mode for more information.
HA Cloning Procedure
To pair devices in a High Availability (HA) cluster, the setup follows a secure pairing procedure. This secure pairing establishes the trust relationship between the devices.
The steps below outline the pairing procedure:
- Create an HA clone key on the Clone device.
- Export HA master data from the Master.
- Import the HA master data into the Clone device.
Once paired, the devices synchronize dynamically over Ethernet, provided that network connectivity is maintained.
Before initiating pairing, the Security Officer (SO) of the Master must configure an HA Master URL in the device’s Security Configuration. This is typically an IP address or DNS-resolvable hostname and remains valid even if another device is promoted to Master. All Clones will connect to this HA Master URL to maintain synchronization. The network must allow routing to the HA Master URL at all times.
HA Cluster Management
The following HA cluster management options are available.
Updating HA cluster
To ensure best practice keep all devices in the cluster updated to the same latest firmware release at all times.
See Firmware Update on how to update the cluster to a new release.
Pause / Terminate High Availability
High Availability (HA) functionality can be temporarily paused and resumed, allowing maintenance operations such as updates or backups without losing synchronization data. During a pause, the device is not actively participating in the cluster, but retains all HA-related information for later reactivation.
To permanently remove a device from HA operation. The Master can remove a Clone effectively excluding it from future cluster synchronization. Alternatively a Clone can deactivate it's own cluster participation, but cannot join back on it's own.
HA Master Relocation (IP Address Change)
When relocating the Master device or changing its network address, particularly in environments without DNS resolution, Clones can be temporarily directed to the new Master using the "Temporary HA Master" setting. This allows them to fetch the updated configuration from the Master and continue synchronization.
Before the Clone connects to the new Master, ensure the HA Master URL/IP setting on the Master is updated accordingly.
Become Master
In a high availability cluster, promoting a device to Master is a controlled action that ensures the integrity and security of the system. A device can only become the new Master if both the Security Officer (SO) and the Master SO roles are activated, preventing unauthorized promotion from Clone status. This safeguard ensures that a Clone cannot assume control without prior authorization. It is essential to retain the Master SO Cards, as they are required for promotion. A cluster should only contain a single active Master.
This procedure is typically used when the current Master is no longer available. After promotion, the HA Master URL must be updated on the new Master and all Clones, unless the network was pre-configured for the change (see HA Master Relocation).
Become Clone
This feature enables the demotion of a Master device to a Clone, allowing for flexible reconfiguration of the HA cluster. When this action is triggered, the device contacts the current HA Master using the configured HA Master URL/IP, assuming the role of a Clone.
This feature can be used to transition a standalone device into a high availability setup. Initially, the device may become a Clone without the presence of Master SO cards. However, once part of the cluster, the Master SO credentials are inherited from the active Master. To later promote this device back to Master status, the Master SO role will be required (see Become Master).