RedHat/CentOS(Stream) Alpaca Highly Available Clustering Setup

This guide prepares each Alpaca component to be set up as a cluster. Each individual component can be set up to be highly available as registering nodes.

HA support is not included as a part of regular software support unless ECG performs the update.

Assumptions

  • Credentials to access the Alpaca servers as root.
  • Working knowledge of ssh and a file transfer tool such as scp.
  • Each Alpaca instance should have a resolvable name to a private IP address that is accessible from the other instances.
  • Alpaca is configured to use an S3 bucket as its file store. Amazon S3 and MinIO are the currently supported S3 buckets. This is required for all Alpaca Server instances to store and retrieve files in a distributed way. If this is not completed downloads and logs for tasks will be incomplete and potentially unretrievable.

Configurations

  • Each server instance hosts the complete set of processes required to create a clustered Alpaca setup. The below image is an example of such a setup.

Steps

RabbitMQ

  • Please reference the official guides for troubleshooting.
  • Ensure the /etc/hostname and /etc/hosts file has all nodes listed.
  • Ensure that the Erlang Cookie file is updated on all nodes to be identical.
  • Ensure that the RabbitMQ node-name is correctly set up. This can be accomplished by creating a /etc/rabbitmq/rabbitmq-env.conf file and specifying the correct environment variable. After adding this file you will need to restart RabbitMQ with service rabbitmq-server restart. An example of this file is below.

    NODENAME=rabbit@alpaca-2
    NODE_IP_ADDRESS=10.10.0.254
    NODE_PORT=5672
    
    HOME=/var/lib/rabbitmq
    LOG_BASE=/var/log/rabbitmq
    MNESIA_BASE=/var/lib/rabbitmq/mnesia
    
  • Enable clustering by turning off RabbitMQ and connecting it to one of the other servers in the cluster.

    rabbitmqctl stop_app
    # => Stopping node rabbit@alpaca-2 ...done.
    
    rabbitmqctl reset
    # => Resetting node rabbit@alpaca-2 ...
    
    rabbitmqctl join_cluster rabbit@alpaca-1
    # => Clustering node rabbit@alpaca-2 with [rabbit@alpaca-1] ...done.
    
    rabbitmqctl start_app
    # => Starting node rabbit@alpaca-2 ...done.
    
  • Ensure that all queues are in HA mode. This should be run on the existing RabbitMQ server, although it can be run from any node after an Alpaca Server has been started.

    • rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}'
  • Update application-prod.yml for each Alpaca Server configuration.

    • Remove host field.
    • Replace with comma separated addresses field.

      spring:
        rabbitmq:
          addresses: alpaca-2.lab.ecg.co:5672,alpaca-1.lab.ecg.co:5672
      

MongoDB

  • Please note that in order for MongoDB to perform HA failover, it requires more than 2 nodes with preferably an odd number to allow consensus voting of the new primary. In a pair, failover will not occur seamlessly.
  • Update /etc/mongod.conf to bind to an external IP address on each Mongo instance.

    net:
      port: 27017
      bindIp: 0.0.0.0
    
  • Update /etc/mongod.conf to assign the same replicate set name to each Mongo instance.

    replication:
      replSetName: "alpaca-set"
    
  • After updating the Mongo configuration, restart the service using service mongod restart.

  • Enable replica sets by running the following command in the Mongo terminal listing all Mongo instances.

    rs.initiate({_id:'alpaca-set',members:[{_id:0,host:"alpaca-1.lab.ecg.co:27017"},{_id:1,host:"alpaca-2.lab.ecg.co:27017"}]})
    
  • If you are adding a new Mongo instance to an existing replica set you must connect to the current primary instance and issue the rs.add() command instead.

    rs.add({ host: "alpaca-3.lab.ecg.co:27017" })

  • Update the Alpaca Server config to include each Mongo instance in its pool of available databases.

    spring:
      data:
        mongodb:
          uri: mongodb://alpaca-1.lab.ecg.co:27017,alpaca-2.lab.ecg.co:27017
    
  • Follow the official replica set documentation for additional troubleshooting.

S3

Alpaca supports either AWS S3 or MinIo for S3 buckets.

AWS S3

Follow the official AWS documenation for setting up S3 buckets.

  • Notes:
    • Verify that the AWS credentials used for Alpaca have the correct read/write permissions for each bucket.
    • The Alpaca configuration placed in the bucket must be named alpaca-server-prod.yml.
    • It is recommended to use at two buckets: 1 for configuration files and another for regular file store files.

MinIO

MinIO is a free, self-hosted S3 alternative to AWS.

Steps for each server in the HA Cluster
  1. Create the MinIO user.
    • > useradd -s /sbin/nologin -d /opt/minio minio-user
  2. Create MinIO bin and data directories.
    • > mkdir -p /opt/minio/bin /opt/minio/data
  3. Install MinIO application in bin directory.
    • > wget https://dl.minio.io/server/minio/release/linux-amd64/minio -O /opt/minio/bin/minio
    • > chmod +x /opt/minio/bin/minio
  4. Create MinIO configuration file
    • > touch /opt/minio/minio.conf
  5. Configure MinIO

    • > vi /opt/minio/minio.conf
    • Provide the following configurations:

       MINIO_VOLUMES="/opt/minio/data"
       MINIO_ROOT_USER="<YOUR_USERNAME>"
       MINIO_ROOT_PASSWORD="<YOUR_PASSWORD>"
       MINIO_OPTS="--console-address :9001"
      
    • <YOUR_USENAME>: This can be any username of your choosing. You will need it later during Alpaca configuration.

    • <YOUR_PASSWORD>: This can be any password of your choosing. You will need it later during Alpaca configuration.

  6. Set file permissions.

    • > chown -R minio-user:minio-user /opt/minio
  7. Download MinIO start script.

    • > ( cd /etc/systemd/system/; curl -O https://raw.githubusercontent.com/minio/minio-service/master/linux-systemd/minio.service )
  8. Enable MinIO service.

    • > systemctl enable minio.service
  9. Start MinIO

    • > systemctl start minio
  10. Install MinIO CLI with only root user access.

    • > curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o $HOME/minio-binaries/mc
    • > chmod +x $HOME/minio-binaries/mc
    • > export PATH=$PATH:$HOME/minio-binaries/
  11. Set alias for MinIO Client CLI configuration.

    • > mc alias set <ALIAS> http://localhost:9000 <YOUR_USERNAME> <YOUR_PASSWORD>
      • <ALIAS> - A name for your MinIO instance. This alias will be used later. For the purposes of this guide, myminio will be used as the alias.
      • <YOUR_USERNAME> - The username set in step 5.
      • <YOUR_PASSWORD> - The password set in step 5.
  12. Add configuration bucket.

    • > mc mb --with-versioning myminio/config
  13. Add file store bucket.

    • > mc mb --with-versioning myminio/file-store
Replication Steps

Once MinIO is setup for all members of the HA cluster, it is time to setup replication.

  • Each server needs to add all other servers in the HA cluster as replication members.
  • This example will use two HA members - alpaca1 (alpaca1.acme.com) and alpaca2 (alpaca2.acme.com).
    • Note that IP addresses can be used in place of FQDNs.
  • The username/password used in the below steps are from step 5 above.
  1. alpaca1
    • Replicate config bucket.
      • > mc replicate add myminio/config --remote-bucket 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@alpaca2.acme.com:9000/config' --replicate "delete,delete-marker,existing-objects"
    • Replicate file-store bucket.
      • > mc replicate add myminio/file-store --remote-bucket 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@alpaca2.acme.com:9000/config' --replicate "delete,delete-marker,existing-objects"
  2. alpaca2
    • Replicate config bucket.
      • > mc replicate add myminio/config --remote-bucket 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@alpaca1.acme.com:9000/config' --replicate "delete,delete-marker,existing-objects"
    • Replicate file-store bucket.
      • > mc replicate add myminio/file-store --remote-bucket 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@alpaca1.acme.com:9000/config' --replicate "delete,delete-marker,existing-objects"
  3. Once replication is set up, you can begin adding files to the replication set. This only needs to be performed on one member of the HA cluster.
    • Adding Alpaca server configuration to bucket.
      • Once your configuration is ready (a template can be found at /etc/alpaca/eureka/cloud-config/alpaca-server-template.yml) it can be uploaded to the bucket. Note that the alpaca configuration file to be uploaded to the bucket, must be name alpaca-server-prod.yml.
        • > cp /path/to/alpaca-server-prod.yml myminio/config/alpaca-server-prod.yml
      • The Alpaca server configuration is now uploaded to the bucket and ready for use by Alpaca.

Eureka

  • Update the /etc/alpaca/eureka/config/application-prod.yml file to have each instance register with the additional nodes. This list is command separated to list all additional nodes. The below example is for two nodes. The additional nodes flag is just there for an example of adding more than 2.
  • Server 1:

    eureka:
      instance:
        hostname: alpaca-1.lab.ecg.co
      client:
        serviceUrl:
          defaultZone: http://alpaca:alpaca@alpaca-2.lab.ecg.co:8761/eureka,<additional nodes>
    
  • Server 2:

    eureka:
      instance:
        hostname: alpaca-2.lab.ecg.co
      client:
        serviceUrl:
          defaultZone: http://alpaca:alpaca@alpaca-1.lab.ecg.co:8761/eureka,<additional nodes>
    
  • Centralized configuration also occurs on the Eureka server. This can be enabled using an S3 compatible storage or various other storage options. See the Eureka configuration guide for more details.

Gateway

  • Update the /etc/alpaca/gateway/config/application-prod.yml file to have each instance register all appropriate Eureka nodes. This is the complete list of Eureka instances similar to the Eureka configuration. The below example is for two nodes.

    eureka:
      client:
        service-url:
          defaultZone: http://alpaca:alpaca@alpaca-1.lab.ecg.co:8761/eureka,http://alpaca:alpaca@alpaca-2.lab.ecg.co:8761/eureka
    

Server

  • The server requires configuration elements for all the other nodes. Verify that the Eureka, Mongo, and RabbitMQ configuration fields are all set up to list their various nodes. This configuration will be served by the centralized configuration server. In order for this to occur it is necessary to configure the bootstrap.yml so that it can connect to the Eureka Server. See the configuration guide for a complete set of instructions.