Reconfiguration or recovery of etcd cluster in Management Service

This document describes the way how to reconfigure a node or restore whole etcd cluster which is used by Management Service.

Follow this guide only in case you were referred to it from a different chapter from a documentation (e.g. How to change the IP address of Dispatcher Paragon Management Server).

Check etcd cluster health

Error rendering macro 'excerpt-include'

No link could be created for 'The Recovery Procedure for a Dispatcher Paragon Management Service Cluster Node (Standalone Installer)'.

When etcd cluster is healthy

Error rendering macro 'excerpt-include'

No link could be created for 'MGMT etcd Cluster Is Healthy'.

When etcd cluster is unhealthy

Unfortunately, you cannot add or remove nodes when etcd quorum was lost. It is needed to manually restore etcd cluster health before you can install or reconfigure the other node again.

Example two node environment:

  • First node is installed on server with hostname MGMT1 and IP 10.0.5.147

  • Second node is installed on server with hostname MGMT2 and IP 10.0.5.155

Create dump of etcd storage content from a node which is reported as healthy

  1. Connect to the node which is still functional

  2. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

  3. Run this command:

    It will create a file etcddump.ps1 which will contain commands for restoring the content of etcd.

    rm etcddump.ps1 -ea silentlycontinue; .\etcdctl.exe ls / | %{" .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk $($_ -replace `"/`", `"`") `"$(.\etcdctl.exe get $_)`"" | out-file etcddump.ps1 -append }

  4. Copy etcddump.ps1 to a safe place

Check if the healthy node is the first or an additional node

i.e. which node remained running or which node is without IP change

  1. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

  2. Run this command:

    .\prunmgr.exe //ES//YSoftEtcd

  3. Dispatcher Paragon Bundled Etcd Properties will open

  4. Move to the Startup tab and scroll down at the end of Arguments: field

    1. The first installed node has the following line there:

      -initial-cluster-state=new

    2. Any additional node has the following line there:

      -initial-cluster-state=existing

Restore etcd cluster health

  1. Stop Dispatcher Paragon Bundled Etcd service on all nodes

    1. Backup folder SERVER_HOSTNAME.etcd in "DispatcherParagon installation directory\Management\etcd\" on all nodes. (MGMT1.etcd and MGMT2.etcd regarding the example).

    2. Delete folder SERVER_HOSTNAME.etcd in "DispatcherParagon installation directory\Management\etcd\" on all nodes.

  2. When the healthy node is the first node:

    1. Start Dispatcher Paragon Bundled Etcd service on the first node

  3. When the healthy node is an additional node:

    1. Reconfigure etcd to act as the first node of the new etcd cluster:

      1. Return to Dispatcher Paragon Bundled Etcd Properties dialog which was opened in previous section (2.2., step 3.)

      2. Copy whole content of the Arguments: field and paste it to some plain text editor (i.e. notepad) for easier editing

      3. Change the last two lines of the text

        Remove the non-local cluster node from the -initial-cluster line

        Change -initial-cluster-state to new

        1. original

          -initial-cluster=MGMT1=http://10.0.5.147:2380,MGMT2=http://10.0.5.155:2380
          -initial-cluster-state=existing

        2. after change

          -initial-cluster=MGMT2=http://10.0.5.155:2380
          -initial-cluster-state=new

      4. Copy the whole changed text (not only the last two lines) and insert into the Dispatcher Paragon Bundled Etcd Properties dialog into Arguments: field

      5. Use OK or Apply button to confirm the changes

    2. Start Dispatcher Paragon Bundled Etcd service on the healthy node

  4. Check etcd cluster health on the node where Dispatcher Paragon Bundled Etcd service was started in previous steps:

    1. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

    2. Run this command:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 cluster-health

    3. Output should look like this:

      member 944c8b2d3903fd86 is healthy: got healthy result from http://10.0.5.147:2379
      cluster is healthy

Restore etcd storage content

If you have a dump of etcd storage available

Use the previously created dump to restore etcd storage content.

  1. Copy the etcddump.ps1 file which was created in section 2.1 into "DispatcherParagon installation directory\Management\etcd\" folder

  2. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

  3. Use the existing etcddump.ps1 file or copy the one from the safe place (section 2.1, step 4.)

  4. Run this command:

    powershell.exe -executionpolicy bypass .\etcddump.ps1

  5. The etcd storage is now re-populated. If you want to list all keys and their values run following command:

    .\etcdctl.exe ls / | %{write-host "$($_): $(.\etcdctl.exe get $_)" }

If you do not have a dump of etcd storage available

If you do not have a dump of etcd storage available, the etcd key/value storage must be re-populated manually by values stored in safeq.properties file and Dispatcher Paragon database.

For easier manipulation please open the safeq.properties file located in "DispatcherParagon installation directory\Management\conf\".

Each key in database which should be re-populated by a value from safeq.properties file will be expressed below as property name with suffix.value.

  1. Connect to the node which is still functional

  2. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

    1. Run following commands (replace parameters with suffix.value by the value from safeq.properties file):

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk embeddedDb (select from values 0 or 1 - 0 external database, 1 - embedded database)
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbClass database.type.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbDbName database.name.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbHost database.host.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbPort database.port.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbInstanceName database.msSql.instance.value (if named instance is configured on MSSQL server)
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbDomain database.global.management.domain.value (if Windows authentication is used for MSSQL server)
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk dbDbUsername database.global.management.username.without.domain.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk encryptedUserPassword "database.global.management.password.value"
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk encryptedClusterPassword "database.cluster.management.password.value"
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk encryptedClusterGuestPassword "database.cluster.guest.password.value"

    2. If Data Warehouse use a different database located on the same database server as SafeQ database (SSMD deployment) add also the following record:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWdbName databaseWarehouse.name.value


    3. If Data Warehouse use a different database located on a different database server than SafeQ database (MSMD deployment) add also the following records:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWdbName databaseWarehouse.name.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWdbHost databaseWarehouse.host.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWdbPort databaseWarehouse.port.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWdbUsername databaseWarehouse.global.management.username.without.domain.value
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWencryptedUserPassword "databaseWarehouse.global.management.password.value"
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWencryptedClusterPassword "databaseWarehouse.cluster.management.password.value"
      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk DWencryptedClusterGuestPassword "databaseWarehouse.cluster.guest.password.value"


  3. Open the database management tool (pgAdminIII/MS SQL Management Studio)

    1. Login the Dispatcher Paragon database

    2. Navigate to the SQDB6 database

    3. Open table cluster_mngmt.tenants

    4. Find a row where the column schema_name is equal to the name of your tenant ("tenant_1" by default)

    5. Copy the content of db_pass column to clipboard or write it down

  4. Open Internet browser

    1. Navigate to Dispatcher Paragon administration web interface

    2. Login as an admin

    3. On the Dashboard navigate to widget for text encryption

    4. Enter the password (either paste from keyboard or type it in)

    5. Press Encode

    6. Copy the encoded password to clipboard or write it down

  5. Go to the PowerShell prompt

    1. Move to "DispatcherParagon installation directory\Management\etcd\" folder

    2. Run following command:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 mk encryptedTenantPassword "<type in the encrypted password or paste it in from keyboard>"

      Enclose the encrypted password with quotation marks, e.g. "code,-3,5,98,45,18,-7,-125,-92"

    3. The etcd storage is re-populated. If you want to list all keys and their values run following command:

      .\etcdctl.exe ls / | %{write-host "$($_): $(.\etcdctl.exe get $_)" }

    4. If a key contains a wrong value then the key must be deleted by the command below and re-created again with correct value:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 rm <keyName>


Add or reconfigure the remaining node(s)

If the remaining node is not installed yet and need to be reinstalled again

i.e. if a Management node has crashed, etcd cluster lost quorum and the Management node needs to be reinstalled again to restored etcd cluster

  1. Etcd cluster is now healthy again so you can reinstall the node using the installer.

  2. Choose the Add or Replace node option as we are adding an additional node to the etcd cluster now.

    The first node is either available or an additional node was reconfigured to act as first etcd node in section 3.3, step 3. a.

If the remaining node is already installed and needs to be reconfigured

i.e. if IP address of a Management node was changed without following proper procedure, etcd cluster lost quorum and Management node needs to be added to restored etcd cluster

  1. Connect to the node which is now acting as the first node

  2. Start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

    1. Run this command:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 member add MGMT2 http://10.0.5.155:2380

      Replace MGMT2 with actual hostname of the other node and 10.0.5.155 with actual IP of the other node.

    2. Output should look like this:

      Added member named MGMT2 with ID a522606ea77f5003 to cluster
       
      ETCD_NAME="MGMT2"
      ETCD_INITIAL_CLUSTER="MGMT1=http://10.0.5.147:2380,MGMT2=http://10.0.5.155:2380"
      ETCD_INITIAL_CLUSTER_STATE="existing"
  3. Connect to the remaining node, start PowerShell and move to "DispatcherParagon installation directory\Management\etcd\" folder

    1. Run this command:

      .\prunmgr.exe //ES//YSoftEtcd
    2. In the General tab use the Start button to start the Dispatcher Paragon Bundled Etcd service

      We need to start Dispatcher Paragon Bundled Etcd service this way to create proper etcd configuration. This is needed only once after the changes.

    3. Dispatcher Paragon Bundled Etcd shall be started

  4. Check etcd cluster health again

    1. Run this command:

      .\etcdctl.exe --endpoint http://127.0.0.1:2379 cluster-health
    2. Output should look like this:

      member 944c8b2d3903fd86 is healthy: got healthy result from http://10.0.5.147:2379
      member a522606ea77f5003 is healthy: got healthy result from http://10.0.5.155:2379
      cluster is healthy
    3. You can now close Dispatcher Paragon Bundled Etcd Properties on the remaining node. Dispatcher Paragon Bundled Etcd service will remain running.

    4. etcd cluster is now reconfigured.