mirror of
https://github.com/gluster/glusterfs.git
synced 2026-02-06 18:48:16 +01:00
* geo-replication: fiz for secondary node fail-over Problem: When geo-replication session is setup, all the gsyncd slave processes are coming up on the host which is used in creating the geo-rep session. When this primary slave node goes down, all the bricks are going into faulty state. Cause: When monitor process tries to connect to the remote secondary node, we are always using the remote_addr as a hostname. This variable holds the hostname of the node which is used in creating the geo-rep session. Thus, the gsyncd slave processes are always coming up on the primary slave node. When this node goes down, monitor process is not able to bring up gsyncd slave process and bricks are going into faulty state. Fix: Instead of remote_addr, we should use resource_remote which holds the hostname of randomly picked remote node. This way, when geo-rep session is created and started, we will have the gsyncd slave processes distributed across the secondary cluster. If the node which is used in creating the session goes down, monitor process will bring the gsyncd slave process on a randomly picked remote node (from the nodes which are up at the moment). Bricks will not go into faulty state. fixes:#3956 Signed-off-by: Sanju Rakonde <sanju.rakonde@phonepe.com> * geo-replication: fiz for secondary node fail-over Problem: When geo-replication session is setup, all the gsyncd slave processes are coming up on the host which is used in creating the geo-rep session. When this primary slave node goes down, all the bricks are going into faulty state. Cause: When monitor process tries to connect to the remote secondary node, we are always using the remote_addr as a hostname. This variable holds the hostname of the node which is used in creating the geo-rep session. Thus, the gsyncd slave processes are always coming up on the primary slave node. When this node goes down, monitor process is not able to bring up gsyncd slave process and bricks are going into faulty state. Fix: Instead of remote_addr, we should use resource_remote which holds the hostname of randomly picked remote node. This way, when geo-rep session is created and started, we will have the gsyncd slave processes distributed across the secondary cluster. If the node which is used in creating the session goes down, monitor process will bring the gsyncd slave process on a randomly picked remote node (from the nodes which are up at the moment). Bricks will not go into faulty state. fixes:#3956 Signed-off-by: Sanju Rakonde <sanju.rakonde@phonepe.com> --------- Signed-off-by: Sanju Rakonde <sanju.rakonde@phonepe.com>