Rancher runs up to 3 instances of etcd on 3 different hosts. If a majority of hosts running etcd fail, follow these steps:
kubernetes-etcd
container that is in running
state. Exec into the container, by using Execute Shell. In the shell, run etcdctl cluster-health
.
cluster is healthy
, then there is no disaster and the etcd cluster is fine.cluster is unhealthy
, make a note of this kubernetes-etcd
container. This is your sole survivor. All other containers can be replaced as you will use this container to scale up.Disconnected
state. Confirm that none of these hosts are running your sole survivor.disaster
. The container will restart automatically and etcd will heal itself to become a single-node cluster. System functionality will be restored.etcd=true
to your hosts. Etcd will scale back up as hosts are added and start running the etcd service. In most cases, everything will automatically heal. If new/dead containers are stuck in initializing
after three minutes, exec into those containers and run delete
. Do not, under any circumstance, run the delete
command on your sole survivor.