This blog details about replacing a dead cassandra node. Recently, i have faced this situation and i had to struggle with few issues. I would like to detail all those issues and point right resolution in all the situations.
1. First i would like to describe normal replace procedure. This should be first try, and this works only in ideal situations. Nevertheless, try it.
a. Check the status of the node using "nodetool status". If any node is down, the status will appear with "DN" status.
eg:
Assuming that you have 6 nodes, here is how it looks. I am giving a masked details and renamed host id and IPs.
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 1.2.3.4 7.57 GiB 256 50.2% 2ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.5 7.4 GiB 256 50.1% 9ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.6 7.82 GiB 256 51.3% 1ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.7 7.62 GiB 256 48.9% 2f71f53-9dbdaf-4r324-8697-f80b9351e7 --
DN 1.2.3.8 6.91 GiB 256 47.8% 38abaar-9dbdaf-4r324-8697-f80b9351e7 --
Here, in the above status, the node with IP 1.2.3.8 is DOWN.
Assuming we have to replace that with a new machine, here are the steps.
1) install cassandra on new node and do not start cassandra.
2) make sure the seed details and every thing is fine in the cassandra installation.
3) start cassandra with following command (assuming your cassandra installation directory is /usr/lib/cassandra).
/usr/lib/cassandra/bin/cassandra -Dcassandra.replace_address_first_boot=1.2.3.8
Now, the story begins.
case 1: If it starts without any problems, you must be lucky and go to all the nodes and do repair on every one. That should be it.
case 2: If you get waring saying, it is unsafe to replace and use cassandra.allow_unsafe_replace.
then: /usr/lib/cassandra/bin/cassandra -Dcassandra.replace_address_first_boot=1.2.3.8 -Dcassandra.allow_unsafe_replace=true
if it starts after that, you can still consider your self lucky.. go ahead with repair on each node and you will be done.
case 3: If it screams with error
i) nodetool removenode --host 1.2.3.8
Then run the cassandra command with replace as in step 1/step 2.
ii) If it still screams at you,
what you can do is, go to the data folder of cassandra. This will be configured in cassandra.yaml. By default it will be, <cassandra_installation_directory>/data.
check the system directory in data directory. This is system information collected from all the machines. Once it is created with old machine details, you will get into this situation.
run the command on the new/fresh node that you want to replace.
P.S. Do not run this command on any existing machine. This will destroy complete cluster information if mis-used.
rm -r <data_direcotry>/system/*
What it means: Removing all system tables data from the new cassandra node.
now run the command with replace_address command above. If you encounter case 2, run with unsafe replace true.
This should join the cluster now without any issues. When you check the nodetool status,
you should see only new node, but with same host id as old machine, like below.
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 1.2.3.4 7.57 GiB 256 50.2% 2ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.5 7.4 GiB 256 50.1% 9ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.6 7.82 GiB 256 51.3% 1ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.7 7.62 GiB 256 48.9% 2f71f53-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.10 6.91 GiB 256 47.8% 38abaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.9 7.92 GiB 256 51.7% 7cddaar-9dbdaf-4r324-8697-f80b9351e7 --
--------------
Thank You.
1. First i would like to describe normal replace procedure. This should be first try, and this works only in ideal situations. Nevertheless, try it.
a. Check the status of the node using "nodetool status". If any node is down, the status will appear with "DN" status.
eg:
Assuming that you have 6 nodes, here is how it looks. I am giving a masked details and renamed host id and IPs.
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 1.2.3.4 7.57 GiB 256 50.2% 2ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.5 7.4 GiB 256 50.1% 9ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.6 7.82 GiB 256 51.3% 1ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.7 7.62 GiB 256 48.9% 2f71f53-9dbdaf-4r324-8697-f80b9351e7 --
DN 1.2.3.8 6.91 GiB 256 47.8% 38abaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.9 7.92 GiB 256 51.7% 7cddaar-9dbdaf-4r324-8697-f80b9351e7 --
Here, in the above status, the node with IP 1.2.3.8 is DOWN.
Assuming we have to replace that with a new machine, here are the steps.
1) install cassandra on new node and do not start cassandra.
2) make sure the seed details and every thing is fine in the cassandra installation.
3) start cassandra with following command (assuming your cassandra installation directory is /usr/lib/cassandra).
/usr/lib/cassandra/bin/cassandra -Dcassandra.replace_address_first_boot=1.2.3.8
Now, the story begins.
case 1: If it starts without any problems, you must be lucky and go to all the nodes and do repair on every one. That should be it.
case 2: If you get waring saying, it is unsafe to replace and use cassandra.allow_unsafe_replace.
then: /usr/lib/cassandra/bin/cassandra -Dcassandra.replace_address_first_boot=1.2.3.8 -Dcassandra.allow_unsafe_replace=true
if it starts after that, you can still consider your self lucky.. go ahead with repair on each node and you will be done.
case 3: If it screams with error
java.lang.RuntimeException: Host ID collision between active endpoint
It means, the details from seed about the cluster information are still having the died machine in its gossip or system information. If you get this situation, proceed as follows.i) nodetool removenode --host 1.2.3.8
Then run the cassandra command with replace as in step 1/step 2.
ii) If it still screams at you,
what you can do is, go to the data folder of cassandra. This will be configured in cassandra.yaml. By default it will be, <cassandra_installation_directory>/data.
check the system directory in data directory. This is system information collected from all the machines. Once it is created with old machine details, you will get into this situation.
run the command on the new/fresh node that you want to replace.
P.S. Do not run this command on any existing machine. This will destroy complete cluster information if mis-used.
rm -r <data_direcotry>/system/*
What it means: Removing all system tables data from the new cassandra node.
now run the command with replace_address command above. If you encounter case 2, run with unsafe replace true.
This should join the cluster now without any issues. When you check the nodetool status,
you should see only new node, but with same host id as old machine, like below.
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 1.2.3.4 7.57 GiB 256 50.2% 2ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.5 7.4 GiB 256 50.1% 9ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.6 7.82 GiB 256 51.3% 1ssbaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.7 7.62 GiB 256 48.9% 2f71f53-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.10 6.91 GiB 256 47.8% 38abaar-9dbdaf-4r324-8697-f80b9351e7 --
UN 1.2.3.9 7.92 GiB 256 51.7% 7cddaar-9dbdaf-4r324-8697-f80b9351e7 --
Run repair on all the machines, and you are ready to go...
--------------
Thank You.