Version v3.4 of the documentation is no longer actively maintained. The page that you are viewing is the last archived version. For the latest stable documentation, see v3.5.
Data Corruption
etcd has built in automated data corruption detection to prevent member state from diverging.
Enabling data corruption detection
Data corruption detection can be done in two ways:
- Initial check, enabled with
--experimental-initial-corrupt-check
flag. - Periodic check, enabled with
--experimental-corrupt-check-time
flag.
Initial check will be executed during bootstrap of etcd member. Member will compare it’s persistent state vs other members and exit if there is a mismatch.
Periodic check will be executed by the cluster leader in a cluster that is already running.
Leader will compare it’s persistent state vs other members and raise a CORRUPT ALARM if there is a mismatch.
Period of checks is configured using format: 1m
- every minute, 1h
- evey hour.
Recommended period is a couple of hours as there is a high performance cost.
Running a check requires computing a checksum by scanning entire etcd content at given revision.
Restoring a corrupted member
There are three ways to restore a corrupted member:
- Purge member persistent state
- Replace member
- Restore whole cluster
After the corrupted member is restored, CORRUPT ALARM can be removed.
Purge member persistent state
Members state can be purged by:
- Stopping the etcd instance.
- Backing up etcd data directory.
- Moving out the
snap
subdirectory from the etcd data directory. - Starting
etcd
with--initial-cluster-state=existing
and cluster members listed in--initial-cluster
.
Etcd member is expected to download up-to-date snapshot from the leader.
Replace member
Member can be replaced by:
- Stopping the etcd instance.
- Backing up the etcd data directory.
- Removing the data directory.
- Removing the member from cluster by running
etcdctl member remove
. - Adding it back by running
etcdctl member add
- Starting
etcd
with--initial-cluster-state=existing
and cluster members listed in--initial-cluster
.
Restore whole cluster
Cluster can be restored by saving a snapshot from current leader and restoring it to all members.
Run etcdctl snapshot save
against the leader and follow restoring a cluster procedure.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.