Version v3.6-DRAFT of the documentation is in DRAFT status. For the latest stable documentation, see v3.5.
etcd has built in automated data corruption detection to prevent member state from diverging.
Data corruption detection can be done using:
--experimental-initial-corrupt-check
flag.--experimental-compact-hash-check-enabled
flag.--experimental-corrupt-check-time
flag.Initial check will be executed during bootstrap of etcd member. Member will compare its persistent state vs other members and exit if there is a mismatch.
Both periodic check will be executed by the cluster leader in a cluster that is already running. Leader will compare its persistent state vs other members and raise a CORRUPT ALARM if there is a mismatch. Both checks serve the same purpose, however they are both worth enabling to balance performance and time to detection.
When enabled using --experimental-compact-hash-check-enabled
flag, check will be executed once every minute.
This can be adjusted using --experimental-compact-hash-check-time
flag using format: 1m
- every minute, 1h
- evey hour.
This check extends compaction to also calculate checksum that can be compared between cluster members.
Doesn’t cause additional database scan making it very cheap, but requiring a regular compaction in cluster.
Enabled using --experimental-corrupt-check-time
flag, requires providing an execution period in format: 1m
- every minute, 1h
- evey hour.
Recommended period is a couple of hours due to a high performance cost.
Running a check requires computing a checksum by scanning entire etcd content at given revision.
There are three ways to restore a corrupted member:
After the corrupted member is restored, CORRUPT ALARM can be removed.
Members state can be purged by:
snap
subdirectory from the etcd data directory.etcd
with --initial-cluster-state=existing
and cluster members listed in --initial-cluster
.Etcd member is expected to download up-to-date snapshot from the leader.
Member can be replaced by:
etcdctl member remove
.etcdctl member add
etcd
with --initial-cluster-state=existing
and cluster members listed in --initial-cluster
.Cluster can be restored by saving a snapshot from current leader and restoring it to all members.
Run etcdctl snapshot save
against the leader and follow restoring a cluster procedure.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.