etcd-io/etcd

panic when two snapshots are received in short period

Open

#18,055 创建于 2024年5月22日

在 GitHub 查看
 (11 评论) (0 反应) (1 负责人)Go (51,701 star) (10,352 fork)batch import
area/robustness-testinghelp wantedstage/trackedtype/bug

描述

Bug report criteria

What happened?

https://github.com/etcd-io/etcd/actions/runs/9188844320/job/25269542381

{"level":"panic","ts":"2024-05-22T10:31:40.994846Z","caller":"etcdserver/server.go:1010","msg":"failed to open snapshot backend","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applySnapshot\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1010\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:947\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:840\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:187"}
2024-05-22T10:31:41.0044258Z /home/runner/actions-runner/_work/etcd/etcd/bin/etcd (TestRobustnessExploratoryKubernetesHighTrafficClusterOfSize3-test-0) (40381): {"level":"info","ts":"2024-05-22T10:31:40.994887Z","caller":"etcdserver/server.go:984","msg":"applied snapshot","current-snapshot-index":1590,"current-applied-index":1590,"incoming-leader-snapshot-index":1857,"incoming-leader-snapshot-term":2}
2024-05-22T10:31:41.0060472Z /home/runner/actions-runner/_work/etcd/etcd/bin/etcd (TestRobustnessExploratoryKubernetesHighTrafficClusterOfSize3-test-0) (40381): {"level":"panic","ts":"2024-05-22T10:31:40.994904Z","caller":"schedule/schedule.go:202","msg":"execute job failed","job":"server_applyAll","panic":"failed to open snapshot backend","stacktrace":"go.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob.func1\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:202\nruntime.gopanic\n\truntime/panic.go:770\ngo.uber.org/zap/zapcore.CheckWriteAction.OnWrite\n\tgo.uber.org/zap@v1.27.0/zapcore/entry.go:196\ngo.uber.org/zap/zapcore.(*CheckedEntry).Write\n\tgo.uber.org/zap@v1.27.0/zapcore/entry.go:262\ngo.uber.org/zap.(*Logger).Panic\n\tgo.uber.org/zap@v1.27.0/logger.go:285\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applySnapshot\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1010\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:947\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func6\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:840\ngo.etcd.io/etcd/pkg/v3/schedule.job.Do\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:41\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:206\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.6.0-alpha.0/schedule/schedule.go:187"}
2024-05-22T10:31:41.0074671Z /home/runner/actions-runner/_work/etcd/etcd/bin/etcd (TestRobustnessExploratoryKubernetesHighTrafficClusterOfSize3-test-0) (40381): panic: failed to open snapshot backend [recovered]
2024-05-22T10:31:41.0077589Z /home/runner/actions-runner/_work/etcd/etcd/bin/etcd (TestRobustnessExploratoryKubernetesHighTrafficClusterOfSize3-test-0) (40381): 	panic: execute job failed

What did you expect to happen?

Etcd not to panic

How can we reproduce it (as minimally and precisely as possible)?

TODO

Anything else we need to know?

No response

Etcd version (please run commands below)

v3.6

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

贡献者指南