2020/02/25

InnoDB Clusterの全ノードを正常に停止させたあとの復旧方法

TL;DR

  • MySQL Shellで dba.rebootClusterFromCompleteOutage()

深く考えずにGroup Replicationの全ノードを停止すると、いざ次回起動した時に
2020-02-25T09:14:08.497656Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to xxx.xxx.xxx.xxx:33061 on local port: 33061.'
のようなエラーを吐き続けて最終的に
2020-02-25T09:14:08.497685Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 33061'
2020-02-25T09:14:09.500209Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'
2020-02-25T09:14:12.789547Z 2 [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2020-02-25T09:14:12.789633Z 2 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
GRの起動に失敗する。
タイムアウトとは言っているけれど、「過半数を満たしたグループに接続しようと思ったけれど、接続先がみんな過半数を満たしたグループにいない」からこうなっているんだとは思う。
### node1 
$ mysql -e "SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members" 
+------------------------------+--------------+-------------+ 
| member_host | member_state | member_role | 
+------------------------------+--------------+-------------+ 
| node1 | OFFLINE | | 
+------------------------------+--------------+-------------+ 

### node2
$ mysql -e "SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members" 
+------------------------------+--------------+-------------+ 
| member_host | member_state | member_role | 
+------------------------------+--------------+-------------+ 
| node2 | OFFLINE | | 
+------------------------------+--------------+-------------+ 

### node3
$ mysql -e "SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members" 
+------------------------------+--------------+-------------+ 
| member_host | member_state | member_role | 
+------------------------------+--------------+-------------+ 
| node3 | OFFLINE | | 
+------------------------------+--------------+-------------+
performance_schema.replication_group_members をのぞき込む限りはみんな自分のことしか見えてない。
 MySQL  localhost:33060+ ssl  JS > dba.getCluster()
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (RuntimeError)
MySQL Shellもこの通りエラるが、どれか1台のノードで dba.rebootClusterFromCompleteOutage() を実行すればOK。
 MySQL  localhost:33060+ ssl  JS > dba.rebootClusterFromCompleteOutage()
Reconfiguring the default cluster from complete outage...

The instance 'node2:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance 'node3:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

Disabling super_read_only mode on instance 'node1:3306'.
The cluster was successfully rebooted.

<Cluster:myfabric>
残っているメタデータから「他のノードもこのクラスターにrejoinさせる?」と聞いてくれる充実っぷりなので復旧(というのかこの場合)はらくちん。

壊れはしなかった(オフラインの間にゴニョゴニョしたのが競合してしまった…)

0 件のコメント :

コメントを投稿