GA

2019/11/26

本番データ de CLONEプラグイン、そして思ったこと

TL;DR

  • みんなだいすきCLONEプラグインを使って本番のマスター(稼働中)から本番のスレーブ(諸事情により再構築が必要だった)を作った。
    • 何度かリトライはしたけど、ちゃんと動いているような気がする
  • CLONE INSTANCE FROM .. を実行するターミナルは tmux などで切断から保護しましょう…

CLONEプラグインって何的な人は先にこっちを読んでいただくと良いかも知れない。
取り敢えず
何も考えずにクローン用のアカウントをマスターに作成する。
mysql> CREATE USER cloner@xxx.xxx.xxx.xxx IDENTIFIED BY 'xxxx';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT BACKUP_ADMIN ON *.* TO cloner@xxx.xxx.xxx.xxx;
Query OK, 0 rows affected (0.00 sec)
スレーブ用地から CLONE INSTANCE FROM .. でクローンを開始しようとする。
mysql> SET GLOBAL SEESSION clone_valid_donor_list = 'yyy.yyy.yyy.yyy:3306';
Query OK, 0 rows affected (0.00 sec)

mysql> CLONE INSTANCE FROM cloner@yyy.yyy.yyy.yyy:3306 IDENTIFIED BY 'xxxx';
ERROR 3862 (HY000): Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory).
(つд⊂)ゴシゴシ
ERROR 3862は「ドナーからこんなエラーを受け取った」ためのエラーコードなので、ERROR 1016 Can’t open file .. がドナーから受け取った生のエラー。
クローンに関する進捗とかその他は全てドナーノード側のエラーログに出力される。 Note レベルなのでドナー側が log_error_verbosity= 3 でないと出力されないと思う。
こっちがドナー側のエラーログ。
2019-11-26T10:39:22.974397+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Acquired backup lock.'
2019-11-26T10:39:22.974459+09:00 34186 [Note] [MY-013457] [InnoDB] Clone Begin Master Task by cloner@172.21.160.132
2019-11-26T10:39:22.979792+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_INIT: Storage Initialize.'
2019-11-26T10:39:22.979825+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T10:39:51.981458+09:00 34187 [Note] [MY-013458] [InnoDB] Clone set state change ACK: 1
2019-11-26T10:39:51.981516+09:00 34187 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_ACK: Storage Ack.'
2019-11-26T10:39:51.981544+09:00 34187 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T10:39:52.081398+09:00 34186 [Note] [MY-013458] [InnoDB] Clone Master received state change ACK
2019-11-26T10:39:52.081463+09:00 34186 [Note] [MY-013458] [InnoDB] Clone State Change : Number of tasks = 1
2019-11-26T10:39:52.081483+09:00 34186 [Note] [MY-013458] [InnoDB] Clone State BEGIN FILE COPY
2019-11-26T10:39:52.087133+09:00 34186 [Note] [MY-011845] [InnoDB] Clone Start PAGE ARCH : start LSN : 1927678359668, checkpoint LSN : 1927629671670
2019-11-26T10:39:52.094897+09:00 34186 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1815 Saved Error code: 0
2019-11-26T10:39:52.094936+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_EXECUTE: Storage Execute: error: 1815: Can't open file: './undo001' (errno: 2 - No such file or directory).'
2019-11-26T10:39:52.094954+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Before sending COM_RES_ERROR:  : error: 1815: Can't open file: './undo001' (errno: 2 - No such file or directory).'
2019-11-26T10:39:52.095022+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: After sending COM_RES_ERROR.'
2019-11-26T10:39:52.095295+09:00 34187 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_EXIT: Storage End.'
2019-11-26T10:39:52.095347+09:00 34187 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T10:39:52.095375+09:00 34187 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Exiting clone protocol.'
2019-11-26T10:39:52.096028+09:00 34186 [Note] [MY-011846] [InnoDB] Clone Stop  PAGE ARCH : end   LSN : 1927629671670, log sys LSN : 1927678363098
2019-11-26T10:39:52.096207+09:00 34186 [Note] [MY-013457] [InnoDB] Clone End Master Task ID: 0 Passed, code: 0:
2019-11-26T10:39:52.096234+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_EXIT: Storage End.'
2019-11-26T10:39:52.096268+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T10:39:52.096302+09:00 34186 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Exiting clone protocol.'
こっちがレシピエント側のエラーログ。
2019-11-26T10:39:22.963763+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2019-11-26T10:39:22.974258+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Master ACK Connect.'
2019-11-26T10:39:22.974323+09:00 139 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Version Check
2019-11-26T10:39:22.975340+09:00 139 [Note] [MY-013457] [InnoDB] Clone Apply Version End Master Task ID: 0 Passed, code: 0:
2019-11-26T10:39:22.975382+09:00 139 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Task
2019-11-26T10:39:22.976280+09:00 139 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
2019-11-26T10:39:22.976316+09:00 139 [Note] [MY-011977] [InnoDB] Clone Drop all user data
2019-11-26T10:39:24.122046+09:00 139 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 2314 task: 0
2019-11-26T10:39:51.598531+09:00 139 [Note] [MY-011977] [InnoDB] Clone Drop User schemas
2019-11-26T10:39:51.598901+09:00 139 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 14 task: 0
2019-11-26T10:39:51.623511+09:00 139 [Note] [MY-011977] [InnoDB] Clone Drop User tablespaces
2019-11-26T10:39:51.624095+09:00 139 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 6 task: 0
2019-11-26T10:39:51.625091+09:00 132 [Note] [MY-010596] [Repl] Error reading relay log event for channel '': slave SQL thread was killed
2019-11-26T10:39:51.625136+09:00 132 [Note] [MY-010587] [Repl] Slave SQL thread for channel '' exiting, replication stopped in log 'FIRST' at position 0
2019-11-26T10:39:51.742987+09:00 139 [Note] [MY-011977] [InnoDB] Clone Drop: finished successfully
2019-11-26T10:39:51.743073+09:00 139 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished
2019-11-26T10:39:51.981055+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_INIT.'
2019-11-26T10:39:52.095229+09:00 139 [Note] [MY-013458] [InnoDB] Clone Set Error code: 3862 Saved Error code: 0
2019-11-26T10:39:52.095271+09:00 139 [Note] [MY-013458] [InnoDB] Clone Apply set error code: 3862: Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory).
2019-11-26T10:39:52.095290+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Wait for remote after local issue: error: 3862: Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory)..'
2019-11-26T10:39:52.095305+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_EXECUTE: error: 3862: Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory)..'
2019-11-26T10:39:52.095552+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Master ACK COM_EXIT.'
2019-11-26T10:39:52.095941+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Master ACK Disconnect : abort: false.'
2019-11-26T10:39:52.096479+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task COM_EXIT.'
2019-11-26T10:39:52.096798+09:00 139 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Disconnect : abort: false.'
2019-11-26T10:39:52.096858+09:00 139 [Note] [MY-013458] [InnoDB] Clone Set Error code: 3862 Saved Error code: 3862
2019-11-26T10:39:52.096890+09:00 139 [Note] [MY-013458] [InnoDB] Clone Apply set error code: 3862: Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory).
2019-11-26T10:39:52.096944+09:00 139 [Note] [MY-013458] [InnoDB] Clone Set Error code: 3862 Saved Error code: 3862
2019-11-26T10:39:52.097189+09:00 139 [Note] [MY-013457] [InnoDB] Clone Apply End Master Task ID: 0 Failed, code: 3862: Clone Donor Error: 1016 : Can't open file: './undo001' (errno: 2 - No such file or directory).
目をこらして良く見ていただくとわかるのだが、 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished なので、クローンが始まった時点でレシピエントのdatadirからは諸々が消える。

ちなみにドナー側の undo001 、ちゃんとあるんですけどねってすったもんだした後、レシピエントを再起動したりドナーの innodb_undo_log_truncate = ON にしてみたり色々したけれど、結局ドナー(= 稼働中のマスター)を再起動するまでクローンは同じエラーで転け続けた。
  • ひょっとしたら、ドナーを5.7から8.0にバージョンアップした時に、「自動で更新かかるから良いっしょー」ってアップグレード後に再起動してなかったのがいけないんじゃないかなと思っている( mysql_upgrade の時は自分で再起動してたけど、すっかり忘れていたんじゃないかなと)
mysql> CLONE INSTANCE FROM cloner@yyy.yyy.yyy.yyy:3306 IDENTIFIED BY 'xxxx';
Query OK, 0 rows affected (37 min 23.83 sec)
再起動後はすんなり通った。
レシピエントの方のエラーログには
2019-11-26T11:08:19.464385+09:00 7 [Note] [MY-011978] [InnoDB] Clone estimated size: 188.15 GiB Available space: 359.75 GiB
と書いてあったので、「何も設定しない(= ほぼ無制限にCLONEがリソースを欲しがる)状態」で85MB/sくらい。
m4.xlarge + 900IOPSなgp2のEBSなので「マスターでトラフィックを受けながらにしてはまあまあ?」みたいな感じはする。
ただしまあ dstat を見てたら結構iowait出てるしストレージもかなり読むし出ていくパケットも多いし…で、2台目やる時はちょっと制限しようかなとか思った(↓ はドナー側)
$ dstat -amrl
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- --io/total- ---load-avg---
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  buff  cach  free| read  writ| 1m   5m  15m
 53   1  12  33   0   0|  90M  318k| 337k   90M|   0     0 |  20k 5285 |12.0G    0  1602M 1952M| 751   105 |5.57 6.21 5.83
 62   1   7  30   0   0|  90M  378k| 220k   90M|   0     0 |  16k 5335 |12.0G    0  1602M 1953M| 752   118 |5.57 6.21 5.83
 53   1  17  29   0   1|  89M  321k| 291k   89M|   0     0 |  18k 5428 |12.0G    0  1602M 1950M| 751   103 |5.57 6.21 5.83

2019-11-26T11:08:46.108105+09:00 112 [Note] [MY-013458] [InnoDB] Stage progress: 1% completed.
2019-11-26T11:10:31.546724+09:00 112 [Note] [MY-013458] [InnoDB] Stage progress: 11% completed.
制限するにはこの辺のパラメーターをいじるんだけれど、
複数個所いっぺんに変えるとわからなくなる(というか、なったので一度中断してやり直している)ので今回は clone_enable_compression = ONだけ変更して実行(↓もドナー側)
mysql> SET GLOBAL SEESSION clone_valid_donor_list = 'yyy.yyy.yyy.yyy:3306';
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL clone_enable_compression = ON;
Query OK, 0 rows affected (0.00 sec)

mysql> CLONE INSTANCE FROM cloner@yyy.yyy.yyy.yyy:3306 IDENTIFIED BY 'xxxx';

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- --io/total-
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  buff  cach  free| read  writ
 50   0  46   4   0   0|  14M 2675k|  47k 4043k|   0     0 |7780  6601 |12.0G    0  2152M 1425M| 143   277
 49   1  48   3   0   0|  16M  692k|  29k 4434k|   0     0 |5988  4010 |12.0G    0  2152M 1425M| 171  92.0
 50   1  47   3   0   0|  12M  843k|  35k 3330k|   0     0 |7091  5590 |12.0G    0  2152M 1425M| 127   141

2019-11-26T12:17:54.871875+09:00 474 [Note] [MY-013458] [InnoDB] Stage progress: 1% completed.
2019-11-26T12:27:37.153130+09:00 474 [Note] [MY-013458] [InnoDB] Stage progress: 11% completed.
( ゚д゚)
いやー、IOがぐっと下がったのは良いけど、遅い遅い。
読み出し速度が14MB/secしか出てなくて単純計算6倍近くかかるってことは3時間かー…ごはんでも食べてこようそうしよう。
_人人人人人_
> 時は流れた <
 ̄Y^Y^Y^Y^Y^ ̄
さーてクローンどうなったかな。
mysql> CLONE INSTANCE FROM cloner@yyy.yyy.yyy.yyy:3306 IDENTIFIED BY 'xxxx';
Timeout, server 192.168.211.58 not responding.
Σ(゚д゚lll) VPN切れてる!?
2019-11-26T15:31:25.203117+09:00 474 [Note] [MY-013458] [InnoDB] Stage progress: 96% completed.
2019-11-26T15:33:30.443342+09:00 475 [Note] [MY-013458] [InnoDB] Clone set error ACK: 1317
2019-11-26T15:33:30.443418+09:00 475 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1317 Saved Error code: 0
2019-11-26T15:33:30.443440+09:00 475 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_ACK: Storage Ack.'
2019-11-26T15:33:30.443488+09:00 475 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T15:33:30.443902+09:00 475 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_EXIT: Storage End.'
2019-11-26T15:33:30.443943+09:00 475 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_RES_COMPLETE.'
2019-11-26T15:33:30.443959+09:00 475 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Exiting clone protocol.'
2019-11-26T15:33:30.464396+09:00 474 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1160 Saved Error code: 1317
2019-11-26T15:33:30.464443+09:00 474 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1160 Saved Error code: 1317
2019-11-26T15:33:30.464465+09:00 474 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: COM_EXECUTE: Storage Execute: error: 1160: Got an error writing communication packets.'
2019-11-26T15:33:30.464491+09:00 474 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Before sending COM_RES_ERROR: network : error: 1160: Got an error writing communication packets.'
2019-11-26T15:33:30.464566+09:00 474 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: After sending COM_RES_ERROR: error: 1160: Got an error writing communication packets.'
2019-11-26T15:33:30.464646+09:00 474 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1317 Saved Error code: 1317
2019-11-26T15:33:30.464687+09:00 474 [Note] [MY-011846] [InnoDB] Clone Stop  PAGE ARCH : end   LSN : 1928777840909, log sys LSN : 1928826565652
2019-11-26T15:33:30.464922+09:00 474 [Note] [MY-013457] [InnoDB] Clone End Master Task ID: 0 Failed, code: 1317: Got an error writing communication packets
2019-11-26T15:33:30.464955+09:00 474 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Exiting clone protocol: error: 1160: Got an error writing communication packets.'
Σ(゚д゚lll)
というわけで今、圧縮しない設定でCLONE取り直してます。ちゃんとtmux使って。
機会があれば他のパラメーターも試す。

0 件のコメント :

コメントを投稿