MHA切换日志分析

PHPABC Mysql 1,242 次浏览 , , 没有评论

下面是MHA从启动,到failover的完整manager.log,以及集群中所有mysql的general.log.我们可以通过这些日志,更多的了解和分析MHA的运行原理.对用好MHA有很多帮助。
架构如下:
DB1 MASTER
DB2 SLAVE,candidate_master
DB3 SLAVE

DB1:

 [root@localhost mysql]# cat /var/lib/mysql/localhost.log   140928 10:17:04     5 Connect   mha@192.168.153.150 on                     5 Query     set autocommit=1                     5 Query     SELECT CONNECTION_ID() AS Value                     6 Connect   mha@192.168.153.150 on                     6 Query     set autocommit=1                     6 Query     SELECT CONNECTION_ID() AS Value                     6 Query     SET wait_timeout=86400                     6 Query     SELECT @@global.server_id As Value                     6 Query     SELECT VERSION() AS Value                     6 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                     6 Query     SHOW MASTER STATUS                     6 Query     SELECT @@global.relay_log_info_repository AS Value                     6 Query     SELECT @@global.datadir AS Value                     6 Query     SELECT @@global.relay_log_info_file AS Value                     6 Query     SELECT @@global.datadir AS Value                     6 Query     SHOW SLAVE STATUS                     6 Query     SELECT @@global.read_only As Value                     6 Query     SELECT @@global.relay_log_purge As Value 140928 10:17:11     7 Connect   mha@192.168.153.150 on                     7 Query     select @@version_comment limit 1                     7 Query     select count(*) as c from mysql.user                     6 Quit                     7 Quit                     8 Connect   mha@192.168.153.150 on                     8 Query     set autocommit=1                     8 Query     SET wait_timeout=6                     8 Query     SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '12') AS Value                     8 Query     SELECT 1 As Value 140928 10:17:14     8 Query     SELECT 1 As Value 140928 10:17:17     8 Query     SELECT 1 As Value 140928 10:17:20     8 Query     SELECT 1 As Value 140928 10:17:23     8 Query     SELECT 1 As Value 140928 10:17:26     8 Query     SELECT 1 As Value 140928 10:17:29     8 Query     SELECT 1 As Value 140928 10:17:32     8 Query     SELECT 1 As Value 140928 10:17:35     8 Query     SELECT 1 As Value 140928 10:17:38     8 Query     SELECT 1 As Value 

DB2:

 [root@localhost mysql]# cat /var/lib/mysql/localhost.log   140928 10:17:04   214 Connect   mha@192.168.153.150 on                   214 Query     set autocommit=1                   214 Query     SELECT CONNECTION_ID() AS Value                   215 Connect   mha@192.168.153.150 on                   215 Query     set autocommit=1                   215 Query     SELECT CONNECTION_ID() AS Value                   215 Query     SET wait_timeout=86400                   215 Query     SELECT @@global.server_id As Value                   215 Query     SELECT VERSION() AS Value                   215 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   215 Query     SHOW MASTER STATUS                   215 Query     SELECT @@global.relay_log_info_repository AS Value                   215 Query     SELECT @@global.datadir AS Value                   215 Query     SELECT @@global.relay_log_info_file AS Value                   215 Query     SELECT @@global.datadir AS Value                   215 Query     SHOW SLAVE STATUS                   215 Query     SELECT @@global.read_only As Value                   215 Query     SELECT @@global.relay_log_purge As Value                   215 Query     SHOW SLAVE STATUS                   215 Query     SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'slave' 140928 10:17:09   216 Connect   mha@192.168.153.151 on                   216 Query     select @@version_comment limit 1                   216 Query     set sql_log_bin=0                   216 Query     create table if not exists mysql.apply_diff_relay_logs_test(id int)                   216 Query     insert into mysql.apply_diff_relay_logs_test values(1)                   216 Query     update mysql.apply_diff_relay_logs_test set id=id+1 where id=1                   216 Query     delete from mysql.apply_diff_relay_logs_test                   216 Query     drop table mysql.apply_diff_relay_logs_test 140928 10:17:10   216 Quit 140928 10:17:11   215 Quit 140928 10:17:50   217 Connect   mha@192.168.153.150 on                   217 Query     set autocommit=1                   217 Query     SELECT CONNECTION_ID() AS Value                   218 Connect   mha@192.168.153.150 on                   218 Query     set autocommit=1                   218 Query     SELECT CONNECTION_ID() AS Value                   218 Query     SET wait_timeout=86400                   218 Query     SELECT @@global.server_id As Value                   218 Query     SELECT VERSION() AS Value                   218 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   218 Query     SHOW MASTER STATUS                   218 Query     SELECT @@global.relay_log_info_repository AS Value                   218 Query     SELECT @@global.datadir AS Value                   218 Query     SELECT @@global.relay_log_info_file AS Value                   218 Query     SELECT @@global.datadir AS Value                   218 Query     SHOW SLAVE STATUS                   218 Query     SELECT @@global.read_only As Value                   218 Query     SELECT @@global.relay_log_purge As Value                   218 Query     SHOW SLAVE STATUS                   219 Connect   mha@192.168.153.150 on                   219 Query     set autocommit=1                   219 Query     SELECT CONNECTION_ID() AS Value                   220 Connect   mha@192.168.153.150 on                   220 Query     set autocommit=1                   220 Query     SELECT CONNECTION_ID() AS Value                   220 Query     SET wait_timeout=86400                   220 Query     SELECT @@global.server_id As Value                   220 Query     SELECT VERSION() AS Value                   220 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   220 Query     SHOW MASTER STATUS                   220 Query     SELECT @@global.relay_log_info_repository AS Value                   220 Query     SELECT @@global.datadir AS Value                   220 Query     SELECT @@global.relay_log_info_file AS Value                   220 Query     SELECT @@global.datadir AS Value                   220 Query     SHOW SLAVE STATUS                   220 Query     SELECT @@global.read_only As Value                   220 Query     SELECT @@global.relay_log_purge As Value                   220 Query     SHOW SLAVE STATUS                   220 Query     SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value                   220 Query     SHOW SLAVE STATUS                   220 Query     STOP SLAVE IO_THREAD                   220 Query     SHOW SLAVE STATUS 140928 10:17:51   220 Query     SHOW SLAVE STATUS                   220 Query     SHOW SLAVE STATUS                   220 Query     STOP SLAVE SQL_THREAD                   220 Query     SHOW SLAVE STATUS                   220 Query     SHOW MASTER STATUS                   220 Query     SELECT @@global.read_only As Value                   221 Connect   slave@192.168.153.152 on                   221 Query     SELECT UNIX_TIMESTAMP()                   221 Query     SHOW VARIABLES LIKE 'SERVER_ID'                   221 Query     SET @master_heartbeat_period= 1799999979520                   221 Query     SET @master_binlog_checksum= @@global.binlog_checksum                   221 Query     SELECT @master_binlog_checksum                   221 Query     SELECT @@GLOBAL.GTID_MODE                   221 Query     SHOW VARIABLES LIKE 'SERVER_UUID'                   221 Query     SET @slave_uuid= 'dda2f003-4244-11e4-b851-000c29da163f'                   221 Binlog Dump       Log: 'mysql-bin.000016'  Pos: 688                   220 Query     STOP SLAVE                   220 Query     SHOW SLAVE STATUS                   220 Query     RESET SLAVE /*!50516 ALL */                   220 Query     SHOW SLAVE STATUS                   220 Query     SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value                   220 Quit 

DB3:

 [root@localhost ~]# cat /var/lib/mysql/localhost.log 140928 10:17:04   223 Connect   mha@192.168.153.150 on                   223 Query     set autocommit=1                   223 Query     SELECT CONNECTION_ID() AS Value                   224 Connect   mha@192.168.153.150 on                   224 Query     set autocommit=1                   224 Query     SELECT CONNECTION_ID() AS Value                   224 Query     SET wait_timeout=86400                   224 Query     SELECT @@global.server_id As Value                   224 Query     SELECT VERSION() AS Value                   224 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   224 Query     SHOW MASTER STATUS                   224 Query     SELECT @@global.relay_log_info_repository AS Value                   224 Query     SELECT @@global.datadir AS Value                   224 Query     SELECT @@global.relay_log_info_file AS Value                   224 Query     SELECT @@global.datadir AS Value                   224 Query     SHOW SLAVE STATUS                   224 Query     SELECT @@global.read_only As Value                   224 Query     SELECT @@global.relay_log_purge As Value                   224 Query     SHOW SLAVE STATUS 140928 10:17:10   225 Connect   mha@192.168.153.152 on                   225 Query     select @@version_comment limit 1                   225 Query     set sql_log_bin=0                   225 Query     create table if not exists mysql.apply_diff_relay_logs_test(id int)                   225 Query     insert into mysql.apply_diff_relay_logs_test values(1)                   225 Query     update mysql.apply_diff_relay_logs_test set id=id+1 where id=1                   225 Query     delete from mysql.apply_diff_relay_logs_test                   225 Query     drop table mysql.apply_diff_relay_logs_test 140928 10:17:11   225 Quit                   224 Quit #开始failover流程的监测 140928 10:17:50   226 Connect   mha@192.168.153.150 on                   226 Query     set autocommit=1                   226 Query     SELECT CONNECTION_ID() AS Value                   227 Connect   mha@192.168.153.150 on                   227 Query     set autocommit=1                   227 Query     SELECT CONNECTION_ID() AS Value                   227 Query     SET wait_timeout=86400                   227 Query     SELECT @@global.server_id As Value                   227 Query     SELECT VERSION() AS Value                   227 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   227 Query     SHOW MASTER STATUS                   227 Query     SELECT @@global.relay_log_info_repository AS Value                   227 Query     SELECT @@global.datadir AS Value                   227 Query     SELECT @@global.relay_log_info_file AS Value                   227 Query     SELECT @@global.datadir AS Value                   227 Query     SHOW SLAVE STATUS                   227 Query     SELECT @@global.read_only As Value                   227 Query     SELECT @@global.relay_log_purge As Value                   227 Query     SHOW SLAVE STATUS #正式开始failover的监测                   228 Connect   mha@192.168.153.150 on                   228 Query     set autocommit=1                   228 Query     SELECT CONNECTION_ID() AS Value                   229 Connect   mha@192.168.153.150 on                   229 Query     set autocommit=1                   229 Query     SELECT CONNECTION_ID() AS Value                   229 Query     SET wait_timeout=86400                   229 Query     SELECT @@global.server_id As Value                   229 Query     SELECT VERSION() AS Value                   229 Query     SHOW GLOBAL VARIABLES LIKE 'log_bin'                   229 Query     SHOW MASTER STATUS                   229 Query     SELECT @@global.relay_log_info_repository AS Value                   229 Query     SELECT @@global.datadir AS Value                   229 Query     SELECT @@global.relay_log_info_file AS Value                   229 Query     SELECT @@global.datadir AS Value                   229 Query     SHOW SLAVE STATUS                   229 Query     SELECT @@global.read_only As Value                   229 Query     SELECT @@global.relay_log_purge As Value                   229 Query     SHOW SLAVE STATUS                   229 Query     SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value                   229 Query     SHOW SLAVE STATUS                   229 Query     STOP SLAVE IO_THREAD                   229 Query     SHOW SLAVE STATUS 140928 10:17:51   229 Query     SHOW SLAVE STATUS                   229 Query     SHOW SLAVE STATUS                   229 Query     SHOW SLAVE STATUS                   229 Query     SHOW SLAVE STATUS                   229 Query     STOP SLAVE SQL_THREAD                   229 Query     SHOW SLAVE STATUS                   229 Query     STOP SLAVE                   229 Query     SHOW SLAVE STATUS                   229 Query     RESET SLAVE                   229 Query     CHANGE MASTER TO MASTER_HOST = '192.168.153.151' MASTER_USER = 'slave' MASTER_PASSWORD =  MASTER_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000016' MASTER_LOG_POS = 688                   229 Query     START SLAVE                   230 Connect Out       slave@192.168.153.151:3306                   229 Query     SHOW SLAVE STATUS                   229 Query     SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value                   229 Quit 140928 10:18:07   236 Connect   root@localhost on                   236 Query     select @@version_comment limit 1                   236 Query     select USER() 140928 10:18:10   236 Quit 

MHA Manage.log:

 [root@localhost ~]# cat /var/log/masterha/app1/manager.log #启动前的准备工作  #检查服务器状态,获取相关参数设置 Sun Sep 28 10:17:03 2014 - [info] MHA::MasterMonitor version 0.55. Sun Sep 28 10:17:04 2014 - [info] Dead Servers: Sun Sep 28 10:17:04 2014 - [info] Alive Servers: Sun Sep 28 10:17:04 2014 - [info]   192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info]   192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:04 2014 - [info]   192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:04 2014 - [info] Alive Slaves: Sun Sep 28 10:17:04 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:04 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:04 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:04 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info]     Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:04 2014 - [info] Current Alive Master: 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info] Checking slave configurations.. Sun Sep 28 10:17:04 2014 - [info]  read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:04 2014 - [warning]  relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:04 2014 - [info]  read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:04 2014 - [warning]  relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:04 2014 - [info] Checking replication filtering settings.. Sun Sep 28 10:17:04 2014 - [info]  binlog_do_db= , binlog_ignore_db= Sun Sep 28 10:17:04 2014 - [info]  Replication filtering check ok. #测试ssh连接是否成功 Sun Sep 28 10:17:04 2014 - [info] Starting SSH connection tests.. Sun Sep 28 10:17:07 2014 - [info] All SSH connection tests passed successfully. #测试MHA node是否可用 Sun Sep 28 10:17:07 2014 - [info] Checking MHA Node version.. Sun Sep 28 10:17:08 2014 - [info]  Version check ok. Sun Sep 28 10:17:08 2014 - [info] Checking SSH publickey authentication settings on the current master.. Sun Sep 28 10:17:08 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable. Sun Sep 28 10:17:09 2014 - [info] Master MHA Node version is 0.54. Sun Sep 28 10:17:09 2014 - [info] Checking recovery script configurations on the current master.. Sun Sep 28 10:17:09 2014 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000026 #创建MHA日志目录 Sun Sep 28 10:17:09 2014 - [info]   Connecting to root@192.168.153.150(192.168.153.150)..   Creating /var/log/masterha/app1 if not exists..    ok.   Checking output directory is accessible or not..    ok.   Binlog found at /var/lib/mysql, up to mysql-bin.000026 Sun Sep 28 10:17:09 2014 - [info] Master setting check done. #master的检查到此完毕  #开始检查slave的差异日志应用权限 Sun Sep 28 10:17:09 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun Sep 28 10:17:09 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.151 --slave_ip=192.168.153.151 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx Sun Sep 28 10:17:09 2014 - [info]   Connecting to root@192.168.153.151(192.168.153.151:22)..   Checking slave recovery environment settings..     Opening /var/lib/mysql/relay-log.info ... ok.     Relay log found at /var/lib/mysql, up to mysql-relay.000003     Temporary relay log file is /var/lib/mysql/mysql-relay.000003     Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.  done.     Testing mysqlbinlog output.. done.     Cleaning up test file(s).. done. Sun Sep 28 10:17:10 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.152 --slave_ip=192.168.153.152 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx Sun Sep 28 10:17:10 2014 - [info]   Connecting to root@192.168.153.152(192.168.153.152:22)..   Checking slave recovery environment settings..     Opening /var/lib/mysql/relay-log.info ... ok.     Relay log found at /var/lib/mysql, up to mysql-relay.000003     Temporary relay log file is /var/lib/mysql/mysql-relay.000003     Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.  done.     Testing mysqlbinlog output.. done.     Cleaning up test file(s).. done. Sun Sep 28 10:17:11 2014 - [info] Slaves settings check done. Sun Sep 28 10:17:11 2014 - [info] 192.168.153.150 (current master)  +--192.168.153.151  +--192.168.153.152  #调试master_ip_failover_script Sun Sep 28 10:17:11 2014 - [warning] master_ip_failover_script is not defined. #调试shutdown_script Sun Sep 28 10:17:11 2014 - [info] Checking shutdown script status: Sun Sep 28 10:17:11 2014 - [info]   /opt/master_ip_failover.sh --command=status --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150 Sun Sep 28 10:17:11 2014 - [info]  OK. Sun Sep 28 10:17:11 2014 - [info] Set master ping interval 3 seconds. #设置二次检查的主机 Sun Sep 28 10:17:11 2014 - [info] Set secondary check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 Sun Sep 28 10:17:11 2014 - [info] Starting ping health check on 192.168.153.150(192.168.153.150:3306)..  #MHA启动完毕,进入监测状态 Sun Sep 28 10:17:11 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..  #监测DB1服务器挂了 Sun Sep 28 10:17:41 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)  #通过定义的二次监测,确认master是否挂了 Sun Sep 28 10:17:41 2014 - [info] Executing seconary network check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152  --user=root  --master_host=192.168.153.150  --master_ip=192.168.153.150  --master_port=3306 Sun Sep 28 10:17:41 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin Monitoring server 192.168.153.151 is reachable, Master is not reachable from 192.168.153.151. OK. Sun Sep 28 10:17:42 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable. Monitoring server 192.168.153.152 is reachable, Master is not reachable from 192.168.153.152. OK.  #确认master挂了,开始进入failover流程 #再试尝试连接master和master的ssh Sun Sep 28 10:17:42 2014 - [info] Master is not reachable from all other monitoring servers. Failover should start. Sun Sep 28 10:17:44 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:44 2014 - [warning] Connection failed 1 time(s).. Sun Sep 28 10:17:47 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:47 2014 - [warning] Connection failed 2 time(s).. Sun Sep 28 10:17:50 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:50 2014 - [warning] Connection failed 3 time(s).. Sun Sep 28 10:17:50 2014 - [warning] Master is not reachable from health checker! Sun Sep 28 10:17:50 2014 - [warning] Master 192.168.153.150(192.168.153.150:3306) is not reachable! Sun Sep 28 10:17:50 2014 - [warning] SSH is reachable. #通过配置文件,监测其他slave的状态 Sun Sep 28 10:17:50 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status.. Sun Sep 28 10:17:50 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Sep 28 10:17:50 2014 - [info] Reading application default configurations from /etc/app1.cnf.. Sun Sep 28 10:17:50 2014 - [info] Reading server configurations from /etc/app1.cnf.. Sun Sep 28 10:17:50 2014 - [info] Dead Servers: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Servers: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:50 2014 - [info]   192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Slaves: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:50 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info]     Not candidate for the new Master (no_master is set) #再次监测slave的配置是否有变化,是否符合failover条件 Sun Sep 28 10:17:50 2014 - [info] Checking slave configurations.. Sun Sep 28 10:17:50 2014 - [info]  read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:50 2014 - [warning]  relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:50 2014 - [info]  read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:50 2014 - [warning]  relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:50 2014 - [info] Checking replication filtering settings.. Sun Sep 28 10:17:50 2014 - [info]  Replication filtering check ok. Sun Sep 28 10:17:50 2014 - [info] Master is down! Sun Sep 28 10:17:50 2014 - [info] Terminating monitoring script. Sun Sep 28 10:17:50 2014 - [info] Got exit code 20 (Master dead). Sun Sep 28 10:17:50 2014 - [info] MHA::MasterFailover version 0.55. #正式开始failover Sun Sep 28 10:17:50 2014 - [info] Starting master failover. Sun Sep 28 10:17:50 2014 - [info]   #再次对slave配置做检查 Sun Sep 28 10:17:50 2014 - [info] * Phase 1: Configuration Check Phase.. Sun Sep 28 10:17:50 2014 - [info] Sun Sep 28 10:17:50 2014 - [info] Dead Servers: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Checking master reachability via mysql(double check).. Sun Sep 28 10:17:50 2014 - [info]  ok. Sun Sep 28 10:17:50 2014 - [info] Alive Servers: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:50 2014 - [info]   192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Slaves: Sun Sep 28 10:17:50 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:50 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info]     Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:50 2014 - [info] ** Phase 1: Configuration Check Phase completed. Sun Sep 28 10:17:50 2014 - [info]   #对原Master做master_ip_failover_script和shutdown_script的操作 Sun Sep 28 10:17:50 2014 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Sep 28 10:17:50 2014 - [info] Sun Sep 28 10:17:50 2014 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Sep 28 10:17:50 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address. Sun Sep 28 10:17:50 2014 - [info] Executing SHUTDOWN script: Sun Sep 28 10:17:50 2014 - [info]   /opt/master_ip_failover.sh --command=stopssh --ssh_user=root  --host=192.168.153.150  --ip=192.168.153.150  --port=3306 Sun Sep 28 10:17:51 2014 - [info]  Power off done.  Sun Sep 28 10:17:51 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun Sep 28 10:17:51 2014 - [info]   #开始差异日志的恢复:获取slave最后得到的binlog位置 Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] The latest binary log file/position on all slaves is mysql-bin.000026:120 Sun Sep 28 10:17:51 2014 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun Sep 28 10:17:51 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000026:120 Sun Sep 28 10:17:51 2014 - [info] Oldest slaves: Sun Sep 28 10:17:51 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info] #获取原master的binlog日志 Sun Sep 28 10:17:51 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Sun Sep 28 10:17:51 2014 - [info]   #确定新的master Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: Determining New Master Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun Sep 28 10:17:51 2014 - [info] All slaves received relay logs to the same position. No need to resync each other. Sun Sep 28 10:17:51 2014 - [info] Searching new master from slaves.. Sun Sep 28 10:17:51 2014 - [info]  Candidate masters from the configuration file: Sun Sep 28 10:17:51 2014 - [info]   192.168.153.151(192.168.153.151:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info]  Non-candidate masters: Sun Sep 28 10:17:51 2014 - [info]   192.168.153.152(192.168.153.152:3306)  Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info]     Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info]     Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info]  Searching from candidate_master slaves which have received the latest relay log events.. Sun Sep 28 10:17:51 2014 - [info] New master is 192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:51 2014 - [info] Starting master failover.. Sun Sep 28 10:17:51 2014 - [info] From: 192.168.153.150 (current master)  +--192.168.153.151  +--192.168.153.152  To: 192.168.153.151 (new master)  +--192.168.153.152 Sun Sep 28 10:17:51 2014 - [info] #在new master上应用差异的binlog日志 Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 3.4: Master Log Apply Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.151(192.168.153.151:3306).. Sun Sep 28 10:17:51 2014 - [info]  This server has all relay logs. Waiting all logs to be applied.. Sun Sep 28 10:17:51 2014 - [info]   done. Sun Sep 28 10:17:51 2014 - [info]  All relay logs were successfully applied. #获取new master的binlog位置。 Sun Sep 28 10:17:51 2014 - [info] Getting new master's binlog name and position.. Sun Sep 28 10:17:51 2014 - [info]  mysql-bin.000016:688 Sun Sep 28 10:17:51 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.153.151', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=688, MASTER_USER='slave', MASTER_PASSWORD='xxx'; #如果有master_ip_failover_script,那么给new master设置VIP Sun Sep 28 10:17:51 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address. Sun Sep 28 10:17:51 2014 - [info] ** Finished master recovery successfully. Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase completed. Sun Sep 28 10:17:51 2014 - [info]   #开始恢复其他slave,也是从原master的binlog对比来做恢复 Sun Sep 28 10:17:51 2014 - [info] * Phase 4: Slaves Recovery Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] -- Slave diff file generation on host 192.168.153.152(192.168.153.152:3306) started, pid: 8387. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ... Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave. Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152. Sun Sep 28 10:17:51 2014 - [info] -- 192.168.153.152(192.168.153.152:3306) has the latest relay log events. Sun Sep 28 10:17:51 2014 - [info] Generating relay diff files from the latest slave succeeded. Sun Sep 28 10:17:51 2014 - [info]   Sun Sep 28 10:17:51 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) started, pid: 8389. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ... Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.152(192.168.153.152:3306).. Sun Sep 28 10:17:51 2014 - [info]  This server has all relay logs. Waiting all logs to be applied.. Sun Sep 28 10:17:51 2014 - [info]   done. Sun Sep 28 10:17:51 2014 - [info]  All relay logs were successfully applied. #差异日志应用完成以后,切换所有slave到new master。 Sun Sep 28 10:17:51 2014 - [info]  Resetting slave 192.168.153.152(192.168.153.152:3306) and starting replication from the new master 192.168.153.151(192.168.153.151:3306).. Sun Sep 28 10:17:51 2014 - [info]  Executed CHANGE MASTER. Sun Sep 28 10:17:51 2014 - [info]  Slave started. Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152. Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) succeeded. Sun Sep 28 10:17:51 2014 - [info] All new slave servers recovered successfully. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 5: New master cleanup phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Resetting slave info on the new master.. Sun Sep 28 10:17:51 2014 - [info]  192.168.153.151: Resetting slave info succeeded. Sun Sep 28 10:17:51 2014 - [info] Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully. Sun Sep 28 10:17:51 2014 - [info]   #failover操作完成,生成failover报告 ----- Failover Report -----  app1: MySQL Master failover 192.168.153.150 to 192.168.153.151 succeeded  Master 192.168.153.150 is down!  Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.  Started automated(non-interactive) failover. Power off 192.168.153.150. The latest slave 192.168.153.151(192.168.153.151:3306) has all relay logs for recovery. Selected 192.168.153.151 as a new master. 192.168.153.151: OK: Applying all logs succeeded. 192.168.153.152: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.153.152: OK: Applying all logs succeeded. Slave started, replicating from 192.168.153.151. 192.168.153.151: Resetting slave info succeeded. Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully. 

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据

Go