下面是MHA从启动,到failover的完整manager.log,以及集群中所有mysql的general.log.我们可以通过这些日志,更多的了解和分析MHA的运行原理.对用好MHA有很多帮助。
架构如下:
DB1 MASTER
DB2 SLAVE,candidate_master
DB3 SLAVE
DB1:
[root@localhost mysql]# cat /var/lib/mysql/localhost.log 140928 10:17:04 5 Connect mha@192.168.153.150 on 5 Query set autocommit=1 5 Query SELECT CONNECTION_ID() AS Value 6 Connect mha@192.168.153.150 on 6 Query set autocommit=1 6 Query SELECT CONNECTION_ID() AS Value 6 Query SET wait_timeout=86400 6 Query SELECT @@global.server_id As Value 6 Query SELECT VERSION() AS Value 6 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 6 Query SHOW MASTER STATUS 6 Query SELECT @@global.relay_log_info_repository AS Value 6 Query SELECT @@global.datadir AS Value 6 Query SELECT @@global.relay_log_info_file AS Value 6 Query SELECT @@global.datadir AS Value 6 Query SHOW SLAVE STATUS 6 Query SELECT @@global.read_only As Value 6 Query SELECT @@global.relay_log_purge As Value 140928 10:17:11 7 Connect mha@192.168.153.150 on 7 Query select @@version_comment limit 1 7 Query select count(*) as c from mysql.user 6 Quit 7 Quit 8 Connect mha@192.168.153.150 on 8 Query set autocommit=1 8 Query SET wait_timeout=6 8 Query SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '12') AS Value 8 Query SELECT 1 As Value 140928 10:17:14 8 Query SELECT 1 As Value 140928 10:17:17 8 Query SELECT 1 As Value 140928 10:17:20 8 Query SELECT 1 As Value 140928 10:17:23 8 Query SELECT 1 As Value 140928 10:17:26 8 Query SELECT 1 As Value 140928 10:17:29 8 Query SELECT 1 As Value 140928 10:17:32 8 Query SELECT 1 As Value 140928 10:17:35 8 Query SELECT 1 As Value 140928 10:17:38 8 Query SELECT 1 As Value
DB2:
[root@localhost mysql]# cat /var/lib/mysql/localhost.log 140928 10:17:04 214 Connect mha@192.168.153.150 on 214 Query set autocommit=1 214 Query SELECT CONNECTION_ID() AS Value 215 Connect mha@192.168.153.150 on 215 Query set autocommit=1 215 Query SELECT CONNECTION_ID() AS Value 215 Query SET wait_timeout=86400 215 Query SELECT @@global.server_id As Value 215 Query SELECT VERSION() AS Value 215 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 215 Query SHOW MASTER STATUS 215 Query SELECT @@global.relay_log_info_repository AS Value 215 Query SELECT @@global.datadir AS Value 215 Query SELECT @@global.relay_log_info_file AS Value 215 Query SELECT @@global.datadir AS Value 215 Query SHOW SLAVE STATUS 215 Query SELECT @@global.read_only As Value 215 Query SELECT @@global.relay_log_purge As Value 215 Query SHOW SLAVE STATUS 215 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'slave' 140928 10:17:09 216 Connect mha@192.168.153.151 on 216 Query select @@version_comment limit 1 216 Query set sql_log_bin=0 216 Query create table if not exists mysql.apply_diff_relay_logs_test(id int) 216 Query insert into mysql.apply_diff_relay_logs_test values(1) 216 Query update mysql.apply_diff_relay_logs_test set id=id+1 where id=1 216 Query delete from mysql.apply_diff_relay_logs_test 216 Query drop table mysql.apply_diff_relay_logs_test 140928 10:17:10 216 Quit 140928 10:17:11 215 Quit 140928 10:17:50 217 Connect mha@192.168.153.150 on 217 Query set autocommit=1 217 Query SELECT CONNECTION_ID() AS Value 218 Connect mha@192.168.153.150 on 218 Query set autocommit=1 218 Query SELECT CONNECTION_ID() AS Value 218 Query SET wait_timeout=86400 218 Query SELECT @@global.server_id As Value 218 Query SELECT VERSION() AS Value 218 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 218 Query SHOW MASTER STATUS 218 Query SELECT @@global.relay_log_info_repository AS Value 218 Query SELECT @@global.datadir AS Value 218 Query SELECT @@global.relay_log_info_file AS Value 218 Query SELECT @@global.datadir AS Value 218 Query SHOW SLAVE STATUS 218 Query SELECT @@global.read_only As Value 218 Query SELECT @@global.relay_log_purge As Value 218 Query SHOW SLAVE STATUS 219 Connect mha@192.168.153.150 on 219 Query set autocommit=1 219 Query SELECT CONNECTION_ID() AS Value 220 Connect mha@192.168.153.150 on 220 Query set autocommit=1 220 Query SELECT CONNECTION_ID() AS Value 220 Query SET wait_timeout=86400 220 Query SELECT @@global.server_id As Value 220 Query SELECT VERSION() AS Value 220 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 220 Query SHOW MASTER STATUS 220 Query SELECT @@global.relay_log_info_repository AS Value 220 Query SELECT @@global.datadir AS Value 220 Query SELECT @@global.relay_log_info_file AS Value 220 Query SELECT @@global.datadir AS Value 220 Query SHOW SLAVE STATUS 220 Query SELECT @@global.read_only As Value 220 Query SELECT @@global.relay_log_purge As Value 220 Query SHOW SLAVE STATUS 220 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value 220 Query SHOW SLAVE STATUS 220 Query STOP SLAVE IO_THREAD 220 Query SHOW SLAVE STATUS 140928 10:17:51 220 Query SHOW SLAVE STATUS 220 Query SHOW SLAVE STATUS 220 Query STOP SLAVE SQL_THREAD 220 Query SHOW SLAVE STATUS 220 Query SHOW MASTER STATUS 220 Query SELECT @@global.read_only As Value 221 Connect slave@192.168.153.152 on 221 Query SELECT UNIX_TIMESTAMP() 221 Query SHOW VARIABLES LIKE 'SERVER_ID' 221 Query SET @master_heartbeat_period= 1799999979520 221 Query SET @master_binlog_checksum= @@global.binlog_checksum 221 Query SELECT @master_binlog_checksum 221 Query SELECT @@GLOBAL.GTID_MODE 221 Query SHOW VARIABLES LIKE 'SERVER_UUID' 221 Query SET @slave_uuid= 'dda2f003-4244-11e4-b851-000c29da163f' 221 Binlog Dump Log: 'mysql-bin.000016' Pos: 688 220 Query STOP SLAVE 220 Query SHOW SLAVE STATUS 220 Query RESET SLAVE /*!50516 ALL */ 220 Query SHOW SLAVE STATUS 220 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 220 Quit
DB3:
[root@localhost ~]# cat /var/lib/mysql/localhost.log 140928 10:17:04 223 Connect mha@192.168.153.150 on 223 Query set autocommit=1 223 Query SELECT CONNECTION_ID() AS Value 224 Connect mha@192.168.153.150 on 224 Query set autocommit=1 224 Query SELECT CONNECTION_ID() AS Value 224 Query SET wait_timeout=86400 224 Query SELECT @@global.server_id As Value 224 Query SELECT VERSION() AS Value 224 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 224 Query SHOW MASTER STATUS 224 Query SELECT @@global.relay_log_info_repository AS Value 224 Query SELECT @@global.datadir AS Value 224 Query SELECT @@global.relay_log_info_file AS Value 224 Query SELECT @@global.datadir AS Value 224 Query SHOW SLAVE STATUS 224 Query SELECT @@global.read_only As Value 224 Query SELECT @@global.relay_log_purge As Value 224 Query SHOW SLAVE STATUS 140928 10:17:10 225 Connect mha@192.168.153.152 on 225 Query select @@version_comment limit 1 225 Query set sql_log_bin=0 225 Query create table if not exists mysql.apply_diff_relay_logs_test(id int) 225 Query insert into mysql.apply_diff_relay_logs_test values(1) 225 Query update mysql.apply_diff_relay_logs_test set id=id+1 where id=1 225 Query delete from mysql.apply_diff_relay_logs_test 225 Query drop table mysql.apply_diff_relay_logs_test 140928 10:17:11 225 Quit 224 Quit #开始failover流程的监测 140928 10:17:50 226 Connect mha@192.168.153.150 on 226 Query set autocommit=1 226 Query SELECT CONNECTION_ID() AS Value 227 Connect mha@192.168.153.150 on 227 Query set autocommit=1 227 Query SELECT CONNECTION_ID() AS Value 227 Query SET wait_timeout=86400 227 Query SELECT @@global.server_id As Value 227 Query SELECT VERSION() AS Value 227 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 227 Query SHOW MASTER STATUS 227 Query SELECT @@global.relay_log_info_repository AS Value 227 Query SELECT @@global.datadir AS Value 227 Query SELECT @@global.relay_log_info_file AS Value 227 Query SELECT @@global.datadir AS Value 227 Query SHOW SLAVE STATUS 227 Query SELECT @@global.read_only As Value 227 Query SELECT @@global.relay_log_purge As Value 227 Query SHOW SLAVE STATUS #正式开始failover的监测 228 Connect mha@192.168.153.150 on 228 Query set autocommit=1 228 Query SELECT CONNECTION_ID() AS Value 229 Connect mha@192.168.153.150 on 229 Query set autocommit=1 229 Query SELECT CONNECTION_ID() AS Value 229 Query SET wait_timeout=86400 229 Query SELECT @@global.server_id As Value 229 Query SELECT VERSION() AS Value 229 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 229 Query SHOW MASTER STATUS 229 Query SELECT @@global.relay_log_info_repository AS Value 229 Query SELECT @@global.datadir AS Value 229 Query SELECT @@global.relay_log_info_file AS Value 229 Query SELECT @@global.datadir AS Value 229 Query SHOW SLAVE STATUS 229 Query SELECT @@global.read_only As Value 229 Query SELECT @@global.relay_log_purge As Value 229 Query SHOW SLAVE STATUS 229 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value 229 Query SHOW SLAVE STATUS 229 Query STOP SLAVE IO_THREAD 229 Query SHOW SLAVE STATUS 140928 10:17:51 229 Query SHOW SLAVE STATUS 229 Query SHOW SLAVE STATUS 229 Query SHOW SLAVE STATUS 229 Query SHOW SLAVE STATUS 229 Query STOP SLAVE SQL_THREAD 229 Query SHOW SLAVE STATUS 229 Query STOP SLAVE 229 Query SHOW SLAVE STATUS 229 Query RESET SLAVE 229 Query CHANGE MASTER TO MASTER_HOST = '192.168.153.151' MASTER_USER = 'slave' MASTER_PASSWORD =MASTER_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000016' MASTER_LOG_POS = 688 229 Query START SLAVE 230 Connect Out slave@192.168.153.151:3306 229 Query SHOW SLAVE STATUS 229 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 229 Quit 140928 10:18:07 236 Connect root@localhost on 236 Query select @@version_comment limit 1 236 Query select USER() 140928 10:18:10 236 Quit
MHA Manage.log:
[root@localhost ~]# cat /var/log/masterha/app1/manager.log #启动前的准备工作 #检查服务器状态,获取相关参数设置 Sun Sep 28 10:17:03 2014 - [info] MHA::MasterMonitor version 0.55. Sun Sep 28 10:17:04 2014 - [info] Dead Servers: Sun Sep 28 10:17:04 2014 - [info] Alive Servers: Sun Sep 28 10:17:04 2014 - [info] 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info] 192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:04 2014 - [info] 192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:04 2014 - [info] Alive Slaves: Sun Sep 28 10:17:04 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:04 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:04 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:04 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info] Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:04 2014 - [info] Current Alive Master: 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:04 2014 - [info] Checking slave configurations.. Sun Sep 28 10:17:04 2014 - [info] read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:04 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:04 2014 - [info] read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:04 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:04 2014 - [info] Checking replication filtering settings.. Sun Sep 28 10:17:04 2014 - [info] binlog_do_db= , binlog_ignore_db= Sun Sep 28 10:17:04 2014 - [info] Replication filtering check ok. #测试ssh连接是否成功 Sun Sep 28 10:17:04 2014 - [info] Starting SSH connection tests.. Sun Sep 28 10:17:07 2014 - [info] All SSH connection tests passed successfully. #测试MHA node是否可用 Sun Sep 28 10:17:07 2014 - [info] Checking MHA Node version.. Sun Sep 28 10:17:08 2014 - [info] Version check ok. Sun Sep 28 10:17:08 2014 - [info] Checking SSH publickey authentication settings on the current master.. Sun Sep 28 10:17:08 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable. Sun Sep 28 10:17:09 2014 - [info] Master MHA Node version is 0.54. Sun Sep 28 10:17:09 2014 - [info] Checking recovery script configurations on the current master.. Sun Sep 28 10:17:09 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000026 #创建MHA日志目录 Sun Sep 28 10:17:09 2014 - [info] Connecting to root@192.168.153.150(192.168.153.150).. Creating /var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to mysql-bin.000026 Sun Sep 28 10:17:09 2014 - [info] Master setting check done. #master的检查到此完毕 #开始检查slave的差异日志应用权限 Sun Sep 28 10:17:09 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun Sep 28 10:17:09 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.151 --slave_ip=192.168.153.151 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sun Sep 28 10:17:09 2014 - [info] Connecting to root@192.168.153.151(192.168.153.151:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysql-relay.000003 Temporary relay log file is /var/lib/mysql/mysql-relay.000003 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Sep 28 10:17:10 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.152 --slave_ip=192.168.153.152 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sun Sep 28 10:17:10 2014 - [info] Connecting to root@192.168.153.152(192.168.153.152:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysql-relay.000003 Temporary relay log file is /var/lib/mysql/mysql-relay.000003 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Sep 28 10:17:11 2014 - [info] Slaves settings check done. Sun Sep 28 10:17:11 2014 - [info] 192.168.153.150 (current master) +--192.168.153.151 +--192.168.153.152 #调试master_ip_failover_script Sun Sep 28 10:17:11 2014 - [warning] master_ip_failover_script is not defined. #调试shutdown_script Sun Sep 28 10:17:11 2014 - [info] Checking shutdown script status: Sun Sep 28 10:17:11 2014 - [info] /opt/master_ip_failover.sh --command=status --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150 Sun Sep 28 10:17:11 2014 - [info] OK. Sun Sep 28 10:17:11 2014 - [info] Set master ping interval 3 seconds. #设置二次检查的主机 Sun Sep 28 10:17:11 2014 - [info] Set secondary check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 Sun Sep 28 10:17:11 2014 - [info] Starting ping health check on 192.168.153.150(192.168.153.150:3306).. #MHA启动完毕,进入监测状态 Sun Sep 28 10:17:11 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. #监测DB1服务器挂了 Sun Sep 28 10:17:41 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) #通过定义的二次监测,确认master是否挂了 Sun Sep 28 10:17:41 2014 - [info] Executing seconary network check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 --user=root --master_host=192.168.153.150 --master_ip=192.168.153.150 --master_port=3306 Sun Sep 28 10:17:41 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin Monitoring server 192.168.153.151 is reachable, Master is not reachable from 192.168.153.151. OK. Sun Sep 28 10:17:42 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable. Monitoring server 192.168.153.152 is reachable, Master is not reachable from 192.168.153.152. OK. #确认master挂了,开始进入failover流程 #再试尝试连接master和master的ssh Sun Sep 28 10:17:42 2014 - [info] Master is not reachable from all other monitoring servers. Failover should start. Sun Sep 28 10:17:44 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:44 2014 - [warning] Connection failed 1 time(s).. Sun Sep 28 10:17:47 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:47 2014 - [warning] Connection failed 2 time(s).. Sun Sep 28 10:17:50 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Sep 28 10:17:50 2014 - [warning] Connection failed 3 time(s).. Sun Sep 28 10:17:50 2014 - [warning] Master is not reachable from health checker! Sun Sep 28 10:17:50 2014 - [warning] Master 192.168.153.150(192.168.153.150:3306) is not reachable! Sun Sep 28 10:17:50 2014 - [warning] SSH is reachable. #通过配置文件,监测其他slave的状态 Sun Sep 28 10:17:50 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status.. Sun Sep 28 10:17:50 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Sep 28 10:17:50 2014 - [info] Reading application default configurations from /etc/app1.cnf.. Sun Sep 28 10:17:50 2014 - [info] Reading server configurations from /etc/app1.cnf.. Sun Sep 28 10:17:50 2014 - [info] Dead Servers: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Servers: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Slaves: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Not candidate for the new Master (no_master is set) #再次监测slave的配置是否有变化,是否符合failover条件 Sun Sep 28 10:17:50 2014 - [info] Checking slave configurations.. Sun Sep 28 10:17:50 2014 - [info] read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:50 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306). Sun Sep 28 10:17:50 2014 - [info] read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:50 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306). Sun Sep 28 10:17:50 2014 - [info] Checking replication filtering settings.. Sun Sep 28 10:17:50 2014 - [info] Replication filtering check ok. Sun Sep 28 10:17:50 2014 - [info] Master is down! Sun Sep 28 10:17:50 2014 - [info] Terminating monitoring script. Sun Sep 28 10:17:50 2014 - [info] Got exit code 20 (Master dead). Sun Sep 28 10:17:50 2014 - [info] MHA::MasterFailover version 0.55. #正式开始failover Sun Sep 28 10:17:50 2014 - [info] Starting master failover. Sun Sep 28 10:17:50 2014 - [info] #再次对slave配置做检查 Sun Sep 28 10:17:50 2014 - [info] * Phase 1: Configuration Check Phase.. Sun Sep 28 10:17:50 2014 - [info] Sun Sep 28 10:17:50 2014 - [info] Dead Servers: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Checking master reachability via mysql(double check).. Sun Sep 28 10:17:50 2014 - [info] ok. Sun Sep 28 10:17:50 2014 - [info] Alive Servers: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Sun Sep 28 10:17:50 2014 - [info] Alive Slaves: Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:50 2014 - [info] Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:50 2014 - [info] ** Phase 1: Configuration Check Phase completed. Sun Sep 28 10:17:50 2014 - [info] #对原Master做master_ip_failover_script和shutdown_script的操作 Sun Sep 28 10:17:50 2014 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Sep 28 10:17:50 2014 - [info] Sun Sep 28 10:17:50 2014 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Sep 28 10:17:50 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address. Sun Sep 28 10:17:50 2014 - [info] Executing SHUTDOWN script: Sun Sep 28 10:17:50 2014 - [info] /opt/master_ip_failover.sh --command=stopssh --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150 --port=3306 Sun Sep 28 10:17:51 2014 - [info] Power off done. Sun Sep 28 10:17:51 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun Sep 28 10:17:51 2014 - [info] #开始差异日志的恢复:获取slave最后得到的binlog位置 Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] The latest binary log file/position on all slaves is mysql-bin.000026:120 Sun Sep 28 10:17:51 2014 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000026:120 Sun Sep 28 10:17:51 2014 - [info] Oldest slaves: Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info] #获取原master的binlog日志 Sun Sep 28 10:17:51 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Sun Sep 28 10:17:51 2014 - [info] #确定新的master Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: Determining New Master Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun Sep 28 10:17:51 2014 - [info] All slaves received relay logs to the same position. No need to resync each other. Sun Sep 28 10:17:51 2014 - [info] Searching new master from slaves.. Sun Sep 28 10:17:51 2014 - [info] Candidate masters from the configuration file: Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sun Sep 28 10:17:51 2014 - [info] Non-candidate masters: Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306) Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set) Sun Sep 28 10:17:51 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sun Sep 28 10:17:51 2014 - [info] New master is 192.168.153.151(192.168.153.151:3306) Sun Sep 28 10:17:51 2014 - [info] Starting master failover.. Sun Sep 28 10:17:51 2014 - [info] From: 192.168.153.150 (current master) +--192.168.153.151 +--192.168.153.152 To: 192.168.153.151 (new master) +--192.168.153.152 Sun Sep 28 10:17:51 2014 - [info] #在new master上应用差异的binlog日志 Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 3.4: Master Log Apply Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.151(192.168.153.151:3306).. Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. Waiting all logs to be applied.. Sun Sep 28 10:17:51 2014 - [info] done. Sun Sep 28 10:17:51 2014 - [info] All relay logs were successfully applied. #获取new master的binlog位置。 Sun Sep 28 10:17:51 2014 - [info] Getting new master's binlog name and position.. Sun Sep 28 10:17:51 2014 - [info] mysql-bin.000016:688 Sun Sep 28 10:17:51 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.153.151', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=688, MASTER_USER='slave', MASTER_PASSWORD='xxx'; #如果有master_ip_failover_script,那么给new master设置VIP Sun Sep 28 10:17:51 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address. Sun Sep 28 10:17:51 2014 - [info] ** Finished master recovery successfully. Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase completed. Sun Sep 28 10:17:51 2014 - [info] #开始恢复其他slave,也是从原master的binlog对比来做恢复 Sun Sep 28 10:17:51 2014 - [info] * Phase 4: Slaves Recovery Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] -- Slave diff file generation on host 192.168.153.152(192.168.153.152:3306) started, pid: 8387. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ... Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152. Sun Sep 28 10:17:51 2014 - [info] -- 192.168.153.152(192.168.153.152:3306) has the latest relay log events. Sun Sep 28 10:17:51 2014 - [info] Generating relay diff files from the latest slave succeeded. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) started, pid: 8389. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ... Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.152(192.168.153.152:3306).. Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. Waiting all logs to be applied.. Sun Sep 28 10:17:51 2014 - [info] done. Sun Sep 28 10:17:51 2014 - [info] All relay logs were successfully applied. #差异日志应用完成以后,切换所有slave到new master。 Sun Sep 28 10:17:51 2014 - [info] Resetting slave 192.168.153.152(192.168.153.152:3306) and starting replication from the new master 192.168.153.151(192.168.153.151:3306).. Sun Sep 28 10:17:51 2014 - [info] Executed CHANGE MASTER. Sun Sep 28 10:17:51 2014 - [info] Slave started. Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152. Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) succeeded. Sun Sep 28 10:17:51 2014 - [info] All new slave servers recovered successfully. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] * Phase 5: New master cleanup phase.. Sun Sep 28 10:17:51 2014 - [info] Sun Sep 28 10:17:51 2014 - [info] Resetting slave info on the new master.. Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151: Resetting slave info succeeded. Sun Sep 28 10:17:51 2014 - [info] Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully. Sun Sep 28 10:17:51 2014 - [info] #failover操作完成,生成failover报告 ----- Failover Report ----- app1: MySQL Master failover 192.168.153.150 to 192.168.153.151 succeeded Master 192.168.153.150 is down! Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Power off 192.168.153.150. The latest slave 192.168.153.151(192.168.153.151:3306) has all relay logs for recovery. Selected 192.168.153.151 as a new master. 192.168.153.151: OK: Applying all logs succeeded. 192.168.153.152: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.153.152: OK: Applying all logs succeeded. Slave started, replicating from 192.168.153.151. 192.168.153.151: Resetting slave info succeeded. Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully.