导读
遇到SLAVE延迟很大,binlog apply position一直不动的情况如何排查?
问题描述
收到SLAVE延迟时间一直很大的报警,于是检查一下SLAVE状态(无关状态我给隐去了):
Slave_IO_State: Waiting for master to send event
Master_Log_File: mysql-bin.000605
Read_Master_Log_Pos: 1194
Relay_Log_File: mysql-relay-bin.003224
Relay_Log_Pos: 295105
Relay_Master_Log_File: mysql-bin.000604
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Exec_Master_Log_Pos: 294959
Relay_Log_Space: 4139172581
Seconds_Behind_Master: 10905
可以看到,延迟确实很大,而且从多次show slave status的结果来看,发现binlog的position一直不动。
Read_Master_Log_Pos: 1194
Relay_Log_File: mysql-relay-bin.003224
Relay_Log_Pos: 295105
Relay_Master_Log_File: mysql-bin.000604
Exec_Master_Log_Pos: 294959
Relay_Log_Space: 4139172581
从processlist的中也看不出来有什么不对劲的SQL在跑:
******************** 1. row ******************
Id: 16273070
User: system user
Host:
db: NULL
Command: Connect
Time: 4828912
State: Waiting for master to send event
Info: NULL
********************* 2. row *****************
Id: 16273071
User: system user
Host:
db: NULL
Command: Connect
Time: 9798
State: Reading event from the relay log
Info: NULL
在master上查看相应binlog,确认都在干神马事:
[[email protected]]# mysqlbinlog -vvv --base64-output=decode-rows -j 294959 mysql-bin.000604 | more
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
**# at 294959**
#160204 6:16:30 server id 1 end_log_pos 295029 **Query thread_id=461151** **exec_time=2144** error_code=0
SET TIMESTAMP=1454537790/*!*/;
SET @@session.pseudo_thread_id=461151/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 295029
# at 295085
# at 296040
# at 297047
# at 298056
# at 299068
# at 300104
上面这段内容的几个关键信息:
# at 294959 binlog起点 thread_id=461151 master上执行的线程ID exec_time=2144 该事务执行总耗时
再往下看都是一堆的binlog position信息,通过这种方式可读性不强,我们换一种姿势看看:
[[email protected] (test)]
|