﻿<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title><![CDATA[蚂蚁网-多维人生，三实而立！]]></title> 
<description><![CDATA[真实-不弄虚，不做假，做自己，不违心；
踏实-不浮躁，不盲从，不急功，不近利；
实学-不投机，不取巧，勤于学，精于业。]]></description>
<link>http://www.vants.org/</link>
<language>zh-cn</language>
<generator>www.emlog.net</generator>
<item>
	<title>【转】iptables的nf_conntrack相关参数引起两个问题</title>
	<link>http://www.vants.org/?post=266</link>
	<description><![CDATA[<p><span style="font-size:16px;">【<strong>说在之前</strong></span><span style="font-size:16px;">】：</span></p>
<p><span style="font-size:16px;"><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;近段时间在处理一个故障时，网上查找相关资料时看到的一篇文章，作者处理故障时的态度和思路堪称技术牛典范，特转发过来分享给大家。</span></span></p>
<p><span style="font-size:16px;">【<strong>原文作者</strong></span><span style="font-size:16px;">】：<span style="font-family:微软雅黑;color:#666666;">phanx</span></span></p>
<p><span style="font-size:16px;">【<strong>原文链接</strong></span><span style="font-size:16px;">】：</span><a href="http://blog.chinaunix.net/uid-7549563-id-4912055.html"><span style="font-size:16px;">http://blog.chinaunix.net/uid-7549563-id-4912055.html</span></a></p>
<p><span style="font-size:16px;">【<strong>原为全文</strong></span><span style="font-size:16px;">】：</span></p>
<p><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">========phanx.com========</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 宋体, Arial;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">Author: &nbsp; phanx</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 宋体, Arial;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">Updated: 2015-3-23</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 宋体, Arial;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">转载请保留作者信息</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 宋体, Arial;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 宋体, Arial;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;"><span style="word-wrap:break-word;font-family:'Microsoft YaHei';">=========================</span></span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">某关键业务系统上频繁出现业务失败，并发生了一次大规模业务中断。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">该系统采用两台IBM Power 740运行AIX 6.1＋Oracle 11gR2 RAC作为数据库服务器，两台DELL PowerEdge R720作为应用服务器，前端采用F5作为负载均衡设备。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">数据库服务器和应用服务器之间有Cisco ASA5585硬件防火墙进行访问控制。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">应用服务器上运行自行开发的C程序，通过Oracle Client提供的接口，以长连接的方式访问RAC数据库。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">故障时，先检查了数据库服务器的系统负载，发现相对于正常时期，CPU负载偏高，IO负载明显升高，IOPS达到13000左右。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">正常时的负载</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_1427080650tjWs.png" width="555" height="237" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">异常时的负载</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_14270807344p2D.png" width="555" height="226" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">检查数据库相关日志，发现有大量的TNS错误：</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<div id="codeText" class="codeText" style="overflow:auto;word-wrap:break-word;border-top:#dddddd 1px solid;border-right:#dddddd 1px solid;width:1041px;background:#ffffff;white-space:normal;word-spacing:0px;border-bottom:#dddddd 1px solid;text-transform:none;word-break:break-all;color:#666666;padding-bottom:0px;text-align:left;padding-top:0px;font:12px Consolas, monospace;padding-left:0px;margin:0px 0px 1.1em;border-left:#dddddd 1px solid;widows:1;letter-spacing:0px;padding-right:0px;text-indent:0px;-webkit-text-stroke-width:0px;font-stretch:normal;">
<ol class="dp-css none_number" style="word-wrap:break-word;background:#ffffff;color:#5c5c5c;padding-bottom:5px;padding-top:5px;padding-left:0pt;margin:0px 1px 0px 0px;list-style:none none outside;line-height:1.3;padding-right:0pt;">
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Fatal NI connect error 12170.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">VERSION INFORMATION:&nbsp; VERSION INFORMATION:<br style="word-wrap:break-word;" />
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
&nbsp; VERSION INFORMATION:<br style="word-wrap:break-word;" />
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
&nbsp; VERSION INFORMATION:<br style="word-wrap:break-word;" />
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
&nbsp; VERSION INFORMATION:<br style="word-wrap:break-word;" />
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Mon Feb 23 13:22:16 2015<br style="word-wrap:break-word;" />
&nbsp; Time: 23-Feb-2015 13:22:16<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
*********************************************************************** &nbsp;Time: 23-Feb-2015 13:22:16<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
&nbsp; Time: 23-Feb-2015 13:22:16<br style="word-wrap:break-word;" />
&nbsp; Time: 23-Feb-2015 13:22:16<br style="word-wrap:break-word;" />
&nbsp; Tracing not turned on.<br style="word-wrap:break-word;" />
&nbsp; Tracing not turned on.<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
Fatal NI connect error 12170.<br style="word-wrap:break-word;" />
&nbsp; Tracing not turned on.<br style="word-wrap:break-word;" />
&nbsp; Tracing not turned on.<br style="word-wrap:break-word;" />
&nbsp; Tns error struct:<br style="word-wrap:break-word;" />
&nbsp; Tns error struct:<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
&nbsp; VERSION INFORMATION:<br style="word-wrap:break-word;" />
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production<br style="word-wrap:break-word;" />
&nbsp; Tns error struct:<br style="word-wrap:break-word;" />
&nbsp; Tns error struct:<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns main err code: 12535<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns main err code: 12535<br style="word-wrap:break-word;" />
&nbsp; Time: 23-Feb-2015 13:22:16<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns main err code: 12535<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns main err code: 12535<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; Tracing not turned on.<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; Tns error struct:<br style="word-wrap:break-word;" />
TNS-12535: TNS:operation timed out<br style="word-wrap:break-word;" />
TNS-12535: TNS:operation timed out<br style="word-wrap:break-word;" />
TNS-12535: TNS:operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns secondary err code: 12560<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns main err code: 12535<br style="word-wrap:break-word;" />
TNS-12535: TNS:operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns secondary err code: 12560<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns secondary err code: 12560<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt main err code: 505<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns secondary err code: 12560<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt main err code: 505<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt main err code: 505<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt main err code: 505<br style="word-wrap:break-word;" />
TNS-12535: TNS:operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; TNS-00505: Operation timed out<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
TNS-00505: Operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ns secondary err code: 12560<br style="word-wrap:break-word;" />
TNS-00505: Operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt secondary err code: 78<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt secondary err code: 78<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt main err code: 505<br style="word-wrap:break-word;" />
TNS-00505: Operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt secondary err code: 78<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt OS err code: 0<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt OS err code: 0<br style="word-wrap:break-word;" />
&nbsp; &nbsp;&nbsp;<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt secondary err code: 78<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt OS err code: 0<br style="word-wrap:break-word;" />
&nbsp; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.32.70)(PORT=37975))<br style="word-wrap:break-word;" />
TNS-00505: Operation timed out<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt OS err code: 0<br style="word-wrap:break-word;" />
&nbsp; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.32.70)(PORT=25972))<br style="word-wrap:break-word;" />
&nbsp; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.32.70)(PORT=9108))<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt secondary err code: 78<br style="word-wrap:break-word;" />
&nbsp; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.32.70)(PORT=52073))<br style="word-wrap:break-word;" />
&nbsp; &nbsp; nt OS err code: 0<br style="word-wrap:break-word;" />
&nbsp; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.32.70)(PORT=49148))<br style="word-wrap:break-word;" />
</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"><span style="word-wrap:break-word;color:#5c5c5c;letter-spacing:0px;line-height:1.3;">Mon Feb 23 13:22:16 2015</span></li>
</ol>
</div>
<p><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">再检查应用服务器端，发现应用服务进程大量处于Busy状态，无法处理应用数据。再检查应用服务器到数据库的连接情况，发现数据库上报告timeout的连接，在应用服务器上仍然处于ESTABLISHED状态。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<div id="codeText" class="codeText" style="overflow:auto;word-wrap:break-word;border-top:#dddddd 1px solid;border-right:#dddddd 1px solid;width:1041px;background:#ffffff;white-space:normal;word-spacing:0px;border-bottom:#dddddd 1px solid;text-transform:none;word-break:break-all;color:#666666;padding-bottom:0px;text-align:left;padding-top:0px;font:12px Consolas, monospace;padding-left:0px;margin:0px 0px 1.1em;border-left:#dddddd 1px solid;widows:1;letter-spacing:0px;padding-right:0px;text-indent:0px;-webkit-text-stroke-width:0px;font-stretch:normal;">
<ol class="dp-css none_number" style="word-wrap:break-word;background:#ffffff;color:#5c5c5c;padding-bottom:5px;padding-top:5px;padding-left:0pt;margin:0px 1px 0px 0px;list-style:none none outside;line-height:1.3;padding-right:0pt;">
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[sysadmin@appsrv1 ~]$ netstat -an|grep 10.1.197.15</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 10.1.32.70:37975 10.1.197.15:1521 ESTABLISHED</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 10.1.32.70:25972 10.1.197.15:1521 ESTABLISHED</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 10.1.32.70:9108 10.1.197.15:1521 ESTABLISHED</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 10.1.32.70:52073 10.1.197.15:1521 ESTABLISHED</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 10.1.32.70:49148 10.1.197.15:1521 ESTABLISHED</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
</ol>
</div>
<p><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这时候，怀疑是不是ASA阻断了数据库和应用之间的连接。检查ASA配置后发现超时时间设置的是8个小时，这个业务在低谷期也不会出现8小时空闲，并且应用程序会在空闲的时候定期探测数据库长连接是否可用。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">因此，觉得</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">不太可能是常见</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">的</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">空闲</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">超时导致的连接中断。 &nbsp;继续进行</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">分析，发现数据库里面有较多</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">的direct path read 等待事件。观察应用SQL的执行计划，发现有大量的全表扫描，并且某些SQL的执行时间较长，<br style="word-wrap:break-word;" />
超过了60秒</span><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">。 很显然这是常见的11g Direct Path Read的副作用，要么让应用开发组去优化SQL，要么关掉11g的针对串行的直接路径读。这样就会缓解系统IO繁忙的情况。这样SQL的执行时间也会降低，如果在</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">合理的范围内，就不会引发这个故障了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">但是，仅仅是这个原因，应该不会引起TNS Time out的情况，性能不好，SQL执行时间过长，只是让这个问题浮现了出来，并不是这个故障的根本原因。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<strong style="word-wrap:break-word;font-size:14px;font-family:'Microsoft YaHei';font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;font-style:normal;text-align:left;widows:1;letter-spacing:normal;line-height:26px;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">所以还得继续分析是什么导致应用服务器和数据库服务器之间的已建立连接被单向断掉。</strong><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">应用组把挂死的服务器进程kill掉后，重启了服务进程，业务暂时是恢复了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这时候，让应用组找到连接中断时执行的相应的SQL和连接端口，再找到网络组的兄弟帮忙，从Riverbed Cascade上提取了RAC一个节点和其中一个APP的对应端口的通信流，用wireshark打开进行分析。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">我们从ARW和ASH报告中发现引起中断的情况中，SQL执行时间都较长，基本达到了5分钟左右。 然后针对这些执行较长的SQL的连接数据流分析，应用服务器在提交SQL执行请求后等待数据库服务器回复。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">数据库服务器在执行完以后返回数据给应用服务器时，应用服务器就一直无法接收到数据了，然后数据库服务器就一直重传，直到超时，然后报错TNS Time out。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_1427086929gYG0.png" width="700" height="313" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">从这个TCP流上可以清楚看到，appsrv1在12:20.040144的时候提交了SQL执行请求，紧接着收到了racdb1的ACK报文，说明racdb1成功接收了这个请求，并且开始执行。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">在15:55.30881的时候，racdb1执行完这个SQL后开始向appsrv1返回结果，SQL执行了215秒左右。这时候，appsrv1没有任何回应。直到最后超时，racdb1发出重置连接的RST报文。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这个情况，总是感觉不对，为什么appsrv1莫名其妙的不响应了呢？ appsrv1并没有宕机，网络连接也正常的，百思不得其解。 最后想实在没有办法的话，只能到appsrv1上去抓包看看。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">由于appsrv1比较繁忙，在无法确定故障发生的情况下持续抓包的记录数据肯定会相当庞大，并且肯定会对应用服务器造成较大的压力，并且存储空间也是个问题。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这时候，应用组的人报告说偶尔会有一两个服务进程出现挂死的情况。 于是决定去碰碰运气，设好capture条件，只抓取与racdb1的通信，与其余关联应用服务器的包全部过滤掉，抓了5分钟，就已经有20个G的数据了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这么大的数据，我的4G内存i3小破本子开起来都是个问题，于是找了一台强力的测试服务器，传上去看看。 &nbsp; 翻着翻着发现了一些TCP重传，看起来像故障的现象了，但是发现appsrv1对于重传却返回了</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">ICMP&nbsp;&nbsp;Host Administratively Prohibited，并不是完全没有反应。 于是再找网络组按照时间段提取数据，发现故障时是一样的，在racdb1重传的时候，appsrv1每个重传都回应了</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">ICMP&nbsp;</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">&nbsp;Host Administratively Prohibited。<br style="word-wrap:break-word;" />
<br style="word-wrap:break-word;" />
原来，网络组的哥们从Riverbed Cascade里面提取数据流的时候是设定了只提取TCP相关端口的报文，ICMP报文就被漏掉了，没有提取出来。于是，重新提取故障时候的TCP流和相关数据包。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_1427088446PDQa.png" width="700" height="405" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这个时候就可以看到完整的信息了。确实每个重传都有回应的。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">TCP流是建立起来的，iptables里面也应该也有正常的流状态信息，为什么会被appsrv1拒绝呢？</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">继续对appsrv1进行检查，发现 /etc/sysctl.conf里面配置了这么一句</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">net.netfilter.nf_conntrack_tcp_timeout_established = 300</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">就是它让iptables对于已建立的连接，300秒若没有活动，那么则清除掉，默认的时间是432000秒(5天)。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">问是谁设置的这个参数，答复是应用组在上线前做性能测试时，按照应用供应商的建议修改的。为啥要改，为啥改成这个值也不清楚&nbsp;</span><img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" border="0" alt="" src="http://blog.chinaunix.net/kindeditor/plugins/emoticons/images/40.gif" /><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">好吧，应该就是它了，改回默认的值 432000 应该就能解决了。同时让应用组的优化优化SQL，就可以写故障报告了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">故事就到这里就完了？ &nbsp; &nbsp;当然不是。 在调整了这个值后，一开始还风平浪静。过了一段时间，应用组又来报告说，又出现了很少的</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">业务超时，而且有越来越频繁的趋势，从一天一两次到一条一二十次。真是不省心啊。。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">继续了解情况，这次仿佛跟数据库服务器没啥关系了，是这个应用服务器到服务器总线应用之间的连接出现了问题。服务总线应用服务器向该应用服务器发起连接的时候，偶尔会被拒绝。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">考虑到应用服务器之前有F5来作为负载均衡，先检查了F5上应用服务状态有没有异常，结果良好，F5上对应用的健康探测没有异常。 好吧，还是直接看数据流，为啥会拒绝应用连接。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">服务总线应用服务器和该应用服务器之间通信是短连接，有业务调用的时候，由服务总线方发起连接。应用组找到了被拒绝的连接，通过debug日志找到相关端口信息，继续让网络组</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">提取相关连接数据包。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_1427092058Q30a.png" width="700" height="369" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这里可以看到，在svcbus2向appsrv1发起请求后，同样有应答，但是一分钟后，svcbus2关闭了连接。再过了3分钟appsrv1处理完请求后返回数据给svcbus2的时候就被svcbus2给拒绝掉了，然后同样也是不停重传，最后超时。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">应用组说svcbus2上，应用有一个60秒超时的机制，当对端在60秒内没有返回结果就直接关闭这个请求，断掉连接。<span class="Apple-converted-space">&nbsp;</span></span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">从上面的报文也可以看到，svcbus2发起了FIN报文，但是由于appsrv1没有处理完，所以这个连接关闭是单向的，直到appsrv1处理完成，重传数据超时，连接完全关闭。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这和svcbus2新建连接到appsrv1被拒绝有啥关系呢？我一时也没有想明白。 &nbsp;appsrv1上的iptables对于服务端口应该是一直开放的，不应该拒绝新建的连接啊，除非这个新建的连接有什么特殊的地方。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">通过红帽的客户网站，找到了一些相关信息。 &nbsp;</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">https://access.redhat.com/solutions/73353&nbsp; &nbsp;</span><strong style="word-wrap:break-word;font-size:14px;font-family:'Microsoft YaHei';font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;font-style:normal;text-align:left;widows:1;letter-spacing:normal;line-height:26px;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">iptables randomly drops new connection requests</strong><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<ul style="list-style-type:none;box-sizing:border-box;word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;padding-bottom:0px;text-align:left;padding-top:0px;font:14px/21px Overpass, 'Open Sans', Helvetica, sans-serif;padding-left:0px;margin:0px 0px 1px;widows:1;letter-spacing:normal;padding-right:0px;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">if /proc/net/ipv4/netfilter/ip_conntrack_tcp_loose is set to 1, iptables creates a new connection-tracking entry after receiving any packet, not just packets with the SYN flag set</li>
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">the ip_conntrack_tcp_loose setting is useful in firewalls for keeping already established connections unaffected if the firewall restarts, but this causes issues in the following scenario: <ul style="list-style-type:none;box-sizing:border-box;word-wrap:break-word;padding-bottom:0px;padding-top:0px;padding-left:1px;margin:0px;padding-right:0px;">
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">a client initiates a new connection with the same source port it used in a previous connection</li>
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">the server already has an iptables connection-tracking entry for the same IP address and port, with states ESTABLISHED and UNREPLIED</li>
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">the default rule gets hit, instead of the desired one</li>
<li style="list-style-type:disc;box-sizing:border-box;word-wrap:break-word;list-style-position:outside;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;padding-right:0px;">the packet is dropped, and the server returns icmp-host-prohibited</li>
</ul>
</li>
</ul>
<p><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">https://access.redhat.com/solutions/73273 &nbsp;&nbsp;</span><strong style="word-wrap:break-word;font-size:14px;font-family:'Microsoft YaHei';font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;font-style:normal;text-align:left;widows:1;letter-spacing:normal;line-height:26px;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">Why iptables sporadically drops initial connections requests?</strong><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;text-align:left;font:14px/21px Overpass, 'Open Sans', Helvetica, sans-serif;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">This behavior is caused by enabling&nbsp;</span><strong style="box-sizing:border-box;word-wrap:break-word;font-size:14px;font-family:Overpass, 'Open Sans', Helvetica, sans-serif;font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;font-style:normal;text-align:left;widows:1;letter-spacing:normal;line-height:21px;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">ip_conntrack_tcp_loose</strong><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;text-align:left;font:14px/21px Overpass, 'Open Sans', Helvetica, sans-serif;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">&nbsp;sysctl parameter which enables iptables to create a conntrack entry whenever it sees any packet from any direction and not just SYN packet.<span class="Apple-converted-space">&nbsp;</span><br style="word-wrap:break-word;" />
This is called connection picking and is usually used by the firewalls to ensure that the ongoing end to end sessions remain active if for some reason the firewall gets restarted. But this feature is not needed in standalone servers.</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这里面提到了iptables在规则为allow的情况下也会对某些数据包drop的情况。大意就是在默认情况下(</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;text-align:left;font:14px/21px Overpass, 'Open Sans', Helvetica, sans-serif;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">/proc/net/ipv4/netfilter/ip_conntrack_tcp_loose 为1</span><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">时)，iptables会对即时是不完整的TCP连接也会记录其状态，这样避免</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">iptables重启的时候影响已有的连接，但是会影响一些特殊情况。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">我们的应用正好就出现了这种特殊情况。当svcbus2因为等待appsrv1应答超时的时候，关闭了连接。而appsrv1上的nf_conntrack表中，当收到svcbus2发送的FIN包是，对于这个连接的记录会变成CLOSE_WAIT状态，然后在60秒后，</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">条目被清除。 但是，当appsrv1开始回应数据的时候，nf_conntrack表中又出现了这个连接的条目，并且状态是</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#333333;text-align:left;font:14px/21px Overpass, 'Open Sans', Helvetica, sans-serif;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">ESTABLISHED [UNREPLIED</span><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">]。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<div id="codeText" class="codeText" style="overflow:auto;word-wrap:break-word;border-top:#dddddd 1px solid;border-right:#dddddd 1px solid;width:1041px;background:#ffffff;white-space:normal;word-spacing:0px;border-bottom:#dddddd 1px solid;text-transform:none;word-break:break-all;color:#666666;padding-bottom:0px;text-align:left;padding-top:0px;font:12px Consolas, monospace;padding-left:0px;margin:0px 0px 1.1em;border-left:#dddddd 1px solid;widows:1;letter-spacing:0px;padding-right:0px;text-indent:0px;-webkit-text-stroke-width:0px;font-stretch:normal;">
<ol class="dp-css none_number" style="word-wrap:break-word;background:#ffffff;color:#5c5c5c;padding-bottom:5px;padding-top:5px;padding-left:0pt;margin:0px 1px 0px 0px;list-style:none none outside;line-height:1.3;padding-right:0pt;">
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@appsrv1 ~]# grep 51522 /proc/net/nf_conntrack<br style="word-wrap:break-word;" />
&nbsp; &nbsp; ipv4 &nbsp; &nbsp; 2 tcp &nbsp; &nbsp; &nbsp;6 35 CLOSE_WAIT<span class="Apple-converted-space">&nbsp;</span><span style="word-wrap:break-word;font-size:12px;font-family:Consolas, monospace;white-space:normal;color:#5c5c5c;letter-spacing:0px;line-height:15px;background-color:#ffffff;">src=10.1.32.70 dst=10.1.41.192 sport=9102 dport=51522 src=10.1.41.192 dst=10.1.32.70 sport=51522 dport=9102</span><span class="Apple-converted-space">&nbsp;</span>[ASSURED] mark=0 secmark=0 use=2<br style="word-wrap:break-word;" />
.<br style="word-wrap:break-word;" />
</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">wait more seconds</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@appsrv1 ~]# grep 51522&nbsp;/proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">ipv4 2 tcp 6<span class="Apple-converted-space">&nbsp;</span><span style="word-wrap:break-word;color:#e53333;background-color:#ffe500;">431965</span><span class="Apple-converted-space">&nbsp;</span>ESTABLISHED src=10.1.32.70 dst=10.1.41.192 sport=9102 dport=51522 [UNREPLIED] src=10.1.41.192 dst=10.1.32.70 sport=51522 dport=9102 mark=0 secmark=0 use=2</li>
</ol>
</div>
<p><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">这个条目，由于默认的</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">net.netfilter.nf_conntrack_tcp_timeout_established =</span><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">&nbsp;</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">432000 的影响，会一直保持5天，直到红色那个值变为0才会被清除。 这就导致了当svcbus2再以相同的源端口51522向appsrv1发起连接的时候，appsrv1的iptables<br style="word-wrap:break-word;" />
会拒绝掉这个请求。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">那是不是设置net.ipv4.netfilter.ip_conntrack_tcp_loose=0就行了呢，副作用怎么消除呢？</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">反复想了想，还是不想就这样了，决定看看有没有其它更优的方法。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">到测试环境模拟这个故障，服务端口就用TCP 8888，客户机源端口就用TCP&nbsp;22222</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<div id="codeText" class="codeText" style="overflow:auto;word-wrap:break-word;border-top:#dddddd 1px solid;border-right:#dddddd 1px solid;width:1041px;background:#ffffff;white-space:normal;word-spacing:0px;border-bottom:#dddddd 1px solid;text-transform:none;word-break:break-all;color:#666666;padding-bottom:0px;text-align:left;padding-top:0px;font:12px Consolas, monospace;padding-left:0px;margin:0px 0px 1.1em;border-left:#dddddd 1px solid;widows:1;letter-spacing:0px;padding-right:0px;text-indent:0px;-webkit-text-stroke-width:0px;font-stretch:normal;">
<ol class="dp-css none_number" style="word-wrap:break-word;background:#ffffff;color:#5c5c5c;padding-bottom:5px;padding-top:5px;padding-left:0pt;margin:0px 1px 0px 0px;list-style:none none outside;line-height:1.3;padding-right:0pt;">
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"><span style="word-wrap:break-word;color:#5c5c5c;letter-spacing:0px;line-height:1.3;">1.配置服务器的iptables 允许tcp&nbsp;8888端口.</span></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">2.用python来建立一个监听服务，监听在tcp 8888.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# python</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Python 2.6.6 (r266:84292, Sep 4 2013, 07:46:00)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Type "help", "copyright", "credits" or "license" for more information.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; import socket</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; serversocket.bind(("0.0.0.0",8888))</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; serversocket.listen(5)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; (clientsocket, address) = serversocket.accept()</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">3. 从客户端以tcp 22222为源端口发起连接，并发送请求数据</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@client ~]# python</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Python 2.6.6 (r266:84292, Sep 4 2013, 07:46:00)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Type "help", "copyright", "credits" or "license" for more information.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; import socket</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.bind(("0.0.0.0",22222))</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.connect(("1.1.1.101",8888))</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.send("request",100)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">在server端检查iptable contrack 的状态</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# grep 103 /proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">ipv4 2 tcp 6 431949 ESTABLISHED src=1.1.1.103 dst=1.1.1.101 sport=22222 dport=8888 src=1.1.1.101 dst=1.1.1.103 sport=8888 dport=22222 [ASSURED] mark=0 secmark=0 use=2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Wait some seconds, then close the connection on client.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.close()</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; exit()</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">继续检查server端的iptable contrack 状态</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# grep 103 /proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">ipv4 2 tcp 6 54 CLOSE_WAIT src=1.1.1.103 dst=1.1.1.101 sport=22222 dport=8888 src=1.1.1.101 dst=1.1.1.103 sport=8888 dport=22222 [ASSURED] mark=0 secmark=0 use=2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# sleep 55</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# grep 103 /proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">server端的条目消失了.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">4. 当server端向client发送响应数据的时候</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; clientsocket.recv(1000)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">'request'</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; clientsocket.send("respond",100)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">再到server端查看iptable contrack 状态就会发现有&nbsp;ESTABLISHED[UNREPLIED] 条目.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">但是看TCP 连接的状态却是&nbsp;CLOSE_WAIT.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# grep 103 /proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">ipv4 2 tcp 6 431996<span class="Apple-converted-space">&nbsp;</span><span style="word-wrap:break-word;color:#e53333;background-color:#ffe500;">ESTABLISHED</span><span class="Apple-converted-space">&nbsp;</span>src=1.1.1.101 dst=1.1.1.103 sport=8888 dport=22222 [UNREPLIED] src=1.1.1.103 dst=1.1.1.101 sport=22222 dport=8888 mark=0 secmark=0 use=2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# netstat -ntp|grep 103</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 1 7 1.1.1.101:8888 1.1.1.103:22222<span class="Apple-converted-space">&nbsp;</span><span style="word-wrap:break-word;color:#e53333;background-color:#ffe500;">CLOSE_WAIT</span><span class="Apple-converted-space">&nbsp;</span>28978/python</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">这个时候，从wireshark上抓包也发现了client拒绝server的响应数据并发送ICMP[host administratively prohibited].</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">当server端TCP连接CLOSE_WAIT超时后, 受到net.netfilter.nf_conntrack_tcp_timeout_established参数的控制，nf_contrack表中ESTABLISHED[UNREPLIED] 条目是依然存在的。</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# netstat -ntp|grep 103</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# grep 103 /proc/net/nf_conntrack</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">ipv4 2 tcp 6 431066 ESTABLISHED src=1.1.1.101 dst=1.1.1.103 sport=8888 dport=22222 [UNREPLIED] src=1.1.1.103 dst=1.1.1.101 sport=22222 dport=8888 mark=0 secmark=0 use=2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">监听端口状态是正常的</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@server ~]# netstat -nplt|grep 8888</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 28978/python</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;"></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">5. 这个时候，当client再以tcp&nbsp;22222为源端口向tcp&nbsp;8888发起连接，server端的iptables就拒绝了这个SYN报文. client端连接报错。</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[root@client ~]# python</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Python 2.6.6 (r266:84292, Sep 4 2013, 07:46:00)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Type "help", "copyright", "credits" or "license" for more information.</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; import socket</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.bind(("0.0.0.0",22222))</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt; s.connect(("1.1.1.101",8888))</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">Traceback (most recent call last):</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">File "<stdin style="word-wrap:break-word;">", line 1, in<module style="word-wrap:break-word;"></module></stdin></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">File "<string style="word-wrap:break-word;">", line 1, in connect</string></li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">socket.error: [Errno 113] No route to host</li>
<li style="word-wrap:break-word;background:#ffffff;padding-bottom:0px;padding-top:0px;padding-left:10px;margin:0px;list-style:none none outside;padding-right:0px;">&gt;&gt;&gt;</li>
</ol>
</div>
<p><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">经过这个模拟，也验证了故障的成因，那么要解决问题，就简单了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">有两边可以想办法，一是appsrv1这边，如果让该条目早点过期，那么只要端口重用不是特别快的情况下，这个问题就不会发生。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">在不改造应用的情况下，这是一个较好的临时解决方案，但是过期时间设多久，要先要满足前一个问题的SQL最长执行时间，然后观察端口重用的时间有没有短于SQL最长执行时间。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">appsrv1这边存在条目的原因是返回数据被拒绝后的TCP重传数据包被iptables nf_conntrack记录，svcbus2又没有响应的TCP回应报文。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">那么另外一个方法就是如果能让svcbus2正常响应就解决了。 &nbsp; 怎么能正常解决呢？</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">iptables nf_contrack 还有另外一个参数&nbsp;net.netfilter.nf_conntrack_tcp_timeout_close_wait。 默认是60秒跟TCP的CLOSE_WAIT 超时时间是一致的。通过试验模拟这个故障发现，如果把这个时间设长，超过TCP&nbsp;</span><span style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/21px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">CLOSE_WAIT 超时时间</span><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">，</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">那么在TCP连接关闭后，appsrv1的返回报文还可以到达svcbus2的内核， svcbus2会直接发送TCP RST包将这个连接重置。这样appsrv1上的TCP连接和nf_contrack中的条目都会清除掉。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">看模拟的过程。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<img style="word-wrap:break-word;border-top:0px;border-right:0px;white-space:normal;word-spacing:0px;border-bottom:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';border-left:0px;widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" alt="" src="http://blog.chinaunix.net/attachment/201503/23/7549563_1427101142KXFO.png" width="700" height="118" /><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">当 appsrv-test 在svcbus-test发送FIN连接后过了127秒开始发送响应数据，这时候svcbus-test就立刻回应了RST报文，后来svcbus-test半个小时候在重新再用tcp 22222端口向appsrv-test tcp 8888发起连接的时候，问题问题，连接可以正常建立了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">当然这种处理方法应用层会收到错误，要对这个错误进行处理才行。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">由于第二种方法还涉及应用代码的调整和测试，因此通过观察发现SQL最长执行时间在15分钟左右，服务总线源端口重用时间大概在4个小时，但是由于重用源端口的时候，目标端口还不一定是appsrv上的对应端口，因此4小时不一定形成冲突。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">综合考虑了一下，设置</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
</p>
<pre class="pcmTextBlock browserNotIE ng-binding" style="box-sizing:border-box;overflow:auto;word-wrap:normal;border-top:#ececec 1px solid;border-right:#ececec 1px solid;white-space:pre-wrap;word-spacing:0px;border-bottom:#ececec 1px solid;text-transform:none;word-break:normal;color:#222222;text-align:left;padding-top:0px;font:14px/1.4285 'Courier New', Courier, 'DejaVu Sans Mono', monospace;padding-left:0px;margin:0px 0px 10px;border-left:#ececec 1px solid;widows:1;letter-spacing:normal;padding-right:0px;background-color:#f5f5f5;text-indent:0px;border-radius:0px;-webkit-text-stroke-width:0px;" ng-hide="ie8 || ie9" ng-bind-html="comment.text | linky:'_blank'">net.netfilter.nf_conntrack_tcp_timeout_establishe=7200</pre> <p><span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">是一个比较合适的解决方法。调整后，经过近期的观察，没有出现业务失败了。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">经验：</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">系统参数调整要小心，特别是对应用行为不清楚的情况下，要多测试。该业务系统就是没有经过严格的测试，为了赶目标节点匆匆上线继而发生后续故障。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">另外，在系统软件部署的时候，管理员使用的文档中没有及时更新，缺乏了对Oracle 11g一些容易引起问题的新特性参数进行调整的要求。当遇到应用</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">没有充分优化的情况下，由于这个新特性带来的性能加速恶化，也对相关故障产生了间接的影响。 因此及时更新文档，保证系统的参数基线合理显得也很重要。</span><br style="word-wrap:break-word;white-space:normal;word-spacing:0px;text-transform:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;" />
<span style="white-space:normal;word-spacing:0px;text-transform:none;float:none;color:#666666;text-align:left;font:14px/26px 'Microsoft YaHei';widows:1;display:inline !important;letter-spacing:normal;background-color:#ffffff;text-indent:0px;-webkit-text-stroke-width:0px;">最后，运维工作纷繁复杂，要静下心来仔细的看，才会发现其中的小细节。</span></p> <a href="http://www.vants.org/?post=266">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=279">某业务系统访问慢分析</a></p><p><a href="http://www.vants.org/?post=48">疑难杂症分析案例-启明星辰竟然用的是我08年制作的PPT</a></p><p><a href="http://www.vants.org/?post=126">连接数相关知识</a></p><p><a href="http://www.vants.org/?post=127">ICMP通讯管理性过滤禁止差错报文（type 3，code 13）</a></p><p><a href="http://www.vants.org/?post=109">TCP MSS与PMTUD</a></p></div>]]></description>
	<pubDate>Sat, 23 Apr 2016 03:49:26 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=266</guid>

</item>
<item>
	<title>PSH|RST同置位，系统应用共沉寂!</title>
	<link>http://www.vants.org/?post=240</link>
	<description><![CDATA[<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 年前一位技术兄弟维护的站点遇到异常流量，导致无法正常访问站点。</span><span style="font-size:14px;">其将捕获到的报文发给我，让我帮其分析一下大致是什么情况。年前杂事较多，未来得及写分析文档，年后将未完成的部分补充完全，放在此供各位兄弟讨论。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我首先查看其TCP会话数较多（8500多个），而且大部分的TCP会话是219.140.167.122与X.X.254.18之间产生的，并且这些会话具有较为明显的流量特征，如下图所示：</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/e3ca4a9c5a31ccd234c9b9e1ebc2cf1120140207143005262804230.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/e3ca4a9c5a31ccd234c9b9e1ebc2cf1120140207143005262804230.png" height="353" border="0" width="480" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我们在报文 中任意查看其中一个TCP会话的交互报文，如下图：</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/0db844719b659d5ad141e33003410a1020140207143005125179546.png" target="_blank"><img style="width:490px;height:108px;" alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/0db844719b659d5ad141e33003410a1020140207143005125179546.png" height="77" border="0" width="480" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我们可以发现，其在完成三次握手之后，219.140.167.122主机与X.X.254.18发起了一个PSH、RST同时置一的报文，如下图：</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/4f2f10f7ec45e04e9e44e8ea0cad7df8201402071430042117773941.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/4f2f10f7ec45e04e9e44e8ea0cad7df8201402071430042117773941.png" height="257" border="0" width="480" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 这个报文的解码和follow TCP Stream显示其为一个http&nbsp; get请求报文，如下：</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/b313f8ceaa8e1b6d8a58a1a4e5d090c620140207143004894065512.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/b313f8ceaa8e1b6d8a58a1a4e5d090c620140207143004894065512.png" height="360" border="0" width="479" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 将其解码，如下：</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/ac8b171d6f6414c0965a80800a4c7802201402071430031090577414.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/ac8b171d6f6414c0965a80800a4c7802201402071430031090577414.png" height="189" border="0" width="328" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;可见这是一个针对站点某个pdf文档进行访问的操作。</span></p>
<p align="center"><span style="font-size:14px;"><span style="font-size:14px;"><a href="/content/plugins/kl_album/upload/201402/8a3e32f6777d76f3184c2968630c489920140207143003184819981.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/8a3e32f6777d76f3184c2968630c489920140207143003184819981.png" height="105" border="0" width="480" /></a></span></span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:14px;">在这个TCP交互过程中，我们可发现服务器在收到这个PSH、RST位同时置一的get请求之后，并没有立即RST释放这个TCP连接，而是在72秒之后，服务器才向客户端发送RST报文释放该TCP连接。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;一般情况下，在收到RST报文之后，系统传输层会立即释放对应的TCP连接，为什么要等到72秒之后才发送RST报文呢？TCP协议栈在收到PSH，RST位同时置一的报文时，应该如何处理？？</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Google百度均未找到相关的说明资料。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我们不妨自己先大胆推测一下服务器在收到PSH、RST位同时置一的报文时时如何处理。</span><br />
<strong><span style="font-size:14px;">三种假设：</span><br />
</strong><span style="font-size:14px;">1，如果服务器先处理RST位，则服务器端会立即释放相关的TCP连接表信息。PSH位置一应该会失去应有的意义，传输层不会将客户端的应用字段递交给应用层处理。</span><br />
<span style="font-size:14px;">2，如果服务器先处理PSH位，后处理RST位，则服务器将get请求提交应用层之后，释放TCP连接。服务器及时向应用层在处理完客户端的get请求之后，应用层向传输层提交应用层响应数据，这时，会发现在服务器传输层已有的TCP连接表信息中找不到对应的TCP连接，传输层向应用层报错，应用层放弃。</span><br />
<span style="font-size:14px;">3，如果服务器先处理PSH位，忽略RST位，则服务器会将应用层的响应字段正常发送给客户端。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我们再来仔细的看看上述报文交互的情况：</span></p>
<p align="center"><span style="font-size:14px;"><a href="/content/plugins/kl_album/upload/201402/cb0d23186d7f5da91d0c739a43c3ed48201402071430021808486751.png" target="_blank"><img alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/cb0d23186d7f5da91d0c739a43c3ed48201402071430021808486751.png" height="183" border="0" width="473" /></a><a href="/content/plugins/kl_album/upload/201402/8a3e32f6777d76f3184c2968630c489920140207143003184819981.png" target="_blank"></a></span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size:14px;">我们可以发现，服务器在72秒之后向客户端发送的RST报文其ACK位是置一的，ACK相对确认号是1，这说明这个RST报文发送出来的时候，服务器端的TCP连接表信息是正常的，并且传输层并未处理PSH、RST位置一的报文，否则ACK相对确认号应该是805而不是我们看到的1。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我们再来看一下这个服务器发送的RST报文的解码，如下图所示：</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/4469843aed5ea7673d72382591064393201402071430021116908895.png" target="_blank"><img style="width:595px;height:515px;" alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/4469843aed5ea7673d72382591064393201402071430021116908895.png" height="360" border="0" width="347" /></a></p>
<p><br />
<span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 该报文TTL=64，可说明两点：</span></p>
<p><span style="font-size:14px;">1，这个报文的确是服务器发送的，不会是第三方进行TCP会话劫持伪造发送的；</span></p>
<p><span style="font-size:14px;">2，这个服务器可能是linux的服务器。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 上面的这些说明了什么呢？</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong><span style="color:#337fe5;">我个人认为，造成上述情况出现的原因是服务器过滤了RST位置一的报文！</span></strong></span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 服务器要做到对TCP标识位进行过滤并不是一件难事，iptables就可以。i</span><span style="font-size:14px;">ptables如下命令即可实现对RST位置一报文的过滤：</span><br />
<span style="font-size:14px;">　　<strong><em><span style="color:#e53333;">iptables -A INPUT -p tcp --tcp-flags RST RST -j DROP</span></em></strong></span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 服务器过滤了PSH/RST位置一的报文，它不管PSH是否置一，因此那个客户端发送给服务器的PSH/RST位置一的http get报文并未被服务器传输层收到，服务器传输层在等待了72秒未收到客户端的任何请求之后，主动RST释放了这个TCP连接。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 这些流量特征一致的TCP会话基本都是一样的，在三次握手建立TCP连接之后，向服务器发送PSH、RST位同时置一的http get请求报文，如下图所示：&nbsp;</span><span style="font-size:14px;">&nbsp;</span></p>
<p align="center"><a href="/content/plugins/kl_album/upload/201402/6220c6d2e321a411e3a50d0ded9f3c58201402071430011052456779.png" target="_blank"><img style="width:522px;height:269px;" alt="点击查看原图" src="/content/plugins/kl_album/upload/201402/6220c6d2e321a411e3a50d0ded9f3c58201402071430011052456779.png" height="251" border="0" width="480" /></a></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;客户端短时间内大量的这种报文，导致服务器的连接表骤增而在一定时间内难以释放，从而给服务器造成了DOS攻击的效果。</span></p> <a href="http://www.vants.org/?post=240">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=135">关于《DDoS攻击原理及防护方法论》一文的疑问</a></p><p><a href="http://www.vants.org/?post=285">MOTS攻击之TCP攻击</a></p><p><a href="http://www.vants.org/?post=289">某省厅门户网站A市局访问异常应急处置</a></p><p><a href="http://www.vants.org/?post=281">MOTS攻击技术分析</a></p><p><a href="http://www.vants.org/?post=147">凯创交换机端口镜像设置</a></p></div>]]></description>
	<pubDate>Fri, 07 Feb 2014 06:34:10 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=240</guid>

</item>
<item>
	<title>又遇TCP协议栈异常问题</title>
	<link>http://www.vants.org/?post=231</link>
	<description><![CDATA[<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-size:14px;">大家还记得我以前写的《</span><span style="font-size:14px;">T</span><span style="font-size:14px;">CP确认机制异常案例》</span></span><span style="font-size:14px;"><span style="font-size:14px;"><span style="font-size:14px;">(链接为：http://www.vants.org/?post=200)</span>吗？今天在一个用户那边再次遇到了一个TCP协议栈异常的问题。</span></span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 用户反馈的问题现象是业务交互出现异常，难以定位异常出现的原因，我在用户现场分析了异常出现时的报文交互情况，如下图所示：</span></p>
<p align="center"><span style="font-size:14px;"><a href="/content/plugins/kl_album/upload/201310/48f4f684af4e99c606e0a538b4b6b7b820131029164759342554406.png" target="_blank"><img style="width:694px;height:318px;" alt="点击查看原图" src="/content/plugins/kl_album/upload/201310/48f4f684af4e99c606e0a538b4b6b7b820131029164759342554406.png" border="0" height="177" width="480" /></a></span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 由F5设备主动向服务器发送SYN连接请求报文，服务器响应SYN/ACK报文，F5发送ACK报文确认后，向服务器连续发送3个大小分别为1514、1514、350大小的应用请求报文，但是1.199秒之后，F5重传了应用请求的第一个报文（该报文序列号为No7，该数据报其实是序号为No4的报文的重传），</span><span style="font-size:14px;">紧接着，2秒后，看到服务器的SYN/ACK的重传报文（该报文序列号为No8，该报文为No2报文的重传报文），后面数十秒的交互，基本是都是F5对应用请求报文的重传和服务器SYN/ACK报文的重传，在40秒之后，由F5主动发送RST报文释放该TCP连接。</span></p>
<p><span style="font-size:14px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 由整个交互的过程，我们可以清晰的看到，服务器不断重传SYN/ACK报文，说明服务器没有正常处理F5的ACK报文（序列号为No3的报文），站在F5的角度，TCP连接已经建立成功，但是站在服务器的角度，却认为TCP连接未建立完成，因此服务器不断重传SYN/ACK报文，为什么服务器在明确收到了F5的三次握手的ACK报文，却没有正确处理呢？而且后续F5重传的应用请求报文（No7、No10、No11、No14、No17）都可以说是对服务器SYN/ACK报文的确认，但是服务器全部忽略了，因此，这基本上可以判断为服务器端系统的TCP协议栈出现异常，导致了应用出现了异常。</span></p> <a href="http://www.vants.org/?post=231">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=279">某业务系统访问慢分析</a></p><p><a href="http://www.vants.org/?post=285">MOTS攻击之TCP攻击</a></p><p><a href="http://www.vants.org/?post=289">某省厅门户网站A市局访问异常应急处置</a></p><p><a href="http://www.vants.org/?post=281">MOTS攻击技术分析</a></p><p><a href="http://www.vants.org/?post=300">SharkFest'19 Retrospective</a></p></div>]]></description>
	<pubDate>Tue, 29 Oct 2013 08:47:12 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=231</guid>

</item>
<item>
	<title>策略误报导致应用保存失败的分析案例</title>
	<link>http://www.vants.org/?post=216</link>
	<description><![CDATA[<div><span style="line-height:1.5;font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;某用户应用，需要将一些修改信息提交保存至业务服务器，在这个保存过程中出现无法保存的现象（保存失败，点击保存按钮后，IE页面进度条长时间处于加载状态），我出于帮忙，到用户现场捕获了业务保存时的交互报文，如下图所示：</span></div>
<div style="text-align:center;"><a target="_blank" href="/content/plugins/kl_album/upload/201304/6f94dab888a60f4ee853e1595dd1500e20130416141014591728737.jpg"></a><img src="/content/plugins/kl_album/upload/201304/6f94dab888a60f4ee853e1595dd1500e20130416141014591728737.jpg" title="点击查看原图" alt="点击查看原图" border="0" height="250" width="700" /><br />
</div>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;查看这个交互的过程，我们可以非常清晰的看到，客户端与服务器TCP三次握手正常，问题出在客户端提交POST请求的报文被中间设备丢弃了。比较有趣的是，客户端在尝试两次重传之后，较为聪明的将这个POST请求字段（长度为859B）拆分为两个长度分别为536B和323B的字段，并先将长度为536B的应用字段重传给服务器，我们清楚的看到服务器对这个应用字段作出了确认，这说明这个长度的应用字段正常到达了应用服务器，但是后续长度为323B的应用字段一直被丢弃，这很容易想到是中间设备策略误报一直丢弃某些固定报文导致的，我们来看一下这个应用字段里到底封装的是什么应用数据，我们首先看这个报文的应用字段解码如下：</span></p>
<p style="text-align:center;"><span style="font-size:14px;"><a target="_blank" href="/content/plugins/kl_album/upload/201304/7719843f06ef965f3ce5d35244b3725f201304161410171181527905.jpg"><img src="/content/plugins/kl_album/upload/201304/7719843f06ef965f3ce5d35244b3725f201304161410171181527905.jpg" title="点击查看原图" alt="点击查看原图" border="0" height="352" width="615" /></a><br />
</span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;</span><span style="font-size:14px;line-height:1.5;">这个看得有点头晕，我们将这段字符解码一下，如下图所示：</span></div>
<div style="text-align:center;"><a target="_blank" href="/content/plugins/kl_album/upload/201304/5ca43a124ae480867981f0f521d1ccc420130416141009277170427.jpg"><img src="/content/plugins/kl_album/upload/201304/5ca43a124ae480867981f0f521d1ccc420130416141009277170427.jpg" title="点击查看原图" alt="点击查看原图" border="0" height="180" width="680" /></a><br />
</div>
<div><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp;解码之后，我们可以看到这个应用字段中存在“varchar(60)”等关键字，这可能引起WAF、IPS等设备将此报文误报为SQL注入的尝试。</span><span style="line-height:1.5;font-size:14px;">&nbsp; &nbsp; &nbsp;&nbsp;</span></div> <a href="http://www.vants.org/?post=216">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=202">可能的URL超长导致丢包案例</a></p><p><a href="http://www.vants.org/?post=290">省厅A登陆省厅B预算系统异常应急处置</a></p><p><a href="http://www.vants.org/?post=231">又遇TCP协议栈异常问题</a></p><p><a href="http://www.vants.org/?post=291">省局门户网站地市信息公开栏目访问异常应急处置</a></p><p><a href="http://www.vants.org/?post=73">视频点播服务间歇性中断故障分析案例</a></p></div>]]></description>
	<pubDate>Sat, 20 Apr 2013 06:13:55 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=216</guid>

</item>
<item>
	<title>《某公司QQ掉线分析案例》之我见</title>
	<link>http://www.vants.org/?post=215</link>
	<description><![CDATA[<div>&nbsp; &nbsp; &nbsp; &nbsp;<span style="font-size:14px;line-height:1.5;">前段时间看到CSNA论坛上有一篇QQ掉线故障分析的案例，原文是PDF格式的，我就不转载其全文了，其链接如下 ： http://www.csna.cn/network-analyst-50520-1-1.html，大家可自行通过上述链接下载查看。</span></div>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;大致看了分析过程之后，觉得其中存在一些疑问，特地找时间针对这个案例做一个大致的分析，以便大家讨论参考。原本准备做相关交互图示的，由于个人时间问题，这次就免了，还望各位见谅。</span></p>
<p><b><span style="font-size:14px;">1，</span><span class="Apple-tab-span" style="white-space:pre;font-size:14px;">	</span><span style="font-size:14px;">QQ多人同时掉线，大约1秒就自动连上，浏览网页和其他应用没有问题 &nbsp;</span></b></p>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;QQ属于一种终端的应用，一般情况下，局域网内某应用出现同时掉线、缓慢等问题，基本上可以排除是单个终端的问题，因为从这个现象上来看，这是一个全局问题，可能是业务应用本身、业务服务器或者承载业务的网络系统出现了问题。在此故障分析案例中，作者提到QQ同时掉线但是浏览网页和其他应用没有问题。通过这个故障现象的描述，我个人首先可能会想到整个互联网出口是否会出现网络瞬间中断的情况？这是我个人的猜想，因为如果真是互联网出口出现网络瞬间中断，那么HTTP等基于TCP的应用由于TCP的重传机制，在出口网络中断1秒左右的时间内，终端使用者不会有明显的异常感受，而基于UDP等应用，可能会出现掉线等现象，在网络出口恢复正常之后，QQ再次自动上线。这个猜想的原因至少可以很好的解释该案例作者一开始描述的故障现象。</span></p>
<p><b><span style="font-size:14px;">2，</span><span class="Apple-tab-span" style="white-space:pre;font-size:14px;">	</span><span style="font-size:14px;">QQ掉线是QQ服务器主动RST连接导致的，腾讯给出的解释是由于用户端IP地址出现了变化。</span></b></p>
<p><b><i><span style="font-size:14px;">1），</span><span style="font-size:14px;">QQ大部分情况下应该是使用UDP协议的，少数情况下使用TCP协议。</span></i></b></p>
<p><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp;QQ是一个较为复杂的应用，其为了满足使用者在各种复杂的环境对QQ的正常使用，其会存在多种模式和网络交互行为，大部分情况下，QQ默认会使用UDP协议进行交互，少数情况下会使用TCP甚至TCP的80端口、443端口进行交互，特殊情况下，根据需要和设置，其还可以通过代理的方式进行交互。</span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;正是QQ这种特殊应用工作机制的多样性和复杂性，导致我们在遇到其故障时，给我们的分析定位带来了难度。在该案例中，由于QQ是大规模的出现掉线现象，我们在分析处理时，其实根本搞不清楚不同使用者的QQ到底是工作在何种模式之下，其网络行为特征千差万别，因此，我们很难通过某一台测试机来全面分析验证QQ故障的原因。</span></div>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;在该案例中，测试机的QQ似乎使用的是TCP443端口进行交互的，但其实绝大部分情况下，QQ都是基于UDP协议进行交互的。因此在此案例中，按照我个人较为严谨的分析习惯，测试机的选择非常值得做进一步的商榷。</span></p>
<p><b><i><span style="font-size:14px;">2），</span><span style="font-size:14px;">就算这个客户网络中所有使用的QQ都是基于TCP协议的，如果真是如腾讯所说，是由于QQ服务器发现用户IP出现了变化，才向客户端发送RST报文的，那么：</span></i></b></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;其一，不同的QQ客户端与QQ服务器是单独建立连接的，暂且不论QQ服务器的多样性（不同的QQ客户端很可能会与网络中的N个不同的QQ服务器之间建立交互链接），不同的QQ客户端与QQ服务器之间建立不同的交互连接，那么即使防火墙在做NAT时出现BUG，突然改变了某些已有连接的源IP地址，也应该绝少出现同时改变N个不同QQ客户端与服务器之间已有连接的源IP，换句话说，如果原因真是如该案例作者所说，很难解释QQ同时掉线的现象。</span></div>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;其二，那么我们站在QQ服务器的角度来分析一下此种情况真的发生时，QQ服务器会作何处理？</span><span style="font-size:14px;line-height:1.5;">QQ服务器收到源IP地址变化的QQ客户端报文后，QQ服务器会把他作为一个新的TCP连接来处理（因为五元组不一样），而QQ服务器并没有这个TCP连接信息，因此服务器会直接RST该报文，而这种RST报文，其ACK位肯定不会置一的，但是我们看到这个案例中作者的截图如下：</span></div>
<div style="text-align:center;"><span style="font-size:14px;">&nbsp;<a target="_blank" href="/content/plugins/kl_album/upload/201304/bb5c55ad1455c3151170ca3705e8a74420130414175201792293185.png"><img src="/content/plugins/kl_album/upload/201304/bb5c55ad1455c3151170ca3705e8a74420130414175201792293185.png" width="480" height="298" alt="点击查看原图" border="0" /></a></span></div>
<p><span style="font-size:14px;">该RST报文ACK位是置一的，因此这个RST报文并不是如作者想象的情况下产生的。</span></p>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;其三，我们再来看另外一张截图：</span></p>
<div style="text-align:center;"><a target="_blank" href="/content/plugins/kl_album/upload/201304/84554f4ea7cdc4d0a3c795e6ba2460c1201304141752001075498348.png"><img src="/content/plugins/kl_album/upload/201304/84554f4ea7cdc4d0a3c795e6ba2460c1201304141752001075498348.png" width="480" height="275" alt="点击查看原图" border="0" /></a><br />
</div>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</span><span style="font-size:14px;line-height:1.5;">看此交互截图，QQ服务器已经对客户端的报文作出了确认（序号2230的包），紧接着（0.000045秒后），QQ服务器向客户端发送RST报文（序号 2232的包），这个报文可以肯定在服务器紧接着发出序号2230的包之后发出的，如果按照作者的说法，RST报文至少应该是序号为2231的客户端报文经防火墙源地址转换之后（源IP变化），到达QQ服务器，QQ服务器发现源地址变化，再向客户端发送RST报文，这样整个过程方才合理，但是上面这张截图已经充分说明：事实并非如此。</span></div>
<p><span style="font-size:14px;line-height:1.5;"><b>3，如果真的是防火墙的NAT出现了这种低级的错误和BUG，那么对用户影响最大的应该是HTTP等基于TCP应用。因为很多HTTP应用的动态口令、验证码等都需要通过数个不同的TCP连接与服务器进行交互，一旦出现该案例作者所说的防火墙处理BUG，将导致这些应用出现访问故障。而不会仅仅是无关紧要的QQ应用出现问题。</b></span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>4，QQ掉线而其他TCP应用正常，很可能是防火墙的UDP会话保持时间过短导致的。</b></span></p>
<p><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp;因为很多人挂QQ并不一直与好友聊天。而一般情况下，防火墙等为节约设备资源，会将UDP的会话超时时间设置为一个较短的数值，如60秒，那么如果用户在这个超时时间内无任何数据交互的话，防火墙会将这个UDP会话从防火墙连接表中删除，当QQ客户端再次尝试与QQ服务器交互时，其发送的QQ状态更新报文就会被防火墙丢弃，导致QQ客户端掉线。QQ掉线后，客户端会再次发起新的连接，此时防火墙将其作为一个新建的UDP连接进行处理，QQ因此再次成功上线。当然这个很难解释QQ同时掉线的现象。</span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;另外一种情况是，如果防火墙的UDP连接表出现异常，防火墙突然清空其UDP连接表，也可能会导致该案例中故障现象的产生。</span></div>
<p><span style="font-size:14px;"><b>5，在NAT POOL情况下，NAT设备会根据不同的算法（基于源IP的、基于连接的等）实现NAT POOL的地址复用，大部分算法应该都是基于源IP等计算的，如果是基于数据包、连接等进行计算，则很可能会导致各种问题的产生，我以前有一个非常经典的案例——《移动无线VPN客户端隧道建立故障分析》，下次我发布到我的博客，大家可去参考。&nbsp;</b></span></p>
<p><span style="font-size:14px;line-height:1.5;"><b><span style="font-size:16px;">简单总结：</span></b></span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;防火墙在NATPOOL情况下，变化地址是正常的，关键看防火墙的NAT算法。QQ掉线应该跟这个无关，案例中分析来自于QQ服务器的RST报文跟NAT后源地址变化无关。</span></div> <a href="http://www.vants.org/?post=215">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=135">关于《DDoS攻击原理及防护方法论》一文的疑问</a></p><p><a href="http://www.vants.org/?post=285">MOTS攻击之TCP攻击</a></p><p><a href="http://www.vants.org/?post=289">某省厅门户网站A市局访问异常应急处置</a></p><p><a href="http://www.vants.org/?post=281">MOTS攻击技术分析</a></p><p><a href="http://www.vants.org/?post=300">SharkFest'19 Retrospective</a></p></div>]]></description>
	<pubDate>Sun, 14 Apr 2013 09:52:03 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=215</guid>

</item>
<item>
	<title>IDS异常导致业务访问故障的案例</title>
	<link>http://www.vants.org/?post=214</link>
	<description><![CDATA[<p><p><p><span style="font-size:14px;line-height:21px;"><b><span style="font-size:16px;">【说在之前】：</span></b></span></p>
<p><span style="font-size:14px;line-height:21px;">1，这个是以前我在某个商业银行解决的疑难故障，其中IDS设备出现异常，将原先捕获到的原始数据包通过交换机的镜像口回放至交换机，而交换机的镜像端口并未关闭MAC地址学习功能，引起业务服务器交换机MAC表混乱，导致业务服务器的访问出现异常；</span></p>
<p><span style="font-size:14px;line-height:21px;">2，这是一个非常少见的疑难故障，因此值得大家学习参考；</span></p>
<p><span style="font-size:14px;line-height:21px;">3，这段时间由于我个人的规划，工作会作出一些变动，杂事较多，有段时间未在博客中更新新的文章，还望各位兄弟见谅，就将这篇老文作为我三月份博客的更新吧。</span></p>
<p><b style="font-size:14px;line-height:21px;"><span style="font-size:16px;">【案例全文】：</span></b></p>
</p>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp; 前段时间晚上，接到某单位朋友电话，说是其一核心业务系统白天出现了故障，很长一段时间没有解决，现在很多厂家工程师都在，让我过去增援一下，明确定位一下故障的原因所在。于是赶紧打的到现场。下面我把故障的情况描述一下：</span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>故障拓扑</b></span></p>
<p><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp; 经过简化的网络拓扑如下：&nbsp;</span></p>
<p style="text-align:center;"><span style="font-size:14px;line-height:1.5;"><a target="_blank" href="/content/plugins/kl_album/upload/201303/046747e9eb99be00a19a6bcf1ce1c627201303291649201486978934.png"><img src="/content/plugins/kl_album/upload/201303/046747e9eb99be00a19a6bcf1ce1c627201303291649201486978934.png" width="480" height="258" alt="点击查看原图" border="0" /></a><br />
</span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>&nbsp;故障现象</b></span></p>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;核心业务访问异常，业务出现中断现象，使用测试pc直接接入业务服务器交换机，设置为服务器同网段IP，在使用ping测试服务器，不通。</span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>故障分析</b></span></p>
<p><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp;故障时，交换机MAC表出现异常，主要表现为交换机端口镜像的目的口上，出现了防火墙ETH2口的MAC地址和核心服务器的MAC地址，导致核心业务访问出现异常。</span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>分析业务交换机MAC表异常的原因：</b></span></p>
<p><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;交换机MAC表学习的过程为源地址学习，H3C交换机在做端口镜像时，镜像目的端口默认是可以参与网络数据转发的，那么在交换机端口镜像的目的口上出现了防火墙E2口的MAC和业务服务器的MAC，只能说明该端口上收到了源MAC为防火墙E2口和业务服务器的报文，但是作为与IDS监听口相连的端口镜像目的口，通常情况下是不可能收到任何报文的，因为IDS监听口只是被动监听发过来的报文，而不会主动向网络内发送任何的报文。</span></p>
<p><span style="font-size:14px;line-height:1.5;"><b>这里我们大胆猜想一下：</b></span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;IDS在极端情况下，工作机制出现异常，将以前捕获的报文，从监听口播放出去，导致交换机口学习到了业务服务器的MAC和防火墙E2口的MAC，这样交换机1根据源地址学习的工作机制，将防火墙E2口MAC和业务服务器MAC对应记录到端口镜像目的端口上。这样可以完美的解释整个故障的现象。如下图所示：&nbsp;</span></div>
<div style="text-align:center;"><a target="_blank" href="/content/plugins/kl_album/upload/201303/3a21244b5103cc7dcf87ea86068ce6dc201303291649201342361459.png"><img src="/content/plugins/kl_album/upload/201303/3a21244b5103cc7dcf87ea86068ce6dc201303291649201342361459.png" width="480" height="265" alt="点击查看原图" border="0" /></a><br />
</div>
<p><b style="font-size:14px;line-height:1.5;">&nbsp;故障解决的方式：</b></p>
<p><span style="font-size:14px;line-height:1.5;">&nbsp; &nbsp; &nbsp; &nbsp;关闭交换机端口镜像目的端口的MAC地址学习功能。这样即使该端口接收到大量的相关报文，其也不会影响交换机正常的MAC地址表项。</span></p>
<div><span style="font-size:14px;">&nbsp; &nbsp; &nbsp; &nbsp;到底是不是IDS将以前捕获的报文发送至镜像目的端口 ？我们可以通过TAP串接抓取IDS监听口的报文。如果是，那么IDS为什么会这么做？这个问题是个疑问，我想只有厂家可以解释了。技术的东西是可以较真的，没什么太多的变数。</span></div>
</p> <a href="http://www.vants.org/?post=214">阅读全文&gt;&gt;</a><div id="related_log" style="font-size:12px"><p><b>相关日志：</b></p><p><a href="http://www.vants.org/?post=161">交换机等网络设备端口镜像设置系列文章的说明</a></p><p><a href="http://www.vants.org/?post=155">Foundry 交换机端口镜像设置</a></p><p><a href="http://www.vants.org/?post=145">Linktrust SG端口镜像设置</a></p><p><a href="http://www.vants.org/?post=159">爱立信SE800型号的路由器端口镜像设置</a></p><p><a href="http://www.vants.org/?post=154">Extreme 交换机端口镜像设置</a></p></div>]]></description>
	<pubDate>Fri, 29 Mar 2013 08:46:13 +0000</pubDate>
	<author>易隐者</author>
	<guid>http://www.vants.org/?post=214</guid>

</item></channel>
</rss>