现象
收到告警,XX服务打开文件数过多,达到YY%。
验证问题
1、检查文件描述符
进入容器
# lsof 无法输出结果
# ls -lrt /proc/1/fd
lr-x------ 1 root root 64 Jun 14 17:04 258 -> pipe:[3198254937]
l-wx------ 1 root root 64 Jun 14 17:04 259 -> pipe:[3198254937]
确实有 2W 多个 fd
2、arthas
[arthas@1]$ jvm
...
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
FILE-DESCRIPTOR
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MAX-FILE-DESCRIPTOR-COUNT 65536
OPEN-FILE-DESCRIPTOR-COUNT 22178
分析
1、通过文件描述符 id 怎么查找到创建文件描述符的方法?
2、在 java 的世界里,什么会创建管道?NIO?执行 shell 命令?
使用 file-leak-detector
wget -O file-leak-detector-1.15-jar-with-dependencies.jar https://repo.jenkins-ci.org/releases/org/kohsuke/file-leak-detector/1.15/file-leak-detector-1.15-jar-with-dependencies.jar
java -jar file-leak-detector-1.15-jar-with-dependencies.jar 1 http=19999,threshold=30000,strong
没有发现特殊的打开未关闭的文件。
此时排查陷入困境,看文件描述符成对出现,猜测是NIO之类的操作创建的文件描述符。
OQL 查询
SelectorImpl
未发现有大量nio对象
arthas watch epoll 对 fd 的创建、删除
# 启动监控指定进程
# 监控 jdk函数
options unsafe true
# pipe fd的创建,打印读写fd
watch sun.nio.ch.EPollSelectorImpl <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl <init> -x 2 -n 1000000 >> /logs/arthas.log &
# 监控IOUtil.makePipe 不行,不能监控native
# 监控epoll fd创建
watch sun.nio.ch.EPollArrayWrapper <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper <init> >> /logs/arthas.log &
# 监控pipe fd的删除,
watch sun.nio.ch.EPollSelectorImpl implClose "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl implClose -x 2 -n 1000000 >> /logs/arthas.log &
# 监控epoll fd的删除
watch sun.nio.ch.EPollArrayWrapper closeEPollFD "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper closeEPollFD -x 2 -n 1000000 >> /logs/arthas.log &
结论
问题代码有如下:
EventLoopGroup eventLoopGroup = new NioEventLoopGroup(4);
NioClient nioClient = new NioClient(eventLoopGroup);// 此行发生异常
nioClientMap.put(key, nioClient);
EventLoopGroup 对象回收了,但对象创建的句柄未回收。
分析 NioEventLoopGroup 创建流程
public NioEventLoopGroup(int nThreads, Executor executor) {
this(nThreads, executor, SelectorProvider.provider());
}
public static SelectorProvider provider() {
synchronized (lock) {
if (provider != null)
return provider;
return AccessController.doPrivileged(
new PrivilegedAction<SelectorProvider>() {
public SelectorProvider run() {
if (loadProviderFromProperty())
return provider;
if (loadProviderAsService())
return provider;
provider = sun.nio.ch.DefaultSelectorProvider.create();
return provider;
}
});
}
}
public class DefaultSelectorProvider {
public static SelectorProvider create() {
String osname = AccessController
.doPrivileged(new GetPropertyAction("os.name"));
if (osname.equals("SunOS"))
return createProvider("sun.nio.ch.DevPollSelectorProvider");
if (osname.equals("Linux"))
return createProvider("sun.nio.ch.EPollSelectorProvider");
return new sun.nio.ch.PollSelectorProvider();
}
}
EPollSelectorImpl(SelectorProvider sp) throws IOException {
super(sp);
//创建一个pipe通道,返回fd文件句柄,用于实现超时机制
long pipeFds = IOUtil.makePipe(false);
//分别保存输入和输出的句柄
fd0 = (int) (pipeFds >>> 32);
fd1 = (int) pipeFds;
//创建epoll包装类
pollWrapper = new EPollArrayWrapper();
//这里是初始化中断,后面用于实现超时机制
pollWrapper.initInterrupt(fd0, fd1);
fdToKey = new HashMap<>();
}