searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

记录文件描述符泄漏问题

2023-06-30 06:06:21
65
0

现象

收到告警,XX服务打开文件数过多,达到YY%。

验证问题

1、检查文件描述符

进入容器

# lsof 无法输出结果
# ls -lrt /proc/1/fd

lr-x------ 1 root root 64 Jun 14 17:04 258 -> pipe:[3198254937]
l-wx------ 1 root root 64 Jun 14 17:04 259 -> pipe:[3198254937]

确实有 2W 多个 fd

2、arthas

[arthas@1]$ jvm 
...
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 FILE-DESCRIPTOR                                                                                                                                                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 MAX-FILE-DESCRIPTOR-COUNT                                         65536                                                                                                                                                               
 OPEN-FILE-DESCRIPTOR-COUNT                                        22178    

 

分析

1、通过文件描述符 id 怎么查找到创建文件描述符的方法?

2、在 java 的世界里,什么会创建管道?NIO?执行 shell 命令?

使用 file-leak-detector

wget -O file-leak-detector-1.15-jar-with-dependencies.jar https://repo.jenkins-ci.org/releases/org/kohsuke/file-leak-detector/1.15/file-leak-detector-1.15-jar-with-dependencies.jar
java -jar file-leak-detector-1.15-jar-with-dependencies.jar 1 http=19999,threshold=30000,strong

没有发现特殊的打开未关闭的文件。

此时排查陷入困境,看文件描述符成对出现,猜测是NIO之类的操作创建的文件描述符。

OQL 查询

SelectorImpl

未发现有大量nio对象

arthas watch epoll 对 fd 的创建、删除

# 启动监控指定进程
# 监控 jdk函数
options unsafe true
# pipe fd的创建,打印读写fd
watch sun.nio.ch.EPollSelectorImpl <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl <init> -x 2 -n 1000000 >> /logs/arthas.log &
# 监控IOUtil.makePipe 不行,不能监控native
# 监控epoll fd创建
watch sun.nio.ch.EPollArrayWrapper <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper <init> >> /logs/arthas.log &
# 监控pipe fd的删除,
watch sun.nio.ch.EPollSelectorImpl implClose "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl implClose -x 2 -n 1000000 >> /logs/arthas.log &
# 监控epoll fd的删除
watch sun.nio.ch.EPollArrayWrapper closeEPollFD "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper closeEPollFD -x 2 -n 1000000 >> /logs/arthas.log &

结论

问题代码有如下:

EventLoopGroup eventLoopGroup = new NioEventLoopGroup(4);
NioClient nioClient = new NioClient(eventLoopGroup);// 此行发生异常
nioClientMap.put(key, nioClient);

EventLoopGroup 对象回收了,但对象创建的句柄未回收。

 

分析 NioEventLoopGroup 创建流程

    public NioEventLoopGroup(int nThreads, Executor executor) {
        this(nThreads, executor, SelectorProvider.provider());
    }

 

   public static SelectorProvider provider() {
        synchronized (lock) {
            if (provider != null)
                return provider;
            return AccessController.doPrivileged(
                new PrivilegedAction<SelectorProvider>() {
                    public SelectorProvider run() {
                            if (loadProviderFromProperty())
                                return provider;
                            if (loadProviderAsService())
                                return provider;
                            provider = sun.nio.ch.DefaultSelectorProvider.create();
                            return provider;
                        }
                    });
        }
    }

 

public class DefaultSelectorProvider {
    public static SelectorProvider create() {
        String osname = AccessController
            .doPrivileged(new GetPropertyAction("os.name"));
        if (osname.equals("SunOS"))
            return createProvider("sun.nio.ch.DevPollSelectorProvider");
        if (osname.equals("Linux"))
            return createProvider("sun.nio.ch.EPollSelectorProvider");
        return new sun.nio.ch.PollSelectorProvider();
    }
}

 

EPollSelectorImpl(SelectorProvider sp) throws IOException {
	super(sp);
    //创建一个pipe通道,返回fd文件句柄,用于实现超时机制
	long pipeFds = IOUtil.makePipe(false);
    //分别保存输入和输出的句柄
	fd0 = (int) (pipeFds >>> 32);
	fd1 = (int) pipeFds;
    //创建epoll包装类
	pollWrapper = new EPollArrayWrapper();
    //这里是初始化中断,后面用于实现超时机制
	pollWrapper.initInterrupt(fd0, fd1);
	fdToKey = new HashMap<>();
}
0条评论
0 / 1000
朱****斌
10文章数
0粉丝数
朱****斌
10 文章 | 0 粉丝
原创

记录文件描述符泄漏问题

2023-06-30 06:06:21
65
0

现象

收到告警,XX服务打开文件数过多,达到YY%。

验证问题

1、检查文件描述符

进入容器

# lsof 无法输出结果
# ls -lrt /proc/1/fd

lr-x------ 1 root root 64 Jun 14 17:04 258 -> pipe:[3198254937]
l-wx------ 1 root root 64 Jun 14 17:04 259 -> pipe:[3198254937]

确实有 2W 多个 fd

2、arthas

[arthas@1]$ jvm 
...
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 FILE-DESCRIPTOR                                                                                                                                                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 MAX-FILE-DESCRIPTOR-COUNT                                         65536                                                                                                                                                               
 OPEN-FILE-DESCRIPTOR-COUNT                                        22178    

 

分析

1、通过文件描述符 id 怎么查找到创建文件描述符的方法?

2、在 java 的世界里,什么会创建管道?NIO?执行 shell 命令?

使用 file-leak-detector

wget -O file-leak-detector-1.15-jar-with-dependencies.jar https://repo.jenkins-ci.org/releases/org/kohsuke/file-leak-detector/1.15/file-leak-detector-1.15-jar-with-dependencies.jar
java -jar file-leak-detector-1.15-jar-with-dependencies.jar 1 http=19999,threshold=30000,strong

没有发现特殊的打开未关闭的文件。

此时排查陷入困境,看文件描述符成对出现,猜测是NIO之类的操作创建的文件描述符。

OQL 查询

SelectorImpl

未发现有大量nio对象

arthas watch epoll 对 fd 的创建、删除

# 启动监控指定进程
# 监控 jdk函数
options unsafe true
# pipe fd的创建,打印读写fd
watch sun.nio.ch.EPollSelectorImpl <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl <init> -x 2 -n 1000000 >> /logs/arthas.log &
# 监控IOUtil.makePipe 不行,不能监控native
# 监控epoll fd创建
watch sun.nio.ch.EPollArrayWrapper <init> "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper <init> >> /logs/arthas.log &
# 监控pipe fd的删除,
watch sun.nio.ch.EPollSelectorImpl implClose "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollSelectorImpl implClose -x 2 -n 1000000 >> /logs/arthas.log &
# 监控epoll fd的删除
watch sun.nio.ch.EPollArrayWrapper closeEPollFD "{target,clazz,method,params,returnObj,throwExp,@java.lang.Thread@currentThread(),@org.apache.skywalking.apm.toolkit.trace.TraceContext@traceId()}" -x 2 -n 1000000 >> /logs/arthas.log &
stack sun.nio.ch.EPollArrayWrapper closeEPollFD -x 2 -n 1000000 >> /logs/arthas.log &

结论

问题代码有如下:

EventLoopGroup eventLoopGroup = new NioEventLoopGroup(4);
NioClient nioClient = new NioClient(eventLoopGroup);// 此行发生异常
nioClientMap.put(key, nioClient);

EventLoopGroup 对象回收了,但对象创建的句柄未回收。

 

分析 NioEventLoopGroup 创建流程

    public NioEventLoopGroup(int nThreads, Executor executor) {
        this(nThreads, executor, SelectorProvider.provider());
    }

 

   public static SelectorProvider provider() {
        synchronized (lock) {
            if (provider != null)
                return provider;
            return AccessController.doPrivileged(
                new PrivilegedAction<SelectorProvider>() {
                    public SelectorProvider run() {
                            if (loadProviderFromProperty())
                                return provider;
                            if (loadProviderAsService())
                                return provider;
                            provider = sun.nio.ch.DefaultSelectorProvider.create();
                            return provider;
                        }
                    });
        }
    }

 

public class DefaultSelectorProvider {
    public static SelectorProvider create() {
        String osname = AccessController
            .doPrivileged(new GetPropertyAction("os.name"));
        if (osname.equals("SunOS"))
            return createProvider("sun.nio.ch.DevPollSelectorProvider");
        if (osname.equals("Linux"))
            return createProvider("sun.nio.ch.EPollSelectorProvider");
        return new sun.nio.ch.PollSelectorProvider();
    }
}

 

EPollSelectorImpl(SelectorProvider sp) throws IOException {
	super(sp);
    //创建一个pipe通道,返回fd文件句柄,用于实现超时机制
	long pipeFds = IOUtil.makePipe(false);
    //分别保存输入和输出的句柄
	fd0 = (int) (pipeFds >>> 32);
	fd1 = (int) pipeFds;
    //创建epoll包装类
	pollWrapper = new EPollArrayWrapper();
    //这里是初始化中断,后面用于实现超时机制
	pollWrapper.initInterrupt(fd0, fd1);
	fdToKey = new HashMap<>();
}
文章来自个人专栏
文章 | 订阅
0条评论
0 / 1000
请输入你的评论
0
0