searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

gperftools中的cpu-profiler使用说明

2023-10-07 07:55:25
36
0

一、编译安装软件

1.1 编译安装libunwind

从github.com/libunwind/libunwind/releases下载最新版本的libunwind源码包

解压到/usr/local/src目录

cd 解压源码目录

./configure

make -j6

make install

 

1.2 编译安装gperftools

从github.com/gperftools/gperftools/releases下载最新版本的gperftools源码包

解压到/usr/local/src目录

cd 解压源码目录

./autogen.sh

./configure

make -j6

make install

 

二、使用

2.1 运行一段时间就会正常退出的程序的性能分析

这种情况,我们可以直接在代码中插入性能分析函数。示例代码如下:

#include<gperftools/profiler.h>

#include<stdlib.h>

#include<stdio.h>

void func(void)

{

    int i;

    for(i=0;i<1024*1024; i++)

    {

        char *p = (char*)malloc(1024*1024*120);

        free(p);

    }

}

int main()

{

    ProfilerStart("test.prof");

    func();

    ProfilerStop();

    return 0;

}

编译运行,注意编译时需要连接tcmalloc和profiler库。运行后会生成test.prof文件,然后用pprof就可以生成text的分析报告,具体如下:

编译:

gcc -o test test.c -ltcmalloc -lprofiler

运行:

./test //生成prof文件

查看性能报告(text版):

pprof --text test test.prof

[root@localhost gperftools]# pprof --text test test.prof         

Using local file test.

Using local file test.prof.

Total: 35 samples

       5  14.3%  14.3%        5  14.3% std::_Rb_tree_rebalance_for_erase

       3   8.6%  22.9%       18  51.4% tcmalloc::allocate_full_malloc_oom

       2   5.7%  28.6%       14  40.0% ::do_malloc_pages

       2   5.7%  34.3%        2   5.7% Lock (inline)

       2   5.7%  40.0%        2   5.7% _ZNSt8_Rb_treeIN8tcmalloc17SpanPtrWithLengthES1_St9_IdentityIS1_ENS0_15SpanBestFitLessENS0_20STLPageHeapAllocatorIS1_vEEE14_M_upper_boundEPSt13_Rb_tree_nodeIS1_ESA_RKS1_.isra.13

       2   5.7%  45.7%        2   5.7% allocate (inline)

       2   5.7%  51.4%       10  28.6% tcmalloc::PageHeap::AllocLarge

       2   5.7%  57.1%        2   5.7% tcmalloc::PageHeap::CheckAndHandlePreMerge

       2   5.7%  62.9%        4  11.4% tcmalloc::PageHeap::MergeIntoFreeList

       1   2.9%  65.7%        1   2.9% Unlock (inline)

       1   2.9%  68.6%        1   2.9% _M_get_insert_unique_pos (inline)

       1   2.9%  71.4%        4  11.4% _M_insert_ (inline)

       1   2.9%  74.3%        1   2.9% _init

       1   2.9%  77.1%       15  42.9% do_allocate_full (inline)

       1   2.9%  80.0%        1   2.9% get (inline)

       1   2.9%  82.9%        1   2.9% madvise

       1   2.9%  85.7%        1   2.9% malloc_fast_path (inline)

       1   2.9%  88.6%        1   2.9% std::_Rb_tree_insert_and_rebalance

       1   2.9%  91.4%        9  25.7% tcmalloc::PageHeap::Delete

       1   2.9%  94.3%        3   8.6% tcmalloc::PageHeap::ReleaseAtLeastNPages

       1   2.9%  97.1%        1   2.9% tcmalloc::PageHeap::SearchFreeAndLargeLists

       1   2.9% 100.0%        1   2.9% tcmalloc::Sampler::RecordAllocationSlow

       0   0.0% 100.0%        1   2.9% GetDescriptor (inline)

       0   0.0% 100.0%        1   2.9% RecordAllocation (inline)

       0   0.0% 100.0%        1   2.9% SampleAllocation (inline)

       0   0.0% 100.0%        2   5.7% SpinLockHolder (inline)

       0   0.0% 100.0%        1   2.9% TCMalloc_SystemRelease

 

       0   0.0% 100.0%        2   5.7% _M_create_node (inline)

       0   0.0% 100.0%        5  14.3% _M_erase_aux (inline)

       0   0.0% 100.0%        2   5.7% _M_get_node (inline)

       0   0.0% 100.0%        5  14.3% _M_insert_unique (inline)

       0   0.0% 100.0%       12  34.3% _ZN12_GLOBAL__N_1L13do_free_pagesEPN8tcmalloc4SpanEPv.isra.19

       0   0.0% 100.0%       35 100.0% __libc_start_main

       0   0.0% 100.0%       35 100.0% _start

       0   0.0% 100.0%        1   2.9% do_free (inline)

       0   0.0% 100.0%        3   8.6% do_free_pages

       0   0.0% 100.0%        1   2.9% do_free_with_callback (inline)

       0   0.0% 100.0%       14  40.0% do_malloc (inline)

       0   0.0% 100.0%        5  14.3% erase (inline)

       0   0.0% 100.0%        1   2.9% free_fast_path (inline)

       0   0.0% 100.0%       35 100.0% func

       0   0.0% 100.0%        5  14.3% insert (inline)

       0   0.0% 100.0%       35 100.0% main

       0   0.0% 100.0%        1   2.9% tc_free

       0   0.0% 100.0%        1   2.9% tc_malloc

       0   0.0% 100.0%        6  17.1% tcmalloc::PageHeap::Carve

       0   0.0% 100.0%        1   2.9% tcmalloc::PageHeap::DecommitSpan

       0   0.0% 100.0%        3   8.6% tcmalloc::PageHeap::IncrementalScavenge

       0   0.0% 100.0%       11  31.4% tcmalloc::PageHeap::New

       0   0.0% 100.0%        5  14.3% tcmalloc::PageHeap::PrependToFreeList

       0   0.0% 100.0%        2   5.7% tcmalloc::PageHeap::ReleaseSpan

       0   0.0% 100.0%        5  14.3% tcmalloc::PageHeap::RemoveFromFreeList

       0   0.0% 100.0%        2   5.7% upper_bound (inline)

       0   0.0% 100.0%        1   2.9% ~SpinLockHolder (inline)

输出数据解析:

每行包含6列数据,依次为:

1 分析样本数量(不包含其他函数调用)

2 分析样本百分比(不包含其他函数调用)

3 目前为止的分析样本百分比(不包含其他函数调用)

4 分析样本数量(包含其他函数调用)

5 分析样本百分比(包含其他函数调用)

6 函数名

样本数量相当于消耗的CPU时间。

整个函数消耗的CPU时间相当于包括函数内部其他函数调用所消耗。

0条评论
0 / 1000
d****m
2文章数
0粉丝数
d****m
2 文章 | 0 粉丝
d****m
2文章数
0粉丝数
d****m
2 文章 | 0 粉丝
原创

gperftools中的cpu-profiler使用说明

2023-10-07 07:55:25
36
0

一、编译安装软件

1.1 编译安装libunwind

从github.com/libunwind/libunwind/releases下载最新版本的libunwind源码包

解压到/usr/local/src目录

cd 解压源码目录

./configure

make -j6

make install

 

1.2 编译安装gperftools

从github.com/gperftools/gperftools/releases下载最新版本的gperftools源码包

解压到/usr/local/src目录

cd 解压源码目录

./autogen.sh

./configure

make -j6

make install

 

二、使用

2.1 运行一段时间就会正常退出的程序的性能分析

这种情况,我们可以直接在代码中插入性能分析函数。示例代码如下:

#include<gperftools/profiler.h>

#include<stdlib.h>

#include<stdio.h>

void func(void)

{

    int i;

    for(i=0;i<1024*1024; i++)

    {

        char *p = (char*)malloc(1024*1024*120);

        free(p);

    }

}

int main()

{

    ProfilerStart("test.prof");

    func();

    ProfilerStop();

    return 0;

}

编译运行,注意编译时需要连接tcmalloc和profiler库。运行后会生成test.prof文件,然后用pprof就可以生成text的分析报告,具体如下:

编译:

gcc -o test test.c -ltcmalloc -lprofiler

运行:

./test //生成prof文件

查看性能报告(text版):

pprof --text test test.prof

[root@localhost gperftools]# pprof --text test test.prof         

Using local file test.

Using local file test.prof.

Total: 35 samples

       5  14.3%  14.3%        5  14.3% std::_Rb_tree_rebalance_for_erase

       3   8.6%  22.9%       18  51.4% tcmalloc::allocate_full_malloc_oom

       2   5.7%  28.6%       14  40.0% ::do_malloc_pages

       2   5.7%  34.3%        2   5.7% Lock (inline)

       2   5.7%  40.0%        2   5.7% _ZNSt8_Rb_treeIN8tcmalloc17SpanPtrWithLengthES1_St9_IdentityIS1_ENS0_15SpanBestFitLessENS0_20STLPageHeapAllocatorIS1_vEEE14_M_upper_boundEPSt13_Rb_tree_nodeIS1_ESA_RKS1_.isra.13

       2   5.7%  45.7%        2   5.7% allocate (inline)

       2   5.7%  51.4%       10  28.6% tcmalloc::PageHeap::AllocLarge

       2   5.7%  57.1%        2   5.7% tcmalloc::PageHeap::CheckAndHandlePreMerge

       2   5.7%  62.9%        4  11.4% tcmalloc::PageHeap::MergeIntoFreeList

       1   2.9%  65.7%        1   2.9% Unlock (inline)

       1   2.9%  68.6%        1   2.9% _M_get_insert_unique_pos (inline)

       1   2.9%  71.4%        4  11.4% _M_insert_ (inline)

       1   2.9%  74.3%        1   2.9% _init

       1   2.9%  77.1%       15  42.9% do_allocate_full (inline)

       1   2.9%  80.0%        1   2.9% get (inline)

       1   2.9%  82.9%        1   2.9% madvise

       1   2.9%  85.7%        1   2.9% malloc_fast_path (inline)

       1   2.9%  88.6%        1   2.9% std::_Rb_tree_insert_and_rebalance

       1   2.9%  91.4%        9  25.7% tcmalloc::PageHeap::Delete

       1   2.9%  94.3%        3   8.6% tcmalloc::PageHeap::ReleaseAtLeastNPages

       1   2.9%  97.1%        1   2.9% tcmalloc::PageHeap::SearchFreeAndLargeLists

       1   2.9% 100.0%        1   2.9% tcmalloc::Sampler::RecordAllocationSlow

       0   0.0% 100.0%        1   2.9% GetDescriptor (inline)

       0   0.0% 100.0%        1   2.9% RecordAllocation (inline)

       0   0.0% 100.0%        1   2.9% SampleAllocation (inline)

       0   0.0% 100.0%        2   5.7% SpinLockHolder (inline)

       0   0.0% 100.0%        1   2.9% TCMalloc_SystemRelease

 

       0   0.0% 100.0%        2   5.7% _M_create_node (inline)

       0   0.0% 100.0%        5  14.3% _M_erase_aux (inline)

       0   0.0% 100.0%        2   5.7% _M_get_node (inline)

       0   0.0% 100.0%        5  14.3% _M_insert_unique (inline)

       0   0.0% 100.0%       12  34.3% _ZN12_GLOBAL__N_1L13do_free_pagesEPN8tcmalloc4SpanEPv.isra.19

       0   0.0% 100.0%       35 100.0% __libc_start_main

       0   0.0% 100.0%       35 100.0% _start

       0   0.0% 100.0%        1   2.9% do_free (inline)

       0   0.0% 100.0%        3   8.6% do_free_pages

       0   0.0% 100.0%        1   2.9% do_free_with_callback (inline)

       0   0.0% 100.0%       14  40.0% do_malloc (inline)

       0   0.0% 100.0%        5  14.3% erase (inline)

       0   0.0% 100.0%        1   2.9% free_fast_path (inline)

       0   0.0% 100.0%       35 100.0% func

       0   0.0% 100.0%        5  14.3% insert (inline)

       0   0.0% 100.0%       35 100.0% main

       0   0.0% 100.0%        1   2.9% tc_free

       0   0.0% 100.0%        1   2.9% tc_malloc

       0   0.0% 100.0%        6  17.1% tcmalloc::PageHeap::Carve

       0   0.0% 100.0%        1   2.9% tcmalloc::PageHeap::DecommitSpan

       0   0.0% 100.0%        3   8.6% tcmalloc::PageHeap::IncrementalScavenge

       0   0.0% 100.0%       11  31.4% tcmalloc::PageHeap::New

       0   0.0% 100.0%        5  14.3% tcmalloc::PageHeap::PrependToFreeList

       0   0.0% 100.0%        2   5.7% tcmalloc::PageHeap::ReleaseSpan

       0   0.0% 100.0%        5  14.3% tcmalloc::PageHeap::RemoveFromFreeList

       0   0.0% 100.0%        2   5.7% upper_bound (inline)

       0   0.0% 100.0%        1   2.9% ~SpinLockHolder (inline)

输出数据解析:

每行包含6列数据,依次为:

1 分析样本数量(不包含其他函数调用)

2 分析样本百分比(不包含其他函数调用)

3 目前为止的分析样本百分比(不包含其他函数调用)

4 分析样本数量(包含其他函数调用)

5 分析样本百分比(包含其他函数调用)

6 函数名

样本数量相当于消耗的CPU时间。

整个函数消耗的CPU时间相当于包括函数内部其他函数调用所消耗。

文章来自个人专栏
文章 | 订阅
0条评论
0 / 1000
请输入你的评论
1
0