一、编译安装软件
1.1 编译安装libunwind
从github.com/libunwind/libunwind/releases下载最新版本的libunwind源码包
解压到/usr/local/src目录
cd 解压源码目录
./configure
make -j6
make install
1.2 编译安装gperftools
从github.com/gperftools/gperftools/releases下载最新版本的gperftools源码包
解压到/usr/local/src目录
cd 解压源码目录
./autogen.sh
./configure
make -j6
make install
二、使用
2.1 运行一段时间就会正常退出的程序的性能分析
这种情况,我们可以直接在代码中插入性能分析函数。示例代码如下:
#include<gperftools/profiler.h>
#include<stdlib.h>
#include<stdio.h>
void func(void)
{
int i;
for(i=0;i<1024*1024; i++)
{
char *p = (char*)malloc(1024*1024*120);
free(p);
}
}
int main()
{
ProfilerStart("test.prof");
func();
ProfilerStop();
return 0;
}
编译运行,注意编译时需要连接tcmalloc和profiler库。运行后会生成test.prof文件,然后用pprof就可以生成text的分析报告,具体如下:
编译:
gcc -o test test.c -ltcmalloc -lprofiler
运行:
./test //生成prof文件
查看性能报告(text版):
pprof --text test test.prof
[root@localhost gperftools]# pprof --text test test.prof
Using local file test.
Using local file test.prof.
Total: 35 samples
5 14.3% 14.3% 5 14.3% std::_Rb_tree_rebalance_for_erase
3 8.6% 22.9% 18 51.4% tcmalloc::allocate_full_malloc_oom
2 5.7% 28.6% 14 40.0% ::do_malloc_pages
2 5.7% 34.3% 2 5.7% Lock (inline)
2 5.7% 40.0% 2 5.7% _ZNSt8_Rb_treeIN8tcmalloc17SpanPtrWithLengthES1_St9_IdentityIS1_ENS0_15SpanBestFitLessENS0_20STLPageHeapAllocatorIS1_vEEE14_M_upper_boundEPSt13_Rb_tree_nodeIS1_ESA_RKS1_.isra.13
2 5.7% 45.7% 2 5.7% allocate (inline)
2 5.7% 51.4% 10 28.6% tcmalloc::PageHeap::AllocLarge
2 5.7% 57.1% 2 5.7% tcmalloc::PageHeap::CheckAndHandlePreMerge
2 5.7% 62.9% 4 11.4% tcmalloc::PageHeap::MergeIntoFreeList
1 2.9% 65.7% 1 2.9% Unlock (inline)
1 2.9% 68.6% 1 2.9% _M_get_insert_unique_pos (inline)
1 2.9% 71.4% 4 11.4% _M_insert_ (inline)
1 2.9% 74.3% 1 2.9% _init
1 2.9% 77.1% 15 42.9% do_allocate_full (inline)
1 2.9% 80.0% 1 2.9% get (inline)
1 2.9% 82.9% 1 2.9% madvise
1 2.9% 85.7% 1 2.9% malloc_fast_path (inline)
1 2.9% 88.6% 1 2.9% std::_Rb_tree_insert_and_rebalance
1 2.9% 91.4% 9 25.7% tcmalloc::PageHeap::Delete
1 2.9% 94.3% 3 8.6% tcmalloc::PageHeap::ReleaseAtLeastNPages
1 2.9% 97.1% 1 2.9% tcmalloc::PageHeap::SearchFreeAndLargeLists
1 2.9% 100.0% 1 2.9% tcmalloc::Sampler::RecordAllocationSlow
0 0.0% 100.0% 1 2.9% GetDescriptor (inline)
0 0.0% 100.0% 1 2.9% RecordAllocation (inline)
0 0.0% 100.0% 1 2.9% SampleAllocation (inline)
0 0.0% 100.0% 2 5.7% SpinLockHolder (inline)
0 0.0% 100.0% 1 2.9% TCMalloc_SystemRelease
0 0.0% 100.0% 2 5.7% _M_create_node (inline)
0 0.0% 100.0% 5 14.3% _M_erase_aux (inline)
0 0.0% 100.0% 2 5.7% _M_get_node (inline)
0 0.0% 100.0% 5 14.3% _M_insert_unique (inline)
0 0.0% 100.0% 12 34.3% _ZN12_GLOBAL__N_1L13do_free_pagesEPN8tcmalloc4SpanEPv.isra.19
0 0.0% 100.0% 35 100.0% __libc_start_main
0 0.0% 100.0% 35 100.0% _start
0 0.0% 100.0% 1 2.9% do_free (inline)
0 0.0% 100.0% 3 8.6% do_free_pages
0 0.0% 100.0% 1 2.9% do_free_with_callback (inline)
0 0.0% 100.0% 14 40.0% do_malloc (inline)
0 0.0% 100.0% 5 14.3% erase (inline)
0 0.0% 100.0% 1 2.9% free_fast_path (inline)
0 0.0% 100.0% 35 100.0% func
0 0.0% 100.0% 5 14.3% insert (inline)
0 0.0% 100.0% 35 100.0% main
0 0.0% 100.0% 1 2.9% tc_free
0 0.0% 100.0% 1 2.9% tc_malloc
0 0.0% 100.0% 6 17.1% tcmalloc::PageHeap::Carve
0 0.0% 100.0% 1 2.9% tcmalloc::PageHeap::DecommitSpan
0 0.0% 100.0% 3 8.6% tcmalloc::PageHeap::IncrementalScavenge
0 0.0% 100.0% 11 31.4% tcmalloc::PageHeap::New
0 0.0% 100.0% 5 14.3% tcmalloc::PageHeap::PrependToFreeList
0 0.0% 100.0% 2 5.7% tcmalloc::PageHeap::ReleaseSpan
0 0.0% 100.0% 5 14.3% tcmalloc::PageHeap::RemoveFromFreeList
0 0.0% 100.0% 2 5.7% upper_bound (inline)
0 0.0% 100.0% 1 2.9% ~SpinLockHolder (inline)
输出数据解析:
每行包含6列数据,依次为:
1 分析样本数量(不包含其他函数调用)
2 分析样本百分比(不包含其他函数调用)
3 目前为止的分析样本百分比(不包含其他函数调用)
4 分析样本数量(包含其他函数调用)
5 分析样本百分比(包含其他函数调用)
6 函数名
样本数量相当于消耗的CPU时间。
整个函数消耗的CPU时间相当于包括函数内部其他函数调用所消耗。