Profiling

From The DarkMod Wiki
Revision as of 10:09, 3 September 2010 by Tels (talk | contribs) (add link)
Jump to navigationJump to search

Profiling

To find code parts that are too slow, The Dark Mod can be run under a profiler. This article details how to do this under Linux:

Windows

No information on this yet.

Linux

gprof

It is not yet possible to use gprof with TDM, so use oprofile!

gprof is part of the GNU tools just like gcc is. It is an application-level profiler.

To compile the TDM source for profiling, you can use

./linuxBuild BUILD=profile

However, it is currently not possible to run TDM under gprof, as this requires to recompile the entire application with "gcc -pg". As D3 itself is not yet open source, this is not possible.

oprofile

oprofile is a system-wide profiler, that can also profile the Linux kernel. It will by default profile everything that is running, so to avoid noise it is better to close as many applications (chat, Firefox etc) and shut down as many system services (f.i. Mysql, Postgres, Apache) as possible.

Preparing oprofile

First, install the nec. packages:

sudo aptitude install oprofile oprofile-gui

Then, recompile the TDM source with debug to get symbol names:

./linuxBuild BUILD=debug

Note: "debug", not "release"!

There are two ways to use oprofile:

Using tdm_profile.sh

Instead of using tdm_launcher.linux, open a shell, change to the darkmod directory and run:

tdm_profile.sh fm_name

where fm_name is a previously installed FM. It has not nec. to be the current installed FM, but it must be installed already!

After loading a game, running around a bit and exiting TDM again, you can find a report in the file tdm_profile_report.txt inside the darkmod directory.

The disadvantage of this method is that you will also profile the mission load time, and for big missions that will easily swamp any work done during the mission run time unless you spent minutes there.

Sample output:

CPU: AMD64 processors, speed 1800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
2034319  38.8674  doom.x86                 doom.x86                 /home/te/games/doom3/doom.x86
1303104  24.8969  libGLcore.so.185.18.36   doom.x86                 /usr/lib32/libGLcore.so.185.18.36
154842    2.9584  libmozjs.so              firefox                  /usr/lib/xulrunner-1.9.1.9/libmozjs.so
139323    2.6619  libasound_module_rate_speexrate.so doom.x86                 /usr/lib32/alsa-lib/libasound_module_rate_speexrate.so
124850    2.3854  gamex86.so               doom.x86                 CImage::LoadImage(CRenderPipe*)
80195     1.5322  gamex86.so               doom.x86                 iReadLump
66593     1.2723  libxul.so                firefox                  /usr/lib/xulrunner-1.9.1.9/libxul.so
62445     1.1931  gamex86.so               doom.x86                 idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const
56680     1.0829  zero                     doom.x86                 /dev/zero
43315     0.8276  libc-2.10.1.so           doom.x86                 memcpy
41921     0.8009  libGLcore.so.185.18.36   kwin                     /usr/lib/libGLcore.so.185.18.36
38338     0.7325  firefox                  firefox                  /usr/lib/firefox-3.5.9/firefox
35118     0.6710  [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000) kwin [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000)
33020     0.6309  libc-2.10.1.so           doom.x86                 memset
30151     0.5761  gamex86.so               doom.x86                 idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*)
27437     0.5242  no-vmlinux               no-vmlinux               /no-vmlinux
26936     0.5146  libasound.so.2.0.0       doom.x86                 /usr/lib32/libasound.so.2.0.0
23572     0.4504  libQtCore.so.4.5.2       kwin                     /usr/lib/libQtCore.so.4.5.2
23031     0.4400  Xorg                     Xorg                     /usr/bin/Xorg
21347     0.4079  libQtGui.so.4.5.2        kdeinit4                 /usr/lib/libQtGui.so.4.5.2
20301     0.3879  oprofiled                oprofiled                /usr/bin/oprofiled
19825     0.3788  gamex86.so               doom.x86                 idInterpreter::Execute()
19785     0.3780  libGL.so.185.18.36       doom.x86                 /usr/lib32/libGL.so.185.18.36
19583     0.3742  gamex86.so               doom.x86                 idGameLocal::ProcessStimResponse(unsigned long)
18508     0.3536  nvidia_drv.so            Xorg                     /usr/lib/xorg/modules/drivers/nvidia_drv.so
13580     0.2595  gamex86.so               doom.x86                 idGameLocal::RunFrame(usercmd_t const*)

Using opcontrol manually

The opcontrol utility can be used to toggle the profiling on and off during runtime. Just start it, start TDM normally and after loading a mission, toggle back to a text-mode console with CTRL+ALT+F2, use:

opcontrol --init
opcontrol --start
opcontrol --reset

to start profiling, switch back to the desktop (CTRL+ALT+F7), then wait a bit, then turn profiling off again.

opcontrol --dump
opcontrol --stop
opreport --symbols --demangle=smart >tdm_profile_report.txt

Sample output:

After ack:

samples  %        image name   app name   symbol name

67453     2.4265  gamex86.so   doom.x86   CImage::LoadImage(CRenderPipe*)
42819     1.5403  gamex86.so   doom.x86   iReadLump
31089     1.1184  gamex86.so   doom.x86   idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const
16156     0.5812  gamex86.so   doom.x86   idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*)
10535     0.3790  gamex86.so   doom.x86   idInterpreter::Execute()
10438     0.3755  gamex86.so   doom.x86   idGameLocal::ProcessStimResponse(unsigned long)
7241      0.2605  gamex86.so   doom.x86   idMD5Anim::GetInterpolatedFrame(frameBlend_t&, idJointQuat*, int const*, int) const
7134      0.2566  gamex86.so   doom.x86   idGameLocal::RunFrame(usercmd_t const*)
6680      0.2403  gamex86.so   doom.x86   idMoveable::Think()
5641      0.2029  gamex86.so   doom.x86   idClipModel::Link_r(clipSector_s*)
5133      0.1846  gamex86.so   doom.x86   idAnimator::CreateFrame(int, bool)
4836      0.1740  gamex86.so   doom.x86   darkModLAS::updateLASState()
4725      0.1700  gamex86.so   doom.x86   idQuat::Slerp(idQuat const&, idQuat const&, float)
4719      0.1698  gamex86.so   doom.x86   idEntity::RunPhysics()
4501      0.1619  gamex86.so   doom.x86   __i686.get_pc_thunk.bx
4365      0.1570  gamex86.so   doom.x86   idSIMD_Generic::TransformJoints(idJointMat*, int const*, int, int)
4265      0.1534  gamex86.so   doom.x86   idEntity::Present()
3844      0.1383  gamex86.so   doom.x86   idStr::Icmp(char const*, char const*)
3755      0.1351  gamex86.so   doom.x86   idAF::ChangePose(idEntity*, int)
3663      0.1318  gamex86.so   doom.x86   idAnim::Name() const
3540      0.1273  gamex86.so   doom.x86   idQuat::ToMat3() const
2953      0.1062  gamex86.so   doom.x86   idInterpreter::CallEvent(function_t const*, int)
2755      0.0991  gamex86.so   doom.x86   idEntity::Think()
2562      0.0922  gamex86.so   doom.x86   idPhysics_RigidBody::SaveState()
2440      0.0878  gamex86.so   doom.x86   idDict::FindKey(char const*) const
2429      0.0874  gamex86.so   doom.x86   idIK_Walk::Evaluate()
2302      0.0828  gamex86.so   doom.x86   idAnimator::SetJointAxis(jointHandle_t, jointModTransform_t, idMat3 const&)

Tips

Use ack (or grep) on the tdm_profile_report.txt to filter out anything not mentioning gamex86.so, this reduces the clutter:

ack gamex86.so tdm_profile_report.txt

See also