Profiling

From The DarkMod Wiki
Revision as of 14:17, 10 March 2012 by Tels (talk | contribs) (update)
Jump to navigationJump to search

Profiling

To find code parts that are too slow, The Dark Mod can be run under a profiler. This article details how to do this under Linux:

Windows

No information on this yet.

Linux

gprof

gprof is part of the GNU tools just like gcc is, and is an application-level profiler.

To compile the TDM source for profiling, you can use

./linuxBuild BUILD=profile

Then run:

gprof thedarkmod.x86

Template:Note: Further information is missing here.

oprofile

oprofile is a system-wide profiler, that can also profile the Linux kernel. It will by default profile everything that is running, so to avoid noise it is better to close as many applications (chat, Firefox etc) and shut down as many system services (f.i. Mysql, Postgres, Apache) as possible.

Preparing oprofile
  • First, install the nec. packages:
sudo aptitude install oprofile oprofile-gui

The next step is not nec., so you could f.i. also profile a release build, however, the output will be less informative without symbols.

  • Then, recompile the TDM source with debug to get symbol names.
./linuxBuild BUILD=debug

Note: "debug", not "release" nor "profile"!

There are two ways to use oprofile:

Using tdm_profile.sh

Instead of using tdm_launcher.linux, open a shell, change to the darkmod directory and run:

sudo ./tdm_profile.sh fm_name

where fm_name is a previously installed FM, e.g. the directory it resides in, f.i:

sudo ./tdm_profile.sh builders_influence

It has not nec. to be the current installed FM, but it must have been installed already!

After loading a game, running around a bit and exiting TDM again, you can find a report in the file tdm_profile_report.txt inside the darkmod directory.

The disadvantage of this method is that you will also profile the mission load time, and for big missions that will easily swamp any work done during the mission run time unless you spent minutes there.

Sample output:

CPU: AMD64 processors, speed 1800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
2034319  38.8674  doom.x86                 doom.x86                 /home/te/games/doom3/doom.x86
1303104  24.8969  libGLcore.so.185.18.36   doom.x86                 /usr/lib32/libGLcore.so.185.18.36
154842    2.9584  libmozjs.so              firefox                  /usr/lib/xulrunner-1.9.1.9/libmozjs.so
139323    2.6619  libasound_module_rate_speexrate.so doom.x86                 /usr/lib32/alsa-lib/libasound_module_rate_speexrate.so
124850    2.3854  gamex86.so               doom.x86                 CImage::LoadImage(CRenderPipe*)
80195     1.5322  gamex86.so               doom.x86                 iReadLump
66593     1.2723  libxul.so                firefox                  /usr/lib/xulrunner-1.9.1.9/libxul.so
62445     1.1931  gamex86.so               doom.x86                 idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const
56680     1.0829  zero                     doom.x86                 /dev/zero
43315     0.8276  libc-2.10.1.so           doom.x86                 memcpy
41921     0.8009  libGLcore.so.185.18.36   kwin                     /usr/lib/libGLcore.so.185.18.36
38338     0.7325  firefox                  firefox                  /usr/lib/firefox-3.5.9/firefox
35118     0.6710  [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000) kwin [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000)
33020     0.6309  libc-2.10.1.so           doom.x86                 memset
30151     0.5761  gamex86.so               doom.x86                 idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*)
27437     0.5242  no-vmlinux               no-vmlinux               /no-vmlinux
26936     0.5146  libasound.so.2.0.0       doom.x86                 /usr/lib32/libasound.so.2.0.0
23572     0.4504  libQtCore.so.4.5.2       kwin                     /usr/lib/libQtCore.so.4.5.2
23031     0.4400  Xorg                     Xorg                     /usr/bin/Xorg
21347     0.4079  libQtGui.so.4.5.2        kdeinit4                 /usr/lib/libQtGui.so.4.5.2
20301     0.3879  oprofiled                oprofiled                /usr/bin/oprofiled
19825     0.3788  gamex86.so               doom.x86                 idInterpreter::Execute()
19785     0.3780  libGL.so.185.18.36       doom.x86                 /usr/lib32/libGL.so.185.18.36
19583     0.3742  gamex86.so               doom.x86                 idGameLocal::ProcessStimResponse(unsigned long)
18508     0.3536  nvidia_drv.so            Xorg                     /usr/lib/xorg/modules/drivers/nvidia_drv.so
13580     0.2595  gamex86.so               doom.x86                 idGameLocal::RunFrame(usercmd_t const*)
Using opcontrol manually

The opcontrol utility can be used to toggle the profiling on and off during runtime. Just start it, start TDM normally and after loading a mission, toggle back to a text-mode console with CTRL+ALT+F2, use:

opcontrol --init
opcontrol --start
opcontrol --reset

to start profiling, switch back to the desktop (CTRL+ALT+F7), then wait a bit, then turn profiling off again.

opcontrol --dump
opcontrol --stop
opreport --symbols --demangle=smart >tdm_profile_report.txt

Sample output:

After ack:

samples  %        image name   app name   symbol name

67453     2.4265  gamex86.so   doom.x86   CImage::LoadImage(CRenderPipe*)
42819     1.5403  gamex86.so   doom.x86   iReadLump
31089     1.1184  gamex86.so   doom.x86   idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const
16156     0.5812  gamex86.so   doom.x86   idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*)
10535     0.3790  gamex86.so   doom.x86   idInterpreter::Execute()
10438     0.3755  gamex86.so   doom.x86   idGameLocal::ProcessStimResponse(unsigned long)
7241      0.2605  gamex86.so   doom.x86   idMD5Anim::GetInterpolatedFrame(frameBlend_t&, idJointQuat*, int const*, int) const
7134      0.2566  gamex86.so   doom.x86   idGameLocal::RunFrame(usercmd_t const*)
6680      0.2403  gamex86.so   doom.x86   idMoveable::Think()
5641      0.2029  gamex86.so   doom.x86   idClipModel::Link_r(clipSector_s*)
5133      0.1846  gamex86.so   doom.x86   idAnimator::CreateFrame(int, bool)
4836      0.1740  gamex86.so   doom.x86   darkModLAS::updateLASState()
4725      0.1700  gamex86.so   doom.x86   idQuat::Slerp(idQuat const&, idQuat const&, float)
4719      0.1698  gamex86.so   doom.x86   idEntity::RunPhysics()
4501      0.1619  gamex86.so   doom.x86   __i686.get_pc_thunk.bx
4365      0.1570  gamex86.so   doom.x86   idSIMD_Generic::TransformJoints(idJointMat*, int const*, int, int)
4265      0.1534  gamex86.so   doom.x86   idEntity::Present()
3844      0.1383  gamex86.so   doom.x86   idStr::Icmp(char const*, char const*)
3755      0.1351  gamex86.so   doom.x86   idAF::ChangePose(idEntity*, int)
3663      0.1318  gamex86.so   doom.x86   idAnim::Name() const
3540      0.1273  gamex86.so   doom.x86   idQuat::ToMat3() const
2953      0.1062  gamex86.so   doom.x86   idInterpreter::CallEvent(function_t const*, int)
2755      0.0991  gamex86.so   doom.x86   idEntity::Think()
2562      0.0922  gamex86.so   doom.x86   idPhysics_RigidBody::SaveState()
2440      0.0878  gamex86.so   doom.x86   idDict::FindKey(char const*) const
2429      0.0874  gamex86.so   doom.x86   idIK_Walk::Evaluate()
2302      0.0828  gamex86.so   doom.x86   idAnimator::SetJointAxis(jointHandle_t, jointModTransform_t, idMat3 const&)

Tips

Use ack (or grep) on the tdm_profile_report.txt to filter out anything not mentioning gamex86.so, this reduces the clutter:

ack gamex86.so tdm_profile_report.txt

idTimer

Use #ifdef to include the following sections:

// uncomment to show timings
#define M_TIMINGS

// global
static idTimer timer_totals;

// somewhere in your code
#ifdef M_TIMINGS
    timer_total.Start();
#endif
... code to measure
#ifdef M_TIMINGS
    timer_total.Stop();
#endif

// for instance from time to time in Think():
#ifdef M_TIMINGS
        gameLocal.Printf( "%s: total %0.4f ms\n", GetName(), timer_total.Milliseconds() );
#endif

Timing Builds

There is also a class for managing timers, that is only available if you create a timing build. All code that uses this class should also wrapped in #ifdefs that are only active under a timing build. Example from actor.h:

#ifdef TIMING_BUILD
public:
        int actorGetObstaclesTimer;
        int actorGetPointOutsideObstaclesTimer;
        int actorGetWallEdgesTimer;
        int actorSortWallEdgesTimer;
        int actorBuildPathTreeTimer;
        int actorPrunePathTreeTimer;
        int actorFindOptimalPathTimer;
        int actorRouteToGoalTimer;
        int actorSubSampleWalkPathTimer;
        int actorWalkPathValidTimer;
#endif

And from from actor.cpp:

INIT_TIMER_HANDLE(actorGetObstaclesTimer);

and then in the function to be profiled:

START_SCOPED_TIMING(actor->actorRouteToGoalTimer, scopedRouteToGoalTimer);

The timer will end at the scope automatically.

After you recompiled the source for at timing build, f.i. with:

 ./linuxBuild.sh BASEFLAGS="-DTIMING_BUILD"

You can use the command listTimers on the console to get the timing information. More commands: writeTimerCSV, resetTimers

g_frametime

Set this CVAR to 1, and the code will emit how many milliseconds it took to do all the calculations for one frame, minus the render time. The output on the console will be something like this:

game 28080: all:0.2 th:0.2 ev:0.0 28 ents

The numbers are:

  • game: The current frame time (always increasing from map start)
  • all: The complete time to think + event handling
  • th: The thinking time (without event handling)
  • ev: The event handling time
  • ents: The number of entities that did thinking

g_timeentities

When set to > 0, shows entities whose Think() functions exceeded the number of milliseconds specified. Set it for instance to 2, and all entities that think longer than 2ms will be shown on the console.

See also