Profiling: Difference between revisions
m add timing build "how to compile" |
Added reference to timeline profiler, added Windows sampling profiler instructions, deleted non-universal TIMING_BUILD and M_TIMINGS (use Tracy instead). |
||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
== Profiling == | == Profiling == | ||
To find code parts that are too slow, The Dark Mod can be run under a profiler. | To find code parts that are too slow, The Dark Mod can be run under a profiler. | ||
The two main types of profilers are: | |||
* Sampling profiler: use platform tools | |||
* Timeline profiler: see [[Tracy: timeline profiler]] | |||
== Sampling == | |||
=== Windows === | === Windows === | ||
Sampling profiler is integrated into Visual C++ today: | |||
* Run Release build of the game under debugger. | |||
* Go to menu <tt>Debug -> Windows -> Diagnostic Tools</tt>. | |||
* Switch to <tt>CPU Usage</tt> tab. | |||
* Enable <tt>Record CPU Profile</tt> to start profiling. | |||
* Break/pause execution to see profiling results since start. Preferably click on <tt>Open Details</tt>. | |||
=== Linux === | === Linux === | ||
'''Note: This section is outdated!''' | |||
==== gprof ==== | ==== gprof ==== | ||
gprof is part of the GNU tools just like gcc is, and is an application-level profiler. | |||
gprof is part of the GNU tools just like gcc is | |||
To compile the TDM source for profiling, you can use | To compile the TDM source for profiling, you can use | ||
Line 19: | Line 33: | ||
./linuxBuild BUILD=profile | ./linuxBuild BUILD=profile | ||
Then run TDM normally: | |||
cd .doom3/ | |||
./thedarkmod.x86 | |||
This will produce a file named "gmon.out", which can then after TDM exits, be analyzed with gprof: | |||
gprof | |||
For further information see the man page of <tt>gprof</tt>. | |||
==== oprofile ==== | ==== oprofile ==== | ||
oprofile is a system-wide profiler, that can also profile the Linux kernel. It will by default profile everything that is running, so to avoid noise it is better to close as many applications (chat, Firefox etc) and shut down as many system services (f.i. Mysql, Postgres, Apache) as possible. | Another method of profiling is by using '''oprofile'''. '''oprofile''' is a system-wide profiler, that can also profile the Linux kernel. It will by default profile everything that is running, so to avoid noise it is better to close as many applications (chat, Firefox etc) and shut down as many system services (f.i. Mysql, Postgres, Apache) as possible. | ||
===== Preparing oprofile ===== | ===== Preparing oprofile ===== | ||
Line 147: | Line 170: | ||
ack gamex86.so tdm_profile_report.txt | ack gamex86.so tdm_profile_report.txt | ||
== g_frametime == | == g_frametime == |
Latest revision as of 19:47, 29 January 2024
Profiling
To find code parts that are too slow, The Dark Mod can be run under a profiler.
The two main types of profilers are:
- Sampling profiler: use platform tools
- Timeline profiler: see Tracy: timeline profiler
Sampling
Windows
Sampling profiler is integrated into Visual C++ today:
- Run Release build of the game under debugger.
- Go to menu Debug -> Windows -> Diagnostic Tools.
- Switch to CPU Usage tab.
- Enable Record CPU Profile to start profiling.
- Break/pause execution to see profiling results since start. Preferably click on Open Details.
Linux
Note: This section is outdated!
gprof
gprof is part of the GNU tools just like gcc is, and is an application-level profiler.
To compile the TDM source for profiling, you can use
./linuxBuild BUILD=profile
Then run TDM normally:
cd .doom3/ ./thedarkmod.x86
This will produce a file named "gmon.out", which can then after TDM exits, be analyzed with gprof:
gprof
For further information see the man page of gprof.
oprofile
Another method of profiling is by using oprofile. oprofile is a system-wide profiler, that can also profile the Linux kernel. It will by default profile everything that is running, so to avoid noise it is better to close as many applications (chat, Firefox etc) and shut down as many system services (f.i. Mysql, Postgres, Apache) as possible.
Preparing oprofile
- First, install the nec. packages:
sudo aptitude install oprofile oprofile-gui
The next step is not nec., so you could f.i. also profile a release build, however, the output will be less informative without symbols.
- Then, recompile the TDM source with debug to get symbol names.
./linuxBuild BUILD=debug
Note: "debug", not "release" nor "profile"!
There are two ways to use oprofile:
Using tdm_profile.sh
Instead of using tdm_launcher.linux, open a shell, change to the darkmod directory and run:
sudo ./tdm_profile.sh fm_name
where fm_name is a previously installed FM, e.g. the directory it resides in, f.i:
sudo ./tdm_profile.sh builders_influence
It has not nec. to be the current installed FM, but it must have been installed already!
After loading a game, running around a bit and exiting TDM again, you can find a report in the file tdm_profile_report.txt inside the darkmod directory.
The disadvantage of this method is that you will also profile the mission load time, and for big missions that will easily swamp any work done during the mission run time unless you spent minutes there.
Sample output:
CPU: AMD64 processors, speed 1800 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name app name symbol name 2034319 38.8674 doom.x86 doom.x86 /home/te/games/doom3/doom.x86 1303104 24.8969 libGLcore.so.185.18.36 doom.x86 /usr/lib32/libGLcore.so.185.18.36 154842 2.9584 libmozjs.so firefox /usr/lib/xulrunner-1.9.1.9/libmozjs.so 139323 2.6619 libasound_module_rate_speexrate.so doom.x86 /usr/lib32/alsa-lib/libasound_module_rate_speexrate.so 124850 2.3854 gamex86.so doom.x86 CImage::LoadImage(CRenderPipe*) 80195 1.5322 gamex86.so doom.x86 iReadLump 66593 1.2723 libxul.so firefox /usr/lib/xulrunner-1.9.1.9/libxul.so 62445 1.1931 gamex86.so doom.x86 idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const 56680 1.0829 zero doom.x86 /dev/zero 43315 0.8276 libc-2.10.1.so doom.x86 memcpy 41921 0.8009 libGLcore.so.185.18.36 kwin /usr/lib/libGLcore.so.185.18.36 38338 0.7325 firefox firefox /usr/lib/firefox-3.5.9/firefox 35118 0.6710 [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000) kwin [vdso] (tgid:2583 range:0x7fffe1fff000-0x7fffe2000000) 33020 0.6309 libc-2.10.1.so doom.x86 memset 30151 0.5761 gamex86.so doom.x86 idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*) 27437 0.5242 no-vmlinux no-vmlinux /no-vmlinux 26936 0.5146 libasound.so.2.0.0 doom.x86 /usr/lib32/libasound.so.2.0.0 23572 0.4504 libQtCore.so.4.5.2 kwin /usr/lib/libQtCore.so.4.5.2 23031 0.4400 Xorg Xorg /usr/bin/Xorg 21347 0.4079 libQtGui.so.4.5.2 kdeinit4 /usr/lib/libQtGui.so.4.5.2 20301 0.3879 oprofiled oprofiled /usr/bin/oprofiled 19825 0.3788 gamex86.so doom.x86 idInterpreter::Execute() 19785 0.3780 libGL.so.185.18.36 doom.x86 /usr/lib32/libGL.so.185.18.36 19583 0.3742 gamex86.so doom.x86 idGameLocal::ProcessStimResponse(unsigned long) 18508 0.3536 nvidia_drv.so Xorg /usr/lib/xorg/modules/drivers/nvidia_drv.so 13580 0.2595 gamex86.so doom.x86 idGameLocal::RunFrame(usercmd_t const*)
Using opcontrol manually
The opcontrol utility can be used to toggle the profiling on and off during runtime. Just start it, start TDM normally and after loading a mission, toggle back to a text-mode console with CTRL+ALT+F2, use:
opcontrol --init opcontrol --start opcontrol --reset
to start profiling, switch back to the desktop (CTRL+ALT+F7), then wait a bit, then turn profiling off again.
opcontrol --dump opcontrol --stop opreport --symbols --demangle=smart >tdm_profile_report.txt
Sample output:
After ack:
samples % image name app name symbol name 67453 2.4265 gamex86.so doom.x86 CImage::LoadImage(CRenderPipe*) 42819 1.5403 gamex86.so doom.x86 iReadLump 31089 1.1184 gamex86.so doom.x86 idClip::ClipModelsTouchingBounds_r(clipSector_s const*, listParms_s&) const 16156 0.5812 gamex86.so doom.x86 idGameLocal::AnalyzeRenderImage(CRenderPipe*, float*) 10535 0.3790 gamex86.so doom.x86 idInterpreter::Execute() 10438 0.3755 gamex86.so doom.x86 idGameLocal::ProcessStimResponse(unsigned long) 7241 0.2605 gamex86.so doom.x86 idMD5Anim::GetInterpolatedFrame(frameBlend_t&, idJointQuat*, int const*, int) const 7134 0.2566 gamex86.so doom.x86 idGameLocal::RunFrame(usercmd_t const*) 6680 0.2403 gamex86.so doom.x86 idMoveable::Think() 5641 0.2029 gamex86.so doom.x86 idClipModel::Link_r(clipSector_s*) 5133 0.1846 gamex86.so doom.x86 idAnimator::CreateFrame(int, bool) 4836 0.1740 gamex86.so doom.x86 darkModLAS::updateLASState() 4725 0.1700 gamex86.so doom.x86 idQuat::Slerp(idQuat const&, idQuat const&, float) 4719 0.1698 gamex86.so doom.x86 idEntity::RunPhysics() 4501 0.1619 gamex86.so doom.x86 __i686.get_pc_thunk.bx 4365 0.1570 gamex86.so doom.x86 idSIMD_Generic::TransformJoints(idJointMat*, int const*, int, int) 4265 0.1534 gamex86.so doom.x86 idEntity::Present() 3844 0.1383 gamex86.so doom.x86 idStr::Icmp(char const*, char const*) 3755 0.1351 gamex86.so doom.x86 idAF::ChangePose(idEntity*, int) 3663 0.1318 gamex86.so doom.x86 idAnim::Name() const 3540 0.1273 gamex86.so doom.x86 idQuat::ToMat3() const 2953 0.1062 gamex86.so doom.x86 idInterpreter::CallEvent(function_t const*, int) 2755 0.0991 gamex86.so doom.x86 idEntity::Think() 2562 0.0922 gamex86.so doom.x86 idPhysics_RigidBody::SaveState() 2440 0.0878 gamex86.so doom.x86 idDict::FindKey(char const*) const 2429 0.0874 gamex86.so doom.x86 idIK_Walk::Evaluate() 2302 0.0828 gamex86.so doom.x86 idAnimator::SetJointAxis(jointHandle_t, jointModTransform_t, idMat3 const&)
Tips
Use ack (or grep) on the tdm_profile_report.txt to filter out anything not mentioning gamex86.so, this reduces the clutter:
ack gamex86.so tdm_profile_report.txt
g_frametime
Set this CVAR to 1, and the code will emit how many milliseconds it took to do all the calculations for one frame, minus the render time. The output on the console will be something like this:
game 28080: all:0.2 th:0.2 ev:0.0 28 ents
The numbers are:
- game: The current frame time (always increasing from map start)
- all: The complete time to think + event handling
- th: The thinking time (without event handling)
- ev: The event handling time
- ents: The number of entities that did thinking
g_timeentities
When set to > 0, shows entities whose Think() functions exceeded the number of milliseconds specified. Set it for instance to 2, and all entities that think longer than 2ms will be shown on the console.