Mission Database: Difference between revisions

From The DarkMod Wiki
Jump to navigationJump to search
(Written Update Mission section.)
(Added Storage Concerns section: article done)
Line 158: Line 158:
= Storage Concerns =
= Storage Concerns =


TODO
We store all missions in SVN repository, thus every version of every FM is saved forever.
While total size of all FMs can be 10 GB, the SVN repository can be larger due to storing full history, especially if large FMs are updated many times.
As of 2021, it is not clear yet how bad things will become.
Most likely the repo won't grow too large, but it's better to be careful.
In order to decide how to make history smaller, we should first understand how is SVN repository stored on the server.
 
=== Xdelta and Zip Format ===
 
SVN history is a series of revisions.
For every revision, SVN stores the diff between the previous version and the new one for every modified file.
So when we commit an update to pk4 file, the size of SVN repository grows by the size of the diff on pk4 file.
In the worst case the diff can be as large as the new version of pk4 file.
Unfortunately, such worst-case outcome easily happen even if only a few files inside archive were modified.
 
Pk4 file is an ordinary zip archive, so it is stored in [https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure Zip format]. All files are stored sequentally inside the archive file, one after another.
Every file inside zip archive is compressed independently of all the other files, and occupies some subsegment of the file.
If some file was not changed and was not recompressed, then the new archive contains exactly the same bytes for this file as the old archive.
In theory, a perfect diff algorithm can detect it, and avoid including any data for such "not-changed" files into the diff.
 
In SVN, diff between revisions is computed using xdelta algorithm with search window limited to 100 KB.
Due to the very limited search window, the algorithm cannot reliably detect that files inside the old archive are reused in the new one.
Changing the order of files inside zip archive or removing files larger than 100 KB are enough to completely break the diff algorithm, resulting in a maximum-size diff.
That's why even using 7-zip to modify the old archive does not guarantee that your commit will produce small diff.
In fact, maximum-size diff is almost guaranteed if you remove at least one file of size larger than 100 KB (same can also happen for file modification).
 
=== Pk4diff Optimization ===
 
We have a special tool for "optimizing" pk4 file to reduce diff size.
This tool inspects the old and the new versions of the archive and finds which files have equal contents.
Then it repacks the archive in the following way:
# Take old version of the archive.
# Rename all files which were modified or removed to <tt>__trash__/trashN._tbin</tt>.
# Append files which were added or modified to the end of the archive.
The resulting pk4 archive as almost exactly the same as the old one, with new data appended at the end.
It is almost certain that SVN will produce diff file which only contains the differences (the appended data).
The downside is: "optimized" pk4 file is slightly larger because it still stores the old data as "trash".
 
In order to run the optimizer script, Python 3 must be installed.
Of course, SVN must be available in command line (for TortoiseSVN, make sure to [https://stackoverflow.com/a/9874961/556899 check "command line client tools" during installation]).
The tool is located in <tt>devel/pk4diff/bin</tt> in the assets repo and consists of Python script, pk4diff executable, and xdelta3 executable.
The easiest way to run it is to copy all three files into the directory with pk4 file (which must be in SVN working copy), then execute in command line:
 
  python pk4diff.py --optimize qwerty.pk4
 
Here is the sample output:
 
  CMD: svn export hhta.pk4@BASE __tmp_clean__.pk4
  A    __tmp_clean__.pk4
  Export complete.CMD:
  pk4diffexe __tmp_clean__.pk4 hhta.pk4
  Added size: 6676263
  Removed size: 33085296
  CMD: xdelta3 -e -f -B 524288 -W 524288 -s __tmp_clean__.pk4 hhta.pk4 __tmp_diff__.bin
  Xdelta diff size: '''517134007'''
  CMD: pk4diffexe __tmp_clean__.pk4 hhta.pk4 __tmp_optimized__.pk4
  Added size: 6676263
  Removed size: 33085296
  CMD: xdelta3 -e -f -B 524288 -W 524288 -s __tmp_clean__.pk4 __tmp_optimized__.pk4 __tmp_diff__.bin
  Xdelta diff size: '''6712187'''
  Added portion of dead data: '''5.885588%'''
  Replacing pk4 file with optimized file
 
All lines starting with "<tt>CMD:</tt>" shows running some program with some parameters.
The script works like this:
# The procedure starts with exporting clean version of hhta.pk4 using SVN command.
# Then pk4diffexe is run: it displays how many bytes are added/removed in the update. The full FM package is about 500 MB, so the changes are pretty small in this example.
# xdelta3 is run to estimate initial size of the diff. Obviously, it is maximum-size diff in this example (+ 500 MB to repo size).
# pk4diffexe is run again, but now it produces an optimized pk4 file.
# xdelta3 is run again on the optimized pk4 file. The diff size becomes about 6 MB, so the optimization has reduced diff a lot.
# The optimized pk4 file contains some trash data, and we are told which portion of the optimized pk4 is trash. It's only 6% in this case (33 MB).
# Since portion of trash is lower than 10%, the pk4 file is replaced with the optimized one. If there is too much trash, then optimized pk4 is simply deleted with a different message.
 
After running the program, the <tt>hhta.pk4</tt> file is modified: now it is the optimized version. Also, there is file <tt>hhta.pk4.old</tt> nearby: the is the copy of modified pk4 before optimization, in case you decide to restore it back. All that is left is to delete the .old file and commit modified pk4 to SVN.
 
Note that xdelta3 provides only rough estimate on the diff size, because 1) SVN uses xdelta 1 instead of xdelta 3, and 2) SVN uses window size = 100 KB, while command-line xdelta3 does not allow windows size smaller than 512 KB. However, the diff size should be correct in most cases.
 
For programmers: the source code for pk4diffexe is located in <tt>devel/pk4diff/src</tt>.
They require CMake and Conan to be built.
See file <tt>conan_install.bat</tt> to see how to build it.
 
=== Trash ===
 
Optimized pk4 file contains "trash" files.
They are located in <tt>__trash__</tt> directory and have filenames <tt>trashKKK._tbin</tt>.
Since they have weird extension, they should never affect how TDM game works.
 
If the total amount of trash is less than 100 KB, then you can safely delete it using 7-zip program before committing update.
When the total amount of trash is more than 100 KB, then SVN diff algorithm will be broken if you delete it, most likely resulting in maximum-size diff.
Indeed, we should control amount of trash in order to achieve balance between reducing SVN storage on server and reducing download traffic and storage on players' machines.
That's why pk4diff script only accepts optimized package if amount of trash is lower than 10%.
 
 
 
= References =
* Thread on developer subforum: https://forums.thedarkmod.com/index.php?/topic/20624-store-missions-archive-in-svn/

Revision as of 04:38, 25 May 2021

All released missions are stored in FM database, which defines what is available in mission downloader in-game.


Requirements

In order for a mission to be included in the database, it must satisfy the following conditions:

  1. It has gone through beta-testing by several forum members who did not take part in its creation.
    • As a rare exceptional case, a mission can be rejected from the database if the majority of beta-testers come to conclusion that it is unplayable or its quality is way too low.
  2. Does not contain any questionable material, potential intellectual property infridgements. Including:
    • using names from the original Thief games
    • using assets from other games
    • having content which is so bad that it is forbidden even on our forums (TODO: link to forum rules?)

A very few missions are installed along with the game, like Saint Lucia or Training Mission. These official missions are considered to be part of the game, and are not included in the database.


SVN

Starting from 2021, the FM database is stored in SVN repository (#5551). Only TDM team members have access to this SVN.

Here is the SVN address:

https://svn.thedarkmod.com/project/missions/trunk

But please don't rush to checkout the whole repo yet!

Checkout everything

Yes, you can simply copy the link above to SVN Checkout and get single working copy with all the FMs. But keep in mind that you will have to download ~10 GB of data and the working copy will take ~20 GB of space. In most cases you don't need everything in order to work with the database.

The typical reasons to checkout the whole repo are:

  • You want to run automated search or tests over all released FMs.
  • You are regular committer of the FM database, added a lot of new missions and updates, so the investment is worth it.
TortoiseSVN Repo-browser
Accountant 2 FM in SVN

Checkout as needed

Instead of checking out the whole repo, you can checkout only the few FMs you are going to modify. This is the recommended approach.

With this approach, you need the Repo-browser feature of TortoiseSVN. It allows to look through all the directories and files on remote SVN without checking it out first.

The directory structure of SVN is show on the picture. Most importantly, all information about FM with internal name "qwerty" is stored in fms/qwerty subdirectory of the repo. You can checkout only this directory in order to work with FM, just put this checkout address:

https://svn.thedarkmod.com/project/missions/trunk/fms/qwerty

Or find the FM directory in Repo-browser, right-click and select Checkout.

Detailed instructions in the rest of the article assume this approach.


Add New Mission

Add FM directory in Repo-browser
Checkout FM
Commit changes in FM directory
Add new files in Commit dialog
Precommit hook failed due to wrong value of "type"

Before adding new FM, negotiate internal name with FM author. It must be different from names of all existing FMs, consist of only lowercase letters and digits, be rather short (aim for 10-20 letters). Among many words in the mission title, prefer rare words and proper nouns over common words when composing the internal name.

Open SVN in Repo-browser, find fms directory which contains all the FMs. Right-click on it and choose Add folder in the context menu. Then enter the internal name of FM as the name of the new directory and proceed with commit. Now that the directory has been created, you can checkout it: find the directory in Repo-browser, right-click it and Checkout.

The second step is to upload pk4 files. Copy all the pk4 files of FM into the working copy directory (i.e. checkout directory). Open the directory in Windows Explorer, right-click and select SVN commit. Select all pk4 files with Shift, right-click and select Add. Also set checkboxes for these files. Write commit message in the text area above: it should start with internal name of FM in brackets. When everything is done, hit OK to do the actual commit.

While the pk4 files are already in the repository, they are not yet visible in the database. A mission is only added when its directory contains fminfo.xml file, so now you need to add it. You can take this file from another FM (find it in Repo-browser, right-click, Open with, select text editor), and adjust it for the FM being added. Here is explanation for some fields:

  • internalName defines name of the directory and pk4 file.
  • title is the name seen by players in-game.
  • author is one or several people who made the mission.
  • releaseDate shows when the very first version of the mission was added.
  • type is multi if mission contains several playable .map files (i.e. is campaign), and single otherwise.
  • size is size of the main pk4 file in megabytes displayed to users.
  • version is natural number used by in-game downloader to decide whether update is available or not. Starts with 1.
  • description contains text displayed in in-game downloader when player inspects mission details.
  • mainPack points to the main pk4 file of the mission. Note that the name of file is fully determined by internal name.
  • localisationPack points to _l10n.pk4 file if it exists.

Note that XML cannot directly contains some characters, thus they must be escaped:

  • Ampersand (&amp;) quotes (&quot; or &apos;) and angle brackets (&lt; or &gt;): reference
  • Line break symbol can be inserted as &#10; according to reference

When you have created fminfo.xml file, double-check that all properties are correct. Then use SVN to add and commit the file, just like you did with pk4 file.

If you did something wrong (most likely), then you will see an error saying that "Commit blocked by pre-commit hook". A long stacktrace from Python script is included, and the meaningful message should be at the end of it. Typically, it is either XML validation error saying that something in fminfo.xml is wrong, or a message from some custom failed check. You need to fix the errors and try to commit again --- until you manage to commit successfully.

As the last step, create subdirectory named screenshots in the working copy directory. Put screenshots in .jpg or .png format into the directory: they will be displayed in in-game mission downloader. Then add and commit all the screenshot files into the repository, same way as you did for pk4 and xml files.


Update Mission

This section covers the case if mission has already been released, but new version should be uploaded.

First of all, make sure you have up-to-date working copy of the FM directory. If you already have working copy, do right-click and SVN Update in it in Windows Explorer. If you don't have it yet, then open Repo-browser, find the directory named by FM's internal name, right-click and select Checkout. If you don't know the internal name, you can learn it like this: install the FM in the game, then look what is written in the currentfm.txt file. Suppose internal name is "qwerty".

There are two ways to update the FM:

  1. Minor update changes one or several lines in some file, e.g. in .map file or in .gui file. Such update is usually done when TDM engine is changed, FM gets broken, and merely a small spawnarg change is enough to return proper behavior. Usually such changes are done by TDM developers with approval of FM authors.
  2. Major update covers everything else. It happens when FM author sends a new version of pk4 file, with arbitrary changes in .map file, and probably some other changes.

When doing minor update, prefer editing existing pk4 archive instead of creating a new one from scratch. For instance, you can open the archive in 7-zip, extract only one file which you want to modify, change it in text editor, then add it back to the archive in 7-zip (overwriting the old version). It is a bad idea to extract everything, do some changes, then create a brand new pk4 archive.

Regardless of how major the update is, keep in mind SVN storage specifics to avoid rapid growth of server storage requirements. Unless full size of FM package is smaller than 100 MB, please run the diff optimization tool. The tool is located in devel\pk4diff\bin in the assets SVN repo. With Python 3 installed, copy its files to the FM directory and run there:

 python pk4diff.py --optimize qwerty.pk4

The tool will check how much data is really changed inside the archive. If the changes account for less than 10% of the full package size, then the tool will repack it to ensure that this update takes only a very little additional space on server. See more details in #Storage Concerns section.

Now you need to edit fminfo.xml file:

  1. Increment the integer in "version" property by one. Otherwise in-game downloader won't know that the mission has been updated.
  2. Probably update "size" property if it has changed significantly.

After you have dealt both with pk4 storage concerns and with XML file update, it's time to commit the changes. Open working copy directory in Windows Explorer, right-click and select SVN Commit. You should see at least two files marked as modified with checkboxes set: qwerty.pk4 and fminfo.xml. Write commit message which describes the changes. Start commit message with FM internal name in brackets. If the new version came from the author, write something like "New version: sent to me by XyMegaMapper yesterday". If the update did not come from FM author, then all the changes must be described in full detail! In case of minor update, it can be something like "In guis/mainmenu_custom_defs.gui, removed MM_BRIEFING_VIDEO_MATERIAL_K defines for K > 1". Finally, click OK to commit the changes.


Storage Concerns

We store all missions in SVN repository, thus every version of every FM is saved forever. While total size of all FMs can be 10 GB, the SVN repository can be larger due to storing full history, especially if large FMs are updated many times. As of 2021, it is not clear yet how bad things will become. Most likely the repo won't grow too large, but it's better to be careful. In order to decide how to make history smaller, we should first understand how is SVN repository stored on the server.

Xdelta and Zip Format

SVN history is a series of revisions. For every revision, SVN stores the diff between the previous version and the new one for every modified file. So when we commit an update to pk4 file, the size of SVN repository grows by the size of the diff on pk4 file. In the worst case the diff can be as large as the new version of pk4 file. Unfortunately, such worst-case outcome easily happen even if only a few files inside archive were modified.

Pk4 file is an ordinary zip archive, so it is stored in Zip format. All files are stored sequentally inside the archive file, one after another. Every file inside zip archive is compressed independently of all the other files, and occupies some subsegment of the file. If some file was not changed and was not recompressed, then the new archive contains exactly the same bytes for this file as the old archive. In theory, a perfect diff algorithm can detect it, and avoid including any data for such "not-changed" files into the diff.

In SVN, diff between revisions is computed using xdelta algorithm with search window limited to 100 KB. Due to the very limited search window, the algorithm cannot reliably detect that files inside the old archive are reused in the new one. Changing the order of files inside zip archive or removing files larger than 100 KB are enough to completely break the diff algorithm, resulting in a maximum-size diff. That's why even using 7-zip to modify the old archive does not guarantee that your commit will produce small diff. In fact, maximum-size diff is almost guaranteed if you remove at least one file of size larger than 100 KB (same can also happen for file modification).

Pk4diff Optimization

We have a special tool for "optimizing" pk4 file to reduce diff size. This tool inspects the old and the new versions of the archive and finds which files have equal contents. Then it repacks the archive in the following way:

  1. Take old version of the archive.
  2. Rename all files which were modified or removed to __trash__/trashN._tbin.
  3. Append files which were added or modified to the end of the archive.

The resulting pk4 archive as almost exactly the same as the old one, with new data appended at the end. It is almost certain that SVN will produce diff file which only contains the differences (the appended data). The downside is: "optimized" pk4 file is slightly larger because it still stores the old data as "trash".

In order to run the optimizer script, Python 3 must be installed. Of course, SVN must be available in command line (for TortoiseSVN, make sure to check "command line client tools" during installation). The tool is located in devel/pk4diff/bin in the assets repo and consists of Python script, pk4diff executable, and xdelta3 executable. The easiest way to run it is to copy all three files into the directory with pk4 file (which must be in SVN working copy), then execute in command line:

 python pk4diff.py --optimize qwerty.pk4

Here is the sample output:

 CMD: svn export hhta.pk4@BASE __tmp_clean__.pk4
 A    __tmp_clean__.pk4
 Export complete.CMD:
 pk4diffexe __tmp_clean__.pk4 hhta.pk4
 Added size: 6676263
 Removed size: 33085296
 CMD: xdelta3 -e -f -B 524288 -W 524288 -s __tmp_clean__.pk4 hhta.pk4 __tmp_diff__.bin
 Xdelta diff size: 517134007
 CMD: pk4diffexe __tmp_clean__.pk4 hhta.pk4 __tmp_optimized__.pk4
 Added size: 6676263
 Removed size: 33085296
 CMD: xdelta3 -e -f -B 524288 -W 524288 -s __tmp_clean__.pk4 __tmp_optimized__.pk4 __tmp_diff__.bin
 Xdelta diff size: 6712187
 Added portion of dead data: 5.885588%
 Replacing pk4 file with optimized file

All lines starting with "CMD:" shows running some program with some parameters. The script works like this:

  1. The procedure starts with exporting clean version of hhta.pk4 using SVN command.
  2. Then pk4diffexe is run: it displays how many bytes are added/removed in the update. The full FM package is about 500 MB, so the changes are pretty small in this example.
  3. xdelta3 is run to estimate initial size of the diff. Obviously, it is maximum-size diff in this example (+ 500 MB to repo size).
  4. pk4diffexe is run again, but now it produces an optimized pk4 file.
  5. xdelta3 is run again on the optimized pk4 file. The diff size becomes about 6 MB, so the optimization has reduced diff a lot.
  6. The optimized pk4 file contains some trash data, and we are told which portion of the optimized pk4 is trash. It's only 6% in this case (33 MB).
  7. Since portion of trash is lower than 10%, the pk4 file is replaced with the optimized one. If there is too much trash, then optimized pk4 is simply deleted with a different message.

After running the program, the hhta.pk4 file is modified: now it is the optimized version. Also, there is file hhta.pk4.old nearby: the is the copy of modified pk4 before optimization, in case you decide to restore it back. All that is left is to delete the .old file and commit modified pk4 to SVN.

Note that xdelta3 provides only rough estimate on the diff size, because 1) SVN uses xdelta 1 instead of xdelta 3, and 2) SVN uses window size = 100 KB, while command-line xdelta3 does not allow windows size smaller than 512 KB. However, the diff size should be correct in most cases.

For programmers: the source code for pk4diffexe is located in devel/pk4diff/src. They require CMake and Conan to be built. See file conan_install.bat to see how to build it.

Trash

Optimized pk4 file contains "trash" files. They are located in __trash__ directory and have filenames trashKKK._tbin. Since they have weird extension, they should never affect how TDM game works.

If the total amount of trash is less than 100 KB, then you can safely delete it using 7-zip program before committing update. When the total amount of trash is more than 100 KB, then SVN diff algorithm will be broken if you delete it, most likely resulting in maximum-size diff. Indeed, we should control amount of trash in order to achieve balance between reducing SVN storage on server and reducing download traffic and storage on players' machines. That's why pk4diff script only accepts optimized package if amount of trash is lower than 10%.


References