Multilanguage Display
Title: Multilanguage Display [PAGE IN PROGRESS]
Background
This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.
In TDM main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.)
Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there is are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding language-specific file, e.g., french.lang, german.lang, etc. When you play TDM, and select a given language, the corresponding .lang file (if needed) is read.
Except for Russian, these files use a specific 8-bit encoding within the ISO-8859 family. (The i18n – Char page has specifics here.) This specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members.
Language Names in Idealized UTF8 Form
Only the TDM-exposed languages are shown here. The all.lang file has additional strings for other potential Latin and Cyrillic languages. For reference, the generated encoding for the <language>.lang file is shown in the comment.
"#str_02460" "English" // English [ISO-8859-1] "#str_02461" "Deutsch" // German [ISO-8859-1] "#str_02462" "Español" // Spanish [ISO-8859-1] "#str_02463" "Français" // French [ISO-8859-15] "#str_02464" "Português" // Portuguese [ISO-8859-1] "#str_02465" "Polski" // Polish [ISO-8859-2] "#str_02466" "Italiano" // Italian [ISO-8859-1] "#str_02467" "Česky" // Czech [ISO-8859-2] "#str_02468" "Русский" // Russian [WIN-1251] "#str_02469" "Català" // Catalan [ISO-8859-1] "#str_02470" "Dansk" // Danish [ISO-8859-1] "#str_02472" "Nederlands" // Dutch [ISO-8859-1] "#str_02474" "Magyar" // Hungarian [ISO-8859-2] "#str_02476" "Svenska" // Swedish [ISO-8859-1] "#str_02477" "Türkçe" // Turkish [ISO-8859-9] "#str_02479" "Română" // Romanian [ISO-8859-16] "#str_02480" "Slovenčina" // Slovak [ISO-8859-2]
First Compromise – Cyrillic vs. Latin Font
While there is some overlap in characters between Cyrillic and Latin encodings, it is insufficient for our purposes. So, for [English] and other European sections, a Latin transliteration from Cyrillic is used:
"#str_02468" "Russkiy" // Russian
Going the other way, for [Russian], the current treatment is not to do a Cyrillic transliteration from Latin, but instead to drop accents, rendering just in ASCII (except of course “Russian”), e.g.:
"#str_02460" "English" // English "#str_02461" "Deutsch" // German "#str_02462" "Espanol" // Spanish "#str_02463" "Francais" // French "#str_02464" "Portugues" // Portuguese "#str_02465" "Polski" // Polish "#str_02466" "Italiano" // Italian "#str_02467" "Cesky" // Czech "#str_02468" "Русский" // Russian "#str_02469" "Catala" // Catalan "#str_02470" "Dansk" // Danish "#str_02472" "Nederlands" // Dutch "#str_02474" "Magyar" // Hungarian "#str_02476" "Svenska" // Swedish "#str_02477" "Turkce" // Turkish "#str_02479" "Romana" // Romanian "#str_02480" "Slovencina" // Slovak
Moving between ISO-8859 Members
In the discussion here, let “L1” be the current language (from the player’s perspective) or the [language] section of interest in all.lang (from the translator’s perspective). Let “L2” be the language of the foreign word, i.e., L1 and L2 differ.
Type and Go
In several cases, the translator merely puts the ideal character in the all.lang strings. After <language>.lang generation, everything works. Cases are:
- L1 and L2 are both use the same ISO-8859 encoding. (Specifically, they are either both ISO-8859-1, which doesn’t need <language>.map files, or both ISO-8859-2, and have <language>.map files with, by convention, identical contents.) NOTE: For TDM’s purposes, ISO-8859-15 (French) can be considered part of the ISO-8859-1 family.
- L2 character is ASCII (i.e., in ISO range 0x00-0x7f). This is always the case if L2 = English. Characters in this range have identical codepoints in all five TDM-supported ISO-8859 and in the TDM target encoding.
- L2 character is in the ISO range 0xA0-0xFF, with additional constraints.
For the last case, the L2 character must be either:
- present at the same codepoint in all five TDM-supported ISO-8859 encodings, and in the TDM target encoding. For our Latin language names, these accented characters are like that:
ç in "Français" and "Türkçe" at 0xE7 à in "Català" at 0xEA â in "Română" at 0xE2 ü in "Türkçe" at 0xFC
- present at the same codepoint in ISO-8859 encodings for L1 and L2, and in the TDM target encoding.
- present in different codepoints in ISO-8859 encodings for L1 and L2, but with each being either the same as the TDM target encoding, or mapped to the TDM target encoding by the <language>.map file.
Tricks
If the L2 character is not represented in the ISO encoding for L1, then at <L1>.lang generation time, it will be replaced by a “?”. To work around this, the translator has two methods:
Trick Method 1 – Direct Stuffing
When L1 encoding is ISO-8859-1 (or -9 or -15, which all have identical encodings for the 0xA0-BF ranges) and L2 character is in TDM’s 0xA0-0xBF range (or D7, F7), then you can lookup the ISO character associated with that codepoint (shown in parentheses in cells of the i18n- charmap table), and type it. Examples:
[English] // and other ISO-8859-1, -9, or -15 "#str_02467" "¬esky" // our default mapping is ISO 8859-1, so ¬ is shown as Č (¬) "#str_02480" "Sloven®ina" // Slovak (® in ISO-8859-1 is č in our font)
Here’s another example of that last stuffing from a TDM-unexposed language, showing also a diaresis (0xA8):
"#str_02481" "Sloven¨®ina" // Slovenian (southern slovenia) (¨ => š)
In this case, ISO-8859-1 would need to use the diaresis, but ISO-8859-15 (French) would not, i.e., could type and go with š. The spreadsheet shows all available substitutions in light blue.
Trick Method 2 – Special Mapping with Repurposed Character
If all else fails, a little-used character in L1 can be redirected to the L2 codepoint TDM wants, by an extra mapping command in the <L1>.map file. Current such tricks are...
To get ñ for "Español":
- for iso 2, put ¨ (diaeresis 0xA8)
- for iso 16, put ¶ (pilcrow 0xB6)
To get ê for "Português":
- for iso 2: Put ´ (acute accent 0xBD)
Details of TDM and ISO Char Sets and Potential Tricks and Limitations
[TO DO – Insert big table]