I18N - Character mapping: Difference between revisions

From The DarkMod Wiki
Jump to navigationJump to search
Tels (talk | contribs)
m remove outdated note
Geep (talk | contribs)
Encodings: Update/correct language encodings, default file info
 
(One intermediate revision by one other user not shown)
Line 3: Line 3:
== Encodings ==
== Encodings ==


Note that the language files (f.i. strings/german.lang) as well as the readables and the FM dictionariaries are expected to be in the following encodings:
Whether used to define translation dictionaries that are system-wide (in tdm_base01.pk4/strings/) or FM-specific (in <FM>/strings/), language files (f.i. german.lang), derived from Unicode all.lang, are expected to be in the following encodings:


* '''Czech''', '''Polish:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-2 ISO-8859-2] ('''not WIN-1250!)
* '''Czech, Polish, Hungarian, Slovak:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-2 ISO-8859-2] ('''not WIN-1250!)
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251]
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251]
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1]
* '''Romanian:''' ISO-8859-16
* '''French:''' ISO-8859-15
* '''Turkish:''' ISO-8859-9
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1]. This covers English, German, Italian, Spanish, Portuguese, Swedish, Danish, Dutch, and Catalan.


=== Remapping ===
=== Remapping ===


The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead.  
The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead. (Note: a default map is no longer shipped, at least since TDM 2.10 and probably much earlier. Generally, ISO-8859-1 languages don't need remapping.)
 
Remapping files are only looked for in the tdm_base01.pk4/strings directory. So they cannot be overwritten in any .map file placed in an FM's string directory.


The content of a map file is wrapped in '''{''' and '''}''', and each mapping consists of two hexadecimal numbers, the source and the target character number.
The content of a map file is wrapped in '''{''' and '''}''', and each mapping consists of two hexadecimal numbers, the source and the target character number.
Line 60: Line 65:
}
}
</pre>
</pre>
{{i18n}}* [[Font patcher]]
{{i18n}}* [[Font Patcher]]


[[Category:fonts]]
[[Category:fonts]]

Latest revision as of 18:08, 31 May 2026

The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, a special font is used (Carleton for the menu f.i.). These fonts are build/patched so that the right characters appear in the right place.

Encodings

Whether used to define translation dictionaries that are system-wide (in tdm_base01.pk4/strings/) or FM-specific (in <FM>/strings/), language files (f.i. german.lang), derived from Unicode all.lang, are expected to be in the following encodings:

  • Czech, Polish, Hungarian, Slovak: ISO-8859-2 (not WIN-1250!)
  • Russian: WIN-1251
  • Romanian: ISO-8859-16
  • French: ISO-8859-15
  • Turkish: ISO-8859-9
  • All other languages: ISO-8859-1. This covers English, German, Italian, Spanish, Portuguese, Swedish, Danish, Dutch, and Catalan.

Remapping

The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead. (Note: a default map is no longer shipped, at least since TDM 2.10 and probably much earlier. Generally, ISO-8859-1 languages don't need remapping.)

Remapping files are only looked for in the tdm_base01.pk4/strings directory. So they cannot be overwritten in any .map file placed in an FM's string directory.

The content of a map file is wrapped in { and }, and each mapping consists of two hexadecimal numbers, the source and the target character number.

Examples

For russian:

{
        0xFF    0xB6            // я
}

For European languages in ISO 8859-2 charset (f.i. Czech):

// a comment
{
        0xF2    0xA1            // ň
        0xDB    0xA2            // Ű (similiar to Ü, used in Hungarian)
        0xFB    0xA4            // ű
        0xA9    0xA6            // Š
        0xB9    0xA8            // š
        0xA1    0xAA            // Ą
        0xC8    0xAC            // Č
        0xCA    0xAB            // Ę
        0xE8    0xAE            // č
        0xD5    0xB0            // Ő (similiar to Ö, used in Hungarian)
        0xA3    0xB1            // Ł
        0xAb    0xB2            // Ť
        0xCF    0xB3            // Ď
        0xAC    0xB4            // Ž
        0xB3    0xB5            // ł
        0xBf    0xB6            // ż
        0xEF    0xB7            // ď
        0xBE    0xB8            // ž
        0xF5    0xB9            // ő (similiar to ö, used in Hungarian)
        0xB1    0xBA            // ą
        0xEA    0xBB            // ę
        0xF8    0xF7            // ř
        0xD8    0xD7            // Ř
        0xEC    0xA3            // ě
        0xCC    0xA5            // Ě
        0xD9    0xA9            // Ů
        0xF9    0xAF            // ů
        0xBB    0xB6            // ť
}

See Also

Translation resources

Overview of translations

Translation discussions