I18N - Character mapping: Difference between revisions
add note |
→Encodings: Update/correct language encodings, default file info |
||
| (2 intermediate revisions by one other user not shown) | |||
| Line 3: | Line 3: | ||
== Encodings == | == Encodings == | ||
Whether used to define translation dictionaries that are system-wide (in tdm_base01.pk4/strings/) or FM-specific (in <FM>/strings/), language files (f.i. german.lang), derived from Unicode all.lang, are expected to be in the following encodings: | |||
* '''Czech | * '''Czech, Polish, Hungarian, Slovak:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-2 ISO-8859-2] ('''not WIN-1250!) | ||
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251] | * '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251] | ||
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1] | * '''Romanian:''' ISO-8859-16 | ||
* '''French:''' ISO-8859-15 | |||
* '''Turkish:''' ISO-8859-9 | |||
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1]. This covers English, German, Italian, Spanish, Portuguese, Swedish, Danish, Dutch, and Catalan. | |||
=== Remapping === | === Remapping === | ||
The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead. | The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead. (Note: a default map is no longer shipped, at least since TDM 2.10 and probably much earlier. Generally, ISO-8859-1 languages don't need remapping.) | ||
Remapping files are only looked for in the tdm_base01.pk4/strings directory. So they cannot be overwritten in any .map file placed in an FM's string directory. | |||
The content of a map file is wrapped in '''{''' and '''}''', and each mapping consists of two hexadecimal numbers, the source and the target character number. | The content of a map file is wrapped in '''{''' and '''}''', and each mapping consists of two hexadecimal numbers, the source and the target character number. | ||
| Line 60: | Line 65: | ||
} | } | ||
</pre> | </pre> | ||
{{i18n}}* [[Font Patcher]] | |||
{{i18n}}* [[Font | |||
[[Category:fonts]] | [[Category:fonts]] | ||
Latest revision as of 18:08, 31 May 2026
The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, a special font is used (Carleton for the menu f.i.). These fonts are build/patched so that the right characters appear in the right place.
Encodings
Whether used to define translation dictionaries that are system-wide (in tdm_base01.pk4/strings/) or FM-specific (in <FM>/strings/), language files (f.i. german.lang), derived from Unicode all.lang, are expected to be in the following encodings:
- Czech, Polish, Hungarian, Slovak: ISO-8859-2 (not WIN-1250!)
- Russian: WIN-1251
- Romanian: ISO-8859-16
- French: ISO-8859-15
- Turkish: ISO-8859-9
- All other languages: ISO-8859-1. This covers English, German, Italian, Spanish, Portuguese, Swedish, Danish, Dutch, and Catalan.
Remapping
The characters are remapped upon loading the dictionary/readable from their source encoding (e.g. ISO 8859-2) to the special character map TDM uses. Responsible for this are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead. (Note: a default map is no longer shipped, at least since TDM 2.10 and probably much earlier. Generally, ISO-8859-1 languages don't need remapping.)
Remapping files are only looked for in the tdm_base01.pk4/strings directory. So they cannot be overwritten in any .map file placed in an FM's string directory.
The content of a map file is wrapped in { and }, and each mapping consists of two hexadecimal numbers, the source and the target character number.
Examples
For russian:
{
0xFF 0xB6 // я
}
For European languages in ISO 8859-2 charset (f.i. Czech):
// a comment
{
0xF2 0xA1 // ň
0xDB 0xA2 // Ű (similiar to Ü, used in Hungarian)
0xFB 0xA4 // ű
0xA9 0xA6 // Š
0xB9 0xA8 // š
0xA1 0xAA // Ą
0xC8 0xAC // Č
0xCA 0xAB // Ę
0xE8 0xAE // č
0xD5 0xB0 // Ő (similiar to Ö, used in Hungarian)
0xA3 0xB1 // Ł
0xAb 0xB2 // Ť
0xCF 0xB3 // Ď
0xAC 0xB4 // Ž
0xB3 0xB5 // ł
0xBf 0xB6 // ż
0xEF 0xB7 // ď
0xBE 0xB8 // ž
0xF5 0xB9 // ő (similiar to ö, used in Hungarian)
0xB1 0xBA // ą
0xEA 0xBB // ę
0xF8 0xF7 // ř
0xD8 0xD7 // Ř
0xEC 0xA3 // ě
0xCC 0xA5 // Ě
0xD9 0xA9 // Ů
0xF9 0xAF // ů
0xBB 0xB6 // ť
}
See Also
- I18N - Main article
Translation resources
- The charset TDM fonts use
- I18N.pl - a script to transform a FM into a mission and I18N data
- Bug Tracker entry #2779
- Text Decals for Signs etc.
- Fonts in TDM
- Font Conversion & Repair - with links to ExportFontToDoom3, Q3Font, Refont, and Font Patcher
Overview of translations
- I18N Status - Which FMs are translated into which language (not entirely up to date)
- Translating FMs
- List of translators
- Translator's Guide
Translation discussions