I18N - Charset: Difference between revisions
m (add stats table) |
(Add paragraph & link about TDM 8859-Sytle Font Map) |
||
(15 intermediate revisions by 3 users not shown) | |||
Line 20: | Line 20: | ||
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251] | * '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251] | ||
* '''French:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-15 ISO-8859-15] | * '''French:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-15 ISO-8859-15] | ||
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1] (German, Dutch, Danish, etc.) | * '''Romanian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-16 ISO-8859-16] | ||
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1] (German, Dutch, Danish, Swedish, Portuguese, etc.) | |||
Line 29: | Line 30: | ||
The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are [[I18N - Character mapping|mapping files]], f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place. | The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are [[I18N - Character mapping|mapping files]], f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place. | ||
== European Languages == | See '''[[I18N - Character mapping|Character mapping]]''' for more information. | ||
=== European Languages === | |||
This mapping is used for European languages, f.i. '''Czech''', '''French''', '''German''', '''Spanish''', '''Portuguese''', '''Polish'''. Note that the double accented characters in Hungarian '''Ő, ő, Ű and ű''' look a bit different from '''Ö, ö, Ü and ü'''! | This mapping is used for European languages, f.i. '''Czech''', '''French''', '''German''', '''Spanish''', '''Portuguese''', '''Polish'''. Note that the double accented characters in Hungarian '''Ő, ő, Ű and ű''' look a bit different from '''Ö, ö, Ü and ü'''! | ||
Line 37: | Line 40: | ||
'''Color code:''' | '''Color code:''' | ||
{{box|#f0d0d0|Character not | {{box|#f0d0d0|Character not usable by TDM|Unusable}}{{box|#d0e0d0|Character not yet used in TDM|Unused}}{{box|#c0ffc0|Character displayed in v1.08 or newer|Usable in v1.08}}{{box|#80f080|Character displayed in v2.03 or newer|Usable in v2.03}}{{box|#d0d0f0|Changed from the ISO-8859-1 default, usable by TDM 1.0 or newer|Changed from ISO 8859-1}} | ||
{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2 width=100% | {|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2 width=100% | ||
Line 210: | Line 213: | ||
|align='center'|7D<br>'''}''' | |align='center'|7D<br>'''}''' | ||
|align='center'|7E<br>'''~''' | |align='center'|7E<br>'''~''' | ||
|align='center' style='background: # | |align='center' style='background: #d0e0d0'|7F<br>'''�''' | ||
|- | |- | ||
Line 227: | Line 230: | ||
|align='center' style='background: #c0ffc0'|8B<br>'''Ă''' | |align='center' style='background: #c0ffc0'|8B<br>'''Ă''' | ||
|align='center' style='background: #c0ffc0'|8C<br>'''Ń''' | |align='center' style='background: #c0ffc0'|8C<br>'''Ń''' | ||
|align='center' style='background: # | |align='center' style='background: #80f080'|8D<br>'''Ș''' | ||
|align='center' style='background: # | |align='center' style='background: #80f080'|8E<br>'''Ț''' | ||
|align='center' style='background: # | |align='center' style='background: #d0e0d0'|8F<br>'''�''' | ||
|- | |- | ||
!9… | !9… | ||
|align='center' style='background: # | |align='center' style='background: #80f080'|90<br>'''đ''' | ||
|align='center' style='background: #c0ffc0'|91<br>'''ś''' | |align='center' style='background: #c0ffc0'|91<br>'''ś''' | ||
|align='center' style='background: #c0ffc0'|92<br>'''ć''' | |align='center' style='background: #c0ffc0'|92<br>'''ć''' | ||
Line 246: | Line 249: | ||
|align='center' style='background: #c0ffc0'|9B<br>'''ă''' | |align='center' style='background: #c0ffc0'|9B<br>'''ă''' | ||
|align='center' style='background: #c0ffc0'|9C<br>'''ń''' | |align='center' style='background: #c0ffc0'|9C<br>'''ń''' | ||
|align='center' style='background: # | |align='center' style='background: #80f080'|9D<br>'''ș''' | ||
|align='center' style='background: # | |align='center' style='background: #80f080'|9E<br>'''ț''' | ||
|align='center' style='background: # | |align='center' style='background: #d0e0d0'|9F<br>'''�''' | ||
|- | |- | ||
Line 366: | Line 369: | ||
|} | |} | ||
== Russian == | For a mapping of these 256 codepoints to Unicode U+NNNN values and formal names, download: | ||
[https://drive.google.com/file/d/1UAz9jSZpT_j33STP3So_Re8JWa8QuAmz/view?usp=sharing TDM 8859-Sytle Font Map to Unicode-16.txt]. This file (by Geep, 2024) is in a standardized format so that it can also be imported into font design programs like FontForge as a custom 256-position map. In the comments, there is additional information about: | |||
* ISO 8859-x sourcing of each character. | |||
* alternative representations of some European and control characters. | |||
=== Russian === | |||
Characters conform to the [https://en.wikipedia.org/wiki/Win-1251 WIN-1251 native encoding, shown in the Wikipedia article]. Exception: the character '''0xFF''' (я) is mapped to '''0xB6''' upon loading. Therefore any Russian font must contain я at the place 0xB6. | |||
=== Asian Languages (Korean, Chinese, Japanese) === | |||
The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code. Plus, these languages use more than 256 different characters, and an 8 bit table will not hold these. | |||
== Statistics == | == Statistics == | ||
Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters): | Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, from TDM v1.08, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters): | ||
{|class="wikitable" border=1 style="border-collapse: collapse; font-size: | {|class="wikitable" border=1 style="border-collapse: collapse; font-size: 85%" cellspacing=0 cellpadding=2 | ||
|- | |- | ||
|Rank | |||
|Occurances | |||
|Letter | |||
|Remarks | |||
|Rank | |Rank | ||
|Occurances | |Occurances | ||
Line 388: | Line 402: | ||
|í | |í | ||
|715 | |715 | ||
| | |||
|25 | |||
|ć | |||
|67 | |||
| | | | ||
Line 394: | Line 412: | ||
|é | |é | ||
|674 | |674 | ||
| | |||
|26 | |||
|è | |||
|65 | |||
| | | | ||
Line 400: | Line 422: | ||
|á | |á | ||
|524 | |524 | ||
| | |||
|27 | |||
|ú | |||
|56 | |||
| | | | ||
Line 406: | Line 432: | ||
|ø | |ø | ||
|303 | |303 | ||
|Danish | |||
|28 | |||
|ê | |||
|52 | |||
| | | | ||
Line 413: | Line 443: | ||
|288 | |288 | ||
| | | | ||
|29 | |||
|ö | |||
|48 | |||
|German | |||
|- | |- | ||
Line 418: | Line 452: | ||
|ó | |ó | ||
|283 | |283 | ||
| | |||
|30 | |||
|É | |||
|46 | |||
| | | | ||
Line 425: | Line 463: | ||
|270 | |270 | ||
|German | |German | ||
|31 | |31 | ||
|ñ | |ñ | ||
Line 571: | Line 469: | ||
|- | |- | ||
|8 | |||
|ł | |||
|203 | |||
|Polish | |||
|32 | |32 | ||
|õ | |õ | ||
Line 577: | Line 479: | ||
|- | |- | ||
|9 | |||
|æ | |||
|200 | |||
|Danish | |||
|33 | |33 | ||
|ń | |ń | ||
Line 583: | Line 489: | ||
|- | |- | ||
|10 | |||
|ě | |||
|182 | |||
| | |||
|34 | |34 | ||
|Ł | |Ł | ||
Line 589: | Line 499: | ||
|- | |- | ||
|11 | |||
|ř | |||
|175 | |||
|Czech | |||
|35 | |35 | ||
|Š | |Š | ||
Line 595: | Line 509: | ||
|- | |- | ||
|12 | |||
|ã | |||
|168 | |||
| | |||
|36 | |36 | ||
|â | |â | ||
Line 601: | Line 519: | ||
|- | |- | ||
|13 | |||
|ž | |||
|148 | |||
|Czech | |||
|37 | |37 | ||
|ź | |ź | ||
Line 607: | Line 529: | ||
|- | |- | ||
|14 | |||
|ý | |||
|142 | |||
| | |||
|38 | |38 | ||
|ß | |ß | ||
Line 613: | Line 539: | ||
|- | |- | ||
|15 | |||
|ę | |||
|141 | |||
| | |||
|39 | |39 | ||
|Ó | |Ó | ||
Line 619: | Line 549: | ||
|- | |- | ||
|16 | |||
|ą | |||
|140 | |||
| | |||
|40 | |40 | ||
|ň | |ň | ||
Line 625: | Line 559: | ||
|- | |- | ||
|17 | |||
|ż | |||
|119 | |||
| | |||
|41 | |41 | ||
|Ú | |Ú | ||
Line 631: | Line 569: | ||
|- | |- | ||
|18 | |||
|å | |||
|109 | |||
|Danish | |||
|42 | |42 | ||
|Á | |Á | ||
Line 637: | Line 579: | ||
|- | |- | ||
|19 | |||
|š | |||
|99 | |||
| | |||
|43 | |43 | ||
|î | |î | ||
Line 643: | Line 589: | ||
|- | |- | ||
|20 | |||
|ś | |||
|97 | |||
| | |||
|44 | |44 | ||
|ť | |ť | ||
Line 649: | Line 599: | ||
|- | |- | ||
|21 | |||
|ç | |||
|91 | |||
| | |||
|45 | |45 | ||
|ô | |ô | ||
Line 655: | Line 609: | ||
|- | |- | ||
|22 | |||
|ä | |||
|86 | |||
|German | |||
|46 | |46 | ||
|Ž | |Ž | ||
Line 661: | Line 619: | ||
|- | |- | ||
|23 | |||
|à | |||
|83 | |||
| | |||
|47 | |47 | ||
|Ż | |Ż | ||
Line 667: | Line 629: | ||
|- | |- | ||
|24 | |||
|ů | |||
|77 | |||
| | |||
|48 | |48 | ||
|Č | |Č | ||
Line 673: | Line 639: | ||
|- | |- | ||
|25 | |||
|ć | |||
|67 | |||
| | |||
|49 | |49 | ||
|ù | |ù | ||
Line 678: | Line 648: | ||
| | | | ||
| | |} | ||
Although ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters to the fonts is quite important. | |||
Preferably, all foreign letters would be added to the fonts (see [[Font Patcher]]). However, if time permits only adding a few, '''í''' would be more important than, say, '''ô'''. | |||
[[Category:Fonts]] | |||
{{i18n}} |
Latest revision as of 21:04, 14 April 2024
Introduction
The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, special charsets are used. The fonts (Carleton for the menu f.i.) are build/patched so that the right characters appear in the right place.
Encodings
all.lang
This file is in UTF-8, and converted with the help of the script devel/gen_lang.pl:
perl devel/gen_lang.pl
This ensures that the generated language files are in their proper encodings (see below).
All other language files
Note that the language files (f.i. strings/german.lang) as well as the readables and the FM dictionariaries are expected to be in the following encodings:
- Czech, Hungarian, Slovak, Polish: ISO-8859-2 (not WIN-1250!)
- Russian: WIN-1251
- French: ISO-8859-15
- Romanian: ISO-8859-16
- All other languages: ISO-8859-1 (German, Dutch, Danish, Swedish, Portuguese, etc.)
Character remapping
The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place.
See Character mapping for more information.
European Languages
This mapping is used for European languages, f.i. Czech, French, German, Spanish, Portuguese, Polish. Note that the double accented characters in Hungarian Ő, ő, Ű and ű look a bit different from Ö, ö, Ü and ü!
In the table below, the original ISO 8859-1 characters are given in () below the TDM character.
Color code:
UnusableUnusedUsable in v1.08Usable in v2.03Changed from ISO 8859-1
…0 | …1 | …2 | …3 | …4 | …5 | …6 | …7 | …8 | …9 | …A | …B | …C | …D | …E | …F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0… | 00 – |
01 – |
02 – |
03 – |
04 – |
05 – |
06 – |
07 – |
08 – |
09 – |
0A – |
0B – |
0C – |
0D – |
0E – |
0F – |
1… | 10 – |
11 – |
12 – |
13 – |
14 – |
15 – |
16 – |
17 – |
18 – |
19 – |
1A – |
1B – |
1C – |
1D – |
1E – |
1F – |
2… | 20 |
21 ! |
22 " |
23 # |
24 $ |
25 % |
26 & |
27 '' |
28 ( |
29 ) |
2A * |
2B + |
2C , |
2D - |
2E . |
2F / |
3… | 30 0 |
31 1 |
32 2 |
33 3 |
34 4 |
35 5 |
36 6 |
37 7 |
38 8 |
39 9 |
3A : |
3B ; |
3C < |
3D = |
3E > |
3F ? |
4… | 40 @ |
41 A |
42 B |
43 C |
44 D |
45 E |
46 F |
47 G |
48 H |
49 I |
4A J |
4B K |
4C L |
4D M |
4E N |
4F O |
5… | 50 P |
51 Q |
52 R |
53 S |
54 T |
55 U |
56 V |
57 W |
58 X |
59 Y |
5A Z |
5B [ |
5C \ |
5D ] |
5E ^ |
5F _ |
6… | 60 ` |
61 a |
62 b |
63 c |
64 d |
65 e |
66 f |
67 g |
68 h |
69 i |
6A j |
6B k |
6C l |
6D m |
6E n |
6F o |
7… | 70 p |
71 q |
72 r |
73 s |
74 t |
75 u |
76 v |
77 w |
78 x |
79 y |
7A z |
7B { |
7C | |
7D } |
7E ~ |
7F � |
8… | 80 Ň |
81 Ś |
82 Ć |
83 Ż |
84 Ź |
85 Ŝ |
86 Ĉ |
87 Ẑ |
88 Ô |
89 Ŕ |
8A Ǔ |
8B Ă |
8C Ń |
8D Ș |
8E Ț |
8F � |
9… | 90 đ |
91 ś |
92 ć |
93 ż |
94 ź |
95 ŝ |
96 ĉ |
97 ẑ |
98 ô |
99 ŕ |
9A ǔ |
9B ă |
9C ń |
9D ș |
9E ț |
9F � |
A… | A0 NBSP |
A1 ň (¡) |
A2 Ű (¢) |
A3 ě (£) |
A4 ű (¤) |
A5 Ě (¥) |
A6 Š (¦) |
A7 § |
A8 š (¨) |
A9 Ů (©) |
AA Ą (ª) |
AB Ę («) |
AC Č (¬) |
AD SHY |
AE č (®) |
AF ů (¯) |
B… | B0 Ő (°) |
B1 Ł (±) |
B2 Ť (²) |
B3 Ď (³) |
B4 Ž (´) |
B5 ł (µ) |
B6 ť (¶) |
B7 ď (·) |
B8 ž (¸) |
B9 ő (¹) |
BA ą (º) |
BB ę (») |
BC Œ (¼) |
BD œ (½) |
BE Ÿ (¾) |
BF ¿ |
C… | C0 À |
C1 Á |
C2 Â |
C3 Ã |
C4 Ä |
C5 Å |
C6 Æ |
C7 Ç |
C8 È |
C9 É |
CA Ê |
CB Ë |
CC Ì |
CD Í |
CE Î |
CF Ï |
D… | D0 Ð |
D1 Ñ |
D2 Ò |
D3 Ó |
D4 Ô |
D5 Õ |
D6 Ö |
D7 Ř (×) |
D8 Ø |
D9 Ù |
DA Ú |
DB Û |
DC Ü |
DD Ý |
DE Þ |
DF ß |
E… | E0 à |
E1 á |
E2 â |
E3 ã |
E4 ä |
E5 å |
E6 æ |
E7 ç |
E8 è |
E9 é |
EA ê |
EB ë |
EC ì |
ED í |
EE î |
EF ï |
F… | F0 ð |
F1 ñ |
F2 ò |
F3 ó |
F4 ô |
F5 õ |
F6 ö |
F7 ř (÷) |
F8 ø |
F9 ù |
FA ú |
FB û |
FC ü |
FD ý |
FE þ |
FF ÿ |
For a mapping of these 256 codepoints to Unicode U+NNNN values and formal names, download: TDM 8859-Sytle Font Map to Unicode-16.txt. This file (by Geep, 2024) is in a standardized format so that it can also be imported into font design programs like FontForge as a custom 256-position map. In the comments, there is additional information about:
- ISO 8859-x sourcing of each character.
- alternative representations of some European and control characters.
Russian
Characters conform to the WIN-1251 native encoding, shown in the Wikipedia article. Exception: the character 0xFF (я) is mapped to 0xB6 upon loading. Therefore any Russian font must contain я at the place 0xB6.
Asian Languages (Korean, Chinese, Japanese)
The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code. Plus, these languages use more than 256 different characters, and an 8 bit table will not hold these.
Statistics
Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, from TDM v1.08, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):
Rank | Occurances | Letter | Remarks | Rank | Occurances | Letter | Remarks |
1 | í | 715 | 25 | ć | 67 | ||
2 | é | 674 | 26 | è | 65 | ||
3 | á | 524 | 27 | ú | 56 | ||
4 | ø | 303 | Danish | 28 | ê | 52 | |
5 | č | 288 | 29 | ö | 48 | German | |
6 | ó | 283 | 30 | É | 46 | ||
7 | ü | 270 | German | 31 | ñ | 37 | |
8 | ł | 203 | Polish | 32 | õ | 32 | |
9 | æ | 200 | Danish | 33 | ń | 26 | |
10 | ě | 182 | 34 | Ł | 24 | ||
11 | ř | 175 | Czech | 35 | Š | 21 | |
12 | ã | 168 | 36 | â | 21 | ||
13 | ž | 148 | Czech | 37 | ź | 20 | |
14 | ý | 142 | 38 | ß | 18 | German | |
15 | ę | 141 | 39 | Ó | 18 | ||
16 | ą | 140 | 40 | ň | 15 | ||
17 | ż | 119 | 41 | Ú | 15 | ||
18 | å | 109 | Danish | 42 | Á | 13 | |
19 | š | 99 | 43 | î | 12 | ||
20 | ś | 97 | 44 | ť | 11 | ||
21 | ç | 91 | 45 | ô | 9 | ||
22 | ä | 86 | German | 46 | Ž | 8 | |
23 | à | 83 | 47 | Ż | 7 | ||
24 | ů | 77 | 48 | Č | 7 | ||
25 | ć | 67 | 49 | ù | 6 |
Although ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters to the fonts is quite important.
Preferably, all foreign letters would be added to the fonts (see Font Patcher). However, if time permits only adding a few, í would be more important than, say, ô.
See Also
- I18N - Main article
Translation resources
- The charset TDM fonts use
- I18N.pl - a script to transform a FM into a mission and I18N data
- Bug Tracker entry #2779
- Text Decals for Signs etc.
- Fonts in TDM
- Font Conversion & Repair - with links to ExportFontToDoom3, Q3Font, Refont, and Font Patcher
Overview of translations
- I18N Status - Which FMs are translated into which language (not entirely up to date)
- Translating FMs
- List of translators
- Translator's Guide
Translation discussions