I18N - Charset: Difference between revisions

From The DarkMod Wiki
Jump to navigationJump to search
m (add swedish and polish to the list of "what charset to use")
 
(8 intermediate revisions by 2 users not shown)
Line 20: Line 20:
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251]
* '''Russian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/Win-1251 WIN-1251]
* '''French:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-15 ISO-8859-15]
* '''French:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-15 ISO-8859-15]
* '''Romanian:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-16 ISO-8859-16]
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1] (German, Dutch, Danish, Swedish, Portuguese, etc.)
* '''All other languages:''' [https://secure.wikimedia.org/wikipedia/en/wiki/ISO/IEC_8859-1 ISO-8859-1] (German, Dutch, Danish, Swedish, Portuguese, etc.)


Line 31: Line 32:
See '''[[I18N - Character mapping|Character mapping]]''' for more information.
See '''[[I18N - Character mapping|Character mapping]]''' for more information.


== European Languages ==
=== European Languages ===


This mapping is used for European languages, f.i. '''Czech''', '''French''', '''German''', '''Spanish''', '''Portuguese''', '''Polish'''. Note that the double accented characters in Hungarian '''Ő, ő, Ű and ű''' look a bit different from '''Ö, ö, Ü and ü'''!
This mapping is used for European languages, f.i. '''Czech''', '''French''', '''German''', '''Spanish''', '''Portuguese''', '''Polish'''. Note that the double accented characters in Hungarian '''Ő, ő, Ű and ű''' look a bit different from '''Ö, ö, Ü and ü'''!
Line 39: Line 40:
'''Color code:'''
'''Color code:'''


{{box|#f0d0d0|Character not displayed by TDM or not defined|Unusable}}{{box|#c0ffc0|Character displayed in v1.08 or newer|Usable in v1.08}}{{box|#d0d0f0|Changed from the ISO-8859-1 default|Changed}}
{{box|#f0d0d0|Character not usable by TDM|Unusable}}{{box|#d0e0d0|Character not yet used in TDM|Unused}}{{box|#c0ffc0|Character displayed in v1.08 or newer|Usable in v1.08}}{{box|#80f080|Character displayed in v2.03 or newer|Usable in v2.03}}{{box|#d0d0f0|Changed from the ISO-8859-1 default, usable by TDM 1.0 or newer|Changed from ISO 8859-1}}


{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2 width=100%
{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2 width=100%
Line 212: Line 213:
|align='center'|7D<br>'''}'''
|align='center'|7D<br>'''}'''
|align='center'|7E<br>'''~'''
|align='center'|7E<br>'''~'''
|align='center' style='background: #f0d0d0'|7F<br>''''''
|align='center' style='background: #d0e0d0'|7F<br>''''''


|-
|-
Line 229: Line 230:
|align='center' style='background: #c0ffc0'|8B<br>'''Ă'''
|align='center' style='background: #c0ffc0'|8B<br>'''Ă'''
|align='center' style='background: #c0ffc0'|8C<br>'''Ń'''
|align='center' style='background: #c0ffc0'|8C<br>'''Ń'''
|align='center' style='background: #f0d0d0'|8D<br>''''''
|align='center' style='background: #80f080'|8D<br>'''Ș'''
|align='center' style='background: #f0d0d0'|8E<br>''''''
|align='center' style='background: #80f080'|8E<br>'''Ț'''
|align='center' style='background: #f0d0d0'|8F<br>''''''
|align='center' style='background: #d0e0d0'|8F<br>''''''


|-
|-
!9…
!9…
|align='center' style='background: #f0d0d0'|90<br>''''''
|align='center' style='background: #80f080'|90<br>'''đ'''
|align='center' style='background: #c0ffc0'|91<br>'''ś'''
|align='center' style='background: #c0ffc0'|91<br>'''ś'''
|align='center' style='background: #c0ffc0'|92<br>'''ć'''
|align='center' style='background: #c0ffc0'|92<br>'''ć'''
Line 248: Line 249:
|align='center' style='background: #c0ffc0'|9B<br>'''ă'''
|align='center' style='background: #c0ffc0'|9B<br>'''ă'''
|align='center' style='background: #c0ffc0'|9C<br>'''ń'''
|align='center' style='background: #c0ffc0'|9C<br>'''ń'''
|align='center' style='background: #f0d0d0'|9D<br>''''''
|align='center' style='background: #80f080'|9D<br>'''ș'''
|align='center' style='background: #f0d0d0'|9E<br>''''''
|align='center' style='background: #80f080'|9E<br>'''ț'''
|align='center' style='background: #f0d0d0'|9F<br>''''''
|align='center' style='background: #d0e0d0'|9F<br>''''''


|-
|-
Line 368: Line 369:
|}
|}


== Russian ==
=== Russian ===


The character '''0xFF''' (я) is mapped to '''0xB6''' upon loading. Therefore any Russian font must contain я at the place 0xB6.
Characters conform to the [https://en.wikipedia.org/wiki/Win-1251 WIN-1251 native encoding, shown in the Wikipedia article]. Exception: the character '''0xFF''' (я) is mapped to '''0xB6''' upon loading. Therefore any Russian font must contain я at the place 0xB6.


== Asian Languages (Korean, Chinese, Japanese) ==
=== Asian Languages (Korean, Chinese, Japanese) ===


The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code.
The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code. Plus, these languages use more than 256 different characters, and an 8 bit table will not hold these.


== Statistics ==
== Statistics ==


Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):
Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, from TDM v1.08, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):


{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 85%" cellspacing=0 cellpadding=2
{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 85%" cellspacing=0 cellpadding=2
Line 644: Line 645:
|}
|}


Althought ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters is quite important.
Although ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters to the fonts is quite important.
 
Preferably, all foreign letters would be added to the fonts (see [[Font Patcher]]). However, if time permits only adding a few, '''í''' would be more important than, say, '''ô'''.


[[Category:Fonts]]
[[Category:Fonts]]


{{i18n}}* [[Font Patcher]]
{{i18n}}* [[Font Patcher]]

Latest revision as of 19:37, 6 March 2024

Introduction

The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, special charsets are used. The fonts (Carleton for the menu f.i.) are build/patched so that the right characters appear in the right place.

Encodings

all.lang

This file is in UTF-8, and converted with the help of the script devel/gen_lang.pl:

perl devel/gen_lang.pl

This ensures that the generated language files are in their proper encodings (see below).

All other language files

Note that the language files (f.i. strings/german.lang) as well as the readables and the FM dictionariaries are expected to be in the following encodings:


The core dictionaries are automatically generated in the right encoding, but make sure that you use the right encoding for the FM dictionary, too!

Character remapping

The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place.

See Character mapping for more information.

European Languages

This mapping is used for European languages, f.i. Czech, French, German, Spanish, Portuguese, Polish. Note that the double accented characters in Hungarian Ő, ő, Ű and ű look a bit different from Ö, ö, Ü and ü!

In the table below, the original ISO 8859-1 characters are given in () below the TDM character.

Color code:

UnusableUnusedUsable in v1.08Usable in v2.03Changed from ISO 8859-1

…0 …1 …2 …3 …4 …5 …6 …7 …8 …9 …A …B …C …D …E …F
0… 00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
1… 10
11
12
13
14
15
16
17
18
19
1A
1B
1C
1D
1E
1F
2… 20
 
21
!
22
"
23
#
24
$
25
%
26
&
27
''
28
(
29
)
2A
*
2B
+
2C
,
2D
-
2E
.
2F
/
3… 30
0
31
1
32
2
33
3
34
4
35
5
36
6
37
7
38
8
39
9
3A
:
3B
;
3C
<
3D
=
3E
>
3F
?
4… 40
@
41
A
42
B
43
C
44
D
45
E
46
F
47
G
48
H
49
I
4A
J
4B
K
4C
L
4D
M
4E
N
4F
O
5… 50
P
51
Q
52
R
53
S
54
T
55
U
56
V
57
W
58
X
59
Y
5A
Z
5B
[
5C
\
5D
]
5E
^
5F
_
6… 60
`
61
a
62
b
63
c
64
d
65
e
66
f
67
g
68
h
69
i
6A
j
6B
k
6C
l
6D
m
6E
n
6F
o
7… 70
p
71
q
72
r
73
s
74
t
75
u
76
v
77
w
78
x
79
y
7A
z
7B
{
7C
|
7D
}
7E
~
7F
8… 80
Ň
81
Ś
82
Ć
83
Ż
84
Ź
85
Ŝ
86
Ĉ
87
88
Ô
89
Ŕ
8A
Ǔ
8B
Ă
8C
Ń
8D
Ș
8E
Ț
8F
9… 90
đ
91
ś
92
ć
93
ż
94
ź
95
ŝ
96
ĉ
97
98
ô
99
ŕ
9A
ǔ
9B
ă
9C
ń
9D
ș
9E
ț
9F
A… A0
NBSP
A1
ň
(¡)
A2
Ű
(¢)
A3
ě
(£)
A4
ű
(¤)
A5
Ě
(¥)
A6
Š
(¦)
A7
§
A8
š
(¨)
A9
Ů
(©)
AA
Ą
(ª)
AB
Ę
(«)
AC
Č
(¬)
AD
SHY
AE
č
(®)
AF
ů
(¯)
B… B0
Ő
(°)
B1
Ł
(±)
B2
Ť
(²)
B3
Ď
(³)
B4
Ž
(´)
B5
ł
(µ)
B6
ť
(¶)
B7
ď
(·)
B8
ž
(¸)
B9
ő
(¹)
BA
ą
(º)
BB
ę
(»)
BC
Œ
(¼)
BD
œ
(½)
BE
Ÿ
(¾)
BF
¿
C… C0
À
C1
Á
C2
Â
C3
Ã
C4
Ä
C5
Å
C6
Æ
C7
Ç
C8
È
C9
É
CA
Ê
CB
Ë
CC
Ì
CD
Í
CE
Î
CF
Ï
D… D0
Ð
D1
Ñ
D2
Ò
D3
Ó
D4
Ô
D5
Õ
D6
Ö
D7
Ř
(×)
D8
Ø
D9
Ù
DA
Ú
DB
Û
DC
Ü
DD
Ý
DE
Þ
DF
ß
E… E0
à
E1
á
E2
â
E3
ã
E4
ä
E5
å
E6
æ
E7
ç
E8
è
E9
é
EA
ê
EB
ë
EC
ì
ED
í
EE
î
EF
ï
F… F0
ð
F1
ñ
F2
ò
F3
ó
F4
ô
F5
õ
F6
ö
F7
ř
(÷)
F8
ø
F9
ù
FA
ú
FB
û
FC
ü
FD
ý
FE
þ
FF
ÿ

Russian

Characters conform to the WIN-1251 native encoding, shown in the Wikipedia article. Exception: the character 0xFF (я) is mapped to 0xB6 upon loading. Therefore any Russian font must contain я at the place 0xB6.

Asian Languages (Korean, Chinese, Japanese)

The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code. Plus, these languages use more than 256 different characters, and an 8 bit table will not hold these.

Statistics

Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, from TDM v1.08, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):

Rank Occurances Letter Remarks Rank Occurances Letter Remarks
1 í 715 25 ć 67
2 é 674 26 è 65
3 á 524 27 ú 56
4 ø 303 Danish 28 ê 52
5 č 288 29 ö 48 German
6 ó 283 30 É 46
7 ü 270 German 31 ñ 37
8 ł 203 Polish 32 õ 32
9 æ 200 Danish 33 ń 26
10 ě 182 34 Ł 24
11 ř 175 Czech 35 Š 21
12 ã 168 36 â 21
13 ž 148 Czech 37 ź 20
14 ý 142 38 ß 18 German
15 ę 141 39 Ó 18
16 ą 140 40 ň 15
17 ż 119 41 Ú 15
18 å 109 Danish 42 Á 13
19 š 99 43 î 12
20 ś 97 44 ť 11
21 ç 91 45 ô 9
22 ä 86 German 46 Ž 8
23 à 83 47 Ż 7
24 ů 77 48 Č 7
25 ć 67 49 ù 6

Although ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters to the fonts is quite important.

Preferably, all foreign letters would be added to the fonts (see Font Patcher). However, if time permits only adding a few, í would be more important than, say, ô.


See Also

Translation resources

Overview of translations

Translation discussions