I18N - Charset: Difference between revisions

Revision as of 12:56, 3 June 2012

Introduction

The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, special charsets are used. The fonts (Carleton for the menu f.i.) are build/patched so that the right characters appear in the right place.

Encodings

all.lang

This file is in UTF-8, and converted with the help of the script devel/gen_lang.pl:

perl devel/gen_lang.pl

This ensures that the generated language files are in their proper encodings (see below).

All other language files

Note that the language files (f.i. strings/german.lang) as well as the readables and the FM dictionariaries are expected to be in the following encodings:

Czech, Hungarian, Slovak, Polish: ISO-8859-2 (not WIN-1250!)
Russian: WIN-1251
French: ISO-8859-15
All other languages: ISO-8859-1 (German, Dutch, Danish, etc.)

The core dictionaries are automatically generated in the right encoding, but make sure that you use the right encoding for the FM dictionary, too!

Character remapping

The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place.

European Languages

This mapping is used for European languages, f.i. Czech, French, German, Spanish, Portuguese, Polish. Note that the double accented characters in Hungarian Ő, ő, Ű and ű look a bit different from Ö, ö, Ü and ü!

In the table below, the original ISO 8859-1 characters are given in () below the TDM character.

Color code:

UnusableUsable in v1.08Changed

	…0	…1	…2	…3	…4	…5	…6	…7	…8	…9	…A	…B	…C	…D	…E	…F
0…	00 –	01 –	02 –	03 –	04 –	05 –	06 –	07 –	08 –	09 –	0A –	0B –	0C –	0D –	0E –	0F –
1…	10 –	11 –	12 –	13 –	14 –	15 –	16 –	17 –	18 –	19 –	1A –	1B –	1C –	1D –	1E –	1F –
2…	20	21 !	22 "	23 #	24 $	25 %	26 &	27 ''	28 (	29 )	2A *	2B +	2C ,	2D -	2E .	2F /
3…	30 0	31 1	32 2	33 3	34 4	35 5	36 6	37 7	38 8	39 9	3A :	3B ;	3C <	3D =	3E >	3F ?
4…	40 @	41 A	42 B	43 C	44 D	45 E	46 F	47 G	48 H	49 I	4A J	4B K	4C L	4D M	4E N	4F O
5…	50 P	51 Q	52 R	53 S	54 T	55 U	56 V	57 W	58 X	59 Y	5A Z	5B [	5C \	5D ]	5E ^	5F _
6…	60 `	61 a	62 b	63 c	64 d	65 e	66 f	67 g	68 h	69 i	6A j	6B k	6C l	6D m	6E n	6F o
7…	70 p	71 q	72 r	73 s	74 t	75 u	76 v	77 w	78 x	79 y	7A z	7B {	7C \|	7D }	7E ~	7F –
8…	80 Ň	81 Ś	82 Ć	83 Ż	84 Ź	85 Ŝ	86 Ĉ	87 Ẑ	88 Ô	89 Ŕ	8A Ǔ	8B Ă	8C Ń	8D –	8E –	8F –
9…	90 –	91 ś	92 ć	93 ż	94 ź	95 ŝ	96 ĉ	97 ẑ	98 ô	99 ŕ	9A ǔ	9B ă	9C ń	9D –	9E –	9F –
A…	A0 NBSP	A1 ň (¡)	A2 Ű (¢)	A3 ě (£)	A4 ű (¤)	A5 Ě (¥)	A6 Š (¦)	A7 §	A8 š (¨)	A9 Ů (©)	AA Ą (ª)	AB Ę («)	AC Č (¬)	AD SHY	AE č (®)	AF ů (¯)
B…	B0 Ő (°)	B1 Ł (±)	B2 Ť (²)	B3 Ď (³)	B4 Ž (´)	B5 ł (µ)	B6 ť (¶)	B7 ď (·)	B8 ž (¸)	B9 ő (¹)	BA ą (º)	BB ę (»)	BC Œ (¼)	BD œ (½)	BE Ÿ (¾)	BF ¿
C…	C0 À	C1 Á	C2 Â	C3 Ã	C4 Ä	C5 Å	C6 Æ	C7 Ç	C8 È	C9 É	CA Ê	CB Ë	CC Ì	CD Í	CE Î	CF Ï
D…	D0 Ð	D1 Ñ	D2 Ò	D3 Ó	D4 Ô	D5 Õ	D6 Ö	D7 Ř (×)	D8 Ø	D9 Ù	DA Ú	DB Û	DC Ü	DD Ý	DE Þ	DF ß
E…	E0 à	E1 á	E2 â	E3 ã	E4 ä	E5 å	E6 æ	E7 ç	E8 è	E9 é	EA ê	EB ë	EC ì	ED í	EE î	EF ï
F…	F0 ð	F1 ñ	F2 ò	F3 ó	F4 ô	F5 õ	F6 ö	F7 ř (÷)	F8 ø	F9 ù	FA ú	FB û	FC ü	FD ý	FE þ	FF ÿ

Russian

The character 0xFF (я) is mapped to 0xB6 upon loading. Therefore any Russian font must contain я at the place 0xB6.

Statistics

Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):

Rank	Occurances	Letter	Remarks
1	í	715
2	é	674
3	á	524
4	ø	303	Danish
5	č	288
6	ó	283
7	ü	270	German
8	ł	203	Polish
9	æ	200	Danish
10	ě	182
11	ř	175	Czech
12	ã	168
13	ž	148	Czech
14	ý	142
15	ę	141
16	ą	140
17	ż	119
18	å	109	Danish
19	š	99
20	ś	97
21	ç	91
22	ä	86	German
23	à	83
24	ů	77
25	ć	67
26	è	65
27	ú	56
28	ê	52
29	ö	48	German
30	É	46
31	ñ	37
32	õ	32
33	ń	26
34	Ł	24
35	Š	21
36	â	21
37	ź	20
38	ß	18	German
39	Ó	18
40	ň	15
41	Ú	15
42	Á	13
43	î	12
44	ť	11
45	ô	9
46	Ž	8
47	Ż	7
48	Č	7
49	ù	6
50	Ś	5
51	ő	5	Hungarian

Althought ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters is quite important.

@@ Line 376: / Line 376: @@
 Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):
-{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2
+{|class="wikitable" border=1 style="border-collapse: collapse; font-size: 85%" cellspacing=0 cellpadding=2
 |-
@@ Line 430: / Line 430: @@
 |ł
 |203
-|
+|Polish
 |-
@@ Line 448: / Line 448: @@
 |ř
 |175
-|
+|Czech
 |-
@@ Line 460: / Line 460: @@
 |ž
 |148
-|
+|Czech
 |-
@@ Line 490: / Line 490: @@
 |å
 |109
-|
+|Danish
 |-

I18N - Charset: Difference between revisions

Revision as of 12:56, 3 June 2012

Contents

Introduction

Encodings

all.lang

All other language files

Character remapping

European Languages

Russian

See Also

Statistics

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools