I18N - Charset: Difference between revisions

Revision as of 18:39, 11 August 2014

Introduction

The D3 code that handles the GUI bitmap font can only load a specific range of bytes as characters. To get the most out of the available entries, special charsets are used. The fonts (Carleton for the menu f.i.) are build/patched so that the right characters appear in the right place.

Encodings

all.lang

This file is in UTF-8, and converted with the help of the script devel/gen_lang.pl:

perl devel/gen_lang.pl

This ensures that the generated language files are in their proper encodings (see below).

All other language files

Note that the language files (f.i. strings/german.lang) as well as the readables and the FM dictionariaries are expected to be in the following encodings:

Czech, Hungarian, Slovak, Polish: ISO-8859-2 (not WIN-1250!)
Russian: WIN-1251
French: ISO-8859-15
All other languages: ISO-8859-1 (German, Dutch, Danish, etc.)

The core dictionaries are automatically generated in the right encoding, but make sure that you use the right encoding for the FM dictionary, too!

Character remapping

The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here. Responsible for the remapping are mapping files, f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place.

See Character mapping for more information.

European Languages

This mapping is used for European languages, f.i. Czech, French, German, Spanish, Portuguese, Polish. Note that the double accented characters in Hungarian Ő, ő, Ű and ű look a bit different from Ö, ö, Ü and ü!

In the table below, the original ISO 8859-1 characters are given in () below the TDM character.

Color code:

UnusableUsable in v1.08Changed

	…0	…1	…2	…3	…4	…5	…6	…7	…8	…9	…A	…B	…C	…D	…E	…F
0…	00 –	01 –	02 –	03 –	04 –	05 –	06 –	07 –	08 –	09 –	0A –	0B –	0C –	0D –	0E –	0F –
1…	10 –	11 –	12 –	13 –	14 –	15 –	16 –	17 –	18 –	19 –	1A –	1B –	1C –	1D –	1E –	1F –
2…	20	21 !	22 "	23 #	24 $	25 %	26 &	27 ''	28 (	29 )	2A *	2B +	2C ,	2D -	2E .	2F /
3…	30 0	31 1	32 2	33 3	34 4	35 5	36 6	37 7	38 8	39 9	3A :	3B ;	3C <	3D =	3E >	3F ?
4…	40 @	41 A	42 B	43 C	44 D	45 E	46 F	47 G	48 H	49 I	4A J	4B K	4C L	4D M	4E N	4F O
5…	50 P	51 Q	52 R	53 S	54 T	55 U	56 V	57 W	58 X	59 Y	5A Z	5B [	5C \	5D ]	5E ^	5F _
6…	60 `	61 a	62 b	63 c	64 d	65 e	66 f	67 g	68 h	69 i	6A j	6B k	6C l	6D m	6E n	6F o
7…	70 p	71 q	72 r	73 s	74 t	75 u	76 v	77 w	78 x	79 y	7A z	7B {	7C \|	7D }	7E ~	7F –
8…	80 Ň	81 Ś	82 Ć	83 Ż	84 Ź	85 Ŝ	86 Ĉ	87 Ẑ	88 Ô	89 Ŕ	8A Ǔ	8B Ă	8C Ń	8D –	8E –	8F –
9…	90 –	91 ś	92 ć	93 ż	94 ź	95 ŝ	96 ĉ	97 ẑ	98 ô	99 ŕ	9A ǔ	9B ă	9C ń	9D –	9E –	9F –
A…	A0 NBSP	A1 ň (¡)	A2 Ű (¢)	A3 ě (£)	A4 ű (¤)	A5 Ě (¥)	A6 Š (¦)	A7 §	A8 š (¨)	A9 Ů (©)	AA Ą (ª)	AB Ę («)	AC Č (¬)	AD SHY	AE č (®)	AF ů (¯)
B…	B0 Ő (°)	B1 Ł (±)	B2 Ť (²)	B3 Ď (³)	B4 Ž (´)	B5 ł (µ)	B6 ť (¶)	B7 ď (·)	B8 ž (¸)	B9 ő (¹)	BA ą (º)	BB ę (»)	BC Œ (¼)	BD œ (½)	BE Ÿ (¾)	BF ¿
C…	C0 À	C1 Á	C2 Â	C3 Ã	C4 Ä	C5 Å	C6 Æ	C7 Ç	C8 È	C9 É	CA Ê	CB Ë	CC Ì	CD Í	CE Î	CF Ï
D…	D0 Ð	D1 Ñ	D2 Ò	D3 Ó	D4 Ô	D5 Õ	D6 Ö	D7 Ř (×)	D8 Ø	D9 Ù	DA Ú	DB Û	DC Ü	DD Ý	DE Þ	DF ß
E…	E0 à	E1 á	E2 â	E3 ã	E4 ä	E5 å	E6 æ	E7 ç	E8 è	E9 é	EA ê	EB ë	EC ì	ED í	EE î	EF ï
F…	F0 ð	F1 ñ	F2 ò	F3 ó	F4 ô	F5 õ	F6 ö	F7 ř (÷)	F8 ø	F9 ù	FA ú	FB û	FC ü	FD ý	FE þ	FF ÿ

Russian

The character 0xFF (я) is mapped to 0xB6 upon loading. Therefore any Russian font must contain я at the place 0xB6.

Asian Languages (Korean, Chinese, Japanese)

The original D3 had support for these languages, so it might be possible to add them to TDM, too. At the moment, however, we lack the fonts and translators. Also, writing from right-to-left (Hebrew) or top-down (Japanese) might be tricky or outright impossible in our GUI without more work in the C++ code.

Statistics

Some of the special characters are used more often then others. Here is a statistic over the entire string set of the TDM core, showing the top 50 most-used characters (excluding a-z, 0-9 and russian characters):

Rank	Occurances	Letter	Remarks	Rank	Occurances	Letter	Remarks
1	í	715		25	ć	67
2	é	674		26	è	65
3	á	524		27	ú	56
4	ø	303	Danish	28	ê	52
5	č	288		29	ö	48	German
6	ó	283		30	É	46
7	ü	270	German	31	ñ	37
8	ł	203	Polish	32	õ	32
9	æ	200	Danish	33	ń	26
10	ě	182		34	Ł	24
11	ř	175	Czech	35	Š	21
12	ã	168		36	â	21
13	ž	148	Czech	37	ź	20
14	ý	142		38	ß	18	German
15	ę	141		39	Ó	18
16	ą	140		40	ň	15
17	ż	119		41	Ú	15
18	å	109	Danish	42	Á	13
19	š	99		43	î	12
20	ś	97		44	ť	11
21	ç	91		45	ô	9
22	ä	86	German	46	Ž	8
23	à	83		47	Ż	7
24	ů	77		48	Č	7
25	ć	67		49	ù	6

Althought ö, ä and ü do not appear that often, with only these and Ü, Ö, Ä and ß, the entire German language works. So adding these letters is quite important.

I18N - Charset: Difference between revisions

Revision as of 18:39, 11 August 2014

Contents

Introduction

Encodings

all.lang

All other language files

Character remapping

European Languages

Russian

Asian Languages (Korean, Chinese, Japanese)

Statistics

See Also

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 28: / Line 28: @@
 The characters are remapped upon loading the dictionary/readable, from their native encoding to the special one that TDM uses and that is described here.  Responsible for the remapping are [[I18N - Character mapping|mapping files]], f.i. "strings/czech.map". If a map file for a specific language is not found, "strings/default.map" is used instead, if this is not found, no remapping takes place.
+See '''[[I18N - Character mapping|Character mapping]]''' for more information.
 == European Languages ==
@@ Line 37: / Line 39: @@
 '''Color code:'''
-{{box|#f0d0d0|Character not displayed by D3 or not defined|Unusable}}{{box|#c0ffc0|Character displayed in v1.08 or newer|Usable in v1.08}}{{box|#d0d0f0|Changed from the ISO-8859-1 default|Changed}}
+{{box|#f0d0d0|Character not displayed by TDM or not defined|Unusable}}{{box|#c0ffc0|Character displayed in v1.08 or newer|Usable in v1.08}}{{box|#d0d0f0|Changed from the ISO-8859-1 default|Changed}}
 {|class="wikitable" border=1 style="border-collapse: collapse; font-size: 95%" cellspacing=0 cellpadding=2 width=100%
@@ Line 381: / Line 383: @@
 |-
+|Rank
+|Occurances
+|Letter
+|Remarks
 |Rank
 |Occurances
@@ Line 390: / Line 396: @@
 |í
 |715
+|
+|25
+|ć
+|67
 |
@@ Line 396: / Line 406: @@
 |é
 |674
+|
+|26
+|è
+|65
 |
@@ Line 402: / Line 416: @@
 |á
 |524
+|
+|27
+|ú
+|56
 |
@@ Line 409: / Line 427: @@
 |303
 |Danish
+|28
+|ê
+|52
+|
 |-
@@ Line 415: / Line 437: @@
 |288
 |
+|29
+|ö
+|48
+|German
 |-
@@ Line 420: / Line 446: @@
 |ó
 |283
+|
+|30
+|É
+|46
 |
@@ Line 427: / Line 457: @@
 |270
 |German
+|31
+|ñ
+|37
+|
 |-
@@ Line 433: / Line 467: @@
 |203
 |Polish
+|32
+|õ
+|32
+|
 |-
@@ Line 439: / Line 477: @@
 |200
 |Danish
+|33
+|ń
+|26
+|
 |-
@@ Line 444: / Line 486: @@
 |ě
 |182
+|
+|34
+|Ł
+|24
 |
@@ Line 451: / Line 497: @@
 |175
 |Czech
+|35
+|Š
+|21
+|
 |-
@@ Line 456: / Line 506: @@
 |ã
 |168
+|
+|36
+|â
+|21
 |
@@ Line 463: / Line 517: @@
 |148
 |Czech
+|37
+|ź
+|20
+|
 |-
@@ Line 469: / Line 527: @@
 |142
 |
+|38
+|ß
+|18
+|German
 |-
@@ Line 474: / Line 536: @@
 |ę
 |141
+|
+|39
+|Ó
+|18
 |
@@ Line 480: / Line 546: @@
 |ą
 |140
+|
+|40
+|ň
+|15
 |
@@ Line 486: / Line 556: @@
 |ż
 |119
+|
+|41
+|Ú
+|15
 |
@@ Line 493: / Line 567: @@
 |109
 |Danish
+|42
+|Á
+|13
+|
 |-
@@ Line 498: / Line 576: @@
 |š
 |99
+|
+|43
+|î
+|12
 |
@@ Line 504: / Line 586: @@
 |ś
 |97
+|
+|44
+|ť
+|11
 |
@@ Line 510: / Line 596: @@
 |ç
 |91
+|
+|45
+|ô
+|9
 |
@@ Line 517: / Line 607: @@
 |86
 |German
+|46
+|Ž
+|8
+|
 |-
@@ Line 522: / Line 616: @@
 |à
 |83
+|
+|47
+|Ż
+|7
 |
@@ Line 528: / Line 626: @@
 |ů
 |77
+|
+|48
+|Č
+|7
 |
@@ Line 535: / Line 637: @@
 |67
 |
-|-
-|26
-|è
-|65
-|
-|-
-|27
-|ú
-|56
-|
-|-
-|28
-|ê
-|52
-|
-|-
-|29
-|ö
-|48
-|German
-|-
-|30
-|É
-|46
-|
-|-
-|31
-|ñ
-|37
-|
-|-
-|32
-|õ
-|32
-|
-|-
-|33
-|ń
-|26
-|
-|-
-|34
-|Ł
-|24
-|
-|-
-|35
-|Š
-|21
-|
-|-
-|36
-|â
-|21
-|
-|-
-|37
-|ź
-|20
-|
-|-
-|38
-|ß
-|18
-|German
-|-
-|39
-|Ó
-|18
-|
-|-
-|40
-|ň
-|15
-|
-|-
-|41
-|Ú
-|15
-|
-|-
-|42
-|Á
-|13
-|
-|-
-|43
-|î
-|12
-|
-|-
-|44
-|ť
-|11
-|
-|-
-|45
-|ô
-|9
-|
-|-
-|46
-|Ž
-|8
-|
-|-
-|47
-|Ż
-|7
-|
-|-
-|48
-|Č
-|7
-|
-|-
 |49
 |ù
 |6
 |
-|-
-|50
-|Ś
-|5
-|
-|-
-|51
-|ő
-|5
-|Hungarian
 |}