Multilanguage Display: Difference between revisions

From The DarkMod Wiki
Jump to navigationJump to search
Geep (talk | contribs)
m Background: link I18n - Charset page
Geep (talk | contribs)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
[PAGE IN PROGRESS]
== Background ==
== Background ==
''This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation to detail this was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.''
''This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation to detail this was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.''
Line 6: Line 4:
In TDM's main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.)
In TDM's main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.)


Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there is are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding file, e.g., french.lang, german.lang, etc., in a language-specific 8-bit encoding. When you play TDM, and select a given language, the corresponding .lang file (if needed) is read.
Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding file, e.g., french.lang, german.lang, etc., in a language-specific 8-bit encoding. When you play TDM, and select a given language, the corresponding .lang file (if provided) is read.


Except for Cyrillic Russian, these files use encodings within the ISO-8859 family (as detailed in [[I18N_-_Charset]]). Any specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members.
Except for Cyrillic Russian, these files use encodings within the ISO-8859 family (as detailed in [[I18N_-_Charset]]). Any specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members.
Line 104: Line 102:
*for iso 2: Put ´ (acute accent 0xBD)
*for iso 2: Put ´ (acute accent 0xBD)


== Details of TDM and ISO Char Sets and Potential Tricks and Limitations ==
== TDM & ISO Char Sets - Details, Potential Tricks, Limitations ==
[WORK IN PROGRESS][COLOR CODING NOT RIGHT YET]


The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified:
The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified:
  "Rows are a combine of all the printable 8-bit codepoints in the range 0x80-0xff defined in ISO-8859-1, -2, -9, -15, and -16. The rows are ordered by Unicode number. If a particular ISO standard does not include the character, leave that cell blank."
  "Rows are a combination of all the printable 8-bit codepoints in the range 0x80-0xff defined in ISO-8859-1, -2, -9, -15, and -16. The rows are ordered by Unicode number. If a particular ISO standard does not include the character, leave that cell blank."
Subsequently, after Excel import, the TDM and Comments columns were filled in by Geep, and color coding added. Language mappings were independently color coded, then verified against existing language maps. Finally converted to wikitable format.
Subsequently, the TDM and Comments columns were filled in by Geep, and color coding added.
Language mappings were independently color coded, then verified against existing language maps.
This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required.
This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required.


Line 298: Line 295:
|-
|-


| U+00D0 || Ð || 0xD0 || 0xD0 || || style="color:#ff0000; font-weight:bold;" | n/a || 0xD0 || || latin capital letter eth || Same glyph as U+0110 latin capital letter d with stroke
| U+00D0 || Ð || 0xD0 || 0xD0 || || style="background:#43a8cc;" | n/a || 0xD0 || || latin capital letter eth || Same glyph as U+0110 latin capital letter d with stroke
|-
|-


Line 376: Line 373:
|-
|-


| U+00EA || ê || 0xEA || 0xEA || style="background:#ffe4b5;" | 0xBD || 0xEA || 0xEA || 0xEA || latin small letter e with circumflex || Hack for iso 2: Put ´ (acute accent, 0xBD) to get ê for "Português"
| U+00EA || ê || 0xEA || 0xEA || style="background:#ebbb54;color:#ff0000;" | 0xBD || 0xEA || 0xEA || 0xEA || latin small letter e with circumflex || Hack for iso 2: Put ´ (acute accent, 0xBD) to get ê for "Português"
|-
|-


Line 397: Line 394:
|-
|-


| U+00F1 || ñ || 0xF1 || 0xF1 || style="background:#ffe4b5;" | 0xA8 || 0xF1 || 0xF1 || style="background:#ffe4b5;" | 0xB6 || latin small letter n with tilde || Hacks to get ñ for "Español": for iso 2, put ¨ (diaeresis 0xA8); for iso 16, put ¶ (0xB6)  
| U+00F1 || ñ || 0xF1 || 0xF1 || style="background:#ebbb54;color:#ff0000;" | 0xA8 || 0xF1 || 0xF1 || style="background:#ebbb54;color:#ff0000;" | 0xB6 || latin small letter n with tilde || Hacks to get ñ for "Español": for iso 2, put ¨ (diaeresis 0xA8); for iso 16, put ¶ (0xB6)  
|-
|-


Line 442: Line 439:
|-
|-


| U+0102 || Ă || 0x8B || || style="background:#ffe4b5;" | 0xC3 || || || style="background:#ffe4b5;" | 0xC3 || latin capital letter a with breve ||  
| U+0102 || Ă || 0x8B || || style="background:#ebbb54;" | 0xC3 || || || style="background:#ebbb54;" | 0xC3 || latin capital letter a with breve ||  
|-
|-


| U+0103 || ă || 0x9B || || style="background:#ffe4b5;" | 0xE3 || || || style="background:#ffe4b5;" | 0xE3 || latin small letter a with breve ||  
| U+0103 || ă || 0x9B || || style="background:#ebbb54;" | 0xE3 || || || style="background:#ebbb54;" | 0xE3 || latin small letter a with breve ||  
|-
|-


| U+0104 || Ą || 0xAA || style="color:#7a7a7a; font-style:italic;" | ª || 0xA1 || style="color:#7a7a7a; font-style:italic;" | ª || style="color:#7a7a7a; font-style:italic;" | ª || 0xA1 || latin capital letter a with ogonek ||  
| U+0104 || Ą || 0xAA || style="background:#9ab4e3" | ª || style="background:#ebbb54;" | 0xA1 || style="background:#9ab4e3" | ª || style="background:#9ab4e3" | ª || style="background:#ebbb54;" | 0xA1 || latin capital letter a with ogonek ||  
|-
|-


| U+0105 || ą || 0xBA || style="color:#7a7a7a; font-style:italic;" | º || 0xB1 || style="color:#7a7a7a; font-style:italic;" | º || style="color:#7a7a7a; font-style:italic;" | º || 0xA2 || latin small letter a with ogonek ||  
| U+0105 || ą || 0xBA || style="background:#9ab4e3" | º || style="background:#ebbb54;" | 0xB1 || style="background:#9ab4e3" | º || style="background:#9ab4e3" | º || style="background:#ebbb54;" | 0xA2 || latin small letter a with ogonek ||  
|-
|-


| U+0106 || Ć || 0x82 || || 0xC6 || || || 0xC5 || latin capital letter c with acute ||  
| U+0106 || Ć || 0x82 || || style="background:#ebbb54;" | 0xC6 || || || style="background:#ebbb54;" | 0xC5 || latin capital letter c with acute ||  
|-
|-


| U+0107 || ć || 0x92 || || 0xE6 || || || 0xE5 || latin small letter c with acute ||  
| U+0107 || ć || 0x92 || || style="background:#ebbb54;" | 0xE6 || || || style="background:#ebbb54;" | 0xE5 || latin small letter c with acute ||  
|-
|- style="background:#dbab8a;"


| U+0108 || Ĉ || 0x86 || || || || || || latin capital c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xC6
| U+0108 || Ĉ || 0x86 || || || || || || latin capital c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xC6
|-
|- style="background:#dbab8a;"


| U+0109 || ĉ || 0x96 || || || || || || latin small c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xE6
| U+0109 || ĉ || 0x96 || || || || || || latin small c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xE6
|-
|-


| U+010C || Č || 0xAC || style="color:#7a7a7a; font-style:italic;" | ¬ || 0xC8 || style="color:#7a7a7a; font-style:italic;" | ¬ || style="color:#7a7a7a; font-style:italic;" | ¬ || 0xB2 || latin capital letter c with caron || Substitute char "not sign" at 0xAC
| U+010C || Č || 0xAC || style="background:#9ab4e3" | ¬ || style="background:#ebbb54;" | 0xC8 || style="background:#9ab4e3" | ¬ || style="background:#9ab4e3" | ¬ || style="background:#ebbb54;" | 0xB2 || latin capital letter c with caron || Substitute char "not sign" at 0xAC
|-
|-


| U+010D || č || 0xAE || style="color:#7a7a7a; font-style:italic;" | ® || 0xE8 || style="color:#7a7a7a; font-style:italic;" | ® || style="color:#7a7a7a; font-style:italic;" | ® || 0xB9 || latin small letter c with caron || Substitute char "registration sign" at 0xAE
| U+010D || č || 0xAE || style="background:#9ab4e3" | ® || style="background:#ebbb54;" | 0xE8 || style="background:#9ab4e3" | ® || style="background:#9ab4e3" | ® || style="background:#ebbb54;" | 0xB9 || latin small letter c with caron || Substitute char "registration sign" at 0xAE
|-
|-


| U+010E || Ď || 0xB3 || style="color:#7a7a7a; font-style:italic;" | ³ || 0xCF || style="color:#7a7a7a; font-style:italic;" | ³ || style="color:#7a7a7a; font-style:italic;" | ³ || || latin capital letter d with caron || Substitute char "superscript three" at 0xB3
| U+010E || Ď || 0xB3 || style="background:#9ab4e3" | ³ || style="background:#ebbb54;" | 0xCF || style="background:#9ab4e3" | ³ || style="background:#9ab4e3" | ³ || || latin capital letter d with caron || Substitute char "superscript three" at 0xB3
|-
|-


| U+010F || ď || 0xB7 || style="color:#7a7a7a; font-style:italic;" | · || 0xEF || style="color:#7a7a7a; font-style:italic;" | · || style="color:#7a7a7a; font-style:italic;" | · || || latin small letter d with caron || Substitute char "middle dot" at 0xB7
| U+010F || ď || 0xB7 || style="background:#9ab4e3; font-weight:bold" | · || style="background:#ebbb54;" | 0xEF || style="background:#9ab4e3; font-weight:bold" | · || style="background:#9ab4e3; font-weight:bold" | · || || latin small letter d with caron || Substitute char "middle dot" at 0xB7
|-
|-


| U+0110 || Đ || 0xD0 || 0xD0 || 0xD0 || style="color:#ff0000; font-weight:bold;" | n/a || || 0xD0 || latin capital letter d with stroke || Same glyph as U+00D0 latin capital letter eth
| U+0110 || Đ || 0xD0 || 0xD0 || 0xD0 || style="background:#43a8cc;" | n/a || || 0xD0 || latin capital letter d with stroke || Same glyph as U+00D0 latin capital letter eth
|-
|-


| U+0111 || đ || 0x90 || || 0xF0 || style="color:#ff0000; font-weight:bold;" | n/a || || 0xF0 || latin small letter d with stroke ||  
| U+0111 || đ || 0x90 || || style="background:#ebbb54;" | 0xF0 || style="background:#43a8cc;" | n/a || || style="background:#ebbb54;" | 0xF0 || latin small letter d with stroke ||  
|-
|-


| U+0118 || Ę || 0xAB || style="color:#7a7a7a; font-style:italic;" | « || style="background:#ffe4b5;" | 0xCA || style="color:#7a7a7a; font-style:italic;" | « || style="color:#7a7a7a; font-style:italic;" | « || 0xDD || latin capital letter e with ogonek || Substitute char "left-pointing double angle quotation mark" at 0xAB
| U+0118 || Ę || 0xAB || style="background:#9ab4e3" | « || style="background:#ebbb54;" | 0xCA || style="background:#9ab4e3" | « || style="background:#9ab4e3" | « || style="background:#ebbb54;" | 0xDD || latin capital letter e with ogonek || Substitute char "left-pointing double angle quotation mark" at 0xAB
|-
|-


| U+0119 || ę || 0xBB || style="color:#7a7a7a; font-style:italic;" | » || style="background:#ffe4b5;" | 0xEA || style="color:#7a7a7a; font-style:italic;" | » || style="color:#7a7a7a; font-style:italic;" | » || 0xFD || latin small letter e with ogonek || Substitute char "right-pointing double angle quotation mark" at 0xBB
| U+0119 || ę || 0xBB || style="background:#9ab4e3" | » || style="background:#ebbb54;" | 0xEA || style="background:#9ab4e3" | » || style="background:#9ab4e3" | » || style="background:#ebbb54;" | 0xFD || latin small letter e with ogonek || Substitute char "right-pointing double angle quotation mark" at 0xBB
|-
|-


| U+011A || Ě || 0xA5 || style="color:#7a7a7a; font-style:italic;" | ¥ || 0xCC || style="color:#7a7a7a; font-style:italic;" | ¥ || style="color:#7a7a7a; font-style:italic;" | ¥ || || latin capital letter e with caron || Substitute char "yen sign" at 0xA5
| U+011A || Ě || 0xA5 || style="background:#9ab4e3" | ¥ || style="background:#ebbb54;" | 0xCC || style="background:#9ab4e3" | ¥ || style="background:#9ab4e3" | ¥ || || latin capital letter e with caron || Substitute char "yen sign" at 0xA5
|-
|-


| U+011B || ě || 0xA3 || style="color:#7a7a7a; font-style:italic;" | £ || 0xEC || style="color:#7a7a7a; font-style:italic;" | £ || style="color:#7a7a7a; font-style:italic;" | £ || || latin small letter e with caron || Substitute char "pound sign" at 0xA3
| U+011B || ě || 0xA3 || style="background:#9ab4e3" | £ || style="background:#ebbb54;" | 0xEC || style="background:#9ab4e3" | £ || style="background:#9ab4e3" | £ || || latin small letter e with caron || Substitute char "pound sign" at 0xA3
|-
|-


| U+011E || Ğ || 0x88 || || || 0xD0 || || || latin capital letter g with breve || As of TDM 2.13
| U+011E || Ğ || 0x88 || || || style="background:#ebbb54;" | 0xD0 || || || latin capital letter g with breve || As of TDM 2.13 (TDM codemap), 2.15 (turkish.map)
|-
|-


| U+011F || ğ || 0x98 || || || 0xF0 || || || latin small letter g with breve || As of TDM 2.13
| U+011F || ğ || 0x98 || || || style="background:#ebbb54;" | 0xF0 || || || latin small letter g with breve || As of TDM 2.13 (TDM codemap), 2.15 (turkish.map)
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+0130 || İ || || || || style="background:#ffe4b5;" | 0xDD || || || latin capital letter i with dot above || Turkish: utf8 "İ" will be mapped to "Î" (0xCE)
| U+0130 || İ || || || || style="background:#ebbb54;" | 0xDD || || || latin capital letter i with dot above || Turkish: utf8 "İ" will be mapped to "Î" (0xCE)
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+0131 || ı || || || || style="background:#ffe4b5;" | 0xFD || || || latin small letter dotless i || Turkish: utf8 "ı" will be mapped to ASCII "i" (0x69)
| U+0131 || ı || || || || style="background:#ebbb54;" | 0xFD || || || latin small letter dotless i || Turkish: utf8 "ı" will be mapped to ASCII "i" (0x69)
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


Line 520: Line 517:
|-
|-


| U+0141 || Ł || 0xB1 || style="color:#7a7a7a; font-style:italic;" | ± || 0xA3 || style="color:#7a7a7a; font-style:italic;" | ± || style="color:#7a7a7a; font-style:italic;" | ± || 0xA3 || latin capital letter l with stroke || Substitute char "plus-minus sign" at 0xB1
| U+0141 || Ł || 0xB1 || style="background:#9ab4e3" | ± || style="background:#ebbb54;" | 0xA3 || style="background:#9ab4e3" | ± || style="background:#9ab4e3" | ± || style="background:#ebbb54;" | 0xA3 || latin capital letter l with stroke || Substitute char "plus-minus sign" at 0xB1
|-
|-


| U+0142 || ł || 0xB5 || style="color:#7a7a7a; font-style:italic;" | µ || 0xB3 || style="color:#7a7a7a; font-style:italic;" | µ || style="color:#7a7a7a; font-style:italic;" | µ || 0xB3 || latin small letter l with stroke || Substitute char "micro sign" at 0xB5
| U+0142 || ł || 0xB5 || style="background:#9ab4e3" | µ || style="background:#ebbb54;" | 0xB3 || style="background:#9ab4e3" | µ || style="background:#9ab4e3" | µ || style="background:#ebbb54;" | 0xB3 || latin small letter l with stroke || Substitute char "micro sign" at 0xB5
|-
|-


| U+0143 || Ń || 0x8C || || 0xD1 || || || 0xD1 || latin capital letter n with acute ||  
| U+0143 || Ń || 0x8C || || style="background:#ebbb54;" | 0xD1 || || || style="background:#ebbb54;" | 0xD1 || latin capital letter n with acute ||  
|-
|-


| U+0144 || ń || 0x9C || || 0xF1 || || || 0xF1 || latin small letter n with acute ||  
| U+0144 || ń || 0x9C || || style="background:#ebbb54;" | 0xF1 || || || style="background:#ebbb54;" | 0xF1 || latin small letter n with acute ||  
|-
|-


| U+0147 || Ň || 0x80 || || 0xD2 || || || || latin capital letter n with caron ||  
| U+0147 || Ň || 0x80 || || style="background:#ebbb54;" | 0xD2 || || || || latin capital letter n with caron ||  
|-
|-


| U+0148 || ň || 0xA1 || style="color:#7a7a7a; font-style:italic;" | ¡ || 0xF2 || style="color:#7a7a7a; font-style:italic;" | ¡ || style="color:#7a7a7a; font-style:italic;" | ¡ || || latin small letter n with caron || Substitute char "inverted exclamation mark" at 0xA1
| U+0148 || ň || 0xA1 || style="background:#9ab4e3" | ¡ || style="background:#ebbb54;" | 0xF2 || style="background:#9ab4e3" | ¡ || style="background:#9ab4e3" | ¡ || || latin small letter n with caron || Substitute char "inverted exclamation mark" at 0xA1
|-
|-


| U+0150 || Ő || 0xB0 || style="color:#7a7a7a; font-style:italic;" | ° || style="background:#ffe4b5;" | 0xD5 || style="color:#7a7a7a; font-style:italic;" | ° || style="color:#7a7a7a; font-style:italic;" | ° || style="background:#ffe4b5;" | 0xD5 || latin capital letter o with double acute || Similiar to Ö, used in Hungarian. Substitute char "degree sign" at 0xB0
| U+0150 || Ő || 0xB0 || style="background:#9ab4e3" | ° || style="background:#ebbb54;" | 0xD5 || style="background:#9ab4e3" | ° || style="background:#9ab4e3" | ° || style="background:#ebbb54;" | 0xD5 || latin capital letter o with double acute || Similiar to Ö, used in Hungarian. Substitute char "degree sign" at 0xB0
|-
|-


| U+0151 || ő || 0xB9 || style="color:#7a7a7a; font-style:italic;" | ¹ || style="background:#ffe4b5;" | 0xF5 || style="color:#7a7a7a; font-style:italic;" | ¹ || style="color:#7a7a7a; font-style:italic;" | ¹ || style="background:#ffe4b5;" | 0xF5 || latin small letter o with double acute || Similiar to ö, used in Hungarian. Substitute char "superscript 1" at 0xB9
| U+0151 || ő || 0xB9 || style="background:#9ab4e3" | ¹ || style="background:#ebbb54;" | 0xF5 || style="background:#9ab4e3" | ¹ || style="background:#9ab4e3" | ¹ || style="background:#ebbb54;" | 0xF5 || latin small letter o with double acute || Similiar to ö, used in Hungarian. Substitute char "superscript 1" at 0xB9
|-
|-


| U+0152 || Œ || 0xBC || style="color:#7a7a7a; font-style:italic;" | ¼ || || style="color:#7a7a7a; font-style:italic;" | ¼ || 0xBC || 0xBC || latin capital ligature oe || Substitute char "vulgar fraction one quarter" at 0xBC
| U+0152 || Œ || 0xBC || style="background:#9ab4e3" | ¼ || || style="background:#9ab4e3" | ¼ || 0xBC || 0xBC || latin capital ligature oe || Substitute char "vulgar fraction one quarter" at 0xBC
|-
|-


| U+0153 || œ || 0xBD || style="color:#7a7a7a; font-style:italic;" | ½ || || style="color:#7a7a7a; font-style:italic;" | ½ || 0xBD || 0xBD || latin small ligature oe || Substitute char "vulgar fraction one half" at 0xBD
| U+0153 || œ || 0xBD || style="background:#9ab4e3" | ½ || || style="background:#9ab4e3" | ½ || 0xBD || 0xBD || latin small ligature oe || Substitute char "vulgar fraction one half" at 0xBD
|-
|-


| U+0154 || Ŕ || 0x89 || || 0xC0 || || || || latin capital letter r with acute ||  
| U+0154 || Ŕ || 0x89 || || style="background:#ebbb54;" | 0xC0 || || || || latin capital letter r with acute ||  
|-
|-


| U+0155 || ŕ || 0x99 || || 0xE0 || || || || latin small letter r with acute ||  
| U+0155 || ŕ || 0x99 || || style="background:#ebbb54;" | 0xE0 || || || || latin small letter r with acute ||  
|-
|-


| U+0158 || Ř || 0xD7 || style="color:#7a7a7a; font-style:italic;" | × || style="background:#ffe4b5;" | 0xD8 || style="color:#7a7a7a; font-style:italic;" | × || style="color:#7a7a7a; font-style:italic;" | × || || latin capital letter r with caron || Substitute char "multiple sign" at 0xD7
| U+0158 || Ř || 0xD7 || style="background:#9ab4e3" | × || style="background:#ebbb54;" | 0xD8 || style="background:#9ab4e3" | × || style="background:#9ab4e3" | × || || latin capital letter r with caron || Substitute char "multiple sign" at 0xD7
|-
|-


| U+0159 || ř || 0xF7 || style="color:#7a7a7a; font-style:italic;" | ÷ || style="background:#ffe4b5;" | 0xF8 || style="color:#7a7a7a; font-style:italic;" | ÷ || style="color:#7a7a7a; font-style:italic;" | ÷ || || latin small letter r with caron || Substitute char "divide sign" at 0xF7
| U+0159 || ř || 0xF7 || style="background:#9ab4e3" | ÷ || style="background:#ebbb54;" | 0xF8 || style="background:#9ab4e3" | ÷ || style="background:#9ab4e3" | ÷ || || latin small letter r with caron || Substitute char "divide sign" at 0xF7
|-
|-


| U+015A || Ś || 0x81 || || 0xA6 || || || style="background:#ffe4b5;" | 0xD7 || latin capital letter s with acute ||  
| U+015A || Ś || 0x81 || || style="background:#ebbb54;" | 0xA6 || || || style="background:#ebbb54;" | 0xD7 || latin capital letter s with acute ||  
|-
|-


| U+015B || ś || 0x91 || || 0xB6 || || || style="background:#ffe4b5;" | 0xF7 || latin small letter s with acute ||  
| U+015B || ś || 0x91 || || style="background:#ebbb54;" | 0xB6 || || || style="background:#ebbb54;" | 0xF7 || latin small letter s with acute ||  
|-
|- style="background:#dbab8a;"


| U+015C || Ŝ || 0x85 || || || || || || latin capital letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xDE
| U+015C || Ŝ || 0x85 || || || || || || latin capital letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xDE
|-
|- style="background:#dbab8a;"


| U+015D || ŝ || 0x95 || || || || || || latin small letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xFE
| U+015D || ŝ || 0x95 || || || || || || latin small letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xFE
|-
|-


| U+015E || Ş || 0x8D || || 0xAA || style="background:#ffe4b5;" | 0xDE || || || latin capital letter s with cedilla || Can stand in for "...comma under"
| U+015E || Ş || 0x8D || || style="background:#ebbb54;" | 0xAA || style="background:#ebbb54;" | 0xDE || || || latin capital letter s with cedilla || Can stand in for "...comma under"
|-
|-


| U+015F || ş || 0x9D || || 0xBA || style="background:#ffe4b5;" | 0xFE || || || latin small letter s with cedilla || Can stand in for "...comma under"
| U+015F || ş || 0x9D || || style="background:#ebbb54;" | 0xBA || style="background:#ebbb54;" | 0xFE || || || latin small letter s with cedilla || Can stand in for "...comma under"
|-
|-


| U+0160 || Š || 0xA6 || style="color:#7a7a7a; font-style:italic;" | ¦ || 0xA9 || style="color:#7a7a7a; font-style:italic;" | ¦ || 0xA6 || 0xA6 || latin capital letter s with caron || Substitute char "broken bar" at 0xA6
| U+0160 || Š || 0xA6 || style="background:#9ab4e3" | ¦ || style="background:#ebbb54;" | 0xA9 || style="background:#9ab4e3" | ¦ || 0xA6 || 0xA6 || latin capital letter s with caron || Substitute char "broken bar" at 0xA6
|-
|-


| U+0161 || š || 0xA8 || style="color:#7a7a7a; font-style:italic;" | ¨ || 0xB9 || style="color:#7a7a7a; font-style:italic;" | ¨ || 0xA8 || 0xA8 || latin small letter s with caron || Substitute char "diaeresis" at 0xA8
| U+0161 || š || 0xA8 || style="background:#9ab4e3" | ¨ || style="background:#ebbb54;" | 0xB9 || style="background:#9ab4e3" | ¨ || 0xA8 || 0xA8 || latin small letter s with caron || Substitute char "diaeresis" at 0xA8
|-
|-


| U+0162 || Ţ || 0x8E || || style="background:#ffe4b5;" | 0xDE || || || || latin capital letter t with cedilla || Can stand in for "...comma under"
| U+0162 || Ţ || 0x8E || || style="background:#ebbb54;" | 0xDE || || || || latin capital letter t with cedilla || Can stand in for "...comma under"
|-
|-


| U+0163 || ţ || 0x9E || || style="background:#ffe4b5;" | 0xFE || || || || latin small letter t with cedilla || Can stand in for "...comma under"
| U+0163 || ţ || 0x9E || || style="background:#ebbb54;" | 0xFE || || || || latin small letter t with cedilla || Can stand in for "...comma under"
|-
|-


| U+0164 || Ť || 0xB2 || style="color:#7a7a7a; font-style:italic;" | ² || 0xAB || style="color:#7a7a7a; font-style:italic;" | ² || style="color:#7a7a7a; font-style:italic;" | ² || || latin capital letter t with caron || Substitute char "superscript two" at 0xB2
| U+0164 || Ť || 0xB2 || style="background:#9ab4e3" | ² || style="background:#ebbb54;" | 0xAB || style="background:#9ab4e3" | ² || style="background:#9ab4e3" | ² || || latin capital letter t with caron || Substitute char "superscript two" at 0xB2
|-
|-


| U+0165 || ť || 0xB6 || style="color:#7a7a7a; font-style:italic;" | ¶ || 0xBB || style="color:#7a7a7a; font-style:italic;" | ¶ || style="color:#7a7a7a; font-style:italic;" | ¶ || || latin small letter t with caron || Substitute char "pilcrow sign" at 0xB6
| U+0165 || ť || 0xB6 || style="background:#9ab4e3" | ¶ || style="background:#ebbb54;" | 0xBB || style="background:#9ab4e3" | ¶ || style="background:#9ab4e3" | ¶ || || latin small letter t with caron || Substitute char "pilcrow sign" at 0xB6
|-
|-


| U+016E || Ů || 0xA9 || style="color:#7a7a7a; font-style:italic;" | © || 0xD9 || style="color:#7a7a7a; font-style:italic;" | © || style="color:#7a7a7a; font-style:italic;" | © || || latin capital letter u with ring above || Substitute char "copyright sign" at 0xA9
| U+016E || Ů || 0xA9 || style="background:#9ab4e3" | © || style="background:#ebbb54;" | 0xD9 || style="background:#9ab4e3" | © || style="background:#9ab4e3" | © || || latin capital letter u with ring above || Substitute char "copyright sign" at 0xA9
|-
|-


| U+016F || ů || 0xAF || style="color:#7a7a7a; font-style:italic;" | ¯ || 0xF9 || style="color:#7a7a7a; font-style:italic;" | ¯ || style="color:#7a7a7a; font-style:italic;" | ¯ || || latin small letter u with ring above || Substitute char "macron" at 0xAF
| U+016F || ů || 0xAF || style="background:#9ab4e3" | ¯ || style="background:#ebbb54;" | 0xF9 || style="background:#9ab4e3" | ¯ || style="background:#9ab4e3" | ¯ || || latin small letter u with ring above || Substitute char "macron" at 0xAF
|-
|-


| U+0170 || Ű || 0xA2 || style="color:#7a7a7a; font-style:italic;" | ¢ || 0xDB || style="color:#7a7a7a; font-style:italic;" | ¢ || style="color:#7a7a7a; font-style:italic;" | ¢ || style="background:#ffe4b5;" | 0xD8 || latin capital letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "cent sign" at 0xA2
| U+0170 || Ű || 0xA2 || style="background:#9ab4e3" | ¢ || style="background:#ebbb54;" | 0xDB || style="background:#9ab4e3" | ¢ || style="background:#9ab4e3" | ¢ || style="background:#ebbb54;" | 0xD8 || latin capital letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "cent sign" at 0xA2
|-
|-


| U+0171 || ű || 0xA4 || style="color:#7a7a7a; font-style:italic;" | ¤ || 0xFB || style="color:#7a7a7a; font-style:italic;" | ¤ || style="color:#7a7a7a; font-style:italic;" | ¤ || style="background:#ffe4b5;" | 0xF8 || latin small letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "currency sign" at 0xA4
| U+0171 || ű || 0xA4 || style="background:#9ab4e3" | ¤ || style="background:#ebbb54;" | 0xFB || style="background:#9ab4e3" | ¤ || style="background:#9ab4e3" | ¤ || style="background:#ebbb54;" | 0xF8 || latin small letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "currency sign" at 0xA4
|-
|-


| U+0178 || Ÿ || 0xBE || style="color:#7a7a7a; font-style:italic;" | ¾ || || style="color:#7a7a7a; font-style:italic;" | ¾ || 0xBE || 0xBE || latin capital letter y with diaeresis || Substitute char "vulgar fraction three quarters" at 0xBE
| U+0178 || Ÿ || 0xBE || style="background:#9ab4e3" | ¾ || || style="background:#9ab4e3" | ¾ || 0xBE || 0xBE || latin capital letter y with diaeresis || Substitute char "vulgar fraction three quarters" at 0xBE
|-
|-


| U+0179 || Ź || 0x84 || || 0xAC || || || style="color:#7a7a7a; font-style:italic;" | 0xAC || latin capital letter z with acute ||
| U+0179 || Ź || 0x84 || || style="background:#ebbb54;" | 0xAC || || || style="background:#ebbb54;" | 0xAC || latin capital letter z with acute ||
|-
|-


| U+017A || ź || 0x94 || || 0xBC || || || style="background:#ffe4b5;" | 0xAE || latin small letter z with acute ||
| U+017A || ź || 0x94 || || style="background:#ebbb54;" | 0xBC || || || style="background:#ebbb54;" | 0xAE || latin small letter z with acute ||
|-
|-


| U+017B || Ż || 0x83 || || 0xAF || || || style="background:#ffe4b5;" | 0xAF || latin capital letter z with dot above ||
| U+017B || Ż || 0x83 || || style="background:#ebbb54;" | 0xAF || || || style="background:#ebbb54;" | 0xAF || latin capital letter z with dot above ||
|-
|-


| U+017C || ż || 0x93 || || 0xBF || || || style="background:#ffe4b5;" | 0xBF || latin small letter z with dot above ||
| U+017C || ż || 0x93 || || style="background:#ebbb54;" | 0xBF || || || style="background:#ebbb54;" | 0xBF || latin small letter z with dot above ||
|-
|-


| U+017D || Ž || 0xB4 || style="color:#7a7a7a; font-style:italic;" | ´ || 0xAE || style="color:#7a7a7a; font-style:italic;" | ´ || 0xB4 || 0xB4 || latin capital letter z with caron || Substitute char "accute accent" at 0xB4
| U+017D || Ž || 0xB4 || style="background:#9ab4e3" | ´ || style="background:#ebbb54;" | 0xAE || style="background:#9ab4e3" | ´ || 0xB4 || 0xB4 || latin capital letter z with caron || Substitute char "accute accent" at 0xB4
|-
|-


| U+017E || ž || 0xB8 || style="color:#7a7a7a; font-style:italic;" | ¸ || 0xBE || style="color:#7a7a7a; font-style:italic;" | ¸ || 0xB8 || 0xB8 || latin small letter z with caron || Substitute char "cedilla" at 0xB8
| U+017E || ž || 0xB8 || style="background:#9ab4e3" | ¸ || style="background:#ebbb54;" | 0xBE || style="background:#9ab4e3" | ¸ || 0xB8 || 0xB8 || latin small letter z with caron || Substitute char "cedilla" at 0xB8
|-
|- style="background:#dbab8a;"


| U+01D3 || Ǔ || 0x8A || || || || || || latin capital u with caron || Not found in ISO-8859. Pinyin tone marking
| U+01D3 || Ǔ || 0x8A || || || || || || latin capital u with caron || Not found in ISO-8859. Pinyin tone marking
|-
|- style="background:#dbab8a;"


| U+01D4 || ǔ || 0x9A || || || || || || latin small u with caron || Not found in ISO-8859. Pinyin tone marking
| U+01D4 || ǔ || 0x9A || || || || || || latin small u with caron || Not found in ISO-8859. Pinyin tone marking
|-
|-


| U+0218 || Ș || 0x8D || || || || || style="background:#ffe4b5;" | 0xAA || latin capital letter s with comma below || See also "...with cedilla"
| U+0218 || Ș || 0x8D || || || || || style="background:#ebbb54;" | 0xAA || latin capital letter s with comma below || See also "...with cedilla"
|-
|-


| U+0219 || ș || 0x9D || || || || || style="background:#ffe4b5;" | 0xBA || latin small letter s with comma below || See also "...with cedilla"
| U+0219 || ș || 0x9D || || || || || style="background:#ebbb54;" | 0xBA || latin small letter s with comma below || See also "...with cedilla"
|-
|-


| U+021A || Ț || 0x8E || || || || || style="background:#ffe4b5;" | 0xDE || latin capital letter t with comma below || See also "...with cedilla"
| U+021A || Ț || 0x8E || || || || || style="background:#ebbb54;" | 0xDE || latin capital letter t with comma below || See also "...with cedilla"
|-
|-


| U+021B || ț || 0x9E || || || || || style="background:#ffe4b5;" | 0xFE || latin small letter t with comma below || See also "...with cedilla"
| U+021B || ț || 0x9E || || || || || style="background:#ebbb54;" | 0xFE || latin small letter t with comma below || See also "...with cedilla"
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


Line 655: Line 652:
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+02D9 || ˙ || || || || style="background:#ffe4b5;" | 0xFF || || || dot above ||
| U+02D9 || ˙ || || || || 0xFF || || || dot above ||
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


Line 661: Line 658:
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+02DD || ˝ || || || || style="background:#ffe4b5;" | 0xBD || || || double acute accent ||
| U+02DD || ˝ || || || || 0xBD || || || double acute accent ||
|-
|- style="background:#dbab8a;"


| U+1E90 || Ẑ || 0x87 || || || || || || latin capital z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
| U+1E90 || Ẑ || 0x87 || || || || || || latin capital z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
|-
|- style="background:#dbab8a;"


| U+1E91 || ẑ || 0x97 || || || || || || latin small z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
| U+1E91 || ẑ || 0x97 || || || || || || latin small z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+201D || ” || || || || || || style="background:#ffe4b5;" | 0xB5 || right double quotation mark ||
| U+201D || ” || || || || || || 0xB5 || right double quotation mark ||
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+201E || „ || || || || || || style="background:#ffe4b5;" | 0xA5 || double low-9 quotation mark ||
| U+201E || „ || || || || || || 0xA5 || double low-9 quotation mark ||
|- style="color:#8c8c8c;"
|- style="color:#8c8c8c;"


| U+20AC || € || || || || || 0xA4 || 0xA4 || euro sign ||
| U+20AC || € || || || || || 0xA4 || 0xA4 || euro sign ||
|}
|}
== For More ==
* [[I18N - Character mapping]] sketches the format and location of <language>.map files.
* [[I18N - Charset]] is the main article about TDM language use and various encodings.

Latest revision as of 17:44, 19 June 2026

Background

This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation to detail this was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.

In TDM's main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.)

Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding file, e.g., french.lang, german.lang, etc., in a language-specific 8-bit encoding. When you play TDM, and select a given language, the corresponding .lang file (if provided) is read.

Except for Cyrillic Russian, these files use encodings within the ISO-8859 family (as detailed in I18N_-_Charset). Any specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members.

Language Names in Idealized UTF8 Form

Only the TDM-exposed languages are shown here. The all.lang file has additional strings for other potential Latin and Cyrillic languages. For reference, the generated encoding for the <language>.lang file is shown in the comment.

	"#str_02460"	"English"		// English	[ISO-8859-1]
	"#str_02461"	"Deutsch"		// German	[ISO-8859-1]
	"#str_02462"	"Español"		// Spanish	[ISO-8859-1]
	"#str_02463"	"Français"		// French	[ISO-8859-15]
	"#str_02464"	"Português"		// Portuguese	[ISO-8859-1]
	"#str_02465"	"Polski"			// Polish	[ISO-8859-2]
	"#str_02466"	"Italiano"		// Italian	[ISO-8859-1]
	"#str_02467"	"Česky"			// Czech	[ISO-8859-2]
	"#str_02468"	"Русский"		// Russian	[WIN-1251]
	"#str_02469"	"Català"		// Catalan	[ISO-8859-1]
	"#str_02470"	"Dansk"			// Danish	[ISO-8859-1]
	"#str_02472"	"Nederlands"		// Dutch	[ISO-8859-1]
	"#str_02474"	"Magyar"		// Hungarian	[ISO-8859-2]
	"#str_02476"	"Svenska"		// Swedish	[ISO-8859-1]
	"#str_02477"	"Türkçe"		// Turkish	[ISO-8859-9]
	"#str_02479"	"Română"		// Romanian	[ISO-8859-16]
	"#str_02480"	"Slovenčina"		// Slovak	[ISO-8859-2]

First Compromise – Cyrillic vs. Latin Font

While there is some overlap in characters between Cyrillic and Latin encodings, it is insufficient for our purposes. So, for [English] and other European sections, a Latin transliteration from Cyrillic is used:

	"#str_02468"	"Russkiy"		// Russian

Going the other way, for [Russian], the current treatment is not to do a Cyrillic transliteration from Latin, but instead to drop accents, rendering just in ASCII (except of course “Russian”), e.g.:

	"#str_02460"	"English"		// English
	"#str_02461"	"Deutsch"		// German
	"#str_02462"	"Espanol"		// Spanish
	"#str_02463"	"Francais"		// French
	"#str_02464"	"Portugues"		// Portuguese
	"#str_02465"	"Polski"			// Polish
	"#str_02466"	"Italiano"		// Italian
	"#str_02467"	"Cesky"			// Czech
	"#str_02468"	"Русский"		// Russian
	"#str_02469"	"Catala"		// Catalan
	"#str_02470"	"Dansk"			// Danish
	"#str_02472"	"Nederlands"		// Dutch
	"#str_02474"	"Magyar"		// Hungarian
	"#str_02476"	"Svenska"		// Swedish
	"#str_02477"	"Turkce"		// Turkish
	"#str_02479"	"Romana"		// Romanian
	"#str_02480"	"Slovencina"		// Slovak

Moving between ISO-8859 Members

In the discussion here, let “L1” be the current language (from the player’s perspective) or the [language] section of interest in all.lang (from the translator’s perspective). Let “L2” be the language of the foreign word, i.e., L1 and L2 differ.

Type and Go

In several cases, the translator merely puts the ideal character in the all.lang strings. After <language>.lang generation, everything works. Cases are:

  1. L1 and L2 are both use the same ISO-8859 encoding. (Specifically, they are either both ISO-8859-1, which doesn’t need <language>.map files, or both ISO-8859-2, and have <language>.map files with, by convention, identical contents.) NOTE: For TDM’s purposes, ISO-8859-15 (French) can be considered part of the ISO-8859-1 family.
  2. L2 character is ASCII (i.e., in ISO range 0x00-0x7f). This is always the case if L2 = English. Characters in this range have identical codepoints in all five TDM-supported ISO-8859 and in the TDM target encoding.
  3. L2 character is in the ISO range 0xA0-0xFF, with additional constraints.

For the last case, the L2 character must be either:

  • present at the same codepoint in all five TDM-supported ISO-8859 encodings, and in the TDM target encoding. For our Latin language names, these accented characters are like that:
		ç in "Français" and "Türkçe" at 0xE7
		à in "Català" at 0xEA
		â in "Română" at 0xE2
		ü in "Türkçe" at 0xFC
  • present at the same codepoint in ISO-8859 encodings for L1 and L2, and in the TDM target encoding.
  • present in different codepoints in ISO-8859 encodings for L1 and L2, but with each being either the same as the TDM target encoding, or mapped to the TDM target encoding by the <language>.map file.

Tricks

If the L2 character is not represented in the ISO encoding for L1, then at <L1>.lang generation time, it will be replaced by a “?”. To work around this, the translator has two methods:

Trick Method 1 – Direct Stuffing

When L1 encoding is ISO-8859-1 (or -9 or -15, which all have identical encodings for the 0xA0-BF ranges) and L2 character is in TDM’s 0xA0-0xBF range (or D7, F7), then you can lookup the ISO character associated with that codepoint (shown in parentheses in cells of the i18n- charmap table), and type it. Examples:

[English] // and other ISO-8859-1, -9, or -15
	"#str_02467"	"¬esky"		// our default mapping is ISO 8859-1, so ¬ is shown as Č (¬)
	"#str_02480"	"Sloven®ina"		// Slovak (® in ISO-8859-1 is č in our font)

Here’s another example of that last stuffing from a TDM-unexposed language, showing also a diaresis (0xA8):

	"#str_02481"	"Sloven¨®ina"		// Slovenian (southern slovenia) (¨ => š)

In this case, ISO-8859-1 would need to use the diaresis, but ISO-8859-15 (French) would not, i.e., could type and go with š. The spreadsheet shows all available substitutions in light blue.

Trick Method 2 – Special Mapping with Repurposed Character

If all else fails, a little-used character in L1 can be redirected to the L2 codepoint TDM wants, by an extra mapping command in the <L1>.map file. Current such tricks are...

To get ñ for "Español":

  • for iso 2, put ¨ (diaeresis 0xA8)
  • for iso 16, put ¶ (pilcrow 0xB6)

To get ê for "Português":

  • for iso 2: Put ´ (acute accent 0xBD)

TDM & ISO Char Sets - Details, Potential Tricks, Limitations

The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified:

"Rows are a combination of all the printable 8-bit codepoints in the range 0x80-0xff defined in ISO-8859-1, -2, -9, -15, and -16. The rows are ordered by Unicode number. If a particular ISO standard does not include the character, leave that cell blank."

Subsequently, after Excel import, the TDM and Comments columns were filled in by Geep, and color coding added. Language mappings were independently color coded, then verified against existing language maps. Finally converted to wikitable format.

This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required.

Color Key
Cell Meaning
Light Gray Text Row Symbol not included in TDM custom character set (so TDM column blank). Can use for char substitution tricks.
0xNN on white Type desired utf8 char (in Symbol column) in all.lang. Generation goes directly to TDM codepoint given.
0xNN, amber hilite Type desired utf8 char in all.lang. Generation + mapping (in <language>.map file) goes to TDM codepoint via codepoint shown.
0xNN in gray Type desired utf8 char in all.lang. Generation + mapping (in <language>.map file) goes to TDM codepoint OF SUBSTITUTE CHARACTER via codepoint shown.
0xNN in red Character not part of this ISO. "0xnn" shown is not ISO, but for trick mapping (in <language.map), to support main menu's multilingual "Languages" page.
Substitute Char Character not part of this ISO. But as trick, you can stuff in this substitute char at the needed TDM codepoint.
n/a Not available to use substitute character, due to trick mapping (in <language>.map) elsewhere.
Empty Cell In ISO column, means character not part of this ISO (so may be candidate for future trick mapping; or simply not fully analyzed viz substitute char.)
Tan Row Orphaned character in TDM set, not part of 5 ISO-8859 standards for current TDM-supported languages.


Unicode Coverage for TDM
Unicode Symbol TDM 1 2 9 15 16 Unicode Name Comments
U+00A0   0xA0 0xA0 0xA0 0xA0 0xA0 0xA0 no-break space
U+00A1 ¡ 0xA1 0xA1 0xA1 inverted exclamation mark
U+00A2 ¢ 0xA2 0xA2 0xA2 cent sign
U+00A3 £ 0xA3 0xA3 0xA3 pound sign
U+00A4 ¤ 0xA4 0xA4 0xA4 currency sign
U+00A5 ¥ 0xA5 0xA5 0xA5 yen sign
U+00A6 ¦ 0xA6 0xA6 broken bar
U+00A7 § 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7 section sign
U+00A8 ¨ 0xA8 0xA8 0xA8 diaeresis See also hack use for ñ
U+00A9 © 0xA9 0xA9 0xA9 0xA9 copyright sign
U+00AA ª 0xAA 0xAA 0xAA feminine ordinal indicator
U+00AB « 0xAB 0xAB 0xAB 0xAB left-pointing double angle quotation mark
U+00AC ¬ 0xAC 0xAC 0xAC not sign
U+00AD 0xAD 0xAD 0xAD 0xAD 0xAD 0xAD soft hyphen
U+00AE ® 0xAE 0xAE 0xAE registered sign
U+00AF ¯ 0xAF 0xAF 0xAF macron
U+00B0 ° 0xB0 0xB0 0xB0 0xB0 0xB0 degree sign
U+00B1 ± 0xB1 0xB1 0xB1 0xB1 plus-minus sign
U+00B2 ² 0xB2 0xB2 0xB2 superscript two
U+00B3 ³ 0xB3 0xB3 0xB3 superscript three
U+00B4 ´ 0xB4 0xB4 0xB4 acute accent
U+00B5 µ 0xB5 0xB5 0xB5 micro sign
U+00B6 0xB6 0xB6 0xB6 0xB6 pilcrow sign
U+00B7 · 0xB7 0xB7 0xB7 0xB7 middle dot
U+00B8 ¸ 0xB8 0xB8 0xB8 cedilla
U+00B9 ¹ 0xB9 0xB9 0xB9 superscript one
U+00BA º 0xBA 0xBA 0xBA masculine ordinal indicator
U+00BB » 0xBB 0xBB 0xBB 0xBB right-pointing double angle quotation mark
U+00BC ¼ 0xBC 0xBC vulgar fraction one quarter
U+00BD ½ 0xBD 0xBD vulgar fraction one half
U+00BE ¾ 0xBE 0xBE vulgar fraction three quarters
U+00BF ¿ 0xBF 0xBF 0xBF 0xBF inverted question mark
U+00C0 À 0xC0 0xC0 0xC0 0xC0 0xC0 latin capital letter a with grave
U+00C1 Á 0xC1 0xC1 0xC1 0xC1 0xC1 0xC1 latin capital letter a with acute
U+00C2 Â 0xC2 0xC2 0xC2 0xC2 0xC2 0xC2 latin capital letter a with circumflex
U+00C3 Ã 0xC3 0xC3 0xC3 0xC3 latin capital letter a with tilde
U+00C4 Ä 0xC4 0xC4 0xC4 0xC4 0xC4 0xC4 latin capital letter a with diaeresis
U+00C5 Å 0xC5 0xC5 0xC5 0xC5 latin capital letter a with ring above
U+00C6 Æ 0xC6 0xC6 0xC6 0xC6 0xC6 latin capital letter ae
U+00C7 Ç 0xC7 0xC7 0xC7 0xC7 0xC7 0xC7 latin capital letter c with cedilla
U+00C8 È 0xC8 0xC8 0xC8 0xC8 0xC8 latin capital letter e with grave
U+00C9 É 0xC9 0xC9 0xC9 0xC9 0xC9 0xC9 latin capital letter e with acute
U+00CA Ê 0xCA 0xCA 0xCA 0xCA 0xCA latin capital letter e with circumflex
U+00CB Ë 0xCB 0xCB 0xCB 0xCB 0xCB 0xCB latin capital letter e with diaeresis
U+00CC Ì 0xCC 0xCC 0xCC 0xCC 0xCC latin capital letter i with grave
U+00CD Í 0xCD 0xCD 0xCD 0xCD 0xCD 0xCD latin capital letter i with acute
U+00CE Î 0xCE 0xCE 0xCE 0xCE 0xCE 0xCE latin capital letter i with circumflex
U+00CF Ï 0xCF 0xCF 0xCF 0xCF 0xCF latin capital letter i with diaeresis
U+00D0 Ð 0xD0 0xD0 n/a 0xD0 latin capital letter eth Same glyph as U+0110 latin capital letter d with stroke
U+00D1 Ñ 0xD1 0xD1 0xD1 0xD1 latin capital letter n with tilde
U+00D2 Ò 0xD2 0xD2 0xD2 0xD2 0xD2 latin capital letter o with grave
U+00D3 Ó 0xD3 0xD3 0xC1 0xD3 0xD3 0xD3 latin capital letter o with acute
U+00D4 Ô 0xD4 0xD4 0xD4 0xD4 0xD4 0xD4 latin capital letter o with circumflex Formerly also mapped from 0x88; redundant, Ğ has 0x88 codepoint now
U+00D5 Õ 0xD5 0xD5 0xD5 0xD5 latin capital letter o with tilde
U+00D6 Ö 0xD6 0xD6 0xD6 0xD6 0xD6 0xD6 latin capital letter o with diaeresis
U+00D7 × 0xD7 0xD7 0xD7 0xD7 multiplication sign
U+00D8 Ø 0xD8 0xD8 0xD8 0xD8 latin capital letter o with stroke
U+00D9 Ù 0xD9 0xD9 0xD9 0xD9 0xD9 latin capital letter u with grave
U+00DA Ú 0xDA 0xDA 0xDA 0xDA 0xDA 0xDA latin capital letter u with acute
U+00DB Û 0xDB 0xDB 0xDB 0xDB 0xDB latin capital letter u with circumflex
U+00DC Ü 0xDC 0xDC 0xDC 0xDC 0xDC 0xDC latin capital letter u with diaeresis
U+00DD Ý 0xDD 0xDD 0xDD 0xDD latin capital letter y with acute
U+00DE Þ 0xDE 0xDE 0xDE latin capital letter thorn
U+00DF ß 0xDF 0xDF 0xDF 0xDF 0xDF 0xDF latin small letter sharp s
U+00E0 à 0xE0 0xE0 0xE0 0xE0 0xE0 latin small letter a with grave
U+00E1 á 0xE1 0xE1 0xE1 0xE1 0xE1 0xE1 latin small letter a with acute
U+00E2 â 0xE2 0xE2 0xE2 0xE2 0xE2 0xE2 latin small letter a with circumflex
U+00E3 ã 0xE3 0xE3 0xE3 0xE3 latin small letter a with tilde
U+00E4 ä 0xE4 0xE4 0xE4 0xE4 0xE4 0xE4 latin small letter a with diaeresis
U+00E5 å 0xE5 0xE5 0xE5 0xE5 latin small letter a with ring above
U+00E6 æ 0xE6 0xE6 0xE6 0xE6 0xE6 latin small letter ae
U+00E7 ç 0xE7 0xE7 0xE7 0xE7 0xE7 0xE7 latin small letter c with cedilla
U+00E8 è 0xE8 0xE8 0xE8 0xE8 0xE8 latin small letter e with grave
U+00E9 é 0xE9 0xE9 0xE9 0xE9 0xE9 0xE9 latin small letter e with acute
U+00EA ê 0xEA 0xEA 0xBD 0xEA 0xEA 0xEA latin small letter e with circumflex Hack for iso 2: Put ´ (acute accent, 0xBD) to get ê for "Português"
U+00EB ë 0xEB 0xEB 0xEB 0xEB 0xEB 0xEB latin small letter e with diaeresis
U+00EC ì 0xEC 0xEC 0xEC 0xEC 0xEC latin small letter i with grave
U+00ED í 0xED 0xED 0xED 0xED 0xED 0xED latin small letter i with acute
U+00EE î 0xEE 0xEE 0xEE 0xEE 0xEE 0xEE latin small letter i with circumflex
U+00EF ï 0xEF 0xEF 0xEF 0xEF 0xEF latin small letter i with diaeresis
U+00F0 ð 0xF0 0xF0 0xF0 latin small letter eth
U+00F1 ñ 0xF1 0xF1 0xA8 0xF1 0xF1 0xB6 latin small letter n with tilde Hacks to get ñ for "Español": for iso 2, put ¨ (diaeresis 0xA8); for iso 16, put ¶ (0xB6)
U+00F2 ò 0xF2 0xF2 0xF2 0xF2 0xF2 latin small letter o with grave
U+00F3 ó 0xF3 0xF3 0xF3 0xF3 0xF3 0xF3 latin small letter o with acute
U+00F4 ô 0xF4 0xF4 0xF4 0xF4 0xF4 0xF4 latin small letter o with circumflex Formerly also mapped from 0x88; redundant, ğ has TDM 0x88 codepoint now
U+00F5 õ 0xF5 0xF5 0xF5 0xF5 latin small letter o with tilde
U+00F6 ö 0xF6 0xF6 0xF6 0xF6 0xF6 0xF6 latin small letter o with diaeresis
U+00F7 ÷ 0xF7 0xF7 0xF7 0xF7 division sign
U+00F8 ø 0xF8 0xF8 0xF8 0xF8 latin small letter o with stroke
U+00F9 ù 0xF9 0xF9 0xF9 0xF9 0xF9 latin small letter u with grave
U+00FA ú 0xFA 0xFA 0xFA 0xFA 0xFA 0xFA latin small letter u with acute
U+00FB û 0xFB 0xFB 0xFB 0xFB 0xFB latin small letter u with circumflex
U+00FC ü 0xFC 0xFC 0xFC 0xFC 0xFC 0xFC latin small letter u with diaeresis
U+00FD ý 0xFD 0xFD 0xFD 0xFD latin small letter y with acute
U+00FE þ 0xFE 0xFE 0xFE latin small letter thorn
U+00FF ÿ 0xFF 0xFF 0xFF 0xFF 0xFF latin small letter y with diaeresis
U+0102 Ă 0x8B 0xC3 0xC3 latin capital letter a with breve
U+0103 ă 0x9B 0xE3 0xE3 latin small letter a with breve
U+0104 Ą 0xAA ª 0xA1 ª ª 0xA1 latin capital letter a with ogonek
U+0105 ą 0xBA º 0xB1 º º 0xA2 latin small letter a with ogonek
U+0106 Ć 0x82 0xC6 0xC5 latin capital letter c with acute
U+0107 ć 0x92 0xE6 0xE5 latin small letter c with acute
U+0108 Ĉ 0x86 latin capital c with circumflex Only in ISO-8859-3 (for Esperanto) at 0xC6
U+0109 ĉ 0x96 latin small c with circumflex Only in ISO-8859-3 (for Esperanto) at 0xE6
U+010C Č 0xAC ¬ 0xC8 ¬ ¬ 0xB2 latin capital letter c with caron Substitute char "not sign" at 0xAC
U+010D č 0xAE ® 0xE8 ® ® 0xB9 latin small letter c with caron Substitute char "registration sign" at 0xAE
U+010E Ď 0xB3 ³ 0xCF ³ ³ latin capital letter d with caron Substitute char "superscript three" at 0xB3
U+010F ď 0xB7 · 0xEF · · latin small letter d with caron Substitute char "middle dot" at 0xB7
U+0110 Đ 0xD0 0xD0 0xD0 n/a 0xD0 latin capital letter d with stroke Same glyph as U+00D0 latin capital letter eth
U+0111 đ 0x90 0xF0 n/a 0xF0 latin small letter d with stroke
U+0118 Ę 0xAB « 0xCA « « 0xDD latin capital letter e with ogonek Substitute char "left-pointing double angle quotation mark" at 0xAB
U+0119 ę 0xBB » 0xEA » » 0xFD latin small letter e with ogonek Substitute char "right-pointing double angle quotation mark" at 0xBB
U+011A Ě 0xA5 ¥ 0xCC ¥ ¥ latin capital letter e with caron Substitute char "yen sign" at 0xA5
U+011B ě 0xA3 £ 0xEC £ £ latin small letter e with caron Substitute char "pound sign" at 0xA3
U+011E Ğ 0x88 0xD0 latin capital letter g with breve As of TDM 2.13 (TDM codemap), 2.15 (turkish.map)
U+011F ğ 0x98 0xF0 latin small letter g with breve As of TDM 2.13 (TDM codemap), 2.15 (turkish.map)
U+0130 İ 0xDD latin capital letter i with dot above Turkish: utf8 "İ" will be mapped to "Î" (0xCE)
U+0131 ı 0xFD latin small letter dotless i Turkish: utf8 "ı" will be mapped to ASCII "i" (0x69)
U+0139 Ĺ 0xC5 latin capital letter l with acute
U+013A ĺ 0xE5 latin small letter l with acute
U+013D Ľ 0xA5 latin capital letter l with caron
U+013E ľ 0xB5 latin small letter l with caron
U+0141 Ł 0xB1 ± 0xA3 ± ± 0xA3 latin capital letter l with stroke Substitute char "plus-minus sign" at 0xB1
U+0142 ł 0xB5 µ 0xB3 µ µ 0xB3 latin small letter l with stroke Substitute char "micro sign" at 0xB5
U+0143 Ń 0x8C 0xD1 0xD1 latin capital letter n with acute
U+0144 ń 0x9C 0xF1 0xF1 latin small letter n with acute
U+0147 Ň 0x80 0xD2 latin capital letter n with caron
U+0148 ň 0xA1 ¡ 0xF2 ¡ ¡ latin small letter n with caron Substitute char "inverted exclamation mark" at 0xA1
U+0150 Ő 0xB0 ° 0xD5 ° ° 0xD5 latin capital letter o with double acute Similiar to Ö, used in Hungarian. Substitute char "degree sign" at 0xB0
U+0151 ő 0xB9 ¹ 0xF5 ¹ ¹ 0xF5 latin small letter o with double acute Similiar to ö, used in Hungarian. Substitute char "superscript 1" at 0xB9
U+0152 Œ 0xBC ¼ ¼ 0xBC 0xBC latin capital ligature oe Substitute char "vulgar fraction one quarter" at 0xBC
U+0153 œ 0xBD ½ ½ 0xBD 0xBD latin small ligature oe Substitute char "vulgar fraction one half" at 0xBD
U+0154 Ŕ 0x89 0xC0 latin capital letter r with acute
U+0155 ŕ 0x99 0xE0 latin small letter r with acute
U+0158 Ř 0xD7 × 0xD8 × × latin capital letter r with caron Substitute char "multiple sign" at 0xD7
U+0159 ř 0xF7 ÷ 0xF8 ÷ ÷ latin small letter r with caron Substitute char "divide sign" at 0xF7
U+015A Ś 0x81 0xA6 0xD7 latin capital letter s with acute
U+015B ś 0x91 0xB6 0xF7 latin small letter s with acute
U+015C Ŝ 0x85 latin capital letter s with circumflex Only in ISO-8859-3 (for Esperanto) at 0xDE
U+015D ŝ 0x95 latin small letter s with circumflex Only in ISO-8859-3 (for Esperanto) at 0xFE
U+015E Ş 0x8D 0xAA 0xDE latin capital letter s with cedilla Can stand in for "...comma under"
U+015F ş 0x9D 0xBA 0xFE latin small letter s with cedilla Can stand in for "...comma under"
U+0160 Š 0xA6 ¦ 0xA9 ¦ 0xA6 0xA6 latin capital letter s with caron Substitute char "broken bar" at 0xA6
U+0161 š 0xA8 ¨ 0xB9 ¨ 0xA8 0xA8 latin small letter s with caron Substitute char "diaeresis" at 0xA8
U+0162 Ţ 0x8E 0xDE latin capital letter t with cedilla Can stand in for "...comma under"
U+0163 ţ 0x9E 0xFE latin small letter t with cedilla Can stand in for "...comma under"
U+0164 Ť 0xB2 ² 0xAB ² ² latin capital letter t with caron Substitute char "superscript two" at 0xB2
U+0165 ť 0xB6 0xBB latin small letter t with caron Substitute char "pilcrow sign" at 0xB6
U+016E Ů 0xA9 © 0xD9 © © latin capital letter u with ring above Substitute char "copyright sign" at 0xA9
U+016F ů 0xAF ¯ 0xF9 ¯ ¯ latin small letter u with ring above Substitute char "macron" at 0xAF
U+0170 Ű 0xA2 ¢ 0xDB ¢ ¢ 0xD8 latin capital letter u with double acute Similiar to Ü, used in Hungarian. Substitute char "cent sign" at 0xA2
U+0171 ű 0xA4 ¤ 0xFB ¤ ¤ 0xF8 latin small letter u with double acute Similiar to Ü, used in Hungarian. Substitute char "currency sign" at 0xA4
U+0178 Ÿ 0xBE ¾ ¾ 0xBE 0xBE latin capital letter y with diaeresis Substitute char "vulgar fraction three quarters" at 0xBE
U+0179 Ź 0x84 0xAC | 0xAC latin capital letter z with acute
U+017A ź 0x94 0xBC 0xAE latin small letter z with acute
U+017B Ż 0x83 0xAF 0xAF latin capital letter z with dot above
U+017C ż 0x93 0xBF 0xBF latin small letter z with dot above
U+017D Ž 0xB4 ´ 0xAE ´ 0xB4 0xB4 latin capital letter z with caron Substitute char "accute accent" at 0xB4
U+017E ž 0xB8 ¸ 0xBE ¸ 0xB8 0xB8 latin small letter z with caron Substitute char "cedilla" at 0xB8
U+01D3 Ǔ 0x8A latin capital u with caron Not found in ISO-8859. Pinyin tone marking
U+01D4 ǔ 0x9A latin small u with caron Not found in ISO-8859. Pinyin tone marking
U+0218 Ș 0x8D 0xAA latin capital letter s with comma below See also "...with cedilla"
U+0219 ș 0x9D 0xBA latin small letter s with comma below See also "...with cedilla"
U+021A Ț 0x8E 0xDE latin capital letter t with comma below See also "...with cedilla"
U+021B ț 0x9E 0xFE latin small letter t with comma below See also "...with cedilla"
U+02C7 ˇ 0xB7 caron
U+02D8 ˘ 0xA2 breve
U+02D9 ˙ 0xFF dot above
U+02DB ˛ 0xB2 ogonek
U+02DD ˝ 0xBD double acute accent
U+1E90 0x87 latin capital z with circumflex Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
U+1E91 0x97 latin small z with circumflex Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin
U+201D 0xB5 right double quotation mark
U+201E 0xA5 double low-9 quotation mark
U+20AC 0xA4 0xA4 euro sign

For More