Multilanguage Display: Difference between revisions
Correct colors in table key |
|||
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Background == | == Background == | ||
''This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.'' | ''This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation to detail this was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.'' | ||
In TDM main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.) | In TDM's main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.) | ||
Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there | Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding file, e.g., french.lang, german.lang, etc., in a language-specific 8-bit encoding. When you play TDM, and select a given language, the corresponding .lang file (if provided) is read. | ||
Except for Russian, these files use | Except for Cyrillic Russian, these files use encodings within the ISO-8859 family (as detailed in [[I18N_-_Charset]]). Any specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members. | ||
== Language Names in Idealized UTF8 Form == | == Language Names in Idealized UTF8 Form == | ||
| Line 104: | Line 102: | ||
*for iso 2: Put ´ (acute accent 0xBD) | *for iso 2: Put ´ (acute accent 0xBD) | ||
== | == TDM & ISO Char Sets - Details, Potential Tricks, Limitations == | ||
The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified: | The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified: | ||
"Rows are a | "Rows are a combination of all the printable 8-bit codepoints in the range 0x80-0xff defined in ISO-8859-1, -2, -9, -15, and -16. The rows are ordered by Unicode number. If a particular ISO standard does not include the character, leave that cell blank." | ||
Subsequently, after Excel import, the TDM and Comments columns were filled in by Geep, and color coding added. Language mappings were independently color coded, then verified against existing language maps. Finally converted to wikitable format. | |||
This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required. | This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required. | ||
| Line 298: | Line 295: | ||
|- | |- | ||
| U+00D0 || Ð || 0xD0 || 0xD0 || || style=" | | U+00D0 || Ð || 0xD0 || 0xD0 || || style="background:#43a8cc;" | n/a || 0xD0 || || latin capital letter eth || Same glyph as U+0110 latin capital letter d with stroke | ||
|- | |- | ||
| Line 376: | Line 373: | ||
|- | |- | ||
| U+00EA || ê || 0xEA || 0xEA || style="background:# | | U+00EA || ê || 0xEA || 0xEA || style="background:#ebbb54;color:#ff0000;" | 0xBD || 0xEA || 0xEA || 0xEA || latin small letter e with circumflex || Hack for iso 2: Put ´ (acute accent, 0xBD) to get ê for "Português" | ||
|- | |- | ||
| Line 397: | Line 394: | ||
|- | |- | ||
| U+00F1 || ñ || 0xF1 || 0xF1 || style="background:# | | U+00F1 || ñ || 0xF1 || 0xF1 || style="background:#ebbb54;color:#ff0000;" | 0xA8 || 0xF1 || 0xF1 || style="background:#ebbb54;color:#ff0000;" | 0xB6 || latin small letter n with tilde || Hacks to get ñ for "Español": for iso 2, put ¨ (diaeresis 0xA8); for iso 16, put ¶ (0xB6) | ||
|- | |- | ||
| Line 442: | Line 439: | ||
|- | |- | ||
| U+0102 || Ă || 0x8B || || style="background:# | | U+0102 || Ă || 0x8B || || style="background:#ebbb54;" | 0xC3 || || || style="background:#ebbb54;" | 0xC3 || latin capital letter a with breve || | ||
|- | |- | ||
| U+0103 || ă || 0x9B || || style="background:# | | U+0103 || ă || 0x9B || || style="background:#ebbb54;" | 0xE3 || || || style="background:#ebbb54;" | 0xE3 || latin small letter a with breve || | ||
|- | |- | ||
| U+0104 || Ą || 0xAA || style=" | | U+0104 || Ą || 0xAA || style="background:#9ab4e3" | ª || style="background:#ebbb54;" | 0xA1 || style="background:#9ab4e3" | ª || style="background:#9ab4e3" | ª || style="background:#ebbb54;" | 0xA1 || latin capital letter a with ogonek || | ||
|- | |- | ||
| U+0105 || ą || 0xBA || style=" | | U+0105 || ą || 0xBA || style="background:#9ab4e3" | º || style="background:#ebbb54;" | 0xB1 || style="background:#9ab4e3" | º || style="background:#9ab4e3" | º || style="background:#ebbb54;" | 0xA2 || latin small letter a with ogonek || | ||
|- | |- | ||
| U+0106 || Ć || 0x82 || || 0xC6 || || || 0xC5 || latin capital letter c with acute || | | U+0106 || Ć || 0x82 || || style="background:#ebbb54;" | 0xC6 || || || style="background:#ebbb54;" | 0xC5 || latin capital letter c with acute || | ||
|- | |- | ||
| U+0107 || ć || 0x92 || || 0xE6 || || || 0xE5 || latin small letter c with acute || | | U+0107 || ć || 0x92 || || style="background:#ebbb54;" | 0xE6 || || || style="background:#ebbb54;" | 0xE5 || latin small letter c with acute || | ||
|- | |- style="background:#dbab8a;" | ||
| U+0108 || Ĉ || 0x86 || || || || || || latin capital c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xC6 | | U+0108 || Ĉ || 0x86 || || || || || || latin capital c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xC6 | ||
|- | |- style="background:#dbab8a;" | ||
| U+0109 || ĉ || 0x96 || || || || || || latin small c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xE6 | | U+0109 || ĉ || 0x96 || || || || || || latin small c with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xE6 | ||
|- | |- | ||
| U+010C || Č || 0xAC || style=" | | U+010C || Č || 0xAC || style="background:#9ab4e3" | ¬ || style="background:#ebbb54;" | 0xC8 || style="background:#9ab4e3" | ¬ || style="background:#9ab4e3" | ¬ || style="background:#ebbb54;" | 0xB2 || latin capital letter c with caron || Substitute char "not sign" at 0xAC | ||
|- | |- | ||
| U+010D || č || 0xAE || style=" | | U+010D || č || 0xAE || style="background:#9ab4e3" | ® || style="background:#ebbb54;" | 0xE8 || style="background:#9ab4e3" | ® || style="background:#9ab4e3" | ® || style="background:#ebbb54;" | 0xB9 || latin small letter c with caron || Substitute char "registration sign" at 0xAE | ||
|- | |- | ||
| U+010E || Ď || 0xB3 || style=" | | U+010E || Ď || 0xB3 || style="background:#9ab4e3" | ³ || style="background:#ebbb54;" | 0xCF || style="background:#9ab4e3" | ³ || style="background:#9ab4e3" | ³ || || latin capital letter d with caron || Substitute char "superscript three" at 0xB3 | ||
|- | |- | ||
| U+010F || ď || 0xB7 || style=" | | U+010F || ď || 0xB7 || style="background:#9ab4e3; font-weight:bold" | · || style="background:#ebbb54;" | 0xEF || style="background:#9ab4e3; font-weight:bold" | · || style="background:#9ab4e3; font-weight:bold" | · || || latin small letter d with caron || Substitute char "middle dot" at 0xB7 | ||
|- | |- | ||
| U+0110 || Đ || 0xD0 || 0xD0 || 0xD0 || style=" | | U+0110 || Đ || 0xD0 || 0xD0 || 0xD0 || style="background:#43a8cc;" | n/a || || 0xD0 || latin capital letter d with stroke || Same glyph as U+00D0 latin capital letter eth | ||
|- | |- | ||
| U+0111 || đ || 0x90 || || 0xF0 || style=" | | U+0111 || đ || 0x90 || || style="background:#ebbb54;" | 0xF0 || style="background:#43a8cc;" | n/a || || style="background:#ebbb54;" | 0xF0 || latin small letter d with stroke || | ||
|- | |- | ||
| U+0118 || Ę || 0xAB || style=" | | U+0118 || Ę || 0xAB || style="background:#9ab4e3" | « || style="background:#ebbb54;" | 0xCA || style="background:#9ab4e3" | « || style="background:#9ab4e3" | « || style="background:#ebbb54;" | 0xDD || latin capital letter e with ogonek || Substitute char "left-pointing double angle quotation mark" at 0xAB | ||
|- | |- | ||
| U+0119 || ę || 0xBB || style=" | | U+0119 || ę || 0xBB || style="background:#9ab4e3" | » || style="background:#ebbb54;" | 0xEA || style="background:#9ab4e3" | » || style="background:#9ab4e3" | » || style="background:#ebbb54;" | 0xFD || latin small letter e with ogonek || Substitute char "right-pointing double angle quotation mark" at 0xBB | ||
|- | |- | ||
| U+011A || Ě || 0xA5 || style=" | | U+011A || Ě || 0xA5 || style="background:#9ab4e3" | ¥ || style="background:#ebbb54;" | 0xCC || style="background:#9ab4e3" | ¥ || style="background:#9ab4e3" | ¥ || || latin capital letter e with caron || Substitute char "yen sign" at 0xA5 | ||
|- | |- | ||
| U+011B || ě || 0xA3 || style=" | | U+011B || ě || 0xA3 || style="background:#9ab4e3" | £ || style="background:#ebbb54;" | 0xEC || style="background:#9ab4e3" | £ || style="background:#9ab4e3" | £ || || latin small letter e with caron || Substitute char "pound sign" at 0xA3 | ||
|- | |- | ||
| U+011E || Ğ || 0x88 || || || 0xD0 || || || latin capital letter g with breve || As of TDM 2.13 | | U+011E || Ğ || 0x88 || || || style="background:#ebbb54;" | 0xD0 || || || latin capital letter g with breve || As of TDM 2.13 (TDM codemap), 2.15 (turkish.map) | ||
|- | |- | ||
| U+011F || ğ || 0x98 || || || 0xF0 || || || latin small letter g with breve || As of TDM 2.13 | | U+011F || ğ || 0x98 || || || style="background:#ebbb54;" | 0xF0 || || || latin small letter g with breve || As of TDM 2.13 (TDM codemap), 2.15 (turkish.map) | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+0130 || İ || || || || style="background:# | | U+0130 || İ || || || || style="background:#ebbb54;" | 0xDD || || || latin capital letter i with dot above || Turkish: utf8 "İ" will be mapped to "Î" (0xCE) | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+0131 || ı || || || || style="background:# | | U+0131 || ı || || || || style="background:#ebbb54;" | 0xFD || || || latin small letter dotless i || Turkish: utf8 "ı" will be mapped to ASCII "i" (0x69) | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| Line 520: | Line 517: | ||
|- | |- | ||
| U+0141 || Ł || 0xB1 || style=" | | U+0141 || Ł || 0xB1 || style="background:#9ab4e3" | ± || style="background:#ebbb54;" | 0xA3 || style="background:#9ab4e3" | ± || style="background:#9ab4e3" | ± || style="background:#ebbb54;" | 0xA3 || latin capital letter l with stroke || Substitute char "plus-minus sign" at 0xB1 | ||
|- | |- | ||
| U+0142 || ł || 0xB5 || style=" | | U+0142 || ł || 0xB5 || style="background:#9ab4e3" | µ || style="background:#ebbb54;" | 0xB3 || style="background:#9ab4e3" | µ || style="background:#9ab4e3" | µ || style="background:#ebbb54;" | 0xB3 || latin small letter l with stroke || Substitute char "micro sign" at 0xB5 | ||
|- | |- | ||
| U+0143 || Ń || 0x8C || || 0xD1 || || || 0xD1 || latin capital letter n with acute || | | U+0143 || Ń || 0x8C || || style="background:#ebbb54;" | 0xD1 || || || style="background:#ebbb54;" | 0xD1 || latin capital letter n with acute || | ||
|- | |- | ||
| U+0144 || ń || 0x9C || || 0xF1 || || || 0xF1 || latin small letter n with acute || | | U+0144 || ń || 0x9C || || style="background:#ebbb54;" | 0xF1 || || || style="background:#ebbb54;" | 0xF1 || latin small letter n with acute || | ||
|- | |- | ||
| U+0147 || Ň || 0x80 || || 0xD2 || || || || latin capital letter n with caron || | | U+0147 || Ň || 0x80 || || style="background:#ebbb54;" | 0xD2 || || || || latin capital letter n with caron || | ||
|- | |- | ||
| U+0148 || ň || 0xA1 || style=" | | U+0148 || ň || 0xA1 || style="background:#9ab4e3" | ¡ || style="background:#ebbb54;" | 0xF2 || style="background:#9ab4e3" | ¡ || style="background:#9ab4e3" | ¡ || || latin small letter n with caron || Substitute char "inverted exclamation mark" at 0xA1 | ||
|- | |- | ||
| U+0150 || Ő || 0xB0 || style=" | | U+0150 || Ő || 0xB0 || style="background:#9ab4e3" | ° || style="background:#ebbb54;" | 0xD5 || style="background:#9ab4e3" | ° || style="background:#9ab4e3" | ° || style="background:#ebbb54;" | 0xD5 || latin capital letter o with double acute || Similiar to Ö, used in Hungarian. Substitute char "degree sign" at 0xB0 | ||
|- | |- | ||
| U+0151 || ő || 0xB9 || style=" | | U+0151 || ő || 0xB9 || style="background:#9ab4e3" | ¹ || style="background:#ebbb54;" | 0xF5 || style="background:#9ab4e3" | ¹ || style="background:#9ab4e3" | ¹ || style="background:#ebbb54;" | 0xF5 || latin small letter o with double acute || Similiar to ö, used in Hungarian. Substitute char "superscript 1" at 0xB9 | ||
|- | |- | ||
| U+0152 || Œ || 0xBC || style=" | | U+0152 || Œ || 0xBC || style="background:#9ab4e3" | ¼ || || style="background:#9ab4e3" | ¼ || 0xBC || 0xBC || latin capital ligature oe || Substitute char "vulgar fraction one quarter" at 0xBC | ||
|- | |- | ||
| U+0153 || œ || 0xBD || style=" | | U+0153 || œ || 0xBD || style="background:#9ab4e3" | ½ || || style="background:#9ab4e3" | ½ || 0xBD || 0xBD || latin small ligature oe || Substitute char "vulgar fraction one half" at 0xBD | ||
|- | |- | ||
| U+0154 || Ŕ || 0x89 || || 0xC0 || || || || latin capital letter r with acute || | | U+0154 || Ŕ || 0x89 || || style="background:#ebbb54;" | 0xC0 || || || || latin capital letter r with acute || | ||
|- | |- | ||
| U+0155 || ŕ || 0x99 || || 0xE0 || || || || latin small letter r with acute || | | U+0155 || ŕ || 0x99 || || style="background:#ebbb54;" | 0xE0 || || || || latin small letter r with acute || | ||
|- | |- | ||
| U+0158 || Ř || 0xD7 || style=" | | U+0158 || Ř || 0xD7 || style="background:#9ab4e3" | × || style="background:#ebbb54;" | 0xD8 || style="background:#9ab4e3" | × || style="background:#9ab4e3" | × || || latin capital letter r with caron || Substitute char "multiple sign" at 0xD7 | ||
|- | |- | ||
| U+0159 || ř || 0xF7 || style=" | | U+0159 || ř || 0xF7 || style="background:#9ab4e3" | ÷ || style="background:#ebbb54;" | 0xF8 || style="background:#9ab4e3" | ÷ || style="background:#9ab4e3" | ÷ || || latin small letter r with caron || Substitute char "divide sign" at 0xF7 | ||
|- | |- | ||
| U+015A || Ś || 0x81 || || 0xA6 || || || style="background:# | | U+015A || Ś || 0x81 || || style="background:#ebbb54;" | 0xA6 || || || style="background:#ebbb54;" | 0xD7 || latin capital letter s with acute || | ||
|- | |- | ||
| U+015B || ś || 0x91 || || 0xB6 || || || style="background:# | | U+015B || ś || 0x91 || || style="background:#ebbb54;" | 0xB6 || || || style="background:#ebbb54;" | 0xF7 || latin small letter s with acute || | ||
|- | |- style="background:#dbab8a;" | ||
| U+015C || Ŝ || 0x85 || || || || || || latin capital letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xDE | | U+015C || Ŝ || 0x85 || || || || || || latin capital letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xDE | ||
|- | |- style="background:#dbab8a;" | ||
| U+015D || ŝ || 0x95 || || || || || || latin small letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xFE | | U+015D || ŝ || 0x95 || || || || || || latin small letter s with circumflex || Only in ISO-8859-3 (for Esperanto) at 0xFE | ||
|- | |- | ||
| U+015E || Ş || 0x8D || || 0xAA || style="background:# | | U+015E || Ş || 0x8D || || style="background:#ebbb54;" | 0xAA || style="background:#ebbb54;" | 0xDE || || || latin capital letter s with cedilla || Can stand in for "...comma under" | ||
|- | |- | ||
| U+015F || ş || 0x9D || || 0xBA || style="background:# | | U+015F || ş || 0x9D || || style="background:#ebbb54;" | 0xBA || style="background:#ebbb54;" | 0xFE || || || latin small letter s with cedilla || Can stand in for "...comma under" | ||
|- | |- | ||
| U+0160 || Š || 0xA6 || style=" | | U+0160 || Š || 0xA6 || style="background:#9ab4e3" | ¦ || style="background:#ebbb54;" | 0xA9 || style="background:#9ab4e3" | ¦ || 0xA6 || 0xA6 || latin capital letter s with caron || Substitute char "broken bar" at 0xA6 | ||
|- | |- | ||
| U+0161 || š || 0xA8 || style=" | | U+0161 || š || 0xA8 || style="background:#9ab4e3" | ¨ || style="background:#ebbb54;" | 0xB9 || style="background:#9ab4e3" | ¨ || 0xA8 || 0xA8 || latin small letter s with caron || Substitute char "diaeresis" at 0xA8 | ||
|- | |- | ||
| U+0162 || Ţ || 0x8E || || style="background:# | | U+0162 || Ţ || 0x8E || || style="background:#ebbb54;" | 0xDE || || || || latin capital letter t with cedilla || Can stand in for "...comma under" | ||
|- | |- | ||
| U+0163 || ţ || 0x9E || || style="background:# | | U+0163 || ţ || 0x9E || || style="background:#ebbb54;" | 0xFE || || || || latin small letter t with cedilla || Can stand in for "...comma under" | ||
|- | |- | ||
| U+0164 || Ť || 0xB2 || style=" | | U+0164 || Ť || 0xB2 || style="background:#9ab4e3" | ² || style="background:#ebbb54;" | 0xAB || style="background:#9ab4e3" | ² || style="background:#9ab4e3" | ² || || latin capital letter t with caron || Substitute char "superscript two" at 0xB2 | ||
|- | |- | ||
| U+0165 || ť || 0xB6 || style=" | | U+0165 || ť || 0xB6 || style="background:#9ab4e3" | ¶ || style="background:#ebbb54;" | 0xBB || style="background:#9ab4e3" | ¶ || style="background:#9ab4e3" | ¶ || || latin small letter t with caron || Substitute char "pilcrow sign" at 0xB6 | ||
|- | |- | ||
| U+016E || Ů || 0xA9 || style=" | | U+016E || Ů || 0xA9 || style="background:#9ab4e3" | © || style="background:#ebbb54;" | 0xD9 || style="background:#9ab4e3" | © || style="background:#9ab4e3" | © || || latin capital letter u with ring above || Substitute char "copyright sign" at 0xA9 | ||
|- | |- | ||
| U+016F || ů || 0xAF || style=" | | U+016F || ů || 0xAF || style="background:#9ab4e3" | ¯ || style="background:#ebbb54;" | 0xF9 || style="background:#9ab4e3" | ¯ || style="background:#9ab4e3" | ¯ || || latin small letter u with ring above || Substitute char "macron" at 0xAF | ||
|- | |- | ||
| U+0170 || Ű || 0xA2 || style=" | | U+0170 || Ű || 0xA2 || style="background:#9ab4e3" | ¢ || style="background:#ebbb54;" | 0xDB || style="background:#9ab4e3" | ¢ || style="background:#9ab4e3" | ¢ || style="background:#ebbb54;" | 0xD8 || latin capital letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "cent sign" at 0xA2 | ||
|- | |- | ||
| U+0171 || ű || 0xA4 || style=" | | U+0171 || ű || 0xA4 || style="background:#9ab4e3" | ¤ || style="background:#ebbb54;" | 0xFB || style="background:#9ab4e3" | ¤ || style="background:#9ab4e3" | ¤ || style="background:#ebbb54;" | 0xF8 || latin small letter u with double acute || Similiar to Ü, used in Hungarian. Substitute char "currency sign" at 0xA4 | ||
|- | |- | ||
| U+0178 || Ÿ || 0xBE || style=" | | U+0178 || Ÿ || 0xBE || style="background:#9ab4e3" | ¾ || || style="background:#9ab4e3" | ¾ || 0xBE || 0xBE || latin capital letter y with diaeresis || Substitute char "vulgar fraction three quarters" at 0xBE | ||
|- | |- | ||
| U+0179 || Ź || 0x84 || || 0xAC || || || style=" | | U+0179 || Ź || 0x84 || || style="background:#ebbb54;" | 0xAC || || || style="background:#ebbb54;" | | 0xAC || latin capital letter z with acute || | ||
|- | |- | ||
| U+017A || ź || 0x94 || || 0xBC || || || style="background:# | | U+017A || ź || 0x94 || || style="background:#ebbb54;" | 0xBC || || || style="background:#ebbb54;" | 0xAE || latin small letter z with acute || | ||
|- | |- | ||
| U+017B || Ż || 0x83 || || 0xAF || || || style="background:# | | U+017B || Ż || 0x83 || || style="background:#ebbb54;" | 0xAF || || || style="background:#ebbb54;" | 0xAF || latin capital letter z with dot above || | ||
|- | |- | ||
| U+017C || ż || 0x93 || || 0xBF || || || style="background:# | | U+017C || ż || 0x93 || || style="background:#ebbb54;" | 0xBF || || || style="background:#ebbb54;" | 0xBF || latin small letter z with dot above || | ||
|- | |- | ||
| U+017D || Ž || 0xB4 || style=" | | U+017D || Ž || 0xB4 || style="background:#9ab4e3" | ´ || style="background:#ebbb54;" | 0xAE || style="background:#9ab4e3" | ´ || 0xB4 || 0xB4 || latin capital letter z with caron || Substitute char "accute accent" at 0xB4 | ||
|- | |- | ||
| U+017E || ž || 0xB8 || style=" | | U+017E || ž || 0xB8 || style="background:#9ab4e3" | ¸ || style="background:#ebbb54;" | 0xBE || style="background:#9ab4e3" | ¸ || 0xB8 || 0xB8 || latin small letter z with caron || Substitute char "cedilla" at 0xB8 | ||
|- | |- style="background:#dbab8a;" | ||
| U+01D3 || Ǔ || 0x8A || || || || || || latin capital u with caron || Not found in ISO-8859. Pinyin tone marking | | U+01D3 || Ǔ || 0x8A || || || || || || latin capital u with caron || Not found in ISO-8859. Pinyin tone marking | ||
|- | |- style="background:#dbab8a;" | ||
| U+01D4 || ǔ || 0x9A || || || || || || latin small u with caron || Not found in ISO-8859. Pinyin tone marking | | U+01D4 || ǔ || 0x9A || || || || || || latin small u with caron || Not found in ISO-8859. Pinyin tone marking | ||
|- | |- | ||
| U+0218 || Ș || 0x8D || || || || || style="background:# | | U+0218 || Ș || 0x8D || || || || || style="background:#ebbb54;" | 0xAA || latin capital letter s with comma below || See also "...with cedilla" | ||
|- | |- | ||
| U+0219 || ș || 0x9D || || || || || style="background:# | | U+0219 || ș || 0x9D || || || || || style="background:#ebbb54;" | 0xBA || latin small letter s with comma below || See also "...with cedilla" | ||
|- | |- | ||
| U+021A || Ț || 0x8E || || || || || style="background:# | | U+021A || Ț || 0x8E || || || || || style="background:#ebbb54;" | 0xDE || latin capital letter t with comma below || See also "...with cedilla" | ||
|- | |- | ||
| U+021B || ț || 0x9E || || || || || style="background:# | | U+021B || ț || 0x9E || || || || || style="background:#ebbb54;" | 0xFE || latin small letter t with comma below || See also "...with cedilla" | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| Line 655: | Line 652: | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+02D9 || ˙ || || || | | | U+02D9 || ˙ || || || || 0xFF || || || dot above || | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| Line 661: | Line 658: | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+02DD || ˝ || || || | | | U+02DD || ˝ || || || || 0xBD || || || double acute accent || | ||
|- | |- style="background:#dbab8a;" | ||
| U+1E90 || Ẑ || 0x87 || || || || || || latin capital z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | | U+1E90 || Ẑ || 0x87 || || || || || || latin capital z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | ||
|- | |- style="background:#dbab8a;" | ||
| U+1E91 || ẑ || 0x97 || || || || || || latin small z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | | U+1E91 || ẑ || 0x97 || || || || || || latin small z with circumflex || Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+201D || ” || || || || || | | | U+201D || ” || || || || || || 0xB5 || right double quotation mark || | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+201E || „ || || || || || | | | U+201E || „ || || || || || || 0xA5 || double low-9 quotation mark || | ||
|- style="color:#8c8c8c;" | |- style="color:#8c8c8c;" | ||
| U+20AC || € || || || || || 0xA4 || 0xA4 || euro sign || | | U+20AC || € || || || || || 0xA4 || 0xA4 || euro sign || | ||
|} | |} | ||
== For More == | |||
* [[I18N - Character mapping]] sketches the format and location of <language>.map files. | |||
* [[I18N - Charset]] is the main article about TDM language use and various encodings. | |||
Latest revision as of 17:44, 19 June 2026
Background
This describes a particular challenging use case of a TDM methodology that has been around for some time. The motivation to detail this was a 2026 update by Geep to the language map files, following a revision of Carleton 24pt font used throughout the Main Menu.
In TDM's main menu, the Settings/Video/Language page attempts to list all the supported languages in their native form with respect to character set, that is, untranslated. Showing multiple language strings together is difficult, but some tricks are available. The result has been a reasonable near-term workaround for this particular page, applicable to comparable cases. (Comprehensive support for multilanguage display likely involves a major restructuring, for instance, moving to a native Unicode architecture with combined Latin and Cyrillic bitmaps.)
Recall that strings for the main menu are found in the utf8 file tdm_base01.pk4/strings/all.lang, where there are sections for [English], [German], etc. A particular section is used to generate for distribution the corresponding file, e.g., french.lang, german.lang, etc., in a language-specific 8-bit encoding. When you play TDM, and select a given language, the corresponding .lang file (if provided) is read.
Except for Cyrillic Russian, these files use encodings within the ISO-8859 family (as detailed in I18N_-_Charset). Any specific encoding is a bottleneck... by design, not all characters in any one member of the ISO-8859 family will be accessible by some other members.
Language Names in Idealized UTF8 Form
Only the TDM-exposed languages are shown here. The all.lang file has additional strings for other potential Latin and Cyrillic languages. For reference, the generated encoding for the <language>.lang file is shown in the comment.
"#str_02460" "English" // English [ISO-8859-1] "#str_02461" "Deutsch" // German [ISO-8859-1] "#str_02462" "Español" // Spanish [ISO-8859-1] "#str_02463" "Français" // French [ISO-8859-15] "#str_02464" "Português" // Portuguese [ISO-8859-1] "#str_02465" "Polski" // Polish [ISO-8859-2] "#str_02466" "Italiano" // Italian [ISO-8859-1] "#str_02467" "Česky" // Czech [ISO-8859-2] "#str_02468" "Русский" // Russian [WIN-1251] "#str_02469" "Català" // Catalan [ISO-8859-1] "#str_02470" "Dansk" // Danish [ISO-8859-1] "#str_02472" "Nederlands" // Dutch [ISO-8859-1] "#str_02474" "Magyar" // Hungarian [ISO-8859-2] "#str_02476" "Svenska" // Swedish [ISO-8859-1] "#str_02477" "Türkçe" // Turkish [ISO-8859-9] "#str_02479" "Română" // Romanian [ISO-8859-16] "#str_02480" "Slovenčina" // Slovak [ISO-8859-2]
First Compromise – Cyrillic vs. Latin Font
While there is some overlap in characters between Cyrillic and Latin encodings, it is insufficient for our purposes. So, for [English] and other European sections, a Latin transliteration from Cyrillic is used:
"#str_02468" "Russkiy" // Russian
Going the other way, for [Russian], the current treatment is not to do a Cyrillic transliteration from Latin, but instead to drop accents, rendering just in ASCII (except of course “Russian”), e.g.:
"#str_02460" "English" // English "#str_02461" "Deutsch" // German "#str_02462" "Espanol" // Spanish "#str_02463" "Francais" // French "#str_02464" "Portugues" // Portuguese "#str_02465" "Polski" // Polish "#str_02466" "Italiano" // Italian "#str_02467" "Cesky" // Czech "#str_02468" "Русский" // Russian "#str_02469" "Catala" // Catalan "#str_02470" "Dansk" // Danish "#str_02472" "Nederlands" // Dutch "#str_02474" "Magyar" // Hungarian "#str_02476" "Svenska" // Swedish "#str_02477" "Turkce" // Turkish "#str_02479" "Romana" // Romanian "#str_02480" "Slovencina" // Slovak
Moving between ISO-8859 Members
In the discussion here, let “L1” be the current language (from the player’s perspective) or the [language] section of interest in all.lang (from the translator’s perspective). Let “L2” be the language of the foreign word, i.e., L1 and L2 differ.
Type and Go
In several cases, the translator merely puts the ideal character in the all.lang strings. After <language>.lang generation, everything works. Cases are:
- L1 and L2 are both use the same ISO-8859 encoding. (Specifically, they are either both ISO-8859-1, which doesn’t need <language>.map files, or both ISO-8859-2, and have <language>.map files with, by convention, identical contents.) NOTE: For TDM’s purposes, ISO-8859-15 (French) can be considered part of the ISO-8859-1 family.
- L2 character is ASCII (i.e., in ISO range 0x00-0x7f). This is always the case if L2 = English. Characters in this range have identical codepoints in all five TDM-supported ISO-8859 and in the TDM target encoding.
- L2 character is in the ISO range 0xA0-0xFF, with additional constraints.
For the last case, the L2 character must be either:
- present at the same codepoint in all five TDM-supported ISO-8859 encodings, and in the TDM target encoding. For our Latin language names, these accented characters are like that:
ç in "Français" and "Türkçe" at 0xE7 à in "Català" at 0xEA â in "Română" at 0xE2 ü in "Türkçe" at 0xFC
- present at the same codepoint in ISO-8859 encodings for L1 and L2, and in the TDM target encoding.
- present in different codepoints in ISO-8859 encodings for L1 and L2, but with each being either the same as the TDM target encoding, or mapped to the TDM target encoding by the <language>.map file.
Tricks
If the L2 character is not represented in the ISO encoding for L1, then at <L1>.lang generation time, it will be replaced by a “?”. To work around this, the translator has two methods:
Trick Method 1 – Direct Stuffing
When L1 encoding is ISO-8859-1 (or -9 or -15, which all have identical encodings for the 0xA0-BF ranges) and L2 character is in TDM’s 0xA0-0xBF range (or D7, F7), then you can lookup the ISO character associated with that codepoint (shown in parentheses in cells of the i18n- charmap table), and type it. Examples:
[English] // and other ISO-8859-1, -9, or -15 "#str_02467" "¬esky" // our default mapping is ISO 8859-1, so ¬ is shown as Č (¬) "#str_02480" "Sloven®ina" // Slovak (® in ISO-8859-1 is č in our font)
Here’s another example of that last stuffing from a TDM-unexposed language, showing also a diaresis (0xA8):
"#str_02481" "Sloven¨®ina" // Slovenian (southern slovenia) (¨ => š)
In this case, ISO-8859-1 would need to use the diaresis, but ISO-8859-15 (French) would not, i.e., could type and go with š. The spreadsheet shows all available substitutions in light blue.
Trick Method 2 – Special Mapping with Repurposed Character
If all else fails, a little-used character in L1 can be redirected to the L2 codepoint TDM wants, by an extra mapping command in the <L1>.map file. Current such tricks are...
To get ñ for "Español":
- for iso 2, put ¨ (diaeresis 0xA8)
- for iso 16, put ¶ (pilcrow 0xB6)
To get ê for "Português":
- for iso 2: Put ´ (acute accent 0xBD)
TDM & ISO Char Sets - Details, Potential Tricks, Limitations
The main table below (after the Key) was drafted by Google AI, from a prompt that defined the header and an example row, and specified:
"Rows are a combination of all the printable 8-bit codepoints in the range 0x80-0xff defined in ISO-8859-1, -2, -9, -15, and -16. The rows are ordered by Unicode number. If a particular ISO standard does not include the character, leave that cell blank."
Subsequently, after Excel import, the TDM and Comments columns were filled in by Geep, and color coding added. Language mappings were independently color coded, then verified against existing language maps. Finally converted to wikitable format.
This table skips codepoints in the range 0x00-0x7f, because they are the same for all ISO-8859 standards. No mapping required.
| Cell | Meaning |
|---|---|
| Light Gray Text Row | Symbol not included in TDM custom character set (so TDM column blank). Can use for char substitution tricks. |
| 0xNN on white | Type desired utf8 char (in Symbol column) in all.lang. Generation goes directly to TDM codepoint given. |
| 0xNN, amber hilite | Type desired utf8 char in all.lang. Generation + mapping (in <language>.map file) goes to TDM codepoint via codepoint shown. |
| 0xNN in gray | Type desired utf8 char in all.lang. Generation + mapping (in <language>.map file) goes to TDM codepoint OF SUBSTITUTE CHARACTER via codepoint shown. |
| 0xNN in red | Character not part of this ISO. "0xnn" shown is not ISO, but for trick mapping (in <language.map), to support main menu's multilingual "Languages" page. |
| Substitute Char | Character not part of this ISO. But as trick, you can stuff in this substitute char at the needed TDM codepoint. |
| n/a | Not available to use substitute character, due to trick mapping (in <language>.map) elsewhere. |
| Empty Cell | In ISO column, means character not part of this ISO (so may be candidate for future trick mapping; or simply not fully analyzed viz substitute char.) |
| Tan Row | Orphaned character in TDM set, not part of 5 ISO-8859 standards for current TDM-supported languages. |
| Unicode | Symbol | TDM | 1 | 2 | 9 | 15 | 16 | Unicode Name | Comments |
|---|---|---|---|---|---|---|---|---|---|
| U+00A0 | 0xA0 | 0xA0 | 0xA0 | 0xA0 | 0xA0 | 0xA0 | no-break space | ||
| U+00A1 | ¡ | 0xA1 | 0xA1 | 0xA1 | inverted exclamation mark | ||||
| U+00A2 | ¢ | 0xA2 | 0xA2 | 0xA2 | cent sign | ||||
| U+00A3 | £ | 0xA3 | 0xA3 | 0xA3 | pound sign | ||||
| U+00A4 | ¤ | 0xA4 | 0xA4 | 0xA4 | currency sign | ||||
| U+00A5 | ¥ | 0xA5 | 0xA5 | 0xA5 | yen sign | ||||
| U+00A6 | ¦ | 0xA6 | 0xA6 | broken bar | |||||
| U+00A7 | § | 0xA7 | 0xA7 | 0xA7 | 0xA7 | 0xA7 | 0xA7 | section sign | |
| U+00A8 | ¨ | 0xA8 | 0xA8 | 0xA8 | diaeresis | See also hack use for ñ | |||
| U+00A9 | © | 0xA9 | 0xA9 | 0xA9 | 0xA9 | copyright sign | |||
| U+00AA | ª | 0xAA | 0xAA | 0xAA | feminine ordinal indicator | ||||
| U+00AB | « | 0xAB | 0xAB | 0xAB | 0xAB | left-pointing double angle quotation mark | |||
| U+00AC | ¬ | 0xAC | 0xAC | 0xAC | not sign | ||||
| U+00AD | 0xAD | 0xAD | 0xAD | 0xAD | 0xAD | 0xAD | soft hyphen | ||
| U+00AE | ® | 0xAE | 0xAE | 0xAE | registered sign | ||||
| U+00AF | ¯ | 0xAF | 0xAF | 0xAF | macron | ||||
| U+00B0 | ° | 0xB0 | 0xB0 | 0xB0 | 0xB0 | 0xB0 | degree sign | ||
| U+00B1 | ± | 0xB1 | 0xB1 | 0xB1 | 0xB1 | plus-minus sign | |||
| U+00B2 | ² | 0xB2 | 0xB2 | 0xB2 | superscript two | ||||
| U+00B3 | ³ | 0xB3 | 0xB3 | 0xB3 | superscript three | ||||
| U+00B4 | ´ | 0xB4 | 0xB4 | 0xB4 | acute accent | ||||
| U+00B5 | µ | 0xB5 | 0xB5 | 0xB5 | micro sign | ||||
| U+00B6 | ¶ | 0xB6 | 0xB6 | 0xB6 | 0xB6 | pilcrow sign | |||
| U+00B7 | · | 0xB7 | 0xB7 | 0xB7 | 0xB7 | middle dot | |||
| U+00B8 | ¸ | 0xB8 | 0xB8 | 0xB8 | cedilla | ||||
| U+00B9 | ¹ | 0xB9 | 0xB9 | 0xB9 | superscript one | ||||
| U+00BA | º | 0xBA | 0xBA | 0xBA | masculine ordinal indicator | ||||
| U+00BB | » | 0xBB | 0xBB | 0xBB | 0xBB | right-pointing double angle quotation mark | |||
| U+00BC | ¼ | 0xBC | 0xBC | vulgar fraction one quarter | |||||
| U+00BD | ½ | 0xBD | 0xBD | vulgar fraction one half | |||||
| U+00BE | ¾ | 0xBE | 0xBE | vulgar fraction three quarters | |||||
| U+00BF | ¿ | 0xBF | 0xBF | 0xBF | 0xBF | inverted question mark | |||
| U+00C0 | À | 0xC0 | 0xC0 | 0xC0 | 0xC0 | 0xC0 | latin capital letter a with grave | ||
| U+00C1 | Á | 0xC1 | 0xC1 | 0xC1 | 0xC1 | 0xC1 | 0xC1 | latin capital letter a with acute | |
| U+00C2 | Â | 0xC2 | 0xC2 | 0xC2 | 0xC2 | 0xC2 | 0xC2 | latin capital letter a with circumflex | |
| U+00C3 | Ã | 0xC3 | 0xC3 | 0xC3 | 0xC3 | latin capital letter a with tilde | |||
| U+00C4 | Ä | 0xC4 | 0xC4 | 0xC4 | 0xC4 | 0xC4 | 0xC4 | latin capital letter a with diaeresis | |
| U+00C5 | Å | 0xC5 | 0xC5 | 0xC5 | 0xC5 | latin capital letter a with ring above | |||
| U+00C6 | Æ | 0xC6 | 0xC6 | 0xC6 | 0xC6 | 0xC6 | latin capital letter ae | ||
| U+00C7 | Ç | 0xC7 | 0xC7 | 0xC7 | 0xC7 | 0xC7 | 0xC7 | latin capital letter c with cedilla | |
| U+00C8 | È | 0xC8 | 0xC8 | 0xC8 | 0xC8 | 0xC8 | latin capital letter e with grave | ||
| U+00C9 | É | 0xC9 | 0xC9 | 0xC9 | 0xC9 | 0xC9 | 0xC9 | latin capital letter e with acute | |
| U+00CA | Ê | 0xCA | 0xCA | 0xCA | 0xCA | 0xCA | latin capital letter e with circumflex | ||
| U+00CB | Ë | 0xCB | 0xCB | 0xCB | 0xCB | 0xCB | 0xCB | latin capital letter e with diaeresis | |
| U+00CC | Ì | 0xCC | 0xCC | 0xCC | 0xCC | 0xCC | latin capital letter i with grave | ||
| U+00CD | Í | 0xCD | 0xCD | 0xCD | 0xCD | 0xCD | 0xCD | latin capital letter i with acute | |
| U+00CE | Î | 0xCE | 0xCE | 0xCE | 0xCE | 0xCE | 0xCE | latin capital letter i with circumflex | |
| U+00CF | Ï | 0xCF | 0xCF | 0xCF | 0xCF | 0xCF | latin capital letter i with diaeresis | ||
| U+00D0 | Ð | 0xD0 | 0xD0 | n/a | 0xD0 | latin capital letter eth | Same glyph as U+0110 latin capital letter d with stroke | ||
| U+00D1 | Ñ | 0xD1 | 0xD1 | 0xD1 | 0xD1 | latin capital letter n with tilde | |||
| U+00D2 | Ò | 0xD2 | 0xD2 | 0xD2 | 0xD2 | 0xD2 | latin capital letter o with grave | ||
| U+00D3 | Ó | 0xD3 | 0xD3 | 0xC1 | 0xD3 | 0xD3 | 0xD3 | latin capital letter o with acute | |
| U+00D4 | Ô | 0xD4 | 0xD4 | 0xD4 | 0xD4 | 0xD4 | 0xD4 | latin capital letter o with circumflex | Formerly also mapped from 0x88; redundant, Ğ has 0x88 codepoint now |
| U+00D5 | Õ | 0xD5 | 0xD5 | 0xD5 | 0xD5 | latin capital letter o with tilde | |||
| U+00D6 | Ö | 0xD6 | 0xD6 | 0xD6 | 0xD6 | 0xD6 | 0xD6 | latin capital letter o with diaeresis | |
| U+00D7 | × | 0xD7 | 0xD7 | 0xD7 | 0xD7 | multiplication sign | |||
| U+00D8 | Ø | 0xD8 | 0xD8 | 0xD8 | 0xD8 | latin capital letter o with stroke | |||
| U+00D9 | Ù | 0xD9 | 0xD9 | 0xD9 | 0xD9 | 0xD9 | latin capital letter u with grave | ||
| U+00DA | Ú | 0xDA | 0xDA | 0xDA | 0xDA | 0xDA | 0xDA | latin capital letter u with acute | |
| U+00DB | Û | 0xDB | 0xDB | 0xDB | 0xDB | 0xDB | latin capital letter u with circumflex | ||
| U+00DC | Ü | 0xDC | 0xDC | 0xDC | 0xDC | 0xDC | 0xDC | latin capital letter u with diaeresis | |
| U+00DD | Ý | 0xDD | 0xDD | 0xDD | 0xDD | latin capital letter y with acute | |||
| U+00DE | Þ | 0xDE | 0xDE | 0xDE | latin capital letter thorn | ||||
| U+00DF | ß | 0xDF | 0xDF | 0xDF | 0xDF | 0xDF | 0xDF | latin small letter sharp s | |
| U+00E0 | à | 0xE0 | 0xE0 | 0xE0 | 0xE0 | 0xE0 | latin small letter a with grave | ||
| U+00E1 | á | 0xE1 | 0xE1 | 0xE1 | 0xE1 | 0xE1 | 0xE1 | latin small letter a with acute | |
| U+00E2 | â | 0xE2 | 0xE2 | 0xE2 | 0xE2 | 0xE2 | 0xE2 | latin small letter a with circumflex | |
| U+00E3 | ã | 0xE3 | 0xE3 | 0xE3 | 0xE3 | latin small letter a with tilde | |||
| U+00E4 | ä | 0xE4 | 0xE4 | 0xE4 | 0xE4 | 0xE4 | 0xE4 | latin small letter a with diaeresis | |
| U+00E5 | å | 0xE5 | 0xE5 | 0xE5 | 0xE5 | latin small letter a with ring above | |||
| U+00E6 | æ | 0xE6 | 0xE6 | 0xE6 | 0xE6 | 0xE6 | latin small letter ae | ||
| U+00E7 | ç | 0xE7 | 0xE7 | 0xE7 | 0xE7 | 0xE7 | 0xE7 | latin small letter c with cedilla | |
| U+00E8 | è | 0xE8 | 0xE8 | 0xE8 | 0xE8 | 0xE8 | latin small letter e with grave | ||
| U+00E9 | é | 0xE9 | 0xE9 | 0xE9 | 0xE9 | 0xE9 | 0xE9 | latin small letter e with acute | |
| U+00EA | ê | 0xEA | 0xEA | 0xBD | 0xEA | 0xEA | 0xEA | latin small letter e with circumflex | Hack for iso 2: Put ´ (acute accent, 0xBD) to get ê for "Português" |
| U+00EB | ë | 0xEB | 0xEB | 0xEB | 0xEB | 0xEB | 0xEB | latin small letter e with diaeresis | |
| U+00EC | ì | 0xEC | 0xEC | 0xEC | 0xEC | 0xEC | latin small letter i with grave | ||
| U+00ED | í | 0xED | 0xED | 0xED | 0xED | 0xED | 0xED | latin small letter i with acute | |
| U+00EE | î | 0xEE | 0xEE | 0xEE | 0xEE | 0xEE | 0xEE | latin small letter i with circumflex | |
| U+00EF | ï | 0xEF | 0xEF | 0xEF | 0xEF | 0xEF | latin small letter i with diaeresis | ||
| U+00F0 | ð | 0xF0 | 0xF0 | 0xF0 | latin small letter eth | ||||
| U+00F1 | ñ | 0xF1 | 0xF1 | 0xA8 | 0xF1 | 0xF1 | 0xB6 | latin small letter n with tilde | Hacks to get ñ for "Español": for iso 2, put ¨ (diaeresis 0xA8); for iso 16, put ¶ (0xB6) |
| U+00F2 | ò | 0xF2 | 0xF2 | 0xF2 | 0xF2 | 0xF2 | latin small letter o with grave | ||
| U+00F3 | ó | 0xF3 | 0xF3 | 0xF3 | 0xF3 | 0xF3 | 0xF3 | latin small letter o with acute | |
| U+00F4 | ô | 0xF4 | 0xF4 | 0xF4 | 0xF4 | 0xF4 | 0xF4 | latin small letter o with circumflex | Formerly also mapped from 0x88; redundant, ğ has TDM 0x88 codepoint now |
| U+00F5 | õ | 0xF5 | 0xF5 | 0xF5 | 0xF5 | latin small letter o with tilde | |||
| U+00F6 | ö | 0xF6 | 0xF6 | 0xF6 | 0xF6 | 0xF6 | 0xF6 | latin small letter o with diaeresis | |
| U+00F7 | ÷ | 0xF7 | 0xF7 | 0xF7 | 0xF7 | division sign | |||
| U+00F8 | ø | 0xF8 | 0xF8 | 0xF8 | 0xF8 | latin small letter o with stroke | |||
| U+00F9 | ù | 0xF9 | 0xF9 | 0xF9 | 0xF9 | 0xF9 | latin small letter u with grave | ||
| U+00FA | ú | 0xFA | 0xFA | 0xFA | 0xFA | 0xFA | 0xFA | latin small letter u with acute | |
| U+00FB | û | 0xFB | 0xFB | 0xFB | 0xFB | 0xFB | latin small letter u with circumflex | ||
| U+00FC | ü | 0xFC | 0xFC | 0xFC | 0xFC | 0xFC | 0xFC | latin small letter u with diaeresis | |
| U+00FD | ý | 0xFD | 0xFD | 0xFD | 0xFD | latin small letter y with acute | |||
| U+00FE | þ | 0xFE | 0xFE | 0xFE | latin small letter thorn | ||||
| U+00FF | ÿ | 0xFF | 0xFF | 0xFF | 0xFF | 0xFF | latin small letter y with diaeresis | ||
| U+0102 | Ă | 0x8B | 0xC3 | 0xC3 | latin capital letter a with breve | ||||
| U+0103 | ă | 0x9B | 0xE3 | 0xE3 | latin small letter a with breve | ||||
| U+0104 | Ą | 0xAA | ª | 0xA1 | ª | ª | 0xA1 | latin capital letter a with ogonek | |
| U+0105 | ą | 0xBA | º | 0xB1 | º | º | 0xA2 | latin small letter a with ogonek | |
| U+0106 | Ć | 0x82 | 0xC6 | 0xC5 | latin capital letter c with acute | ||||
| U+0107 | ć | 0x92 | 0xE6 | 0xE5 | latin small letter c with acute | ||||
| U+0108 | Ĉ | 0x86 | latin capital c with circumflex | Only in ISO-8859-3 (for Esperanto) at 0xC6 | |||||
| U+0109 | ĉ | 0x96 | latin small c with circumflex | Only in ISO-8859-3 (for Esperanto) at 0xE6 | |||||
| U+010C | Č | 0xAC | ¬ | 0xC8 | ¬ | ¬ | 0xB2 | latin capital letter c with caron | Substitute char "not sign" at 0xAC |
| U+010D | č | 0xAE | ® | 0xE8 | ® | ® | 0xB9 | latin small letter c with caron | Substitute char "registration sign" at 0xAE |
| U+010E | Ď | 0xB3 | ³ | 0xCF | ³ | ³ | latin capital letter d with caron | Substitute char "superscript three" at 0xB3 | |
| U+010F | ď | 0xB7 | · | 0xEF | · | · | latin small letter d with caron | Substitute char "middle dot" at 0xB7 | |
| U+0110 | Đ | 0xD0 | 0xD0 | 0xD0 | n/a | 0xD0 | latin capital letter d with stroke | Same glyph as U+00D0 latin capital letter eth | |
| U+0111 | đ | 0x90 | 0xF0 | n/a | 0xF0 | latin small letter d with stroke | |||
| U+0118 | Ę | 0xAB | « | 0xCA | « | « | 0xDD | latin capital letter e with ogonek | Substitute char "left-pointing double angle quotation mark" at 0xAB |
| U+0119 | ę | 0xBB | » | 0xEA | » | » | 0xFD | latin small letter e with ogonek | Substitute char "right-pointing double angle quotation mark" at 0xBB |
| U+011A | Ě | 0xA5 | ¥ | 0xCC | ¥ | ¥ | latin capital letter e with caron | Substitute char "yen sign" at 0xA5 | |
| U+011B | ě | 0xA3 | £ | 0xEC | £ | £ | latin small letter e with caron | Substitute char "pound sign" at 0xA3 | |
| U+011E | Ğ | 0x88 | 0xD0 | latin capital letter g with breve | As of TDM 2.13 (TDM codemap), 2.15 (turkish.map) | ||||
| U+011F | ğ | 0x98 | 0xF0 | latin small letter g with breve | As of TDM 2.13 (TDM codemap), 2.15 (turkish.map) | ||||
| U+0130 | İ | 0xDD | latin capital letter i with dot above | Turkish: utf8 "İ" will be mapped to "Î" (0xCE) | |||||
| U+0131 | ı | 0xFD | latin small letter dotless i | Turkish: utf8 "ı" will be mapped to ASCII "i" (0x69) | |||||
| U+0139 | Ĺ | 0xC5 | latin capital letter l with acute | ||||||
| U+013A | ĺ | 0xE5 | latin small letter l with acute | ||||||
| U+013D | Ľ | 0xA5 | latin capital letter l with caron | ||||||
| U+013E | ľ | 0xB5 | latin small letter l with caron | ||||||
| U+0141 | Ł | 0xB1 | ± | 0xA3 | ± | ± | 0xA3 | latin capital letter l with stroke | Substitute char "plus-minus sign" at 0xB1 |
| U+0142 | ł | 0xB5 | µ | 0xB3 | µ | µ | 0xB3 | latin small letter l with stroke | Substitute char "micro sign" at 0xB5 |
| U+0143 | Ń | 0x8C | 0xD1 | 0xD1 | latin capital letter n with acute | ||||
| U+0144 | ń | 0x9C | 0xF1 | 0xF1 | latin small letter n with acute | ||||
| U+0147 | Ň | 0x80 | 0xD2 | latin capital letter n with caron | |||||
| U+0148 | ň | 0xA1 | ¡ | 0xF2 | ¡ | ¡ | latin small letter n with caron | Substitute char "inverted exclamation mark" at 0xA1 | |
| U+0150 | Ő | 0xB0 | ° | 0xD5 | ° | ° | 0xD5 | latin capital letter o with double acute | Similiar to Ö, used in Hungarian. Substitute char "degree sign" at 0xB0 |
| U+0151 | ő | 0xB9 | ¹ | 0xF5 | ¹ | ¹ | 0xF5 | latin small letter o with double acute | Similiar to ö, used in Hungarian. Substitute char "superscript 1" at 0xB9 |
| U+0152 | Œ | 0xBC | ¼ | ¼ | 0xBC | 0xBC | latin capital ligature oe | Substitute char "vulgar fraction one quarter" at 0xBC | |
| U+0153 | œ | 0xBD | ½ | ½ | 0xBD | 0xBD | latin small ligature oe | Substitute char "vulgar fraction one half" at 0xBD | |
| U+0154 | Ŕ | 0x89 | 0xC0 | latin capital letter r with acute | |||||
| U+0155 | ŕ | 0x99 | 0xE0 | latin small letter r with acute | |||||
| U+0158 | Ř | 0xD7 | × | 0xD8 | × | × | latin capital letter r with caron | Substitute char "multiple sign" at 0xD7 | |
| U+0159 | ř | 0xF7 | ÷ | 0xF8 | ÷ | ÷ | latin small letter r with caron | Substitute char "divide sign" at 0xF7 | |
| U+015A | Ś | 0x81 | 0xA6 | 0xD7 | latin capital letter s with acute | ||||
| U+015B | ś | 0x91 | 0xB6 | 0xF7 | latin small letter s with acute | ||||
| U+015C | Ŝ | 0x85 | latin capital letter s with circumflex | Only in ISO-8859-3 (for Esperanto) at 0xDE | |||||
| U+015D | ŝ | 0x95 | latin small letter s with circumflex | Only in ISO-8859-3 (for Esperanto) at 0xFE | |||||
| U+015E | Ş | 0x8D | 0xAA | 0xDE | latin capital letter s with cedilla | Can stand in for "...comma under" | |||
| U+015F | ş | 0x9D | 0xBA | 0xFE | latin small letter s with cedilla | Can stand in for "...comma under" | |||
| U+0160 | Š | 0xA6 | ¦ | 0xA9 | ¦ | 0xA6 | 0xA6 | latin capital letter s with caron | Substitute char "broken bar" at 0xA6 |
| U+0161 | š | 0xA8 | ¨ | 0xB9 | ¨ | 0xA8 | 0xA8 | latin small letter s with caron | Substitute char "diaeresis" at 0xA8 |
| U+0162 | Ţ | 0x8E | 0xDE | latin capital letter t with cedilla | Can stand in for "...comma under" | ||||
| U+0163 | ţ | 0x9E | 0xFE | latin small letter t with cedilla | Can stand in for "...comma under" | ||||
| U+0164 | Ť | 0xB2 | ² | 0xAB | ² | ² | latin capital letter t with caron | Substitute char "superscript two" at 0xB2 | |
| U+0165 | ť | 0xB6 | ¶ | 0xBB | ¶ | ¶ | latin small letter t with caron | Substitute char "pilcrow sign" at 0xB6 | |
| U+016E | Ů | 0xA9 | © | 0xD9 | © | © | latin capital letter u with ring above | Substitute char "copyright sign" at 0xA9 | |
| U+016F | ů | 0xAF | ¯ | 0xF9 | ¯ | ¯ | latin small letter u with ring above | Substitute char "macron" at 0xAF | |
| U+0170 | Ű | 0xA2 | ¢ | 0xDB | ¢ | ¢ | 0xD8 | latin capital letter u with double acute | Similiar to Ü, used in Hungarian. Substitute char "cent sign" at 0xA2 |
| U+0171 | ű | 0xA4 | ¤ | 0xFB | ¤ | ¤ | 0xF8 | latin small letter u with double acute | Similiar to Ü, used in Hungarian. Substitute char "currency sign" at 0xA4 |
| U+0178 | Ÿ | 0xBE | ¾ | ¾ | 0xBE | 0xBE | latin capital letter y with diaeresis | Substitute char "vulgar fraction three quarters" at 0xBE | |
| U+0179 | Ź | 0x84 | 0xAC | | 0xAC | latin capital letter z with acute | ||||
| U+017A | ź | 0x94 | 0xBC | 0xAE | latin small letter z with acute | ||||
| U+017B | Ż | 0x83 | 0xAF | 0xAF | latin capital letter z with dot above | ||||
| U+017C | ż | 0x93 | 0xBF | 0xBF | latin small letter z with dot above | ||||
| U+017D | Ž | 0xB4 | ´ | 0xAE | ´ | 0xB4 | 0xB4 | latin capital letter z with caron | Substitute char "accute accent" at 0xB4 |
| U+017E | ž | 0xB8 | ¸ | 0xBE | ¸ | 0xB8 | 0xB8 | latin small letter z with caron | Substitute char "cedilla" at 0xB8 |
| U+01D3 | Ǔ | 0x8A | latin capital u with caron | Not found in ISO-8859. Pinyin tone marking | |||||
| U+01D4 | ǔ | 0x9A | latin small u with caron | Not found in ISO-8859. Pinyin tone marking | |||||
| U+0218 | Ș | 0x8D | 0xAA | latin capital letter s with comma below | See also "...with cedilla" | ||||
| U+0219 | ș | 0x9D | 0xBA | latin small letter s with comma below | See also "...with cedilla" | ||||
| U+021A | Ț | 0x8E | 0xDE | latin capital letter t with comma below | See also "...with cedilla" | ||||
| U+021B | ț | 0x9E | 0xFE | latin small letter t with comma below | See also "...with cedilla" | ||||
| U+02C7 | ˇ | 0xB7 | caron | ||||||
| U+02D8 | ˘ | 0xA2 | breve | ||||||
| U+02D9 | ˙ | 0xFF | dot above | ||||||
| U+02DB | ˛ | 0xB2 | ogonek | ||||||
| U+02DD | ˝ | 0xBD | double acute accent | ||||||
| U+1E90 | Ẑ | 0x87 | latin capital z with circumflex | Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | |||||
| U+1E91 | ẑ | 0x97 | latin small z with circumflex | Not found in ISO-8859. Rare use in Cyrillic-to-Latin transliteration, or Pinyin | |||||
| U+201D | ” | 0xB5 | right double quotation mark | ||||||
| U+201E | „ | 0xA5 | double low-9 quotation mark | ||||||
| U+20AC | € | 0xA4 | 0xA4 | euro sign |
For More
- I18N - Character mapping sketches the format and location of <language>.map files.
- I18N - Charset is the main article about TDM language use and various encodings.