Supplementary material for "A New Input Technique for Accented Letters in Alphabetical Scripts" by Uwe Waldmann, 20th International Unicode Conference, Washington, DC, USA, January 2002. http://www.mpi-sb.mpg.de/~uwe/paper/AccInput-bibl.html Input methods for accented letters in alphabetical scripts should satisfy a number of (not fully compatible) requirements. In particular, they should be easy to learn and to memorize, they should induce little mental and physical stress, and they should make a large number of characters accessible using few dedicated keys and short key sequences. In the above-mentioned paper, we describe a new input technique called "SITMO" (Single Iteratable Trailing Modifier), and compare it with traditional modifier (dead key) and compose input methods. SITMO uses one dedicated key. Typing replaces the character immediately before the cursor by another character derived from the same base character, for instance "a" by "a with diaeresis", "a with diaeresis" by "a with grave", and "a with grave" by "a with acute", so that, say, "a with grave" can be obtained by typing . SITMO is parameterized by a language- dependent "replacement scheme" defining the possible substitutions. To evaluate the method, we have inspected sample newspaper texts in 28 European languages. In this file, we give for each language the letter frequencies of the sample texts and one or more SITMO replacement schemes. The frequency tables contain the number of occurrences for each derived letter (i.e., letter outside of {A,...,Z,a,...,z}) that occurs in the sample texts. They also contain those derived letters which are used in the standard orthography for the language but for which the number of occurrences in the sample texts is zero (e.g. "u with diaeresis" in Spanish). In addition, the frequency tables show the length of the sample text (always 100000 characters) and the total number of derived letters. With the exception of the dotted and undotted "i" in Azeri and Turkish, upper case letters are always mapped to the corresponding lower case letters. For each replacement scheme, the average number of keystrokes per derived letter (for the sample texts) is given. If derived letters have been turned into base letters (e.g. for French), the average numbers are given both including and excluding the new base letters. The replacement schemes presented here contain three kinds of letters: (a) All derived letters that are used in the standard orthography of the language (not marked). (b) All derived letters not occurring in the standard orthography but occurring in the sample texts in proper names, foreign loans, etc. (marked by [ ]). (c) Some derived letters occurring neither in the sample texts nor in the standard orthography of the language which serve to regularize the replacement scheme (marked by { }). It should be noted that replacement schemes used in a practical implementation will differ from the schemes presented here in several points: - They will contain non-letters, e.g., punctuation characters, mathematical symbols, letter-like symbols. - They will contain much more derived letters not occurring in the standard orthography. - The positions given here for letters of group (b) are often accidental, so many letters of this group will occur at different positions. ======================================================================== Albanian ------------------------------------------------------------------------ #chars 100000 ç 166 ë 7281 ü 3 #derived 7450 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. c ç e ë u [ü] average number of keystrokes per derived letter: 2.0000 ======================================================================== Azeri ------------------------------------------------------------------------ #chars 100000 ç 634 ə 7870 è 1 ğ 335 ı 2957 İ 234 ö 645 ş 1278 ü 1519 #derived 15473 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. c ç e ə [è] g ğ i ı I İ o ö s ş u ü average number of keystrokes per derived letter: 2.0001 ======================================================================== Catalan ------------------------------------------------------------------------ #chars 100000 à 290 á 23 â 1 ç 76 é 345 è 245 í 227 ï 29 ñ 15 ó 453 ò 129 ú 78 ü 18 #derived 1929 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a à [á] [â] c ç e é è i í ï n [ñ] o ó ò u ú ü average number of keystrokes per derived letter: 2.2312 ------------------------------------------------------------------------ // Regularized replacement scheme. // Grave and diaeresis in the third column, other derived letters in // the second column. a [á] à [â] c ç e é è i í ï n [ñ] o ó ò u ú ü average number of keystrokes per derived letter: 2.3696 ======================================================================== Croatian ------------------------------------------------------------------------ #chars 100000 č 778 ć 584 đ 143 ő 1 š 683 ž 438 ź 1 #derived 2628 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // Acute in the third column, other derived letters in the second // column. c č ć d đ o [ő] s š z ž [ź] average number of keystrokes per derived letter: 2.2226 ------------------------------------------------------------------------ // Shifted replacement scheme. // All derived letters in the second column (but "c" with acute in the // "x" row). c č d đ o [ő] s š x ć z ž [ź] average number of keystrokes per derived letter: 2.0004 ======================================================================== Czech ------------------------------------------------------------------------ #chars 100000 á 1612 ä 2 č 729 ď 14 ě 1012 é 788 í 2119 ň 49 ó 18 ö 2 ř 962 š 572 ť 30 ů 372 ú 81 ý 597 ž 749 #derived 9708 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a á [ä] c č d ď e ě é i í n ň o ó [ö] r ř s š t ť u ů ú y ý z ž average number of keystrokes per derived letter: 2.0899 ------------------------------------------------------------------------ // Regularized replacement scheme. // Vowels with acute and consonants with caron in the second column, // other derived letters in the third column. a á [ä] c č d ď e é ě i í n ň o ó [ö] r ř s š t ť u ú ů y ý z ž average number of keystrokes per derived letter: 2.1430 ======================================================================== Danish ------------------------------------------------------------------------ #chars 100000 å 824 æ 592 ä 5 à 2 é 13 ø 830 ö 4 ü 19 #derived 2289 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "ae" in the third column, other derived letters in the second // column. a å æ [ä] [à] e [é] o ø [ö] u [ü] average number of keystrokes per derived letter: 2.2674 ------------------------------------------------------------------------ // Shifted replacement scheme. // All derived letters in the second column (but "ae" in the "e" row). a å [ä] [à] e æ [é] o ø [ö] u [ü] average number of keystrokes per derived letter: 2.0114 ======================================================================== Dutch ------------------------------------------------------------------------ #chars 100000 á 4 ä 0 ë 45 é 18 è 3 ï 9 í 1 ó 9 ö 7 ü 2 ú 0 #derived 98 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a á ä e ë é [è] i ï í o ó ö u ü ú average number of keystrokes per derived letter: 2.3265 ------------------------------------------------------------------------ // Shifted replacement scheme. // Diaeresis in the second column, acute in the third column. a ä á e ë é [è] i ï í o ö ó u ü ú average number of keystrokes per derived letter: 2.3878 ======================================================================== Estonian ------------------------------------------------------------------------ #chars 100000 ä 1086 õ 880 ö 390 š 11 ü 702 ž 0 #derived 3069 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "o" with diaeresis in the third column, other derived letters in // the second column. a ä o õ ö s š u ü z ž average number of keystrokes per derived letter: 2.1271 ------------------------------------------------------------------------ // Regularized replacement scheme. // Tilde in the third column, other derived letters in the second // column. a ä o ö õ s š u ü z ž average number of keystrokes per derived letter: 2.2867 ======================================================================== Finnish ------------------------------------------------------------------------ #chars 100000 ä 3404 å 2 ö 490 š 0 ž 0 #derived 3896 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. a ä [å] o ö s š z ž average number of keystrokes per derived letter: 2.0005 ======================================================================== French ------------------------------------------------------------------------ #chars 100000 à 403 â 20 ã 3 ç 38 é 1808 è 323 ê 145 ë 21 î 38 ï 8 ô 55 œ 21 ù 23 û 18 ü 0 #derived 2924 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a à â [ã] c ç e é è ê ë i î ï o ô œ u ù û ü average number of keystrokes per derived letter: 2.2562 ------------------------------------------------------------------------ // Regularized replacement scheme. // Grave in the third column, circumflex in the fourth column, // diaeresis in the fifth column, other derived letters in the second // column. a {á} à â [ã] c ç e é è ê ë i {í} {ì} î ï o œ {ò} ô u {ú} ù û ü average number of keystrokes per derived letter: 2.4778 ------------------------------------------------------------------------ // Shifted replacement scheme. // Circumflex in the third column, diaeresis in the fourth column, // other derived letters in the second column (but "e" with grave in // the "i" row). a à â [ã] c ç e é ê ë i è î ï o œ ô u ù û ü average number of keystrokes per derived letter: 2.1163 ------------------------------------------------------------------------ // Regularized replacement scheme (using "e" with acute as a base // letter). // Circumflex in the third column, diaeresis in the fourth column, // other derived letters in the second column. a à â [ã] c ç e è ê ë é i {ì} î ï o œ ô u ù û ü average number of keystrokes per derived letter: 2.3047 (per derived letter including é: 1.4979) ======================================================================== German ------------------------------------------------------------------------ #chars 100000 ä 542 é 14 ë 1 ö 204 ß 282 ü 545 #derived 1588 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. a ä e [é] [ë] o ö s ß u ü average number of keystrokes per derived letter: 2.0006 ======================================================================== Hungarian ------------------------------------------------------------------------ #chars 100000 á 3310 é 2811 í 584 ö 889 ó 832 ő 662 ü 522 ú 228 ű 180 #derived 10018 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a á e é i í o ö ó ő u ü ú ű average number of keystrokes per derived letter: 2.2739 ------------------------------------------------------------------------ // Regularized replacement scheme. // Acute in the second column, diaeresis in the third column, // double acute in the fourth column. a á e é i í o ó ö ő u ú ü ű average number of keystrokes per derived letter: 2.3089 ------------------------------------------------------------------------ // Shifted replacement scheme. // Double acute in the third column, other derived letters in the // second column. (but "o" and "u" with diaeresis or double acute in // the "l" or "j" row, respectively). a á e é i í o ó l ö ő u ú j ü ű average number of keystrokes per derived letter: 2.0840 ------------------------------------------------------------------------ // Regularized replacement scheme (using "o" and "u" with diaeresis as // base letters). // All derived letters in the second column. a á e é i í o ó ö ő u ú ü ű average number of keystrokes per derived letter: 2.0000 (per derived letter including ö and ü: 1.8592) ======================================================================== Icelandic ------------------------------------------------------------------------ #chars 100000 á 1346 æ 790 ð 3581 é 457 í 1292 ó 784 ö 653 þ 1100 ú 411 ý 245 #derived 10659 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // Vowels with acute and derived consonants in the second column, // other derived vowels in the third column. a á æ d ð e é i í o ó ö t þ u ú y ý average number of keystrokes per derived letter: 2.1354 ======================================================================== Irish Gaelic ------------------------------------------------------------------------ #chars 100000 á 1494 é 1208 í 1507 ó 715 ú 737 #derived 5661 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. a á e é i í o ó u ú average number of keystrokes per derived letter: 2.0000 ======================================================================== Italian ------------------------------------------------------------------------ #chars 100000 à 189 è 244 é 44 ë 1 ì 39 ò 50 ù 75 #derived 642 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // Grave in the second column, acute in the third column. a à e è é [ë] i ì o ò u ù average number of keystrokes per derived letter: 2.0717 ======================================================================== Latvian ------------------------------------------------------------------------ #chars 100000 ā 3427 č 88 ē 1493 ģ 107 ī 1718 ķ 95 ļ 226 ņ 337 ŗ 0 š 968 ū 404 ž 123 #derived 8986 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. a ā c č e ē g ģ i ī k ķ l ļ n ņ r ŗ s š u ū z ž average number of keystrokes per derived letter: 2.0000 ======================================================================== Lithuanian ------------------------------------------------------------------------ #chars 100000 ą 594 č 355 ė 1469 ę 168 į 486 š 853 ų 1146 ū 426 ž 592 #derived 6089 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a ą c č e ė ę i į s š u ų ū z ž average number of keystrokes per derived letter: 2.0976 ------------------------------------------------------------------------ // Regularized replacement scheme. // Macron and dot in the third column, other derived letters in the // second column. a ą c č e ę ė i į s š u ų ū z ž average number of keystrokes per derived letter: 2.3112 ------------------------------------------------------------------------ // Shifted replacement scheme. // Macron in the third column, other derived letters in the second // column (but "e" with dot in the "d" row). a ą c č d ė e ę i į s š u ų ū z ž average number of keystrokes per derived letter: 2.0700 ------------------------------------------------------------------------ // Regularized replacement scheme (using "e" with dot as a base letter). // Macron in the third column, other derived letters in the second // column. a ą c č e ę ė i į s š u ų ū z ž average number of keystrokes per derived letter: 2.0922 (per derived letter including ė: 1.8287) ======================================================================== Maltese ------------------------------------------------------------------------ #chars 100000 à 94 ċ 406 è 8 ġ 517 ħ 2138 ì 1 ò 9 ù 2 ż 594 #derived 3769 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. a à c ċ e è g ġ h ħ i ì o ò u ù z ż average number of keystrokes per derived letter: 2.0000 ======================================================================== Norwegian ------------------------------------------------------------------------ #chars 100000 å 1251 æ 145 ä 3 à 0 é 9 ê 0 ø 697 ö 15 ô 2 ò 1 ó 0 ü 2 #derived 2125 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "ae" in the third column, other derived letters in the second // column (except for some very rare ones). a å æ [ä] à e é ê o ø [ö] ô ò ó u [ü] average number of keystrokes per derived letter: 2.0814 ======================================================================== Polish ------------------------------------------------------------------------ #chars 100000 ą 795 ć 368 ę 880 ł 1416 ń 156 ó 705 ś 516 ż 682 ź 50 ž 1 #derived 5569 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "z" with acute in the third column, other derived letters in the // second column. a ą c ć e ę l ł n ń o ó s ś z ż ź [ž] average number of keystrokes per derived letter: 2.0093 ------------------------------------------------------------------------ // Regularized replacement scheme. // Dot in the third column, other derived letters in the second // column. a ą c ć e ę l ł n ń o ó s ś z ź ż [ž] average number of keystrokes per derived letter: 2.1228 ======================================================================== Portuguese ------------------------------------------------------------------------ #chars 100000 ã 665 á 327 à 108 â 38 ç 505 é 366 ê 139 è 1 í 230 ó 193 õ 139 ô 16 ú 102 ü 1 #derived 2830 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a ã á à â c ç e é ê [è] i í o ó õ ô u ú ü average number of keystrokes per derived letter: 2.3428 ------------------------------------------------------------------------ // Shifted replacement scheme. // Circumflex and diaeresis in the third column, grave in the fourth // column, other derived letters in the second column (but "a" and "o" // with tilde in the "s" and "p" row, respectively). a á â à s ã c ç e é ê [è] i í o ó ô p õ u ú ü average number of keystrokes per derived letter: 2.1456 ------------------------------------------------------------------------ // Regularized replacement scheme (using "a" and "o" with tilde as base // letters). // Circumflex and diaeresis in the third column, grave in the fourth // column, other derived letters in the second column. a á â à ã c ç e é ê [è] i í o ó ô õ u ú ü average number of keystrokes per derived letter: 2.2034 (per derived letter including ã and õ: 1.8615) ======================================================================== Romanian ------------------------------------------------------------------------ #chars 100000 ă 2260 â 465 é 1 î 818 ș 960 ț 1077 #derived 5581 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "a" with circumflex in the third column, other derived letters in // the second column. a ă â e [é] i î s ș t ț average number of keystrokes per derived letter: 2.0833 ------------------------------------------------------------------------ // Regularized replacement scheme. a â ă e [é] i î s ș t ț average number of keystrokes per derived letter: 2.4049 ======================================================================== Slovak ------------------------------------------------------------------------ #chars 100000 á 1688 ä 67 č 883 ď 120 é 596 í 935 ľ 318 ĺ 13 ň 113 ô 139 ó 92 ö 8 ŕ 4 š 761 ť 434 ú 747 ý 853 ž 605 ź 1 #derived 8377 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a á ä c č d ď e é i í l ľ ĺ n ň o ô ó [ö] r ŕ s š t ť u ú y ý z ž [ź] average number of keystrokes per derived letter: 2.0226 ------------------------------------------------------------------------ // Regularized replacement scheme. // Diaeresis, circumflex and "l" with acute in the third column, other // derived letters in the second column. a á ä c č d ď e é i í l ľ ĺ n ň o ó ô [ö] r ŕ s š t ť u ú y ý z ž [ź] average number of keystrokes per derived letter: 2.0282 ======================================================================== Slovene ------------------------------------------------------------------------ #chars 100000 č 1150 š 788 ž 533 #derived 2471 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // All derived letters in the second column. c č s š z ž average number of keystrokes per derived letter: 2.0000 ======================================================================== Spanish ------------------------------------------------------------------------ #chars 100000 á 334 é 174 í 414 ñ 166 ó 766 ö 1 ú 105 ü 0 #derived 1960 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // Diaeresis in the third column, other derived letters in the second // column. a á e é i í n ñ o ó [ö] u ú ü average number of keystrokes per derived letter: 2.0005 ======================================================================== Swedish ------------------------------------------------------------------------ #chars 100000 ä 1538 å 1330 à 2 é 29 ö 1268 ü 5 #derived 4172 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // "a" with ring in the third column, other derived letters in the // second column. a ä å [à] e é o ö u [ü] average number of keystrokes per derived letter: 2.3198 ------------------------------------------------------------------------ // Shifted replacement scheme. // All derived letters in the second column (but "a" with ring in the // "s" row). a ä [à] s å e é o ö u [ü] average number of keystrokes per derived letter: 2.0005 ------------------------------------------------------------------------ // Regularized replacement scheme (using "a" with ring as a base // letter). // All derived letters in the second column. a ä à å e é o ö u ü average number of keystrokes per derived letter: 2.0007 (per derived letter including å: 1.6817) ======================================================================== Turkish ------------------------------------------------------------------------ #chars 100000 â 0 ç 817 ğ 872 ı 4058 İ 199 ö 677 ş 1181 ü 1529 û 1 #derived 9334 ------------------------------------------------------------------------ // Frequency-based replacement scheme. // Circumflex in the third column, other derived letters in the second // column. a {ä} [â] c ç g ğ i ı I İ o ö s ş u ü [û] average number of keystrokes per derived letter: 2.0001 ======================================================================== West Frisian ------------------------------------------------------------------------ #chars 100000 â 213 ä 1 ê 239 ë 9 é 2 è 1 ï 2 í 1 ô 71 ö 2 û 327 ú 221 ü 0 #derived 1089 ------------------------------------------------------------------------ // Frequency-based replacement scheme. a â ä e ê ë é [è] i ï [í] o ô ö u û ú ü average number of keystrokes per derived letter: 2.2213 ------------------------------------------------------------------------ // Regularized replacement scheme. // Circumflex in the second column, acute in the third column, // diaeresis in the fourth column. a â {á} ä e ê é ë [è] i {î} [í] ï o ô {ó} ö u û ú ü average number of keystrokes per derived letter: 2.2342 ========================================================================