KMKB0028: HOWTO: Map an ANSI keyboard to the same Unicode code-points

In some situations, a keyboard developer may wish to retain a legacy encoding for a keyboard. However, for compatibility reasons, they may wish to update the keyboard to use Unicode characters internally. The following table shows how to map the ANSI (Windows Western European, cp1252) characters to Unicode characters, without changing the legacy font mapping.

Important Note: This does not translate an ANSI keyboard into a true Unicode keyboard. However, it does provide a pathway for transition where an application does not support ANSI input correctly. One application that has had reported issues with ANSI input in this way is Paratext 6.

This table is based on data from http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

ANSI ValueUnicode ValueUnicode Character Name
x80d128U+20ACEURO_SIGN
x81d129*
x82d130U+201ASINGLE_LOW-9_QUOTATION_MARK
x83d131U+0192LATIN_SMALL_LETTER_F_WITH_HOOK
x84d132U+201EDOUBLE_LOW-9_QUOTATION_MARK
x85d133U+2026HORIZONTAL_ELLIPSIS
x86d134U+2020DAGGER
x87d135U+2021DOUBLE_DAGGER
x88d136U+02C6MODIFIER_LETTER_CIRCUMFLEX_ACCENT
x89d137U+2030PER_MILLE_SIGN
x8Ad138U+0160LATIN_CAPITAL_LETTER_S_WITH_CARON
x8Bd139U+2039SINGLE_LEFT-POINTING_ANGLE_QUOTATION_MARK
x8Cd140U+0152LATIN_CAPITAL_LIGATURE_OE
x8Dd141*
x8Ed142U+017DLATIN_CAPITAL_LETTER_Z_WITH_CARON
x8Fd143*
x90d144*
x91d145U+2018LEFT_SINGLE_QUOTATION_MARK
x92d146U+2019RIGHT_SINGLE_QUOTATION_MARK
x93d147U+201CLEFT_DOUBLE_QUOTATION_MARK
x94d148U+201DRIGHT_DOUBLE_QUOTATION_MARK
x95d149U+2022BULLET
x96d150U+2013EN_DASH
x97d151U+2014EM_DASH
x98d152U+02DCSMALL_TILDE
x99d153U+2122TRADE_MARK_SIGN
x9Ad154U+0161LATIN_SMALL_LETTER_S_WITH_CARON
x9Bd155U+203ASINGLE_RIGHT-POINTING_ANGLE_QUOTATION_MARK
x9Cd156U+0153LATIN_SMALL_LIGATURE_OE
x9Dd157*
x9Ed158U+017ELATIN_SMALL_LETTER_Z_WITH_CARON
x9Fd159U+0178LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS

* The 5 characters x81, x8D, x8F, x90 and x9D do not have a mapping to Unicode. For compatibility reasons, you should avoid using these characters even with pure ANSI keyboards, as some applications may strip them from your data.

Example

The following rule is in an ANSI keyboard with a legacy font.

x9A + 'm' > x9B    c 'aa' + 'm' -> 'am'

That would would be converted, according to the table above, as follows:

U+0161 + 'm' > U+203A    c 'aa' + 'm' -> 'am'

Applies to:

  • Keyman Developer 5.0
  • Keyman Developer Professional 6.0
  • Keyman Developer Professional 6.2
  • Keyman Developer Standard 6.0
  • Keyman Developer Standard 6.2
  • Keyman Developer Professional 7.0

KB article KMKB0028 created on 03 Jul 2007

Browse all knowledge base articles