Jump to content

User:HarJIT/sandbox/Cyrillic

From Wikipedia, the free encyclopedia
  • Telegraphy group (not represented in WHATWG)
    • Russian Morse code (uses Russian Cyrillic letters in place of English Roman letters)
    • MTK-2 (5-bit stateful, adapts ITA2 Baudot-Murray for Russian Morse equivalents)
  • 7-bit modified ASCII group (not represented in WHATWG)
    • KOI7-switched (basic Russian coverage without the Yo (or capital Hard Yer), 7-bit stateful, quasi-ASCII order by MTK-2 correspondence in last four rows)
    • Short-KOI (similar to KOI7-switched, not stateful, 7-bit unicase, no Hard Yer)
    • Other encodings with the same idea but other decisions and/or other languages (both Cyrillic YUSCII variants, the three WST Cyrillic variants, …)
  • KOI-8 group (represented in WHATWG by KOI8-R and KOI8-U)
    • KOI-8 (de jure basic KOI-8, like KOI7-switched but using the high bit, not locking shifts; GR region has only letters)
    • KOI8-B (de facto basic KOI-8, extends to full Russian (with Yo) and Bulgarian (with capital Hard Yer) coverage)
    • KOI8-R (add box drawing characters)
      • KOI8-U (sacrifices some box drawing for full Ukrainian coverage, otherwise KOI8-R)
        • KOI8-RU (sacrifices slightly more box drawing for full Belarusian coverage, otherwise KOI8-U)
    • KOI8-E (adds Serbian and Macedonian coverage, removes all box drawing, reduces Ukrainian coverage to Soviet spelling and puts some symbols in its place, otherwise those characters which are in common are in the same locations as in KOI8-RU)
      • KOI8-F (includes all Slavic KOI-8 letter allocations with some punctuation)
    • Various non-Slavic adaptations (KOI8-T, …)
  • Main (Osnovnaja) group (represented in WHATWG by ISO-8859-5 ("Cyrillic (ISO)"))
    • Main code page (from scratch, Russian alphabet in more of a natural order, with box drawing and symbols; the box drawing is included in rows 8_ through A_, meaning that letters start from row B_; basic Russian alphabet (without Yo) included in two rows for each case, then both cases of Yo at the start of the last row)
    • ISO-IR-153 (removes box drawing and symbols, moves capital Yo so that the A_ row case-mirrors the F_ row; its descent from Main code page is apparently to blame for fitting the non-Russian letters and Yo around the basic Russian ones rather than before them)
    • ISO-8859-5 (minimally, but by needs entirely incompatibly, adapts KOI8-E to be a superset of ISO-IR-153; also, changes universal currency sign to a section sign)
    • IBM-915 (adds some box drawing to ISO-8859-5)
    • IBM-1124 (adds full Ukrainian coverage to ISO-8859-5 at the expense of strictly correct rendering of Macedonian, although it remains legible)
    • Various non-Slavic adaptations (ISO-IR-200, ISO-IR-201, …)
    • Cyrillic (Unicode block) (bases its layout on ISO-8859-5 but with many additions)
  • Alternative (Aľtjernativnaja) group (represented in WHATWG by IBM-866 ("Cyrillic (DOS)"))
    • Alternative code page (re-arranges Main code page so box drawing is compatible with OEM-US but letter order is preserved; last two rows unchanged)
    • RST 2018-91 (adds full Ukrainian support to Alternative code page over symbols)
      • IBM-848 (Euro sign update of RST 2018-91, replaces universal currency sign)
    • IBM-866 (adds basic Ukrainian and Belarusian letters to Alternative code page over symbols, omitting those non-Soviet or with Latin homoglyphs, changes a few more symbols to match OEM-US)
      • IBM-808 (Euro sign update of IBM-866, replaces universal currency sign)
      • IBM-1131 (adds the rest of the Ukrainian and Belarusian letters to IBM-866)
        • IBM-849 (Euro sign update of IBM-1131, replaces universal currency sign)
    • KOI-8 N2 (does not use the KOI-8 layout despite the name; redesigns the last row of Alternative code page, adding Ukrainian and Belarusian support but moving the Yo)
      • KOI-8 N1 (subsets KOI8-N2, removing non-Russian letters and mixed single/double lined box drawing)
    • Various non-Slavic adaptations (FreeDOS extensions, …)
  • DOS-transformed 8859 group (not represented in WHATWG)
    • IBM-855 (somewhat analogous to IBM-850 but for the ISO-8859-5 and KOI8-E repertoires (consistent with IBM-853 being the one for ISO-8859-3); unlike all of the other groups, individual capitals always follow individual minuscules, with either nothing or only non-letters in between (there are no ranges containing multiple continuous Cyrillic letters of a single case); letter order almost follows KOI8-E but places the Hard Yer between the Dzhe and the Yu)
    • IBM-872 (Euro sign update of IBM-855, replaces universal currency sign, you get the drill)
  • Windows group (represented in WHATWG by Windows-1251 ("Cyrillic (Windows)"))
    • Windows-1251 (basic Russian letters (excluding Yo) in natural order in the last four rows, other Slavic letters (including Yo) scattered around)
    • Amiga-1251 (Russian letters (including Yo) from Windows-1251, otherwise mostly following ISO-8859-1; resembles an attempt to create an unofficial Russian adaptation of ECMA 94 (as opposed to ISO-8859-5, which is pretty much unrelated to ECMA 94 despite being an ISO 8859 part))
    • RFC 1345's "ECMA-Cyrillic" (purports to be ISO-IR-111 (that is, KOI8-E); actually merely re-orders the rows of ISO-8859-5 to move the Basic Russian rows to the end (and re-instates the universal currency sign) in what at least resembles an unsuccessful attempt to reconstruct ISO-IR-111 based on ISO-8859-5 and an incomplete description of the changes made; although, this does mean that it is compatible for basic Russian support (not counting Yo) with Windows-1251)
  • Macintosh group (represented in WHATWG by Mac OS Cyrillic (Euro))
    • Mac OS Cyrillic (Original) (basic Russian capitals in natural order in first two non-ASCII rows, minuscules in last two, except the minuscule Ya which is displaced up by two rows and immediately preceded by both cases of the Yo; other Slavic letters are scattered amongst preserved MacRoman symbols and punctuation, but several (including the aforementioned Yo) are in pairs of capital followed by minuscule; Ukrainian limited to Soviet orthography)
    • Mac OS Ukrainian (full Ukrainian support, overwriting a couple of symbols)
    • Mac OS Cyrillic (Euro) (based on Ukrainian version, adds euro sign over universal currency sign)
    • Various non-Slavic adaptations (Mac OS Barents Cyrillic, Mac OS Turkic Cyrillic, …)
  • JIS group (represented in WHATWG by Windows-31J, EUC-JP, ISO-2022-JP, GBK, Unified Hangul Code)
    • (Unlike the groups above, and due to differing lead bytes and encoding schemes, these mostly aren't intercompatible (except EUC-JP with EUC-CN) for any Cyrillic subset despite having the same layout.)
    • Row 7 of JIS X 0208 and of GB 2312 (capitals in A_ through C_ (starting at A1), minuscules in homologous D_ through F_; full Russian alphabet (only) in natural order; includes Yo in the main alphabetical ordering unlike the other groups).
    • JIS X 0213 adds various symbols (plus the va/vi/ve/vo katakana) in the unallocated space in the ku, but otherwise leaves it unchanged.
    • Shift JIS obviously changes the encoding bytes.
    • Moved to row 12 in Wansung and row 5 in KPS 9566, but with the same layout; obviously this means different lead bytes.