ISO/IEC 10367
ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2,[1] defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873[2] (as opposed to ISO/IEC 8859, which defines character encodings at level 1 of ISO/IEC 4873).
Relationship to ISO/IEC 8859
The parts of ISO/IEC 8859 define complete encodings at level 1 of ISO/IEC 4873 (i.e. as stateless extended ASCII single-byte encodings, reserving the C1 area), and do not allow for use of multiple parts together. For use at levels 2 and 3 of ISO/IEC 4873 (i.e. with shift codes for additional graphical character sets), ISO/IEC 8859 stipulates that equivalent sets from ISO/IEC 10367 should be used instead.[3]
ISO/IEC 10367:1991 includes ASCII, as well as sets matching the G1 sets used for the right-hand sides (non-ASCII parts) of ISO/IEC 6937 (ITU T.51) and of ISO/IEC 8859 parts 1 through 9 (i.e. those parts which existed as of 1991, when it was published), a set of additional Roman characters supplementing some of those parts, and a set of box drawing characters (shown below).[2][4]
Supplementary G3 Latin set
ISO/IEC 10367 includes the ISO-IR-154 graphical set, which is intended to supplement Latin alphabets number 1, 2 and 5 (i.e. ISO-8859-1, ISO-8859-2 and ISO-8859-9).[4] Specifically, it is intended for use as a G3 set in a profile of ISO/IEC 4873 in which the G1 and G2 sets include the right hand side of ISO-8859-2, and also that of either ISO-8859-1 or ISO-8859-9.[5] These configurations allow the entire ISO/IEC 6937 repertoire (ITU T.51 Annex A) to be represented without the use of non-spacing codes.[6]
For instance, the letter Ĉ would be encoded under ISO/IEC 4873 level 2 as 0x8F 0x23
if this set is included.
Characters which also appear in ISO-8859-1 are shown below with a grey box, while those which also appear in ISO-8859-9 are shown with a green box. Under the current edition of ISO/IEC 4873 / ECMA-43 (although not earlier editions),[7] characters must be used from the lowest-numbered working set they appear in, hence those characters are not used from this G3 set when the respective ISO-8859 right-hand side set is used as the G1 or G2 set.[8]
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_/A_ | Ā 0100 |
Ĉ 0108 |
Ċ 010A |
Ė 0116 |
Ē 0112 |
Ĝ 011C |
‘ 2018 |
“ 201C |
™ 2122 |
← 2190 |
↑ 2191 |
→ 2192 |
↓ 2193 | |||
3_/B_ | ā 0101 |
ĉ 0109 |
ċ 010B |
ð 00F0 |
ė 0117 |
ē 0113 |
ĝ 011D |
’ 2019 |
” 201D |
♪ 266A |
⅛ 215B |
⅜ 215C |
⅝ 215D |
⅞ 215E | ||
4_/C_ | Ğ 011E |
Ġ 0120 |
Ģ 0122 |
Ĥ 0124 |
Ħ 0126 |
Ĩ 0128 |
İ 0130 |
Ī 012A |
Į 012E |
IJ 0132 |
Ĵ 0134 |
Ķ 0136 |
Ļ 013B |
Ŀ 013F |
Ņ 0145 | |
5_/D_ | — 2014 |
Ŋ 014A |
Ō 014C |
Œ 0152 |
Ŗ 0156 |
Ŝ 015C |
Ŧ 0166 |
Þ 00DE |
Ũ 0168 |
Ŭ 016C |
Ū 016A |
Ų 0172 |
Ŵ 0174 |
Ý 00DD |
Ŷ 0176 |
Ÿ 0178 |
6_/E_ | Ω 2126 |
ğ 011F |
ġ 0121 |
ģ 0123 |
ĥ 0125 |
ħ 0127 |
ĩ 0129 |
ı 0131 |
ī 012B |
į 012F |
ij 0133 |
ĵ 0135 |
ķ 0137 |
ļ 013C |
ŀ 0140 |
ņ 0146 |
7_/F_ | ĸ 0138 |
ŋ 014B |
ō 014D |
œ 0153 |
ŗ 0157 |
ŝ 015D |
ŧ 0167 |
þ 00FE |
ũ 0169 |
ŭ 016D |
ū 016B |
ų 0173 |
ŵ 0175 |
ý 00FD |
ŷ 0177 |
ʼn 0149 |
Box drawing set
The following shows the box drawing set from ISO/IEC 10367, which is registered for ISO/IEC 2022 use as ISO-IR-155. Although it does not make use of the 0x20/A0 or 0x7F/FF positions, it is registered as a 96-character set.[9]
Perl libintl includes a "ISO_10367-BOX" codec. This encodes/decodes ASCII over GL and the ISO-IR-155 box drawing set over GR with a few deviations. Specifically, it includes double-lined box-drawing characters in place of heavy-lined characters, and it replaces the upper half block (▀) at 0xCB with a private use character U+E019, documented as "Unit space B".[10]
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_/A_ | ||||||||||||||||
3_/B_ | ||||||||||||||||
4_/C_ | ┃ 2503 |
━ 2501 |
┏ 250F |
┓ 2513 |
┗ 2517 |
┛ 251B |
┣ 2523 |
┫ 252B |
┳ 2533 |
┻ 253B |
╋ 254B |
▀ 2580 |
▄ 2584 |
█ 2588 |
▪ 25AA |
|
5_/D_ | │ 2502 |
─ 2500 |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
░ 2591 |
▒ 2592 |
▓ 2593 |
||
6_/E_ | ||||||||||||||||
7_/F_ |
References
- ISO/IEC JTC 1/SC 2 (1991). "Information technology — Standardized coded graphic character sets for use in 8-bit codes". ISO. ISO/IEC 10367:1991.
- van Wingen, Johan W (1999). "8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367". Character sets. Letters, tokens and codes. Terena.
- ISO/IEC JTC 1/SC 2 (1998-02-12). Final Text of DIS 8859-10, Information Technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6 (PDF). ISO/IEC FDIS 8859-10:1998, JTC1/SC2 N2992, WG3 N415.
- "8-Bit Character Sets - ISO/IEC 10367". Guide to the use of Character Sets in Europe. DKUUG.
- ECMA (1990-03-01). "Supplementary Set for Latin Alphabets 1, 2 and 5" (PDF). ITSCJ/IPSJ. ISO-IR-155.
- ISO/IEC JTC 1/SC 2/WG 3 (1998-04-15). "Annex E: Alternative coded representation of the repertoire with no non-spacing diacritical marks". WD 6937, Coded graphic character set for text communication - Latin alphabet (PDF). p. 37. JTC1/SC2/N454.
- ECMA (1991). "Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard". ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (3rd ed.). p. 23.
- ECMA (1991). "Unique coding of characters". ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (3rd ed.). p. 10.
- ISO/IEC/JTC1/SC2/WG3 (1990-04-16). "Basic Box-Drawings Set" (PDF). ITSCJ/IPSJ. ISO-IR-155.
- Flohr, Guido. "Conversion routines for ISO_10367_BOX". libintl-perl. Locale::RecodeData::ISO_10367_BOX.