ISO/IEC 8859-8

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 (CCSIDs 916 and 5012) to it.[2][3][4] This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO-8859-8: Latin/Hebrew
MIME / IANAISO-8859-8
Alias(es)iso-ir-138, hebrew, csISOLatinHebrew[1]
Language(s)Hebrew, English
StandardISO/IEC 8859-8, ECMA-121, SI 1311
Classificationextended ASCII, ISO 8859
Based onDEC Hebrew (8-bit), ISO/IEC 8859-1
Other related encoding(s)Windows-1255

ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 (code page 28598) is for “visual order”, and ISO-8859-8-I (code page 38598) is for logical order. But usually in practice, and required for XML documents, ISO-8859-8 also stands for logical order text. The WHATWG Encoding Standard used by HTML5 treats ISO-8859-8 and ISO-8859-8-I as distinct encodings with the same mapping due to influence on the layout direction, but notes that this no longer applies to ISO-8859-6 (Arabic), only to ISO-8859-8.[5]

There is also ISO-8859-8-E which supposedly requires directionality to be explicitly specified with special control characters; this latter variant is in practice unused.

The Microsoft Windows code page for Hebrew, Windows-1255, is mostly an extension of ISO/IEC 8859-8 without C1 controls, except for the omission of the double underscore, and replacement of the generic currency sign (¤) with the sheqel sign (₪). It adds support for vowel points as combining characters, and some additional punctuation.

Over a decade after the publication of that standard, Unicode is preferred, at least for the Internet[6] (meaning UTF-8, the dominant encoding for web pages). ISO-8859-8 is used by less that 0.1% of websites.[7]

Code page layout

ISO/IEC 8859-8[8][9][10][11]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
0
1_
16
2_
32
SP
0020
!
0021
"
0022
#
0023
$
0024
%
0025
&
0026
'
0027
(
0028
)
0029
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
3_
48
0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B
<
003C
=
003D
>
003E
?
003F
4_
64
@
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5_
80
P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
[
005B
\
005C
]
005D
^
005E
_
005F
6_
96
`
0060
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
7_
112
p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A
{
007B
|
007C
}
007D
~
007E
8_
128
9_
144
A_
160
NBSP
00A0
¢
00A2
£
00A3
¤
00A4
¥
00A5
¦
00A6
§
00A7
¨
00A8
©
00A9
×
00D7
«
00AB
¬
00AC
SHY
00AD
®
00AE
¯
00AF
B_
176
°
00B0
±
00B1
²
00B2
³
00B3
´
00B4
µ
00B5

00B6
·
00B7
¸
00B8
¹
00B9
÷
00F7
»
00BB
¼
00BC
½
00BD
¾
00BE
C_
192
D_
208

2017
E_
224
א
05D0
ב
05D1
ג
05D2
ד
05D3
ה
05D4
ו
05D5
ז
05D6
ח
05D7
ט
05D8
י
05D9
ך
05DA
כ
05DB
ל
05DC
ם
05DD
מ
05DE
ן
05DF
F_
240
נ
05E0
ס
05E1
ע
05E2
ף
05E3
פ
05E4
ץ
05E5
צ
05E6
ק
05E7
ר
05E8
ש
05E9
ת
05EA
LRM
200E
RLM
200F

  Letter  Number  Punctuation  Symbol  Other  Undefined

  Different from DEC Hebrew (8-bit) to match ISO-8859-1.
  Different from both DEC Hebrew (8-bit) and ISO-8859-1.

FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.

2002 Israeli Standard extensions

Israeli Standard SI1311:2002 matches ISO/IEC 8859-8:1999 except for a number of additional character allocations for the euro sign, new shekel sign and more advanced explicit bidirectional formatting.[12]

SI1311:2002[12]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
D_
208

20AC

20AA
LRO
202D
RLO
202E
PDF
202C

2017
E_
224
א
05D0
ב
05D1
ג
05D2
ד
05D3
ה
05D4
ו
05D5
ז
05D6
ח
05D7
ט
05D8
י
05D9
ך
05DA
כ
05DB
ל
05DC
ם
05DD
מ
05DE
ן
05DF
F_
240
נ
05E0
ס
05E1
ע
05E2
ף
05E3
פ
05E4
ץ
05E5
צ
05E6
ק
05E7
ר
05E8
ש
05E9
ת
05EA
LRE
202A
RLE
202B
LRM
200E
RLM
200F
  Absent from ISO/IEC 8859-8:1999, added in SI1311:2002.

See also

References

  1. Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
  2. "Code page 916 information document". Archived from the original on 2017-02-16.
  3. "CCSID 916 information document". Archived from the original on 2014-11-29.
  4. "CCSID 5012 information document". Archived from the original on 2016-03-27.
  5. van Kesteren, Anne. "9. Legacy single-byte encodings". Encoding Standard. WHATWG. Note: ISO-8859-8 and ISO-8859-8-I are distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and "ISO-8859-6-I" as well, that is no longer true.
  6. John, Nicholas A. (2013). "The Construction of the Multilingual Internet: Unicode, Hebrew, and Globalization". Journal of Computer-Mediated Communication. 18 (3): 321–338. doi:10.1111/jcc4.12015. ISSN 1083-6101. Background: the problem of Hebrew and the Internet
  7. "Usage Statistics of ISO-8859-8 for Websites, January 2019". w3techs.com. Retrieved 2019-01-17.
  8. Code Page CPGID 00916 (pdf) (PDF), IBM
  9. Code Page CPGID 00916 (txt), IBM
  10. International Components for Unicode (ICU), ibm-916_P100-1995.ucm, 2002-12-03
  11. International Components for Unicode (ICU), ibm-5012_P100-1999.ucm, 2002-12-03
  12. Standards Institution of Israel. "ISO-IR 234: Latin/Hebrew character set for 8-bit codes" (PDF). Information Technology Standards Commission of Japan (ITSCJ/IPSJ).
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.