Unicode character property

The Unicode Standard assigns character properties to each code point.[1] These properties can be used to handle "characters" (code points) in processes, like in line-breaking, script direction right-to-left or applying controls. Slightly inconsequently, some "character properties" are also defined for code points that have no character assigned, and code points that are labeled like "<not a character>". The character properties are described in Standard Annex #44.[2]

Properties have levels of forcefulness: normative, informative, contributory, or provisional. For simplicity of specification, a character property can be assigned by specifying a continuous range of code points that have the same property.

Name

A Unicode character is assigned a unique Name (na).[1] The name is composed of uppercase letters A–Z, digits 0–9, - (hyphen-minus) and <space>. Some sequences are excluded: names beginning with a space or hyphen, names ending with a space or hyphen, repeated spaces or hyphens, and space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are tens of thousands, are named in the pattern "cjk unified ideograph-hhhh". For example, U+4E00 CJK UNIFIED IDEOGRAPH-4E00. Formatting characters are named too: U+00A0   NO-BREAK SPACE.

The following classes of code point do not have a Name (na=""): Controls (General Category: Cc), Private use (Co), Surrogate (Cs), Non-characters (Cn) and Reserved (Cn). They may be referenced, informally, by a generic or specific meta-name, called "Code Point Labels": <control>, <control-0088>, <reserved>, <noncharacter-hhhh>, <private-use-hhhh>, <surrogate>. Since these labels contain <>-brackets, they can never appear as a Name, which prevents confusion.

Version 1.0 names

In version 2.0 of Unicode, many names were changed. From then on the rule "a name will never change" came into effect, including the strict (normative) use of alias names. Disused version 1.0-names were moved to the property Alias, to provide some backward compatibility.

Character name alias

Starting from Unicode version 2.0, the published name for a code point will never change. Therefore, in the event of a character name being misspelled or if the character name is completely wrong or seriously misleading, a formal Character Name Alias may be assigned to the character, and this alias may be used by applications instead of the actual defective character name.[1] For example, U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET has the character name alias "PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET" in order to mitigate the misspelling of "bracket" as "brakcet" in the actual character name; U+A015 YI SYLLABLE WU has the character name alias "YI SYLLABLE ITERATION MARK" because contrary to the character name it does not have a fixed syllabic value.

In addition to character name aliases which are corrections to defective character names, some characters are assigned aliases which are alternative names or abbreviations. Five types of character name aliases are defined in the Unicode Standard:

  • Correction: corrections for misspelled or seriously incorrect character names;
  • Control: ISO 6429 names for C0 and C1 control functions (which are not assigned character names in the Unicode Standard);
  • Alternate: alternative names for some format characters (only U+FEFF "ZERO WIDTH NO-BREAK SPACE" which has the alias "BYTE ORDER MARK");
  • Figment: Documented labels for some C1 control code functions which are not actual names in any standard;
  • Abbreviation: Abbreviations or acronyms for control codes, format characters, spaces, and variation selectors.

All formal character name aliases follow the rules for permissible character names, and are guaranteed to be unique within both the character name alias and the character name namespaces (for this reason, the ISO 6429 name "BELL" is not defined as an alias for U+0007 because U+1F514 is named "BELL").[1]

As of Unicode version 12.1, twenty-eight formal character name aliases are defined as corrections for defective character names.[3]

CharacterNameAlias
01A2ƢLATIN CAPITAL LETTER OILATIN CAPITAL LETTER GHA
01A3ƣLATIN SMALL LETTER OILATIN SMALL LETTER GHA
0709܉SYRIAC SUBLINEAR COLON SKEWED RIGHTSYRIAC SUBLINEAR COLON SKEWED LEFT
0CDEKANNADA LETTER FAKANNADA LETTER LLLA
0E9DLAO LETTER FO TAMLAO LETTER FO FON
0E9FLAO LETTER FO SUNGLAO LETTER FO FAY
0EA3LAO LETTER LO LINGLAO LETTER RO
0EA5LAO LETTER LO LOOTLAO LETTER LO
0FD0TIBETAN MARK BSKA- SHOG GI MGO RGYANTIBETAN MARK BKA- SHOG GI MGO RGYAN
11ECHANGUL JONGSEONG IEUNG-KIYEOKHANGUL JONGSEONG YESIEUNG-KIYEOK
11EDHANGUL JONGSEONG IEUNG-SSANGKIYEOKHANGUL JONGSEONG YESIEUNG-SSANGKIYEOK
11EEHANGUL JONGSEONG SSANGIEUNGHANGUL JONGSEONG SSANGYESIEUNG
11EFHANGUL JONGSEONG IEUNG-KHIEUKHHANGUL JONGSEONG YESIEUNG-KHIEUKH
2118SCRIPT CAPITAL PWEIERSTRASS ELLIPTIC FUNCTION
2448OCR DASHMICR ON US SYMBOL
2449OCR CUSTOMER ACCOUNT NUMBERMICR DASH SYMBOL
2B7ALEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKELEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE
2B7CRIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKERIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE
A015YI SYLLABLE WUYI SYLLABLE ITERATION MARK
FE18PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCETPRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
122D4𒋔CUNEIFORM SIGN SHIR TENUCUNEIFORM SIGN NU11 TENU
122D5𒋕CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BURCUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR
16E56𖹖MEDEFAIDRIN CAPITAL LETTER HPMEDEFAIDRIN CAPITAL LETTER H
16E57𖹗MEDEFAIDRIN CAPITAL LETTER NYMEDEFAIDRIN CAPITAL LETTER NG
16E76𖹶MEDEFAIDRIN SMALL LETTER HPMEDEFAIDRIN SMALL LETTER H
16E77𖹷MEDEFAIDRIN SMALL LETTER NYMEDEFAIDRIN SMALL LETTER NG
1B001𛀁HIRAGANA LETTER ARCHAIC YEHENTAIGANA LETTER E-1
1D0C5𝃅BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASISBYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS

Apart from these normative names, informal names may be shown in the Unicode code charts. These are other commonly used names for a character, and need not be restricted to letters A–Z, digits 0–9, - (hyphen-minus) and <space>. These informal names are not guaranteed to be unique, and may be changed or removed in later versions of the standard.

General Category

Each code point is assigned a value for General Category. This is one of the character properties that are also defined for unassigned code points, and code points that are defined "not a character".

General Category (Unicode Character Property)[lower-alpha 1]
ValueCategory Major, minorBasic type[lower-alpha 2]Character assigned[lower-alpha 2]Count
(as of 13.0)
Remarks
 
L, Letter
LuLetter, uppercaseGraphicCharacter1,791
LlLetter, lowercaseGraphicCharacter2,155
LtLetter, titlecaseGraphicCharacter31Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)
LmLetter, modifierGraphicCharacter260A modifier letter
LoLetter, otherGraphicCharacter127,004An ideograph or a letter in a unicase alphabet
M, Mark
MnMark, nonspacingGraphicCharacter1,839
McMark, spacing combiningGraphicCharacter443
MeMark, enclosingGraphicCharacter13
N, Number
NdNumber, decimal digitGraphicCharacter650All these, and only these, have Numeric Type = De[lower-alpha 3]
NlNumber, letterGraphicCharacter236Numerals composed of letters or letterlike symbols (e.g., Roman numerals)
NoNumber, otherGraphicCharacter895E.g., vulgar fractions, superscript and subscript digits
P, Punctuation
PcPunctuation, connectorGraphicCharacter10Includes "_" underscore
PdPunctuation, dashGraphicCharacter25Includes several hyphen characters
PsPunctuation, openGraphicCharacter75Opening bracket characters
PePunctuation, closeGraphicCharacter73Closing bracket characters
PiPunctuation, initial quoteGraphicCharacter12Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
PfPunctuation, final quoteGraphicCharacter10Closing quotation mark. May behave like Ps or Pe depending on usage
PoPunctuation, otherGraphicCharacter593
S, Symbol
SmSymbol, mathGraphicCharacter948Mathematical symbols (e.g., +, , =, ×, ÷, , , ). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation".
ScSymbol, currencyGraphicCharacter62Currency symbols
SkSymbol, modifierGraphicCharacter123
SoSymbol, otherGraphicCharacter6,431
Z, Separator
ZsSeparator, spaceGraphicCharacter17Includes the space, but not TAB, CR, or LF, which are Cc
ZlSeparator, lineFormatCharacter1Only U+2028 LINE SEPARATOR (LSEP)
ZpSeparator, paragraphFormatCharacter1Only U+2029 PARAGRAPH SEPARATOR (PSEP)
C, Other
CcOther, controlControlCharacter65 (will never change)[lower-alpha 3]No name,[lower-alpha 4] <control>
CfOther, formatFormatCharacter161Includes the soft hyphen, joining control characters (zwnj and zwj), control characters to support bi-directional text, and language tag characters
CsOther, surrogateSurrogateNot (but abstract)2,048 (will never change)[lower-alpha 3]No name,[lower-alpha 4] <surrogate>
CoOther, private usePrivate-useNot (but abstract)137,468 total (will never change)[lower-alpha 3] (6,400 in BMP, 131,068 in Planes 1516)No name,[lower-alpha 4] <private-use>
CnOther, not assignedNoncharacterNot66 (will never change)[lower-alpha 3]No name,[lower-alpha 4] <noncharacter>
ReservedNot830,606No name,[lower-alpha 4] <reserved>
  1. "Table 4-4: General Category" (PDF). The Unicode Standard. Unicode Consortium. March 2020.
  2. "Table 2-3: Types of code points" (PDF). The Unicode Standard. Unicode Consortium. March 2020.
  3. Unicode Character Encoding Stability Policies: Property Value Stability Stability policy: Some gc groups will never change. gc=Nd corresponds with Numeric Type=De (decimal).
  4. "Table 4-9: Construction of Code Point Labels" (PDF). The Unicode Standard. Unicode Consortium. March 2020. A Code Point Label may be used to identify a nameless code point. E.g. <control-hhhh>, <control-0088>. The Name remains blank, which can prevent inadvertently replacing, in documentation, a Control Name with a true Control code. Unicode also uses <not a character> for <noncharacter>.

Punctuation

Characters have separate properties to denote they are a punctuation character. The properties all have a Yes/No values: Dash, Quotation_Mark, Sentence_Terminal, Terminal_Punctuation.

Whitespace

Whitespace is a commonly used concept for a typographic effect. Basically it covers invisible characters that have a spacing effect in rendered text. It includes spaces, tabs, and new line formatting controls. In Unicode, such a character has the property set "WSpace=yes". In version 13.0, there are 25 whitespace characters.

Name Code point Width box May break? In
IDN?
Script Block General
category
Notes
character tabulationU+00099 YesNo CommonBasic LatinOther,
control
HT, Horizontal Tab. HTML/XML named entity: &Tab;, LaTeX: '\tab'
line feedU+000A10 Is a line-break CommonBasic LatinOther,
control
LF, Line feed. HTML/XML named entity: &NewLine;
line tabulationU+000B11 Is a line-break CommonBasic LatinOther,
control
VT, Vertical Tab
form feedU+000C12 Is a line-break CommonBasic LatinOther,
control
FF, Form feed
carriage returnU+000D13 Is a line-break CommonBasic LatinOther,
control
CR, Carriage return
spaceU+002032 YesNo CommonBasic LatinSeparator,
space
Most common (normal ASCII space)
next lineU+0085133 Is a line-break CommonLatin-1
Supplement
Other,
control
NEL, Next line
no-break spaceU+00A0160  NoNo CommonLatin-1
Supplement
Separator,
space
Non-breaking space: identical to U+0020, but not a point at which a line may be broken. HTML/XML named entity: &nbsp;, LaTeX: '\ '
ogham space markU+16805760 YesNo OghamOghamSeparator,
space
Used for interword separation in Ogham text. Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font.
en quadU+20008192  YesNo CommonGeneral
Punctuation
Separator,
space
Width of one en. U+2002 is canonically equivalent to this character; U+2002 is preferred.
em quadU+20018193 YesNo CommonGeneral
Punctuation
Separator,
space
Also known as "mutton quad". Width of one em. U+2003 is canonically equivalent to this character; U+2003 is preferred.
en spaceU+20028194 YesNo CommonGeneral
Punctuation
Separator,
space
Also known as "nut". Width of one en. U+2000 En Quad is canonically equivalent to this character; U+2002 is preferred. HTML/XML named entity: &ensp;, LaTeX: '\enspace'
em spaceU+20038195 YesNo CommonGeneral
Punctuation
Separator,
space
Also known as "mutton". Width of one em. U+2001 Em Quad is canonically equivalent to this character; U+2003 is preferred. HTML/XML named entity: &emsp;, LaTeX: '\quad'
three-per-em spaceU+20048196 YesNo CommonGeneral
Punctuation
Separator,
space
Also known as "thick space". One third of an em wide. HTML/XML named entity: &emsp13;
four-per-em spaceU+20058197 YesNo CommonGeneral
Punctuation
Separator,
space
Also known as "mid space". One fourth of an em wide. HTML/XML named entity: &emsp14;
six-per-em spaceU+20068198 YesNo CommonGeneral
Punctuation
Separator,
space
One sixth of an em wide. In computer typography, sometimes equated to U+2009.
figure spaceU+20078199 NoNo CommonGeneral
Punctuation
Separator,
space
Figure space. In fonts with monospaced digits, equal to the width of one digit. HTML/XML named entity: &numsp;
punctuation spaceU+20088200 YesNo CommonGeneral
Punctuation
Separator,
space
As wide as the narrow punctuation in a font, i.e. the advance width of the period or comma.[4] HTML/XML named entity: &puncsp;
thin spaceU+20098201 YesNo CommonGeneral
Punctuation
Separator,
space
Thin space; one-fifth (sometimes one-sixth) of an em wide. Recommended for use as a thousands separator for measures made with SI units. Unlike U+2002 to U+2008, its width may get adjusted in typesetting.[5] HTML/XML named entity: &thinsp;; LaTeX: '\,'
hair spaceU+200A8202 YesNo CommonGeneral
Punctuation
Separator,
space
Thinner than a thin space. HTML/XML named entity: &hairsp; (does not work in all browsers)
line separatorU+20288232 Is a line-break CommonGeneral
Punctuation
Separator,
line
paragraph separatorU+20298233 Is a line-break CommonGeneral
Punctuation
Separator,
paragraph
narrow no-break spaceU+202F8239 NoNo CommonGeneral
Punctuation
Separator,
space
Narrow no-break space. Similar in function to U+00A0 No-Break Space. When used with Mongolian, its width is usually one third of the normal space; in other context, its width sometimes resembles that of the Thin Space (U+2009).
medium mathematical spaceU+205F8287 YesNo CommonGeneral
Punctuation
Separator,
space
MMSP. Used in mathematical formulae. Four-eighteenths of an em.[6] In mathematical typography, the widths of spaces are usually given in integral multiples of an eighteenth of an em, and 4/18 em may be used in several situations, for example between the a and the + and between the + and the b in the expression a + b.[7] HTML/XML named entity: &MediumSpace;
ideographic spaceU+300012288  YesNo CommonCJK Symbols
and
Punctuation
Separator,
space
As wide as a CJK character cell (fullwidth). Used, for example, in tai tou.
 Name  Code point Width box May break? In
IDN?
Script Block General
category
Notes
mongolian vowel separatorU+180E6158 YesNo MongolianMongolianOther,
Format
MVS. A narrow space character, used in Mongolian to cause the final two characters of a word to take on different shapes.[8] It is no longer classified as space character (i.e. in Zs category) in Unicode 6.3.0, even though it was in previous versions of the standard.
zero width spaceU+200B8203 YesNo ?General
Punctuation
Other,
Format
ZWSP, zero-width space. Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing. It is similar to the soft hyphen, with the difference that the latter is used to indicate syllable boundaries, and should display a visible hyphen when the line breaks at it. HTML/XML named entity: &ZeroWidthSpace;[9][lower-alpha 3]
zero width non-joinerU+200C8204 YesContext-dependent[14] ?General
Punctuation
Other,
Format
ZWNJ, zero-width non-joiner. When placed between two characters that would otherwise be connected, a ZWNJ causes them to be printed in their final and initial forms, respectively. HTML/XML named entity: &zwnj;
zero width joinerU+200D8205 YesContext-dependent[15] ?General
Punctuation
Other,
Format
ZWJ, zero-width joiner. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms. Can also be used to display joining forms in isolation. Depending on whether a ligature or conjunct is expected by default, can either induce (as in emoji and in Sinhala) or suppress (as in Devanagari) substitution with a single glyph, whilst still permitting use of individual joining forms (unlike ZWNJ). HTML/XML named entity: &zwj;
word joinerU+20608288 NoNo ?General
Punctuation
Other,
Format
WJ, word joiner. Similar to U+200B, but not a point at which a line may be broken. HTML/XML named entity: &NoBreak;
zero width non-breaking spaceU+FEFF65279 NoNo ?Arabic
Presentation
Forms-B
Other,
Format
Zero-width non-breaking space. Used primarily as a Byte Order Mark. Use as an indication of non-breaking is deprecated as of Unicode 3.2; see U+2060 instead.
  1. White_Space is a binary Unicode property.[lower-alpha 4]
  2. "Unicode 13.0 UCD: PropList.txt". 2019-11-27. Retrieved 2020-03-12.
  3. Although &ZeroWidthSpace; is one HTML5 named entity for U+200B, the additional names NegativeMediumSpace, NegativeThickSpace, NegativeThinSpace and NegativeVeryThinSpace (which are names used in the Wolfram Language for negative-advance spaces, which it maps to the Private Use Area)[10][11][12][13] are also defined by HTML5 as aliases for U+200B (e.g. &NegativeMediumSpace;).[9]
  4. "Unicode Standard Annex #44, Unicode Character Database".

Other general characteristics

Ideographic, alphabetic, noncharacter.

Shaping, width.

Bidirectional writing

Six character properties pertain to bi-directional writing: Bidi_Class, Bidi_Control, Bidi_Mirrored, Bidi_Mirroring_Glyph, Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type.

One of Unicode's major features is support of bi-directional (Bidi) text display right-to-left (R-to-L) and left-to-right (L-to-R). The Unicode Bidirectional Algorithm UAX9[16] describes the process of presenting text with altering script directions. For example, it enables a Hebrew quote in an English text. The Bidi_Character_Type marks a character's behaviour in directional writing. To override a direction, Unicode has defined special formatting control characters (Bidi-Controls). These characters can enforce a direction, and by definition only affect bi-directional writing.

Each code point has a property called Bidi_Class. It defines its behaviour in a bidirectional text as interpreted by the algorithm:

Bidirectional character type (Unicode character property Bidi_Class)[1]
Type[2]DescriptionStrengthDirectionalityGeneral scopeBidi_Control character[3]
LLeft-to-RightStrongL-to-RMost alphabetic and syllabic characters, Chinese characters, non-European or non-Arabic digits, LRM character, ...U+200E LEFT-TO-RIGHT MARK (LRM)
RRight-to-LeftStrongR-to-LAdlam, Hebrew, Mandaic, Mende Kikakui, N'Ko, Samaritan, ancient scripts like Kharoshthi and Nabataean, RLM character, ...U+200F RIGHT-TO-LEFT MARK (RLM)
ALArabic LetterStrongR-to-LArabic, Hanifi Rohingya, Sogdian, Syriac, and Thaana alphabets, and most punctuation specific to those scripts, ALM character, ...U+061C ARABIC LETTER MARK (ALM)
ENEuropean NumberWeakEuropean digits, Eastern Arabic-Indic digits, Coptic epact numbers, ...
ESEuropean SeparatorWeakplus sign, minus sign, ...
ETEuropean Number TerminatorWeakdegree sign, currency symbols, ...
ANArabic NumberWeakArabic-Indic digits, Arabic decimal and thousands separators, Rumi digits, Hanifi Rohingya digits, ...
CSCommon Number SeparatorWeakcolon, comma, full stop, no-break space, ...
NSMNonspacing MarkWeakCharacters in General Categories Mark, nonspacing, and Mark, enclosing (Mn, Me)
BNBoundary NeutralWeakDefault ignorables, non-characters, control characters other than those explicitly given other types
BParagraph SeparatorNeutralparagraph separator, appropriate Newline Functions, higher-level protocol paragraph determination
SSegment SeparatorNeutralTabs
WSWhitespaceNeutralspace, figure space, line separator, form feed, General Punctuation block spaces (smaller set than the Unicode whitespace list)
ONOther NeutralsNeutralAll other characters, including object replacement character
LRELeft-to-Right EmbeddingExplicitL-to-RLRE character onlyU+202A LEFT-TO-RIGHT EMBEDDING (LRE)
LROLeft-to-Right OverrideExplicitL-to-RLRO character onlyU+202D LEFT-TO-RIGHT OVERRIDE (LRO)
RLERight-to-Left EmbeddingExplicitR-to-LRLE character onlyU+202B RIGHT-TO-LEFT EMBEDDING (RLE)
RLORight-to-Left OverrideExplicitR-to-LRLO character onlyU+202E RIGHT-TO-LEFT OVERRIDE (RLO)
PDFPop Directional FormatExplicitPDF character onlyU+202C POP DIRECTIONAL FORMATTING (PDF)
LRILeft-to-Right IsolateExplicitL-to-RLRI character onlyU+2066 LEFT-TO-RIGHT ISOLATE (LRI)
RLIRight-to-Left IsolateExplicitR-to-LRLI character onlyU+2067 RIGHT-TO-LEFT ISOLATE (RLI)
FSIFirst Strong IsolateExplicitFSI character onlyU+2068 FIRST STRONG ISOLATE (FSI)
PDIPop Directional IsolateExplicitPDI character onlyU+2069 POP DIRECTIONAL ISOLATE (PDI)
Notes
1.^ Unicode Bidirectional Algorithm (UAX#9), As of Unicode version 12.0
2.^ Possible Bidirectional character types for character property: Bidi_Class or 'type'
3.^ Bidi_Control characters: Twelve Bidi_Control formatting characters are defined. They are invisible, and have no effect apart from directionality. Nine of them have a unique, overruling BiDi-type that is used by the algorithm. Their type is also their acronym (e.g. character 'LRE' has BiDi type 'LRE').

In normal situations, the algorithm can determine the direction of a text by this character property. To control more complex Bidi situations, e.g. when an English text has a Hebrew quote, extra options are added to Unicode. Twelve characters have the property Bidi_Control=Yes: ALM, FSI, LRE, LRI, LRM, LRO, PDF, PDI, RLE, RLI, RLM and RLO as named in the table. These are invisible formatting control characters, only used by the algorithm and with no effect outside of bidirectional formatting.[16] Despite the name, they are formatting characters, not control characters, and have General category "Other, format (Cf)" in the Unicode definition.

Basically, the algorithm determines a sequence of characters with the same strong direction type (R-to-L or L-to-R), taking in account an overruling by the special Bidi-controls. Number strings (Weak types) are assigned a direction according to their strong environment, as are Neutral characters. Finally, the characters are displayed per a string's direction.

Two character properties are relevant to determining a mirror image of a glyph in bidirectional text: Bidi_Mirrored=Yes indicates that the glyph should be mirrored when written R-to-L. The property Bidi_Mirroring_Glyph=U+hhhh can then point to the mirrored character. For example, brackets "()" are mirrored this way. Shaping cursive scripts such as Arabic, and mirroring glyphs that have a direction, is not part of the algorithm.

Casing

The Case value is Normative in Unicode. It pertains to those scripts with uppercase (aka capital, majuscule) and the lowercase (aka small, minuscule) letters. Case-difference occurs in Adlam, Armenian, Cherokee, Coptic, Cyrillic, Deseret, Glagolitic, Greek, Khutsuri and Mkhedruli Georgian, Latin, Medefaidrin, Old Hungarian, Osage and Warang Citi scripts.

(upper, lower, title, folding—both simple and full)

Numeric values and types

Decimal

Characters are classified with a Numeric type.[1] Characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits are type Numeric. They have a numeric value that can be decimal, including zero and negatives, or a vulgar fraction. If there is not such a value, as with most of the characters, the numeric type is "None".

The characters that do have a numeric value are separated in three groups: Decimal (De), Digit (Di) and Numeric (Nu, i.e. all other). "Decimal" means the character is a straight decimal digit. Only characters that are part of a contiguous encoded range 0..9 have numeric type Decimal. Other digits, like superscripts, have numeric type Digit. All numeric characters like fractions and Roman numerals end up with the type "Numeric". The intended effect is that a simple parser can use these decimal numeric values, without being distracted by say a numeric superscript or a fraction. Seventy-three CJK Ideographs that represent a number, including those used for accounting, are typed Numeric.

On the other hand, characters that could have a numeric value as a second meaning are still marked Numeric type "None", and have no numeric value (""). E.g. Latin letters can be used in paragraph numbering like "II.A.1.b", but the letters "I", "A" and "b" are not numeric (type "None") and have no numeric value.

Numeric Type[a][b] (Unicode character property)
Numeric typeCodeHas numeric valueExampleRemarks
Not numericNoneNo
  • A
  • X (Latin)
  • !
  • Д
  • μ
Numeric Value="NaN"
DecimalDeYes
  • 0
  • 1
  • 9
  •  (Devanagari 6)
  •  (Kannada 6)
  • 𝟨 (Mathematical, styled sans serif)
Straight digit (decimal-radix). Corresponds both ways with General Category=Nd[a]
DigitDiYes
  • ¹ (superscript)
  •  (digit with full stop)
Decimal, but in typographic context
NumericNuYes
  • ¾
  •  (Tamil number ten)
  •  (Roman numeral)
  •  (Han number 6)
Numeric value, but not decimal-radix
a. ^ "Section 4.6: Numeric Value" (PDF). The Unicode Standard. Unicode Consortium. March 2020.
b. ^ "Unicode 13.0 Derived Numeric Types". Unicode Character Database. Unicode Consortium. 2019-09-08.

Hexadecimal digits

Hexadecimal characters are those in the series with hexadecimal values 0...9ABCDEF (sixteen characters, decimal value 0–15). The character property Hex_Digit is set to Yes when a character is in such a series:

Characters in Unicode marked Hex_Digit=Yes[a]
0123456789ABCDEFBasic Latin, capitalsAlso ASCII_Hex_Digit=Yes
0123456789abcdefBasic Latin, small lettersAlso ASCII_Hex_Digit=Yes
Fullwidth forms, capitals
Fullwidth forms, small letters
a. ^ "Unicode 13.0 UCD: PropList.txt". 2019-11-27. Retrieved 2020-03-12.

Forty-four characters are marked as Hex_Digit. The ones in the Basic Latin block are also marked as ASCII_Hex_Digit.

Unicode has no separate characters for hexadecimal values. A consequence is, that when using regular characters it is not possible to determine whether hexadecimal value is intended, or even whether a value is intended at all. That should be determined at a higher level, e.g. by prepending "0x" to a hexadecimal number or by context. The only feature is that Unicode can note that a sequence can or can not be a hexadecimal value.

Block

A block is a uniquely named, contiguous range of code points. It is identified by its first and last code point. Blocks do not overlap. A block may contain code points that are reserved, not-assigned etc. Each character that is assigned, has a single "block name" value from the 308 names assigned as of Unicode version 13.0 Unassigned code points outside of an existing block, have the default value "No_block".

Plane Block range Block name Code points[lower-alpha 1] Assigned characters Scripts[lower-alpha 2][lower-alpha 3][lower-alpha 4][lower-alpha 5][lower-alpha 6]
 
0 BMPU+0000..U+007FBasic Latin[lower-alpha 7]128128Latin (52 characters), Common (76 characters)
U+0080..U+00FFLatin-1 Supplement[lower-alpha 8]128128Latin (64 characters), Common (64 characters)
U+0100..U+017FLatin Extended-A128128Latin
U+0180..U+024FLatin Extended-B208208Latin
U+0250..U+02AFIPA Extensions9696Latin
U+02B0..U+02FFSpacing Modifier Letters8080Bopomofo (2 characters), Latin (14 characters), Common (64 characters)
U+0300..U+036FCombining Diacritical Marks112112Inherited
U+0370..U+03FFGreek and Coptic144135Coptic (14 characters), Greek (117 characters), Common (4 characters)
U+0400..U+04FFCyrillic256256Cyrillic (254 characters), Inherited (2 characters)
U+0500..U+052FCyrillic Supplement4848Cyrillic
0 BMPU+0530..U+058FArmenian9691Armenian
U+0590..U+05FFHebrew11288Hebrew
U+0600..U+06FFArabic256255Arabic (237 characters), Common (6 characters), Inherited (12 characters)
U+0700..U+074FSyriac8077Syriac
U+0750..U+077FArabic Supplement4848Arabic
U+0780..U+07BFThaana6450Thaana
U+07C0..U+07FFNKo6462Nko
U+0800..U+083FSamaritan6461Samaritan
U+0840..U+085FMandaic3229Mandaic
U+0860..U+086FSyriac Supplement1611Syriac
0 BMPU+08A0..U+08FFArabic Extended-A9684Arabic (83 characters), Common (1 character)
U+0900..U+097FDevanagari128128Devanagari (122 characters), Common (2 characters), Inherited (4 characters)
U+0980..U+09FFBengali12896Bengali
U+0A00..U+0A7FGurmukhi12880Gurmukhi
U+0A80..U+0AFFGujarati12891Gujarati
U+0B00..U+0B7FOriya12891Oriya
U+0B80..U+0BFFTamil12872Tamil
U+0C00..U+0C7FTelugu12898Telugu
U+0C80..U+0CFFKannada12889Kannada
U+0D00..U+0D7FMalayalam128118Malayalam
0 BMPU+0D80..U+0DFFSinhala12891Sinhala
U+0E00..U+0E7FThai12887Thai (86 characters), Common (1 character)
U+0E80..U+0EFFLao12882Lao
U+0F00..U+0FFFTibetan256211Tibetan (207 characters), Common (4 characters)
U+1000..U+109FMyanmar160160Myanmar
U+10A0..U+10FFGeorgian9688Georgian (87 characters), Common (1 character)
U+1100..U+11FFHangul Jamo256256Hangul
U+1200..U+137FEthiopic384358Ethiopic
U+1380..U+139FEthiopic Supplement3226Ethiopic
U+13A0..U+13FFCherokee9692Cherokee
0 BMPU+1400..U+167FUnified Canadian Aboriginal Syllabics640640Canadian Aboriginal
U+1680..U+169FOgham3229Ogham
U+16A0..U+16FFRunic9689Runic (86 characters), Common (3 characters)
U+1700..U+171FTagalog3220Tagalog
U+1720..U+173FHanunoo3223Hanunoo (21 characters), Common (2 characters)
U+1740..U+175FBuhid3220Buhid
U+1760..U+177FTagbanwa3218Tagbanwa
U+1780..U+17FFKhmer128114Khmer
U+1800..U+18AFMongolian176157Mongolian (154 characters), Common (3 characters)
U+18B0..U+18FFUnified Canadian Aboriginal Syllabics Extended8070Canadian Aboriginal
0 BMPU+1900..U+194FLimbu8068Limbu
U+1950..U+197FTai Le4835Tai Le
U+1980..U+19DFNew Tai Lue9683New Tai Lue
U+19E0..U+19FFKhmer Symbols3232Khmer
U+1A00..U+1A1FBuginese3230Buginese
U+1A20..U+1AAFTai Tham144127Tai Tham
U+1AB0..U+1AFFCombining Diacritical Marks Extended8017Inherited
U+1B00..U+1B7FBalinese128121Balinese
U+1B80..U+1BBFSundanese6464Sundanese
U+1BC0..U+1BFFBatak6456Batak
0 BMPU+1C00..U+1C4FLepcha8074Lepcha
U+1C50..U+1C7FOl Chiki4848Ol Chiki
U+1C80..U+1C8FCyrillic Extended-C169Cyrillic
U+1C90..U+1CBFGeorgian Extended4846Georgian
U+1CC0..U+1CCFSundanese Supplement168Sundanese
U+1CD0..U+1CFFVedic Extensions4843Common (16 characters), Inherited (27 characters)
U+1D00..U+1D7FPhonetic Extensions128128Cyrillic (2 characters), Greek (15 characters), Latin (111 characters)
U+1D80..U+1DBFPhonetic Extensions Supplement6464Greek (1 character), Latin (63 characters)
U+1DC0..U+1DFFCombining Diacritical Marks Supplement6463Inherited
U+1E00..U+1EFFLatin Extended Additional256256Latin
0 BMPU+1F00..U+1FFFGreek Extended256233Greek
U+2000..U+206FGeneral Punctuation112111Common (109 characters), Inherited (2 characters)
U+2070..U+209FSuperscripts and Subscripts4842Latin (15 characters), Common (27 characters)
U+20A0..U+20CFCurrency Symbols4832Common
U+20D0..U+20FFCombining Diacritical Marks for Symbols4833Inherited
U+2100..U+214FLetterlike Symbols8080Greek (1 character), Latin (4 characters), Common (75 characters)
U+2150..U+218FNumber Forms6460Latin (41 characters), Common (19 characters)
U+2190..U+21FFArrows112112Common
U+2200..U+22FFMathematical Operators256256Common
U+2300..U+23FFMiscellaneous Technical256256Common
0 BMPU+2400..U+243FControl Pictures6439Common
U+2440..U+245FOptical Character Recognition3211Common
U+2460..U+24FFEnclosed Alphanumerics160160Common
U+2500..U+257FBox Drawing128128Common
U+2580..U+259FBlock Elements3232Common
U+25A0..U+25FFGeometric Shapes9696Common
U+2600..U+26FFMiscellaneous Symbols256256Common
U+2700..U+27BFDingbats192192Common
U+27C0..U+27EFMiscellaneous Mathematical Symbols-A4848Common
U+27F0..U+27FFSupplemental Arrows-A1616Common
0 BMPU+2800..U+28FFBraille Patterns256256Braille
U+2900..U+297FSupplemental Arrows-B128128Common
U+2980..U+29FFMiscellaneous Mathematical Symbols-B128128Common
U+2A00..U+2AFFSupplemental Mathematical Operators256256Common
U+2B00..U+2BFFMiscellaneous Symbols and Arrows256253Common
U+2C00..U+2C5FGlagolitic9694Glagolitic
U+2C60..U+2C7FLatin Extended-C3232Latin
U+2C80..U+2CFFCoptic128123Coptic
U+2D00..U+2D2FGeorgian Supplement4840Georgian
U+2D30..U+2D7FTifinagh8059Tifinagh
0 BMPU+2D80..U+2DDFEthiopic Extended9679Ethiopic
U+2DE0..U+2DFFCyrillic Extended-A3232Cyrillic
U+2E00..U+2E7FSupplemental Punctuation12883Common
U+2E80..U+2EFFCJK Radicals Supplement128115Han
U+2F00..U+2FDFKangxi Radicals224214Han
U+2FF0..U+2FFFIdeographic Description Characters1612Common
U+3000..U+303FCJK Symbols and Punctuation6464Han (15 characters), Hangul (2 characters), Common (43 characters), Inherited (4 characters)
U+3040..U+309FHiragana9693Hiragana (89 characters), Common (2 characters), Inherited (2 characters)
U+30A0..U+30FFKatakana9696Katakana (93 characters), Common (3 characters)
U+3100..U+312FBopomofo4843Bopomofo
0 BMPU+3130..U+318FHangul Compatibility Jamo9694Hangul
U+3190..U+319FKanbun1616Common
U+31A0..U+31BFBopomofo Extended3232Bopomofo
U+31C0..U+31EFCJK Strokes4836Common
U+31F0..U+31FFKatakana Phonetic Extensions1616Katakana
U+3200..U+32FFEnclosed CJK Letters and Months256255Hangul (62 characters), Katakana (47 characters), Common (146 characters)
U+3300..U+33FFCJK Compatibility256256Katakana (88 characters), Common (168 characters)
U+3400..U+4DBFCJK Unified Ideographs Extension A6,5926,592Han
U+4DC0..U+4DFFYijing Hexagram Symbols6464Common
U+4E00..U+9FFFCJK Unified Ideographs20,99220,989Han
0 BMPU+A000..U+A48FYi Syllables1,1681,165Yi
U+A490..U+A4CFYi Radicals6455Yi
U+A4D0..U+A4FFLisu4848Lisu
U+A500..U+A63FVai320300Vai
U+A640..U+A69FCyrillic Extended-B9696Cyrillic
U+A6A0..U+A6FFBamum9688Bamum
U+A700..U+A71FModifier Tone Letters3232Common
U+A720..U+A7FFLatin Extended-D224180Latin (175 characters), Common (5 characters)
U+A800..U+A82FSyloti Nagri4845Syloti Nagri
U+A830..U+A83FCommon Indic Number Forms1610Common
0 BMPU+A840..U+A87FPhags-pa6456Phags Pa
U+A880..U+A8DFSaurashtra9682Saurashtra
U+A8E0..U+A8FFDevanagari Extended3232Devanagari
U+A900..U+A92FKayah Li4848Kayah Li (47 characters), Common (1 character)
U+A930..U+A95FRejang4837Rejang
U+A960..U+A97FHangul Jamo Extended-A3229Hangul
U+A980..U+A9DFJavanese9691Javanese (90 characters), Common (1 character)
U+A9E0..U+A9FFMyanmar Extended-B3231Myanmar
U+AA00..U+AA5FCham9683Cham
U+AA60..U+AA7FMyanmar Extended-A3232Myanmar
0 BMPU+AA80..U+AADFTai Viet9672Tai Viet
U+AAE0..U+AAFFMeetei Mayek Extensions3223Meetei Mayek
U+AB00..U+AB2FEthiopic Extended-A4832Ethiopic
U+AB30..U+AB6FLatin Extended-E6460Latin (56 characters), Greek (1 character), Common (3 characters)
U+AB70..U+ABBFCherokee Supplement8080Cherokee
U+ABC0..U+ABFFMeetei Mayek6456Meetei Mayek
U+AC00..U+D7AFHangul Syllables11,18411,172Hangul
U+D7B0..U+D7FFHangul Jamo Extended-B8072Hangul
U+D800..U+DB7FHigh Surrogates8960Unknown
U+DB80..U+DBFFHigh Private Use Surrogates1280Unknown
0 BMPU+DC00..U+DFFFLow Surrogates1,0240Unknown
U+E000..U+F8FFPrivate Use Area6,4006,400Unknown
U+F900..U+FAFFCJK Compatibility Ideographs512472Han
U+FB00..U+FB4FAlphabetic Presentation Forms8058Armenian (5 characters), Hebrew (46 characters), Latin (7 characters)
U+FB50..U+FDFFArabic Presentation Forms-A688611Arabic (609 characters), Common (2 characters)
U+FE00..U+FE0FVariation Selectors1616Inherited
U+FE10..U+FE1FVertical Forms1610Common
U+FE20..U+FE2FCombining Half Marks1616Cyrillic (2 characters), Inherited (14 characters)
U+FE30..U+FE4FCJK Compatibility Forms3232Common
U+FE50..U+FE6FSmall Form Variants3226Common
U+FE70..U+FEFFArabic Presentation Forms-B144141Arabic (140 characters), Common (1 character)
U+FF00..U+FFEFHalfwidth and Fullwidth Forms240225Hangul (52 characters), Katakana (55 characters), Latin (52 characters), Common (66 characters)
U+FFF0..U+FFFFSpecials165Common
1 SMPU+10000..U+1007FLinear B Syllabary12888Linear B
U+10080..U+100FFLinear B Ideograms128123Linear B
U+10100..U+1013FAegean Numbers6457Common
U+10140..U+1018FAncient Greek Numbers8079Greek
U+10190..U+101CFAncient Symbols6414Greek (1 character), Common (13 characters)
U+101D0..U+101FFPhaistos Disc4846Common (45 characters), Inherited (1 character)
U+10280..U+1029FLycian3229Lycian
U+102A0..U+102DFCarian6449Carian
U+102E0..U+102FFCoptic Epact Numbers3228Common (27 characters), Inherited (1 character)
U+10300..U+1032FOld Italic4839Old Italic
1 SMPU+10330..U+1034FGothic3227Gothic
U+10350..U+1037FOld Permic4843Old Permic
U+10380..U+1039FUgaritic3231Ugaritic
U+103A0..U+103DFOld Persian6450Old Persian
U+10400..U+1044FDeseret8080Deseret
U+10450..U+1047FShavian4848Shavian
U+10480..U+104AFOsmanya4840Osmanya
U+104B0..U+104FFOsage8072Osage
U+10500..U+1052FElbasan4840Elbasan
U+10530..U+1056FCaucasian Albanian6453Caucasian Albanian
1 SMPU+10600..U+1077FLinear A384341Linear A
U+10800..U+1083FCypriot Syllabary6455Cypriot
U+10840..U+1085FImperial Aramaic3231Imperial Aramaic
U+10860..U+1087FPalmyrene3232Palmyrene
U+10880..U+108AFNabataean4840Nabataean
U+108E0..U+108FFHatran3226Hatran
U+10900..U+1091FPhoenician3229Phoenician
U+10920..U+1093FLydian3227Lydian
U+10980..U+1099FMeroitic Hieroglyphs3232Meroitic Hieroglyphs
U+109A0..U+109FFMeroitic Cursive9690Meroitic Cursive
1 SMPU+10A00..U+10A5FKharoshthi9668Kharoshthi
U+10A60..U+10A7FOld South Arabian3232Old South Arabian
U+10A80..U+10A9FOld North Arabian3232Old North Arabian
U+10AC0..U+10AFFManichaean6451Manichaean
U+10B00..U+10B3FAvestan6461Avestan
U+10B40..U+10B5FInscriptional Parthian3230Inscriptional Parthian
U+10B60..U+10B7FInscriptional Pahlavi3227Inscriptional Pahlavi
U+10B80..U+10BAFPsalter Pahlavi4829Psalter Pahlavi
U+10C00..U+10C4FOld Turkic8073Old Turkic
U+10C80..U+10CFFOld Hungarian128108Old Hungarian
1 SMPU+10D00..U+10D3FHanifi Rohingya6450Hanifi Rohingya
U+10E60..U+10E7FRumi Numeral Symbols3231Arabic
U+10E80..U+10EBFYezidi6447Yezidi
U+10F00..U+10F2FOld Sogdian4840Old Sogdian
U+10F30..U+10F6FSogdian6442Sogdian
U+10FB0..U+10FDFChorasmian4828Chorasmian
U+10FE0..U+10FFFElymaic3223Elymaic
U+11000..U+1107FBrahmi128109Brahmi
U+11080..U+110CFKaithi8067Kaithi
U+110D0..U+110FFSora Sompeng4835Sora Sompeng
1 SMPU+11100..U+1114FChakma8071Chakma
U+11150..U+1117FMahajani4839Mahajani
U+11180..U+111DFSharada9696Sharada
U+111E0..U+111FFSinhala Archaic Numbers3220Sinhala
U+11200..U+1124FKhojki8062Khojki
U+11280..U+112AFMultani4838Multani
U+112B0..U+112FFKhudawadi8069Khudawadi
U+11300..U+1137FGrantha12886Grantha (85 characters), Inherited (1 character)
U+11400..U+1147FNewa12897Newa
U+11480..U+114DFTirhuta9682Tirhuta
1 SMPU+11580..U+115FFSiddham12892Siddham
U+11600..U+1165FModi9679Modi
U+11660..U+1167FMongolian Supplement3213Mongolian
U+11680..U+116CFTakri8067Takri
U+11700..U+1173FAhom6458Ahom
U+11800..U+1184FDogra8060Dogra
U+118A0..U+118FFWarang Citi9684Warang Citi
U+11900..U+1195FDives Akuru9672Dives Akuru
U+119A0..U+119FFNandinagari9665Nandinagari
U+11A00..U+11A4FZanabazar Square8072Zanabazar Square
1 SMPU+11A50..U+11AAFSoyombo9683Soyombo
U+11AC0..U+11AFFPau Cin Hau6457Pau Cin Hau
U+11C00..U+11C6FBhaiksuki11297Bhaiksuki
U+11C70..U+11CBFMarchen8068Marchen
U+11D00..U+11D5FMasaram Gondi9675Masaram Gondi
U+11D60..U+11DAFGunjala Gondi8063Gunjala Gondi
U+11EE0..U+11EFFMakasar3225Makasar
U+11FB0..U+11FBFLisu Supplement161Lisu
U+11FC0..U+11FFFTamil Supplement6451Tamil
U+12000..U+123FFCuneiform1,024922Cuneiform
1 SMPU+12400..U+1247FCuneiform Numbers and Punctuation128116Cuneiform
U+12480..U+1254FEarly Dynastic Cuneiform208196Cuneiform
U+13000..U+1342FEgyptian Hieroglyphs1,0721,071Egyptian Hieroglyphs
U+13430..U+1343FEgyptian Hieroglyph Format Controls169Egyptian Hieroglyphs
U+14400..U+1467FAnatolian Hieroglyphs640583Anatolian Hieroglyphs
U+16800..U+16A3FBamum Supplement576569Bamum
U+16A40..U+16A6FMro4843Mro
U+16AD0..U+16AFFBassa Vah4836Bassa Vah
U+16B00..U+16B8FPahawh Hmong144127Pahawh Hmong
U+16E40..U+16E9FMedefaidrin9691Medefaidrin
1 SMPU+16F00..U+16F9FMiao160149Miao
U+16FE0..U+16FFFIdeographic Symbols and Punctuation327Han (2 characters), Khitan Small Script (1 character), Nushu (1 character), Tangut (1 character), Common (2 characters)
U+17000..U+187FFTangut6,1446,136Tangut
U+18800..U+18AFFTangut Components768768Tangut
U+18B00..U+18CFFKhitan Small Script512470Khitan small script
U+18D00..U+18D8FTangut Supplement1449Tangut
U+1B000..U+1B0FFKana Supplement256256Hiragana (255 characters), Katakana (1 character)
U+1B100..U+1B12FKana Extended-A4831Hiragana
U+1B130..U+1B16FSmall Kana Extension647Hiragana (3 characters), Katakana (4 characters)
U+1B170..U+1B2FFNushu400396Nüshu
1 SMPU+1BC00..U+1BC9FDuployan160143Duployan
U+1BCA0..U+1BCAFShorthand Format Controls164Common
U+1D000..U+1D0FFByzantine Musical Symbols256246Common
U+1D100..U+1D1FFMusical Symbols256231Common (209 characters), Inherited (22 characters)
U+1D200..U+1D24FAncient Greek Musical Notation8070Greek
U+1D2E0..U+1D2FFMayan Numerals3220Common
U+1D300..U+1D35FTai Xuan Jing Symbols9687Common
U+1D360..U+1D37FCounting Rod Numerals3225Common
U+1D400..U+1D7FFMathematical Alphanumeric Symbols1,024996Common
U+1D800..U+1DAAFSutton SignWriting688672SignWriting
1 SMPU+1E000..U+1E02FGlagolitic Supplement4838Glagolitic
U+1E100..U+1E14FNyiakeng Puachue Hmong8071Nyiakeng Puachue Hmong
U+1E2C0..U+1E2FFWancho6459Wancho
U+1E800..U+1E8DFMende Kikakui224213Mende Kikakui
U+1E900..U+1E95FAdlam9688Adlam
U+1EC70..U+1ECBFIndic Siyaq Numbers8068Common
U+1ED00..U+1ED4FOttoman Siyaq Numbers8061Common
U+1EE00..U+1EEFFArabic Mathematical Alphabetic Symbols256143Arabic
U+1F000..U+1F02FMahjong Tiles4844Common
U+1F030..U+1F09FDomino Tiles112100Common
1 SMPU+1F0A0..U+1F0FFPlaying Cards9682Common
U+1F100..U+1F1FFEnclosed Alphanumeric Supplement256200Common
U+1F200..U+1F2FFEnclosed Ideographic Supplement25664Hiragana (1 character), Common (63 characters)
U+1F300..U+1F5FFMiscellaneous Symbols and Pictographs768768Common
U+1F600..U+1F64FEmoticons8080Common
U+1F650..U+1F67FOrnamental Dingbats4848Common
U+1F680..U+1F6FFTransport and Map Symbols128114Common
1 SMPU+1F700..U+1F77FAlchemical Symbols128116Common
U+1F780..U+1F7FFGeometric Shapes Extended128101Common
U+1F800..U+1F8FFSupplemental Arrows-C256150Common
U+1F900..U+1F9FFSupplemental Symbols and Pictographs256254Common
U+1FA00..U+1FA6FChess Symbols11298Common
U+1FA70..U+1FAFFSymbols and Pictographs Extended-A14457Common
U+1FB00..U+1FBFFSymbols for Legacy Computing256212Common
2 SIPU+20000..U+2A6DFCJK Unified Ideographs Extension B42,72042,718Han
U+2A700..U+2B73FCJK Unified Ideographs Extension C4,1604,149Han
U+2B740..U+2B81FCJK Unified Ideographs Extension D224222Han
U+2B820..U+2CEAFCJK Unified Ideographs Extension E5,7765,762Han
U+2CEB0..U+2EBEFCJK Unified Ideographs Extension F7,4887,473Han
U+2F800..U+2FA1FCJK Compatibility Ideographs Supplement544542Han
3 TIPU+30000..U+3134FCJK Unified Ideographs Extension G4,9444,939Han
14 SSPU+E0000..U+E007FTags12897Common
U+E0100..U+E01EFVariation Selectors Supplement240240Inherited
15 PUA-AU+F0000..U+FFFFFSupplementary Private Use Area-A65,53665,534Unknown
16 PUA-BU+100000..U+10FFFFSupplementary Private Use Area-B65,53665,534Unknown
  1. Code point count includes unassigned code points: non-character, reserved
  2. The script has one or multiple characters in the block, as defined by the Script Property. This is independent of the block name
  3. "Common" and "Unknown" (Zyyy) and "Inherited" (Zinh or Qaai) refer to Scripts in ISO 15924
  4. Unicode Blocks data file. As of Unicode version 13.0
  5. UAX 24: Unicode Script Property (4 alpha code)
  6. UAX 24: Script data file
  7. Called "C0 Controls and Basic Latin" in ISO/IEC 10646
  8. Called "C1 Controls and Latin-1 Supplement" in ISO/IEC 10646

Script

Each assigned character can have a single value for its "Script" property, signifying to which script it belongs.[17] The value is a four-letter code in the range Aaaa-Zzzz, as available in ISO 15924, which is mapped to a writing system. Apart from when describing the background and usage of a script, Unicode does not use a connection between a script and languages that use that script. So "Hebrew" refers to the Hebrew script, not to the Hebrew language.

The special code Zyyy for "Common" allows a single value for a character that is used in multiple scripts. The code Zinh "Inherited script", used for combining characters and certain other special-purpose code points, indicates that a character "inherits" its script identity from the character with which it is combined. (Unicode formerly used the private code Qaai for this purpose.) The code Zzzz "Unknown" is used for all characters that do not belong to a script (i.e. the default value), such as symbols and formatting characters. Overall, characters of a single script can be scattered over multiple blocks, like Latin characters. And the other way around too: multiple scripts can be present is a single block, e.g. block Letterlike Symbols contains characters from the Latin, Greek and Common scripts.

When the Script is "" (blank), according to Unicode the character does not belong to a script. This pertains to symbols, because the existing ISO script codes "Zmth" (Mathematical notation), "Zsym" (Symbol), and "Zsye" (Symbol, emoji variant) are not used in Unicode. The "Script" property is also blank for code points that are not a typographic character like controls, substitutes, and private use code points.

If there is a specific script alias name in ISO 15924, it is used in the character name: U+0041 A LATIN CAPITAL LETTER A, and U+05D0 א HEBREW LETTER ALEF.

ISO 15924 Script in Unicode[e]
Code ISO formal name Directionality Unicode Alias[f] Version Characters Notes Description
Adlm Adlam R-to-L Adlam 9.0 88 Ch 19.9
Afak Afaka Varies ZZ Not in Unicode, proposal is explored[18]
Aghb Caucasian Albanian L-to-R Caucasian Albanian 7.0 53 Ancient/historic Ch 8.10
Ahom Ahom, Tai Ahom L-to-R Ahom 8.0 58 Ancient/historic Ch 15.15
Arab Arabic R-to-L Arabic 1.0 1,291 Ch 9.2
Aran Arabic (Nastaliq variant) Mixed ZZ Typographic variant of Arabic (§ Arab)
Armi Imperial Aramaic R-to-L Imperial Aramaic 5.2 31 Ancient/historic Ch 10.4
Armn Armenian L-to-R Armenian 1.0 96 Ch 7.6
Avst Avestan R-to-L Avestan 5.2 61 Ancient/historic Ch 10.7
Bali Balinese L-to-R Balinese 5.0 121 Ch 17.3
Bamu Bamum L-to-R Bamum 5.2 657 Ch 19.6
Bass Bassa Vah L-to-R Bassa Vah 7.0 36 Ancient/historic Ch 19.7
Batk Batak L-to-R Batak 6.0 56 Ch 17.6
Beng Bengali (Bangla) L-to-R Bengali 1.0 96 Ch 12.2
Bhks Bhaiksuki L-to-R Bhaiksuki 9.0 97 Ancient/historic Ch 14.3
Blis Blissymbols Varies ZZ Not in Unicode, proposal is explored[18]
Bopo Bopomofo L-to-R Bopomofo 1.0 77 Ch 18.3
Brah Brahmi L-to-R Brahmi 6.0 109 Ancient/historic Ch 14.1
Brai Braille L-to-R Braille 3.0 256 Ch 21.1
Bugi Buginese L-to-R Buginese 4.1 30 Ch 17.2
Buhd Buhid L-to-R Buhid 3.2 20 Ch 17.1
Cakm Chakma L-to-R Chakma 6.1 71 Ch 13.11
Cans Unified Canadian Aboriginal Syllabics L-to-R Canadian Aboriginal 3.0 710 Ch 20.2
Cari Carian L-to-R Carian 5.1 49 Ancient/historic Ch 8.4
Cham Cham L-to-R Cham 5.1 83 Ch 16.10
Cher Cherokee L-to-R Cherokee 3.0 172 Ch 20.1
Chrs Chorasmian Mixed Chorasmian 13.0 28 Ancient/historic Ch 10.8
Cirt Cirth Varies ZZ Not in Unicode
Copt Coptic L-to-R Coptic 1.0 137 Ancient/historic, Disunified from Greek in 4.1 Ch 7.3
Cpmn Cypro-Minoan L-to-R ZZ Not in Unicode
Cprt Cypriot syllabary R-to-L Cypriot 4.0 55 Ancient/historic Ch 8.3
Cyrl Cyrillic L-to-R Cyrillic 1.0 443 Includes typographic variant Old Church Slavonic (§ Cyrs) Ch 7.4
Cyrs Cyrillic (Old Church Slavonic variant) Varies ZZ Typographic variant of Cyrillic (§ Cyrl) Ancient/historic
Deva Devanagari (Nagari) L-to-R Devanagari 1.0 154 Ch 12.1
Diak Dives Akuru L-to-R Dives Akuru 13.0 72 Ancient/historic Ch 15.14
Dogr Dogra L-to-R Dogra 11.0 60 Ancient/historic Ch 15.17
Dsrt Deseret (Mormon) L-to-R Deseret 3.1 80 Ch 20.4
Dupl Duployan shorthand, Duployan stenography L-to-R Duployan 7.0 143 Ch 21.5
Egyd Egyptian demotic Mixed ZZ Not in Unicode
Egyh Egyptian hieratic Mixed ZZ Not in Unicode
Egyp Egyptian hieroglyphs L-to-R Egyptian Hieroglyphs 5.2 1,080 Ancient/historic Ch 11.4
Elba Elbasan L-to-R Elbasan 7.0 40 Ancient/historic Ch 8.9
Elym Elymaic R-to-L Elymaic 12.0 23 Ancient/historic Ch 10.9
Ethi Ethiopic (Geʻez) L-to-R Ethiopic 3.0 495 Ch 19.1
Geok Khutsuri (Asomtavruli and Nuskhuri) L-to-R Georgian Unicode groups "Khutsori", "Asomtavruli" and "Nuskhuri" into 'Georgian' (§ Geok). Also "Mkhedruli" and "Mtavruli" are 'Georgian' (§ Geor) Ch 7.7
Geor Georgian (Mkhedruli and Mtavruli) L-to-R Georgian 1.0 173 In Unicode, also includes Geok (Nuskhuri) Ch 7.7
Glag Glagolitic L-to-R Glagolitic 4.1 132 Ancient/historic Ch 7.5
Gong Gunjala Gondi L-to-R Gunjala Gondi 11.0 63 Ch 13.15
Gonm Masaram Gondi L-to-R Masaram Gondi 10.0 75 Ch 13.14
Goth Gothic L-to-R Gothic 3.1 27 Ancient/historic Ch 8.8
Gran Grantha L-to-R Grantha 7.0 85 Ancient/historic Ch 15.13
Grek Greek L-to-R Greek 1.0 518 Directionality sometimes as boustrophedon Ch 7.2
Gujr Gujarati L-to-R Gujarati 1.0 91 Ch 12.4
Guru Gurmukhi L-to-R Gurmukhi 1.0 80 Ch 12.3
Hanb Han with Bopomofo (alias for Han + Bopomofo) Varies ZZ See § Hani, § Bopo
Hang Hangul (Hangŭl, Hangeul) L-to-R Hangul 1.0 11,739 Hangul syllables relocated in 2.0 Ch 18.6
Hani Han (Hanzi, Kanji, Hanja) L-to-R Han 1.0 94,204 Ch 18.1
Hano Hanunoo (Hanunóo) L-to-R Hanunoo 3.2 21 Ch 17.1
Hans Han (Simplified variant) Varies ZZ Subset of Han (Hanzi, Kanji, Hanja) (§ Hani)
Hant Han (Traditional variant) Varies ZZ Subset of § Hani
Hatr Hatran R-to-L Hatran 8.0 26 Ancient/historic Ch 10.12
Hebr Hebrew R-to-L Hebrew 1.0 134 Ch 9.1
Hira Hiragana L-to-R Hiragana 1.0 379 Ch 18.4
Hluw Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) L-to-R Anatolian Hieroglyphs 8.0 583 Ancient/historic Ch 11.6
Hmng Pahawh Hmong L-to-R Pahawh Hmong 7.0 127 Ch 16.11
Hmnp Nyiakeng Puachue Hmong L-to-R Nyiakeng Puachue Hmong 12.0 71 Ch 16.12
Hrkt Japanese syllabaries (alias for Hiragana + Katakana) Varies Katakana or Hiragana See § Hira, § Kana Ch 18.4
Hung Old Hungarian (Hungarian Runic) R-to-L Old Hungarian 8.0 108 Ancient/historic Ch 8.7
Inds Indus (Harappan) Mixed ZZ Not in Unicode, proposal is explored[18]
Ital Old Italic (Etruscan, Oscan, etc.) L-to-R Old Italic 3.1 39 Ancient/historic Ch 8.5
Jamo Jamo (alias for Jamo subset of Hangul) Varies ZZ Subset of § Hang
Java Javanese L-to-R Javanese 5.2 90 Ch 17.4
Jpan Japanese (alias for Han + Hiragana + Katakana) Varies ZZ See § Hani, § Hira and § Kana
Jurc Jurchen L-to-R ZZ Not in Unicode
Kali Kayah Li L-to-R Kayah Li 5.1 47 Ch 16.9
Kana Katakana L-to-R Katakana 1.0 304 Ch 18.4
Khar Kharoshthi R-to-L Kharoshthi 4.1 68 Ancient/historic Ch 14.2
Khmr Khmer L-to-R Khmer 3.0 146 Ch 16.4
Khoj Khojki L-to-R Khojki 7.0 62 Ancient/historic Ch 15.7
Kitl Khitan large script L-to-R ZZ Not in Unicode
Kits Khitan small script T-to-B Khitan Small Script 13.0 471 Ancient/historic Ch 18.12
Knda Kannada L-to-R Kannada 1.0 89 Ch 12.8
Kore Korean (alias for Hangul + Han) L-to-R ZZ See § Hani, § Hang
Kpel Kpelle L-to-R ZZ Not in Unicode, proposal is explored[18]
Kthi Kaithi L-to-R Kaithi 5.2 67 Ancient/historic Ch 15.2
Lana Tai Tham (Lanna) L-to-R Tai Tham 5.2 127 Ch 16.7
Laoo Lao L-to-R Lao 1.0 82 Ch 16.2
Latf Latin (Fraktur variant) Varies ZZ Typographic variant of Latin (§ Latn)
Latg Latin (Gaelic variant) L-to-R ZZ Typographic variant of Latin (§ Latn)
Latn Latin L-to-R Latin 1.0 1,374 See also: Latin script in Unicode Ch 7.1
Leke Leke L-to-R ZZ Not in Unicode
Lepc Lepcha (Róng) L-to-R Lepcha 5.1 74 Ch 13.12
Limb Limbu L-to-R Limbu 4.0 68 Ch 13.6
Lina Linear A L-to-R Linear A 7.0 341 Ancient/historic Ch 8.1
Linb Linear B L-to-R Linear B 4.0 211 Ancient/historic Ch 8.2
Lisu Lisu (Fraser) L-to-R Lisu 5.2 49 Ch 18.9
Loma Loma L-to-R ZZ Not in Unicode, proposal is explored[18]
Lyci Lycian L-to-R Lycian 5.1 29 Ancient/historic Ch 8.4
Lydi Lydian R-to-L Lydian 5.1 27 Ancient/historic Ch 8.4
Mahj Mahajani L-to-R Mahajani 7.0 39 Ancient/historic Ch 15.6
Maka Makasar L-to-R Makasar 11.0 25 Ancient/historic Ch 17.8
Mand Mandaic, Mandaean R-to-L Mandaic 6.0 29 Ch 9.5
Mani Manichaean R-to-L Manichaean 7.0 51 Ancient/historic Ch 10.5
Marc Marchen L-to-R Marchen 9.0 68 Ancient/historic Ch 14.5
Maya Mayan hieroglyphs Mixed ZZ Not in Unicode
Medf Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ) L-to-R Medefaidrin 11.0 91 Ch 19.10
Mend Mende Kikakui R-to-L Mende Kikakui 7.0 213 Ch 19.8
Merc Meroitic Cursive R-to-L Meroitic Cursive 6.1 90 Ancient/historic Ch 11.5
Mero Meroitic Hieroglyphs R-to-L Meroitic Hieroglyphs 6.1 32 Ancient/historic Ch 11.5
Mlym Malayalam L-to-R Malayalam 1.0 118 Ch 12.9
Modi Modi, Moḍī L-to-R Modi 7.0 79 Ancient/historic Ch 15.11
Mong Mongolian T-to-B Mongolian 3.0 167 Mong includes Clear and Manchu scripts Ch 13.5
Moon Moon (Moon code, Moon script, Moon type) Mixed ZZ Not in Unicode, proposal is explored[18]
Mroo Mro, Mru L-to-R Mro 7.0 43 Ch 13.8
Mtei Meitei Mayek (Meithei, Meetei) L-to-R Meetei Mayek 5.2 79 Ch 13.7
Mult Multani L-to-R Multani 8.0 38 Ancient/historic Ch 15.9
Mymr Myanmar (Burmese) L-to-R Myanmar 3.0 223 Ch 16.3
Nand Nandinagari L-to-R Nandinagari 12.0 65 Ancient/historic Ch 15.12
Narb Old North Arabian (Ancient North Arabian) R-to-L Old North Arabian 7.0 32 Ancient/historic Ch 10.1
Nbat Nabataean R-to-L Nabataean 7.0 40 Ancient/historic Ch 10.10
Newa Newa, Newar, Newari, Nepāla lipi L-to-R Newa 9.0 97 Ch 13.3
Nkdb Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba) L-to-R ZZ Not in Unicode
Nkgb Nakhi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba) L-to-R ZZ Not in Unicode, proposal is explored[18]
Nkoo N’Ko R-to-L NKo 5.0 62 Ch 19.4
Nshu Nüshu L-to-R Nushu 10.0 397 Ch 18.8
Ogam Ogham Mixed Ogham 3.0 29 Ancient/historic Ch 8.12
Olck Ol Chiki (Ol Cemet’, Ol, Santali) L-to-R Ol Chiki 5.1 48 Ch 13.10
Orkh Old Turkic, Orkhon Runic R-to-L Old Turkic 5.2 73 Ancient/historic Ch 14.8
Orya Oriya (Odia) L-to-R Oriya 1.0 91 Ch 12.5
Osge Osage L-to-R Osage 9.0 72 Ch 20.3
Osma Osmanya L-to-R Osmanya 4.0 40 Ch 19.2
Ougr Old Uyghur Mixed ZZ Not in Unicode
Palm Palmyrene R-to-L Palmyrene 7.0 32 Ancient/historic Ch 10.11
Pauc Pau Cin Hau L-to-R Pau Cin Hau 7.0 57 Ch 16.13
Pcun Proto-Cuneiform L-to-R ZZ Not in Unicode
Pelm Proto-Elamite L-to-R ZZ Not in Unicode
Perm Old Permic L-to-R Old Permic 7.0 43 Ancient/historic Ch 8.11
Phag Phags-pa T-to-B Phags-pa 5.0 56 Ancient/historic Ch 14.4
Phli Inscriptional Pahlavi R-to-L Inscriptional Pahlavi 5.2 27 Ancient/historic Ch 10.6
Phlp Psalter Pahlavi R-to-L Psalter Pahlavi 7.0 29 Ancient/historic Ch 10.6
Phlv Book Pahlavi Mixed ZZ Not in Unicode
Phnx Phoenician R-to-L Phoenician 5.0 29 Ancient/historic[g] Ch 10.3
Piqd Klingon (KLI pIqaD) L-to-R ZZ Rejected for inclusion in Unicode[19][20]
Plrd Miao (Pollard) L-to-R Miao 6.1 149 Ch 18.10
Prti Inscriptional Parthian R-to-L Inscriptional Parthian 5.2 30 Ancient/historic Ch 10.6
Psin Proto-Sinaitic Mixed ZZ Not in Unicode
Qaaa-Qabx Reserved for private use (range) ZZ Not in Unicode
Ranj Ranjana L-to-R ZZ Not in Unicode
Rjng Rejang (Redjang, Kaganga) L-to-R Rejang 5.1 37 Ch 17.5
Rohg Hanifi Rohingya R-to-L Hanifi Rohingya 11.0 50 Ch 16.14
Roro Rongorongo Mixed ZZ Not in Unicode, proposal is explored[18]
Runr Runic L-to-R Runic 3.0 86 Ancient/historic Ch 8.6
Samr Samaritan R-to-L Samaritan 5.2 61 Ch 9.4
Sara Sarati Mixed ZZ Not in Unicode
Sarb Old South Arabian R-to-L Old South Arabian 5.2 32 Ancient/historic Ch 10.2
Saur Saurashtra L-to-R Saurashtra 5.1 82 Ch 13.13
Sgnw SignWriting T-to-B SignWriting 8.0 672 Ch 21.6
Shaw Shavian (Shaw) L-to-R Shavian 4.0 48 Ch 8.13
Shrd Sharada, Śāradā L-to-R Sharada 6.1 96 Ch 15.3
Shui Shuishu L-to-R ZZ Not in Unicode
Sidd Siddham, Siddhaṃ, Siddhamātṛkā L-to-R Siddham 7.0 92 Ancient/historic Ch 15.5
Sind Khudawadi, Sindhi L-to-R Khudawadi 7.0 69 Ch 15.8
Sinh Sinhala L-to-R Sinhala 3.0 111 Ch 13.2
Sogd Sogdian R-to-L Sogdian 11.0 42 Ancient/historic Ch 14.10
Sogo Old Sogdian R-to-L Old Sogdian 11.0 40 Ancient/historic Ch 14.9
Sora Sora Sompeng L-to-R Sora Sompeng 6.1 35 Ch 15.16
Soyo Soyombo L-to-R Soyombo 10.0 83 Ancient/historic Ch 14.7
Sund Sundanese L-to-R Sundanese 5.1 72 Ch 17.7
Sylo Syloti Nagri L-to-R Syloti Nagri 4.1 45 Ancient/historic Ch 15.1
Syrc Syriac R-to-L Syriac 3.0 88 Includes variants Estrangelo (§ Syre), Western (§ Syrj), and Eastern (§ Syrn) Ch 9.3
Syre Syriac (Estrangelo variant) Mixed ZZ Typographic variant of Syriac (§ Syrc)
Syrj Syriac (Western variant) Mixed ZZ Typographic variant of Syriac (§ Syrc)
Syrn Syriac (Eastern variant) Mixed ZZ Typographic variant of Syriac (§ Syrc)
Tagb Tagbanwa L-to-R Tagbanwa 3.2 18 Ch 17.1
Takr Takri, Ṭākrī, Ṭāṅkrī L-to-R Takri 6.1 67 Ch 15.4
Tale Tai Le L-to-R Tai Le 4.0 35 Ch 16.5
Talu New Tai Lue L-to-R New Tai Lue 4.1 83 Ch 16.6
Taml Tamil L-to-R Tamil 1.0 123 Ch 12.6
Tang Tangut L-to-R Tangut 9.0 6,914 Ancient/historic Ch 18.11
Tavt Tai Viet L-to-R Tai Viet 5.2 72 Ch 16.8
Telu Telugu L-to-R Telugu 1.0 98 Ch 12.7
Teng Tengwar L-to-R ZZ Not in Unicode
Tfng Tifinagh (Berber) L-to-R Tifinagh 4.1 59 Ch 19.3
Tglg Tagalog (Baybayin, Alibata) L-to-R Tagalog 3.2 20 Ch 17.1
Thaa Thaana R-to-L Thaana 3.0 50 Ch 13.1
Thai Thai L-to-R Thai 1.0 86 Ch 16.1
Tibt Tibetan L-to-R Tibetan 2.0 207 Added in 1.0, removed in 1.1 and reintroduced in 2.0 Ch 13.4
Tirh Tirhuta L-to-R Tirhuta 7.0 82 Ch 15.10
Toto Toto L-to-R ZZ Not in Unicode
Ugar Ugaritic L-to-R Ugaritic 4.0 31 Ancient/historic Ch 11.2
Vaii Vai L-to-R Vai 5.1 300 Ch 19.5
Visp Visible Speech L-to-R ZZ Not in Unicode
Wara Warang Citi (Varang Kshiti) L-to-R Warang Citi 7.0 84 Ch 13.9
Wcho Wancho L-to-R Wancho 12.0 59 Ch 13.16
Wole Woleai Mixed ZZ Not in Unicode, proposal is explored[18]
Xpeo Old Persian L-to-R Old Persian 4.1 50 Ancient/historic Ch 11.3
Xsux Cuneiform, Sumero-Akkadian L-to-R Cuneiform 5.0 1,234 Ancient/historic Ch 11.1
Yezi Yezidi R-to-L Yezidi 13.0 47 Ancient/historic Ch 9.6
Yiii Yi L-to-R Yi 3.0 1,220 Ch 18.7
Zanb Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script) L-to-R Zanabazar Square 10.0 72 Ancient/historic Ch 14.6
Zinh Code for inherited script Inherited Inherited 573
Zmth Mathematical notation L-to-R ZZ Not a 'script' in Unicode
Zsym Symbols ZZ Not a 'script' in Unicode
Zsye Symbols (emoji variant) ZZ Not a 'script' in Unicode
Zxxx Code for unwritten documents ZZ Not a 'script' in Unicode
Zyyy Code for undetermined script Mixed Common 8,087
Zzzz Code for uncoded script Unknown 970,188 In Unicode: All other code points
Notes
  1. ^
    ISO 15924 publications As of 25 January 2021
  2. ^
    ISO 15924 Normative text file As of 25 January 2021
  3. ^
    ISO 15924 Changes (including Aliases for Unicode; as of 25 January 2021)
  4. ^
    Unicode version 13.0
  5. ^
  6. ^
    Unicode uses the "Property Value Alias" (Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924. An alias script name may be used in a character name: Palm, Palmyrene U+10860 𐡠 PALMYRENE LETTER ALEPH.
  7. ^
    In Unicode, the Phoenician script is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Punic.[21]

Normalization properties

Decompositions, decomposition type, canonical combining class, composition exclusions, and more.

Age

Age is the version of the Standard in which the code point was first designated. The version number is shortened to the numbering major.minor, although there more detailed version numbers are used: versions 4.0.0 and 4.0.1 both are named 4.0 as Age. Given the releases, Age can be from the range: 1.1, 2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1, 5.0, 5.1, 5.2, 6.0, 6.1, 6.2, 6.3, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 12.1, and 13.0.[22] The long values for Age begin in a V and use an underscore instead of a dot: V1_1, for example.[2] Codepoints without a specifically assigned age value have the value "NA", with the long form "Unassigned".

Deprecated

Once a character has been defined, it will not be withdrawn or changed in defining properties (code point, name). But it can be declared deprecated: A coded character whose use is strongly discouraged.[23] As of Unicode version 10.0, fifteen characters are deprecated:

  • U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE: use the sequence ʼ0020 006E (ʼ n) instead
  • U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW: use the sequence 0627 065F (اٟ) instead
  • U+0F77 TIBETAN VOWEL SIGN VOCALIC RR: use the sequence 0FB2 0F81 (ྲཱྀ) instead
  • U+0F79 TIBETAN VOWEL SIGN VOCALIC LL: use the sequence 0FB3 0F81 (ླཱྀ) instead
  • U+17A3 KHMER INDEPENDENT VOWEL QAQ: use 17A2 KHMER LETTER QA (អ) instead
  • U+17A4 KHMER INDEPENDENT VOWEL QAA: use the sequence 17A2 17B6 (អា) instead
  • U+206A INHIBIT SYMMETRIC SWAPPING
  • U+206B ACTIVATE SYMMETRIC SWAPPING
  • U+206C INHIBIT ARABIC FORM SHAPING
  • U+206D ACTIVATE ARABIC FORM SHAPING
  • U+206E NATIONAL DIGIT SHAPES
  • U+206F NOMINAL DIGIT SHAPES
  • U+2329 LEFT-POINTING ANGLE BRACKET: use U+3008 LEFT ANGLE BRACKET (〈) instead
  • U+232A RIGHT-POINTING ANGLE BRACKET: use U+3009 RIGHT ANGLE BRACKET (〉) instead
  • U+E0001 LANGUAGE TAG

The format characters U+206A through U+206F and U+E0001 should not be used at all, but for the other deprecated characters there are recommended alternatives, as shown above.

Boundaries

The Unicode Standard specifies the following boundary-related properties:

  • Grapheme cluster
  • Word
  • Line
  • Sentence

References

  1. "The Unicode Standard, Chapter 4: Character Properties" (PDF). Unicode, Inc. March 2020. Retrieved 2020-03-15.
  2. "Unicode Standard Annex #44: Unicode Character Database". The Unicode Standard. 2017-06-14.
  3. "UCD: Name Aliases". Unicode Character Database. Unicode Consortium. 2019-03-08.
  4. "Character design standards – space characters". Character design standards. Microsoft. 1998–1999. Archived from the original on August 23, 2000. Retrieved 2009-05-18.
  5. The Unicode Standard 5.0, printed edition, p.205
  6. "General Punctuation" (PDF). The Unicode Standard 5.1. Unicode Inc. 1991–2008. Retrieved 2009-05-13.
  7. Sargent, Murray III (2006-08-29). "Unicode Nearly Plain Text Encoding of Mathematics (Version 2)". Unicode Technical Note #28. Unicode Inc. pp. 19–20. Retrieved 2009-05-19.
  8. Gillam, Richard (2002). Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard. Addison-Wesley. ISBN 0-201-70052-2.
  9. Hickson, Ian. "12.5 Named character references". HTML Standard. WHATWG.
  10. Wolfram. "\[NegativeThickSpace]". Wolfram Language Documentation.
  11. Wolfram. "\[NegativeMediumSpace]". Wolfram Language Documentation.
  12. Wolfram. "\[NegativeThinSpace]". Wolfram Language Documentation.
  13. Wolfram. "\[NegativeVeryThinSpace]". Wolfram Language Documentation.
  14. Faltstrom, P., ed. (August 2010). "Zero Width Non-Joiner". The Unicode Code Points and Internationalized Domain Names for Applications (IDNA). IETF. sec. A.1. doi:10.17487/RFC5892. RFC 5892. Retrieved September 4, 2019.
  15. Faltstrom, P., ed. (August 2010). "Zero Width Joiner". The Unicode Code Points and Internationalized Domain Names for Applications (IDNA). IETF. sec. A.2. doi:10.17487/RFC5892. RFC 5892. Retrieved September 4, 2019.
  16. "Unicode Standard Annex #9: Unicode Bidirectional Algorithm". The Unicode Standard. 2017-05-14.
  17. "Unicode Standard Annex #24: Unicode Script Property". The Unicode Standard. 2015-06-01.
  18. "Proposed New Scripts". Unicode Consortium. 2018-05-25. Retrieved 2019-09-12.
  19. Michael Everson (1997-09-18). "Proposal to encode Klingon in Plane 1 of ISO/IEC 10646-2".
  20. The Unicode Consortium (2001-08-14). "Approved Minutes of the UTC 87 / L2 184 Joint Meeting".
  21. "Middle East-II, Ancient Scripts" (PDF). 13.0.0. The Unicode Consortiumtitle=Middle-East scripts II. Retrieved 2021-01-28.
  22. "UCD: Derived Age". Unicode Character Database. Unicode Consortium. 2019-09-08.
  23. "The Unicode Standard, Chapter 3.4 Characters and Encoding, D13: Deprecated character" (PDF). The Unicode Standard. March 2020.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.