blob: d97210ac346e8b2a21218f9fc9650329e285f1c6 [file] [log] [blame]
# ScriptMetadata.txt
# Copyright © 1991-2016 Unicode, Inc.
# CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
# For terms of use, see http://www.unicode.org/copyright.html
#
# This file provides general information about scripts that may be useful to implementations processing text.
# The information is the best currently available, and may change between versions of CLDR.
#
# Format:
# The data is not in XML; instead it uses the semicolon-delimited format from the Unicode Character Database (UCD).
# This is so that parsers of the UCD can more easily be adapted to read the data.
# Additional fields may be added in future versions; parsers may be designed to ignore those fields until they are revised.
#
# Field - Description
#
# 0 - Script Identifier
# 1 - Web Rank:
# The approximate rank of this script from a large sample of the web,
# in terms of the number of characters found in that script.
# Below 32 the ranking is not statistically significant.
# 2 - Sample Character:
# A sample character for use in "Last Resort" style fonts.
# For printing the combining mark for Zinh in a chart, U+25CC can be prepended.
# See http://unicode.org/policies/lastresortfont_eula.html
# 3 - Origin country:
# The approximate area where the script originated, expressed as a BCP47 region code.
# 4 - Density:
# The approximate information density of characters in this script, based on comparison of bilingual texts.
# 5 - ID Usage:
# The usage for IDs (tables 4-7) according to UAX #31.
# For a description of values, see
# http://unicode.org/reports/tr31/#Table_Candidate_Characters_for_Exclusion_from_Identifiers
# 6 - RTL:
# YES if the script is RTL
# Derived from whether the script contains RTL letters according to the Bidi_Class property
# 7 - LB letters:
# YES if the major languages using the script allow linebreaks between letters (excluding hyphenation).
# Derived from LB property.
# 8 - Shaping Required:
# YES if shaping is required for the major languages using that script for NFC text.
# This includes not only ligation (and Indic conjuncts), Indic vowel splitting/reordering, and
# Arabic-style contextual shaping, but also cases where NSM placement is required, like Thai.
# MIN if NSM placement is sufficient, not the more complex shaping.
# The NSM placement may only be necessary for some major languages using the script.
# 9 - Input Method Engine Required:
# YES if the major languages using the script require IMEs.
# In particular, users (of languages for that script) would be accustomed to using IMEs (such as Japanese)
# and typical commercial products for those languages would need IME support in order to be competitive.
# 10- Cased
# YES if in modern (or most recent) usage case distinctions are customary.
#
# Sometimes a script is included here before it is added in the Unicode Standard.
# Such scripts are marked with a "provisional" comment.
#
# Note: For the most likely language for each script, see
# http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html
#