tools/java/org/unicode/cldr/tool/GenerateScriptMetadata.txt - platform/external/cldr - Git at Google

 # ScriptMetadata.txt
 # Copyright © 1991-2016 Unicode, Inc.
 # CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
 # For terms of use, see http://www.unicode.org/copyright.html
 #
 # This file provides general information about scripts that may be useful to implementations processing text.
 # The information is the best currently available, and may change between versions of CLDR.
 #
 # Format:
 #	The data is not in XML; instead it uses the semicolon-delimited format from the Unicode Character Database (UCD).
 #	This is so that parsers of the UCD can more easily be adapted to read the data.
 #	Additional fields may be added in future versions; parsers may be designed to ignore those fields until they are revised.
 #
 # Field - Description
 #
 # 0 - Script Identifier
 # 1 - Web Rank:
 #		The approximate rank of this script from a large sample of the web,
 #		in terms of the number of characters found in that script.
 #		Below 32 the ranking is not statistically significant.
 # 2 - Sample Character:
 #		A sample character for use in "Last Resort" style fonts.
 #               For printing the combining mark for Zinh in a chart, U+25CC can be prepended.
 #		See http://unicode.org/policies/lastresortfont_eula.html
 # 3 - Origin country:
 #		The approximate area where the script originated, expressed as a BCP47 region code.
 # 4 - Density:
 #		The approximate information density of characters in this script, based on comparison of bilingual texts.
 # 5 - ID Usage:
 #		The usage for IDs (tables 4-7) according to UAX #31.
 #		For a description of values, see
 #		http://unicode.org/reports/tr31/#Table_Candidate_Characters_for_Exclusion_from_Identifiers
 # 6 - RTL:
 #		YES if the script is RTL
 #		Derived from whether the script contains RTL letters according to the Bidi_Class property
 # 7 - LB letters:
 #		YES if the major languages using the script allow linebreaks between letters (excluding hyphenation).
 #		Derived from LB property.
 # 8 - Shaping Required:
 #		YES if shaping is required for the major languages using that script for NFC text.
 #			This includes not only ligation (and Indic conjuncts), Indic vowel splitting/reordering, and
 #			Arabic-style contextual shaping, but also cases where NSM placement is required, like Thai.
 #		MIN if NSM placement is sufficient, not the more complex shaping.
 #			The NSM placement may only be necessary for some major languages using the script.
 # 9 - Input Method Engine Required:
 #		YES if the major languages using the script require IMEs.
 #		In particular, users (of languages for that script) would be accustomed to using IMEs (such as Japanese)
 #		and typical commercial products for those languages would need IME support in order to be competitive.
 # 10- Cased
 #		YES if in modern (or most recent) usage case distinctions are customary.
 #
 # Sometimes a script is included here before it is added in the Unicode Standard.
 # Such scripts are marked with a "provisional" comment.
 #
 # Note: For the most likely language for each script, see
 #		http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html
 #
	# ScriptMetadata.txt
	# Copyright © 1991-2016 Unicode, Inc.
	# CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
	# For terms of use, see http://www.unicode.org/copyright.html
	#
	# This file provides general information about scripts that may be useful to implementations processing text.
	# The information is the best currently available, and may change between versions of CLDR.
	#
	# Format:
	# The data is not in XML; instead it uses the semicolon-delimited format from the Unicode Character Database (UCD).
	# This is so that parsers of the UCD can more easily be adapted to read the data.
	# Additional fields may be added in future versions; parsers may be designed to ignore those fields until they are revised.
	#
	# Field - Description
	#
	# 0 - Script Identifier
	# 1 - Web Rank:
	# The approximate rank of this script from a large sample of the web,
	# in terms of the number of characters found in that script.
	# Below 32 the ranking is not statistically significant.
	# 2 - Sample Character:
	# A sample character for use in "Last Resort" style fonts.
	# For printing the combining mark for Zinh in a chart, U+25CC can be prepended.
	# See http://unicode.org/policies/lastresortfont_eula.html
	# 3 - Origin country:
	# The approximate area where the script originated, expressed as a BCP47 region code.
	# 4 - Density:
	# The approximate information density of characters in this script, based on comparison of bilingual texts.
	# 5 - ID Usage:
	# The usage for IDs (tables 4-7) according to UAX #31.
	# For a description of values, see
	# http://unicode.org/reports/tr31/#Table_Candidate_Characters_for_Exclusion_from_Identifiers
	# 6 - RTL:
	# YES if the script is RTL
	# Derived from whether the script contains RTL letters according to the Bidi_Class property
	# 7 - LB letters:
	# YES if the major languages using the script allow linebreaks between letters (excluding hyphenation).
	# Derived from LB property.
	# 8 - Shaping Required:
	# YES if shaping is required for the major languages using that script for NFC text.
	# This includes not only ligation (and Indic conjuncts), Indic vowel splitting/reordering, and
	# Arabic-style contextual shaping, but also cases where NSM placement is required, like Thai.
	# MIN if NSM placement is sufficient, not the more complex shaping.
	# The NSM placement may only be necessary for some major languages using the script.
	# 9 - Input Method Engine Required:
	# YES if the major languages using the script require IMEs.
	# In particular, users (of languages for that script) would be accustomed to using IMEs (such as Japanese)
	# and typical commercial products for those languages would need IME support in order to be competitive.
	# 10- Cased
	# YES if in modern (or most recent) usage case distinctions are customary.
	#
	# Sometimes a script is included here before it is added in the Unicode Standard.
	# Such scripts are marked with a "provisional" comment.
	#
	# Note: For the most likely language for each script, see
	# http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html
	#