| # ScriptMetadata.txt |
| # Copyright © 1991-2016 Unicode, Inc. |
| # CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/) |
| # For terms of use, see http://www.unicode.org/copyright.html |
| # |
| # This file provides general information about scripts that may be useful to implementations processing text. |
| # The information is the best currently available, and may change between versions of CLDR. |
| # |
| # Format: |
| # The data is not in XML; instead it uses the semicolon-delimited format from the Unicode Character Database (UCD). |
| # This is so that parsers of the UCD can more easily be adapted to read the data. |
| # Additional fields may be added in future versions; parsers may be designed to ignore those fields until they are revised. |
| # |
| # Field - Description |
| # |
| # 0 - Script Identifier |
| # 1 - Web Rank: |
| # The approximate rank of this script from a large sample of the web, |
| # in terms of the number of characters found in that script. |
| # Below 32 the ranking is not statistically significant. |
| # 2 - Sample Character: |
| # A sample character for use in "Last Resort" style fonts. |
| # For printing the combining mark for Zinh in a chart, U+25CC can be prepended. |
| # See http://unicode.org/policies/lastresortfont_eula.html |
| # 3 - Origin country: |
| # The approximate area where the script originated, expressed as a BCP47 region code. |
| # 4 - Density: |
| # The approximate information density of characters in this script, based on comparison of bilingual texts. |
| # 5 - ID Usage: |
| # The usage for IDs (tables 4-7) according to UAX #31. |
| # For a description of values, see |
| # http://unicode.org/reports/tr31/#Table_Candidate_Characters_for_Exclusion_from_Identifiers |
| # 6 - RTL: |
| # YES if the script is RTL |
| # Derived from whether the script contains RTL letters according to the Bidi_Class property |
| # 7 - LB letters: |
| # YES if the major languages using the script allow linebreaks between letters (excluding hyphenation). |
| # Derived from LB property. |
| # 8 - Shaping Required: |
| # YES if shaping is required for the major languages using that script for NFC text. |
| # This includes not only ligation (and Indic conjuncts), Indic vowel splitting/reordering, and |
| # Arabic-style contextual shaping, but also cases where NSM placement is required, like Thai. |
| # MIN if NSM placement is sufficient, not the more complex shaping. |
| # The NSM placement may only be necessary for some major languages using the script. |
| # 9 - Input Method Engine Required: |
| # YES if the major languages using the script require IMEs. |
| # In particular, users (of languages for that script) would be accustomed to using IMEs (such as Japanese) |
| # and typical commercial products for those languages would need IME support in order to be competitive. |
| # 10- Cased |
| # YES if in modern (or most recent) usage case distinctions are customary. |
| # |
| # Sometimes a script is included here before it is added in the Unicode Standard. |
| # Such scripts are marked with a "provisional" comment. |
| # |
| # Note: For the most likely language for each script, see |
| # http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html |
| # |