blob: c365e8a511fd0d6bbb64b44aa91be42d53b5b48d [file] [log] [blame]
Internet Engineering Task Force M. Davis
Internet-Draft Google
Intended status: BCP A. Phillips
Expires: July 17, 2010 Lab126
Y. Umaoka
IBM
January 13, 2010
BCP 47 Extension U
draft-davis-u-langtag-ext-00
Abstract
This document specifies an Extension to BCP 47 which provides subtags
that specify language and/or locale-based behavior or refinements to
language tags, according to work done by the Unicode Consortium.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 17, 2010.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Davis, et al. Expires July 17, 2010 [Page 1]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 3
2. BCP47 Required Information . . . . . . . . . . . . . . . . . . 3
2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1. Canonicalization . . . . . . . . . . . . . . . . . . . 5
2.2. Registration Form . . . . . . . . . . . . . . . . . . . . . 5
3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . . 6
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.1. Normative References . . . . . . . . . . . . . . . . . . . 6
6.2. Informative References . . . . . . . . . . . . . . . . . . 6
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6
Davis, et al. Expires July 17, 2010 [Page 2]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
1. Introduction
[BCP47] permits the definition and registration of language tag
extensions "that contain a language component and are compatible with
applications that understand language tags". This document defines
an extension for identifying Unicode locale-based variations using
language tags. The "singleton" identifier for this extension is 'u'.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
2. BCP47 Required Information
Language tags, as defined by [BCP47], are useful for identifying the
language of content. They are also used as locale identifiers (or
can be mapped to locales) in many operating environments and APIs.
However, most such locale identifiers also provide additional
"tailorings" or options for specific values within a language,
culture, region, or other variation. This extension provides a
mechanism for using these additional tailorings within language tags
for general interchange.
The maintaining authority for this extension's registry is the
Unicode Consortium. Unicode defines common locale data and
identifiers for this data:
+---------------+---------------------------------------------------+
| Item | Value |
+---------------+---------------------------------------------------+
| Name | Unicode Consortium |
| Contact Email | cldr@unicode.org |
| Discussion | cldr-users@unicode.org |
| List Email | |
| URL Location | cldr.unicode.org |
| Specification | Unicode Technical Standard #35 Unicode Locale |
| | Data Markup Language (LDML), |
| | http://unicode.org/reports/tr35/ |
| Section | Section 3.2 BCP 47 Tag Conversion |
+---------------+---------------------------------------------------+
The specification of extension subtags is provided by Section 3 of
Unicode Technical Standard #35 Unicode Locale Data Markup Language
[LDML]. As required by BCP 47, subtags follow the language tag ABNF
and other rules for the formation of language tags and subtags, are
Davis, et al. Expires July 17, 2010 [Page 3]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
restricted to the ASCII letters and digits, are not case sensitive,
and do not exceed eight characters in length.
[LDML] specifies a canonical representation. LDML is available over
the Internet and at no cost, and is available via a royalty-free
license at http://unicode.org/copyright.html. LDML is versioned, and
each version of LDML is numbered, dated, and stable. Extension
subtags, once defined by LDML, are never retracted or change in
meaning in a substantial way.
2.1. Summary
The subtags available for use in the 'u' extension consist of a set
of attributes, keys, and types. Attributes, keys, types, and their
respective meanings are defined in Section 3 (Unicode Language and
Locale Identifiers) of [LDML]. The following is a summary of that
definition (for details see Section 3):
o An 'attribute' is a subtag with a length of three or more
characters following the singleton and preceding any 'keyword'
sequences. No attributes were defined at the time of this
document's publication.
o A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
followed by zero or more 'type' subtags. Each 'key' MUST be
unique within the extension. The order of the 'type' subtags
within a 'keyword' is sometimes significant to their
interpretation. Note that 'keys' can appear without a subsequent
'type' subtag.
A. A 'key' is a subtag with a length of exactly two characters.
Each 'key' is followed by zero or more 'type' subtags.
B. A 'type' is a subtag with a length of three or more characters
following a key. 'Type' subtags are specific to a particular
'key' and the order of the 'type' subtags MAY be significant
to the interpretation of the 'keyword'.
For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
o The base language tag "de-DE" (German as used in Germany), exactly
as defined by [BCP47] using subtags from the IANA Language Subtag
Registry.
o The singleton 'u', identifying this extension.
o The attribute 'attr', which is an example for illustration (no
attributes were defined at the time this document was published).
Davis, et al. Expires July 17, 2010 [Page 4]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
o The keyword 'co-phonebk', consisting to the key 'co' (Collation)
and the type 'phonebk' (Phonebook collation order).
With successive versions of [LDML], additional attributes, keys, and
types MAY be defined. Once defined, attributes, keys, and types will
never be removed. Machine-readable files listing the valid
attributes, keys, and types are available in the CLDR repository for
each version. For example, for version 1.7.2, the files are located
at http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/.
These also can contain aliases which were used in previous versions
of [LDML].
2.1.1. Canonicalization
As required by [BCP47], case is not significant. The canonical form
for all subtags in the extension is lowercase. The canonical order
of attributes is in [US-ASCII] order (that is, numbers before
letters, with letters sorted as lowercase US-ASCII code points). The
canonical order of keywords is in [US-ASCII] order by key. The order
of subtags within a keyword is significant; the meaning of this
extension is altered if those subtags are rearranged. Thus, the
canonical form of the extension never reorders the subtags within a
keyword.
2.2. Registration Form
Per [RFC5646], Section 3.7:
%%
Identifier: u
Description: Unicode Locale
Comments: Subtags for the identification of language and cultural
variations. Used to set behavior in locale APIs.
Added: 2009-mm-dd
RFC: [TBD]
Authority: Unicode Consortium
Contact_Email: cldr@unicode.org
Mailing_List: cldr-users@unicode.org
URL: http://cldr.unicode.org
%%
3. Acknowledgements
Thanks to John Emmons and the rest of the Unicode CLDR Technical
Committee for their work in developing the BCP 47 subtags for LDML.
Davis, et al. Expires July 17, 2010 [Page 5]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
4. IANA Considerations
This document will require IANA to insert the record in Section 2.2
into the Language Extensions Registry, according to Section 3.7.
Extensions and the Extensions Registry of "Tags for Identifying
Languages" in [BCP47]. There might be occasional maintenance of this
record. This document does not require IANA to create or maintain a
new registry or otherwise impact IANA.
5. Security Considerations
The security considerations for this extension are the same as those
for [RFC5646] (or its successors). See Section 6. Security
Considerations of [RFC5646].
6. References
6.1. Normative References
[BCP47] Davis, M., Ed., "Tags for the Identification of Language
(BCP47)", September 2009.
[LDML] Davis, M., "Unicode Technical Standard #35: Locale Data
Markup Language (LDML)", December 2007,
<http://www.unicode.org/reports/tr35/>.
[RFC5646] Phillips, A. and M. Davis, "Tags for Identifying
Languages", BCP 47, RFC 5646, September 2009.
[US-ASCII]
International Organization for Standardization, "ISO/IEC
646:1991, Information technology -- ISO 7-bit coded
character set for information interchange.", 1991.
6.2. Informative References
[ldml-registry]
"Registry for Common Locale Data Repository tag elements",
September 2009.
Davis, et al. Expires July 17, 2010 [Page 6]
Internet-Draft BCP 47 Unicode Locale Extension January 2010
Authors' Addresses
Mark Davis
Google
Email: mark@macchiato.com
Addison Phillips
Lab126
Email: addison@inter-locale.com
Yoshito Umaoka
IBM
Email: yoshito_umaoka@us.ibm.com
Davis, et al. Expires July 17, 2010 [Page 7]