| #8.2.4 Tokenization Table of contents 8.4 Serializing HTML fragments |
| |
| WHATWG |
| |
| HTML 5 |
| |
| Draft Recommendation — 13 January 2009 |
| |
| ← 8.2.4 Tokenization – Table of contents – 8.4 Serializing HTML |
| fragments → |
| |
| 8.2.5 Tree construction |
| |
| The input to the tree construction stage is a sequence of tokens from |
| the tokenization stage. The tree construction stage is associated with |
| a DOM Document object when a parser is created. The "output" of this |
| stage consists of dynamically modifying or extending that document's |
| DOM tree. |
| |
| This specification does not define when an interactive user agent has |
| to render the Document so that it is available to the user, or when it |
| has to begin accepting user input. |
| |
| As each token is emitted from the tokeniser, the user agent must |
| process the token according to the rules given in the section |
| corresponding to the current insertion mode. |
| |
| When the steps below require the UA to insert a character into a node, |
| if that node has a child immediately before where the character is to |
| be inserted, and that child is a Text node, and that Text node was the |
| last node that the parser inserted into the document, then the |
| character must be appended to that Text node; otherwise, a new Text |
| node whose data is just that character must be inserted in the |
| appropriate place. |
| |
| DOM mutation events must not fire for changes caused by the UA parsing |
| the document. (Conceptually, the parser is not mutating the DOM, it is |
| constructing it.) This includes the parsing of any content inserted |
| using document.write() and document.writeln() calls. [DOM3EVENTS] |
| |
| Not all of the tag names mentioned below are conformant tag names in |
| this specification; many are included to handle legacy content. They |
| still form part of the algorithm that implementations are required to |
| implement to claim conformance. |
| |
| The algorithm described below places no limit on the depth of the DOM |
| tree generated, or on the length of tag names, attribute names, |
| attribute values, text nodes, etc. While implementors are encouraged to |
| avoid arbitrary limits, it is recognized that practical concerns will |
| likely force user agents to impose nesting depths. |
| |
| 8.2.5.1 Creating and inserting elements |
| |
| When the steps below require the UA to create an element for a token in |
| a particular namespace, the UA must create a node implementing the |
| interface appropriate for the element type corresponding to the tag |
| name of the token in the given namespace (as given in the specification |
| that defines that element, e.g. for an a element in the HTML namespace, |
| this specification defines it to be the HTMLAnchorElement interface), |
| with the tag name being the name of that element, with the node being |
| in the given namespace, and with the attributes on the node being those |
| given in the given token. |
| |
| The interface appropriate for an element in the HTML namespace that is |
| not defined in this specification is HTMLElement. The interface |
| appropriate for an element in another namespace that is not defined by |
| that namespace's specification is Element. |
| |
| When a resettable element is created in this manner, its reset |
| algorithm must be invoked once the attributes are set. (This |
| initializes the element's value and checkedness based on the element's |
| attributes.) |
| __________________________________________________________________ |
| |
| When the steps below require the UA to insert an HTML element for a |
| token, the UA must first create an element for the token in the HTML |
| namespace, and then append this node to the current node, and push it |
| onto the stack of open elements so that it is the new current node. |
| |
| The steps below may also require that the UA insert an HTML element in |
| a particular place, in which case the UA must follow the same steps |
| except that it must insert or append the new node in the location |
| specified instead of appending it to the current node. (This happens in |
| particular during the parsing of tables with invalid content.) |
| |
| If an element created by the insert an HTML element algorithm is a |
| form-associated element, and the form element pointer is not null, and |
| the newly created element doesn't have a form attribute, the user agent |
| must associate the newly created element with the form element pointed |
| to by the form element pointer before inserting it wherever it is to be |
| inserted. |
| __________________________________________________________________ |
| |
| When the steps below require the UA to insert a foreign element for a |
| token, the UA must first create an element for the token in the given |
| namespace, and then append this node to the current node, and push it |
| onto the stack of open elements so that it is the new current node. If |
| the newly created element has an xmlns attribute in the XMLNS namespace |
| whose value is not exactly the same as the element's namespace, that is |
| a parse error. |
| |
| When the steps below require the user agent to adjust MathML attributes |
| for a token, then, if the token has an attribute named definitionurl, |
| change its name to definitionURL (note the case difference). |
| |
| When the steps below require the user agent to adjust foreign |
| attributes for a token, then, if any of the attributes on the token |
| match the strings given in the first column of the following table, let |
| the attribute be a namespaced attribute, with the prefix being the |
| string given in the corresponding cell in the second column, the local |
| name being the string given in the corresponding cell in the third |
| column, and the namespace being the namespace given in the |
| corresponding cell in the fourth column. (This fixes the use of |
| namespaced attributes, in particular xml:lang.) |
| |
| Attribute name Prefix Local name Namespace |
| xlink:actuate xlink actuate XLink namespace |
| xlink:arcrole xlink arcrole XLink namespace |
| xlink:href xlink href XLink namespace |
| xlink:role xlink role XLink namespace |
| xlink:show xlink show XLink namespace |
| xlink:title xlink title XLink namespace |
| xlink:type xlink type XLink namespace |
| xml:base xml base XML namespace |
| xml:lang xml lang XML namespace |
| xml:space xml space XML namespace |
| xmlns (none) xmlns XMLNS namespace |
| xmlns:xlink xmlns xlink XMLNS namespace |
| __________________________________________________________________ |
| |
| The generic CDATA element parsing algorithm and the generic RCDATA |
| element parsing algorithm consist of the following steps. These |
| algorithms are always invoked in response to a start tag token. |
| 1. Insert an HTML element for the token. |
| 2. If the algorithm that was invoked is the generic CDATA element |
| parsing algorithm, switch the tokeniser's content model flag to the |
| CDATA state; otherwise the algorithm invoked was the generic RCDATA |
| element parsing algorithm, switch the tokeniser's content model |
| flag to the RCDATA state. |
| 3. Let the original insertion mode be the current insertion mode. |
| 4. Then, switch the insertion mode to "in CDATA/RCDATA". |
| |
| 8.2.5.2 Closing elements that have implied end tags |
| |
| When the steps below require the UA to generate implied end tags, then, |
| while the current node is a dd element, a dt element, an li element, an |
| option element, an optgroup element, a p element, an rp element, or an |
| rt element, the UA must pop the current node off the stack of open |
| elements. |
| |
| If a step requires the UA to generate implied end tags but lists an |
| element to exclude from the process, then the UA must perform the above |
| steps as if that element was not in the above list. |
| |
| 8.2.5.3 Foster parenting |
| |
| Foster parenting happens when content is misnested in tables. |
| |
| When a node node is to be foster parented, the node node must be |
| inserted into the foster parent element, and the current table must be |
| marked as tainted. (Once the current table has been tainted, whitespace |
| characters are inserted into the foster parent element instead of the |
| current node.) |
| |
| The foster parent element is the parent element of the last table |
| element in the stack of open elements, if there is a table element and |
| it has such a parent element. If there is no table element in the stack |
| of open elements (fragment case), then the foster parent element is the |
| first element in the stack of open elements (the html element). |
| Otherwise, if there is a table element in the stack of open elements, |
| but the last table element in the stack of open elements has no parent, |
| or its parent node is not an element, then the foster parent element is |
| the element before the last table element in the stack of open |
| elements. |
| |
| If the foster parent element is the parent element of the last table |
| element in the stack of open elements, then node must be inserted |
| immediately before the last table element in the stack of open elements |
| in the foster parent element; otherwise, node must be appended to the |
| foster parent element. |
| |
| 8.2.5.4 The "initial" insertion mode |
| |
| When the insertion mode is "initial", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Ignore the token. |
| |
| A comment token |
| Append a Comment node to the Document object with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| If the DOCTYPE token's name is not a case-sensitive match for |
| the string "html", or if the token's public identifier is |
| neither missing nor a case-sensitive match for the string |
| "XSLT-compat", or if the token's system identifier is not |
| missing, then there is a parse error (this is the DOCTYPE parse |
| error). Conformance checkers may, instead of reporting this |
| error, switch to a conformance checking mode for another |
| language (e.g. based on the DOCTYPE token a conformance checker |
| could recognize that the document is an HTML4-era document, and |
| defer to an HTML4 conformance checker.) |
| |
| Append a DocumentType node to the Document node, with the name |
| attribute set to the name given in the DOCTYPE token; the |
| publicId attribute set to the public identifier given in the |
| DOCTYPE token, or the empty string if the public identifier was |
| missing; the systemId attribute set to the system identifier |
| given in the DOCTYPE token, or the empty string if the system |
| identifier was missing; and the other attributes specific to |
| DocumentType objects set to null and empty lists as appropriate. |
| Associate the DocumentType node with the Document object so that |
| it is returned as the value of the doctype attribute of the |
| Document object. |
| |
| Then, if the DOCTYPE token matches one of the conditions in the |
| following list, then set the document to quirks mode: |
| |
| + The force-quirks flag is set to on. |
| + The name is set to anything other than "HTML". |
| + The public identifier starts with: "+//Silmaril//dtd html Pro |
| v0r11 19970101//" |
| + The public identifier starts with: "-//AdvaSoft Ltd//DTD HTML |
| 3.0 asWedit + extensions//" |
| + The public identifier starts with: "-//AS//DTD HTML 3.0 |
| asWedit + extensions//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0 |
| Level 1//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0 |
| Level 2//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0 |
| Strict Level 1//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0 |
| Strict Level 2//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0 |
| Strict//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.0//" |
| + The public identifier starts with: "-//IETF//DTD HTML 2.1E//" |
| + The public identifier starts with: "-//IETF//DTD HTML 3.0//" |
| + The public identifier starts with: "-//IETF//DTD HTML 3.2 |
| Final//" |
| + The public identifier starts with: "-//IETF//DTD HTML 3.2//" |
| + The public identifier starts with: "-//IETF//DTD HTML 3//" |
| + The public identifier starts with: "-//IETF//DTD HTML Level |
| 0//" |
| + The public identifier starts with: "-//IETF//DTD HTML Level |
| 1//" |
| + The public identifier starts with: "-//IETF//DTD HTML Level |
| 2//" |
| + The public identifier starts with: "-//IETF//DTD HTML Level |
| 3//" |
| + The public identifier starts with: "-//IETF//DTD HTML Strict |
| Level 0//" |
| + The public identifier starts with: "-//IETF//DTD HTML Strict |
| Level 1//" |
| + The public identifier starts with: "-//IETF//DTD HTML Strict |
| Level 2//" |
| + The public identifier starts with: "-//IETF//DTD HTML Strict |
| Level 3//" |
| + The public identifier starts with: "-//IETF//DTD HTML |
| Strict//" |
| + The public identifier starts with: "-//IETF//DTD HTML//" |
| + The public identifier starts with: "-//Metrius//DTD Metrius |
| Presentational//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 2.0 HTML Strict//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 2.0 HTML//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 2.0 Tables//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 3.0 HTML Strict//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 3.0 HTML//" |
| + The public identifier starts with: "-//Microsoft//DTD Internet |
| Explorer 3.0 Tables//" |
| + The public identifier starts with: "-//Netscape Comm. |
| Corp.//DTD HTML//" |
| + The public identifier starts with: "-//Netscape Comm. |
| Corp.//DTD Strict HTML//" |
| + The public identifier starts with: "-//O'Reilly and |
| Associates//DTD HTML 2.0//" |
| + The public identifier starts with: "-//O'Reilly and |
| Associates//DTD HTML Extended 1.0//" |
| + The public identifier starts with: "-//O'Reilly and |
| Associates//DTD HTML Extended Relaxed 1.0//" |
| + The public identifier starts with: "-//SoftQuad Software//DTD |
| HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//" |
| + The public identifier starts with: "-//SoftQuad//DTD HoTMetaL |
| PRO 4.0::19971010::extensions to HTML 4.0//" |
| + The public identifier starts with: "-//Spyglass//DTD HTML 2.0 |
| Extended//" |
| + The public identifier starts with: "-//SQ//DTD HTML 2.0 |
| HoTMetaL + extensions//" |
| + The public identifier starts with: "-//Sun Microsystems |
| Corp.//DTD HotJava HTML//" |
| + The public identifier starts with: "-//Sun Microsystems |
| Corp.//DTD HotJava Strict HTML//" |
| + The public identifier starts with: "-//W3C//DTD HTML 3 |
| 1995-03-24//" |
| + The public identifier starts with: "-//W3C//DTD HTML 3.2 |
| Draft//" |
| + The public identifier starts with: "-//W3C//DTD HTML 3.2 |
| Final//" |
| + The public identifier starts with: "-//W3C//DTD HTML 3.2//" |
| + The public identifier starts with: "-//W3C//DTD HTML 3.2S |
| Draft//" |
| + The public identifier starts with: "-//W3C//DTD HTML 4.0 |
| Frameset//" |
| + The public identifier starts with: "-//W3C//DTD HTML 4.0 |
| Transitional//" |
| + The public identifier starts with: "-//W3C//DTD HTML |
| Experimental 19960712//" |
| + The public identifier starts with: "-//W3C//DTD HTML |
| Experimental 970421//" |
| + The public identifier starts with: "-//W3C//DTD W3 HTML//" |
| + The public identifier starts with: "-//W3O//DTD W3 HTML 3.0//" |
| + The public identifier is set to: "-//W3O//DTD W3 HTML Strict |
| 3.0//EN//" |
| + The public identifier starts with: "-//WebTechs//DTD Mozilla |
| HTML 2.0//" |
| + The public identifier starts with: "-//WebTechs//DTD Mozilla |
| HTML//" |
| + The public identifier is set to: "-/W3C/DTD HTML 4.0 |
| Transitional/EN" |
| + The public identifier is set to: "HTML" |
| + The system identifier is set to: |
| "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd" |
| + The system identifier is missing and the public identifier |
| starts with: "-//W3C//DTD HTML 4.01 Frameset//" |
| + The system identifier is missing and the public identifier |
| starts with: "-//W3C//DTD HTML 4.01 Transitional//" |
| |
| Otherwise, if the DOCTYPE token matches one of the conditions in |
| the following list, then set the document to limited quirks |
| mode: |
| |
| + The public identifier starts with: "-//W3C//DTD XHTML 1.0 |
| Frameset//" |
| + The public identifier starts with: "-//W3C//DTD XHTML 1.0 |
| Transitional//" |
| + The system identifier is not missing and the public identifier |
| starts with: "-//W3C//DTD HTML 4.01 Frameset//" |
| + The system identifier is not missing and the public identifier |
| starts with: "-//W3C//DTD HTML 4.01 Transitional//" |
| |
| The name, system identifier, and public identifier strings must |
| be compared to the values given in the lists above in an ASCII |
| case-insensitive manner. A system identifier whose value is the |
| empty string is not considered missing for the purposes of the |
| conditions above. |
| |
| Then, switch the insertion mode to "before html". |
| |
| Anything else |
| Parse error. |
| |
| Set the document to quirks mode. |
| |
| Switch the insertion mode to "before html", then reprocess the |
| current token. |
| |
| 8.2.5.5 The "before html" insertion mode |
| |
| When the insertion mode is "before html", tokens must be handled as |
| follows: |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A comment token |
| Append a Comment node to the Document object with the data |
| attribute set to the data given in the comment token. |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Create an element for the token in the HTML namespace. Append it |
| to the Document object. Put this element in the stack of open |
| elements. |
| |
| If the token has an attribute "manifest", then resolve the value |
| of that attribute to an absolute URL, and if that is successful, |
| run the application cache selection algorithm with the resulting |
| absolute URL. Otherwise, if there is no such attribute or |
| resolving it fails, run the application cache selection |
| algorithm with no manifest. The algorithm must be passed the |
| Document object. |
| |
| Switch the insertion mode to "before head". |
| |
| Anything else |
| Create an HTMLElement node with the tag name html, in the HTML |
| namespace. Append it to the Document object. Put this element in |
| the stack of open elements. |
| |
| Run the application cache selection algorithm with no manifest, |
| passing it the Document object. |
| |
| Switch the insertion mode to "before head", then reprocess the |
| current token. |
| |
| Should probably make end tags be ignored, so that "</head><!-- |
| --><html>" puts the comment before the root node (or should we?) |
| |
| The root element can end up being removed from the Document object, |
| e.g. by scripts; nothing in particular happens in such cases, content |
| continues being appended to the nodes as described in the next section. |
| |
| 8.2.5.6 The "before head" insertion mode |
| |
| When the insertion mode is "before head", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Ignore the token. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is "head" |
| Insert an HTML element for the token. |
| |
| Set the head element pointer to the newly created head element. |
| |
| Switch the insertion mode to "in head". |
| |
| An end tag whose tag name is one of: "head", "br" |
| Act as if a start tag token with the tag name "head" and no |
| attributes had been seen, then reprocess the current token. |
| |
| Any other end tag |
| Parse error. Ignore the token. |
| |
| Anything else |
| Act as if a start tag token with the tag name "head" and no |
| attributes had been seen, then reprocess the current token. |
| |
| This will result in an empty head element being generated, with |
| the current token being reprocessed in the "after head" |
| insertion mode. |
| |
| 8.2.5.7 The "in head" insertion mode |
| |
| When the insertion mode is "in head", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is one of: "base", "command", "eventsource", |
| "link" |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| A start tag whose tag name is "meta" |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| If the element has a charset attribute, and its value is a |
| supported encoding, and the confidence is currently tentative, |
| then change the encoding to the encoding given by the value of |
| the charset attribute. |
| |
| Otherwise, if the element has a content attribute, and applying |
| the algorithm for extracting an encoding from a Content-Type to |
| its value returns a supported encoding encoding, and the |
| confidence is currently tentative, then change the encoding to |
| the encoding encoding. |
| |
| A start tag whose tag name is "title" |
| Follow the generic RCDATA element parsing algorithm. |
| |
| A start tag whose tag name is "noscript", if the scripting flag is |
| enabled |
| |
| A start tag whose tag name is one of: "noframes", "style" |
| Follow the generic CDATA element parsing algorithm. |
| |
| A start tag whose tag name is "noscript", if the scripting flag is |
| disabled |
| Insert an HTML element for the token. |
| |
| Switch the insertion mode to "in head noscript". |
| |
| A start tag whose tag name is "script" |
| |
| 1. Create an element for the token in the HTML namespace. |
| 2. Mark the element as being "parser-inserted". |
| This ensures that, if the script is external, any |
| document.write() calls in the script will execute in-line, |
| instead of blowing the document away, as would happen in most |
| other cases. It also prevents the script from executing until |
| the end tag is seen. |
| 3. If the parser was originally created for the HTML fragment |
| parsing algorithm, then mark the script element as "already |
| executed". (fragment case) |
| 4. Append the new element to the current node. |
| 5. Switch the tokeniser's content model flag to the CDATA state. |
| 6. Let the original insertion mode be the current insertion mode. |
| 7. Switch the insertion mode to "in CDATA/RCDATA". |
| |
| An end tag whose tag name is "head" |
| Pop the current node (which will be the head element) off the |
| stack of open elements. |
| |
| Switch the insertion mode to "after head". |
| |
| An end tag whose tag name is "br" |
| Act as described in the "anything else" entry below. |
| |
| A start tag whose tag name is "head" |
| Any other end tag |
| Parse error. Ignore the token. |
| |
| Anything else |
| Act as if an end tag token with the tag name "head" had been |
| seen, and reprocess the current token. |
| |
| In certain UAs, some elements don't trigger the "in body" mode |
| straight away, but instead get put into the head. Do we want to |
| copy that? |
| |
| 8.2.5.8 The "in head noscript" insertion mode |
| |
| When the insertion mode is "in head noscript", tokens must be handled |
| as follows: |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| An end tag whose tag name is "noscript" |
| Pop the current node (which will be a noscript element) from the |
| stack of open elements; the new current node will be a head |
| element. |
| |
| Switch the insertion mode to "in head". |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| |
| A comment token |
| A start tag whose tag name is one of: "link", "meta", "noframes", |
| "style" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| An end tag whose tag name is "br" |
| Act as described in the "anything else" entry below. |
| |
| A start tag whose tag name is one of: "head", "noscript" |
| Any other end tag |
| Parse error. Ignore the token. |
| |
| Anything else |
| Parse error. Act as if an end tag with the tag name "noscript" |
| had been seen and reprocess the current token. |
| |
| 8.2.5.9 The "after head" insertion mode |
| |
| When the insertion mode is "after head", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is "body" |
| Insert an HTML element for the token. |
| |
| Switch the insertion mode to "in body". |
| |
| A start tag whose tag name is "frameset" |
| Insert an HTML element for the token. |
| |
| Switch the insertion mode to "in frameset". |
| |
| A start tag token whose tag name is one of: "base", "link", "meta", |
| "noframes", "script", "style", "title" |
| Parse error. |
| |
| Push the node pointed to by the head element pointer onto the |
| stack of open elements. |
| |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| Remove the node pointed to by the head element pointer from the |
| stack of open elements. |
| |
| An end tag whose tag name is "br" |
| Act as described in the "anything else" entry below. |
| |
| A start tag whose tag name is "head" |
| Any other end tag |
| Parse error. Ignore the token. |
| |
| Anything else |
| Act as if a start tag token with the tag name "body" and no |
| attributes had been seen, and then reprocess the current token. |
| |
| 8.2.5.10 The "in body" insertion mode |
| |
| When the insertion mode is "in body", tokens must be handled as |
| follows: |
| |
| A character token |
| Reconstruct the active formatting elements, if any. |
| |
| Insert the token's character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Parse error. For each attribute on the token, check to see if |
| the attribute is already present on the top element of the stack |
| of open elements. If it is not, add the attribute and its |
| corresponding value to that element. |
| |
| A start tag token whose tag name is one of: "base", "command", |
| "eventsource", "link", "meta", "noframes", "script", "style", |
| "title" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| A start tag whose tag name is "body" |
| Parse error. |
| |
| If the second element on the stack of open elements is not a |
| body element, or, if the stack of open elements has only one |
| node on it, then ignore the token. (fragment case) |
| |
| Otherwise, for each attribute on the token, check to see if the |
| attribute is already present on the body element (the second |
| element) on the stack of open elements. If it is not, add the |
| attribute and its corresponding value to that element. |
| |
| An end-of-file token |
| If there is a node in the stack of open elements that is not |
| either a dd element, a dt element, an li element, a p element, a |
| tbody element, a td element, a tfoot element, a th element, a |
| thead element, a tr element, the body element, or the html |
| element, then this is a parse error. |
| |
| Stop parsing. |
| |
| An end tag whose tag name is "body" |
| If the stack of open elements does not have a body element in |
| scope, this is a parse error; ignore the token. |
| |
| Otherwise, if there is a node in the stack of open elements that |
| is not either a dd element, a dt element, an li element, a p |
| element, a tbody element, a td element, a tfoot element, a th |
| element, a thead element, a tr element, the body element, or the |
| html element, then this is a parse error. |
| |
| Switch the insertion mode to "after body". |
| |
| An end tag whose tag name is "html" |
| Act as if an end tag with tag name "body" had been seen, then, |
| if that token wasn't ignored, reprocess the current token. |
| |
| The fake end tag token here can only be ignored in the fragment |
| case. |
| |
| A start tag whose tag name is one of: "address", "article", "aside", |
| "blockquote", "center", "datagrid", "details", "dialog", "dir", |
| "div", "dl", "fieldset", "figure", "footer", "header", "menu", |
| "nav", "ol", "p", "section", "ul" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", |
| "h6" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| If the current node is an element whose tag name is one of "h1", |
| "h2", "h3", "h4", "h5", or "h6", then this is a parse error; pop |
| the current node off the stack of open elements. |
| |
| Insert an HTML element for the token. |
| |
| A start tag whose tag name is one of: "pre", "listing" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| If the next token is a U+000A LINE FEED (LF) character token, |
| then ignore that token and move on to the next one. (Newlines at |
| the start of pre blocks are ignored as an authoring |
| convenience.) |
| |
| A start tag whose tag name is "form" |
| If the form element pointer is not null, then this is a parse |
| error; ignore the token. |
| |
| Otherwise: |
| |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token, and set the form element |
| pointer to point to the element created. |
| |
| A start tag whose tag name is "li" |
| Run the following algorithm: |
| |
| 1. Initialize node to be the current node (the bottommost node of |
| the stack). |
| 2. If node is an li element, then act as if an end tag with the |
| tag name "li" had been seen, then jump to the last step. |
| 3. If node is not in the formatting category, and is not in the |
| phrasing category, and is not an address, div, or p element, |
| then jump to the last step. |
| 4. Otherwise, set node to the previous entry in the stack of open |
| elements and return to step 2. |
| 5. This is the last step. |
| If the stack of open elements has a p element in scope, then |
| act as if an end tag with the tag name "p" had been seen. |
| Finally, insert an HTML element for the token. |
| |
| A start tag whose tag name is one of: "dd", "dt" |
| Run the following algorithm: |
| |
| 1. Initialize node to be the current node (the bottommost node of |
| the stack). |
| 2. If node is a dd or dt element, then act as if an end tag with |
| the same tag name as node had been seen, then jump to the last |
| step. |
| 3. If node is not in the formatting category, and is not in the |
| phrasing category, and is not an address, div, or p element, |
| then jump to the last step. |
| 4. Otherwise, set node to the previous entry in the stack of open |
| elements and return to step 2. |
| 5. This is the last step. |
| If the stack of open elements has a p element in scope, then |
| act as if an end tag with the tag name "p" had been seen. |
| Finally, insert an HTML element for the token. |
| |
| A start tag whose tag name is "plaintext" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| Switch the content model flag to the PLAINTEXT state. |
| |
| Once a start tag with the tag name "plaintext" has been seen, |
| that will be the last token ever seen other than character |
| tokens (and the end-of-file token), because there is no way to |
| switch the content model flag out of the PLAINTEXT state. |
| |
| An end tag whose tag name is one of: "address", "article", "aside", |
| "blockquote", "center", "datagrid", "details", "dialog", "dir", |
| "div", "dl", "fieldset", "figure", "footer", "header", |
| "listing", "menu", "nav", "ol", "pre", "section", "ul" |
| If the stack of open elements does not have an element in scope |
| with the same tag name as that of the token, then this is a |
| parse error; ignore the token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags. |
| 2. If the current node is not an element with the same tag name |
| as that of the token, then this is a parse error. |
| 3. Pop elements from the stack of open elements until an element |
| with the same tag name as the token has been popped from the |
| stack. |
| |
| An end tag whose tag name is "form" |
| Let node be the element that the form element pointer is set to. |
| |
| Set the form element pointer to null. |
| |
| If node is null or the stack of open elements does not have node |
| in scope, then this is a parse error; ignore the token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags. |
| 2. If the current node is not node, then this is a parse error. |
| 3. Remove node from the stack of open elements. |
| |
| An end tag whose tag name is "p" |
| If the stack of open elements does not have an element in scope |
| with the same tag name as that of the token, then this is a |
| parse error; act as if a start tag with the tag name p had been |
| seen, then reprocess the current token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags, except for elements with the same |
| tag name as the token. |
| 2. If the current node is not an element with the same tag name |
| as that of the token, then this is a parse error. |
| 3. Pop elements from the stack of open elements until an element |
| with the same tag name as the token has been popped from the |
| stack. |
| |
| An end tag whose tag name is one of: "dd", "dt", "li" |
| If the stack of open elements does not have an element in scope |
| with the same tag name as that of the token, then this is a |
| parse error; ignore the token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags, except for elements with the same |
| tag name as the token. |
| 2. If the current node is not an element with the same tag name |
| as that of the token, then this is a parse error. |
| 3. Pop elements from the stack of open elements until an element |
| with the same tag name as the token has been popped from the |
| stack. |
| |
| An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" |
| If the stack of open elements does not have an element in scope |
| whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", |
| then this is a parse error; ignore the token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags. |
| 2. If the current node is not an element with the same tag name |
| as that of the token, then this is a parse error. |
| 3. Pop elements from the stack of open elements until an element |
| whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6" |
| has been popped from the stack. |
| |
| An end tag whose tag name is "sarcasm" |
| Take a deep breath, then act as described in the "any other end |
| tag" entry below. |
| |
| A start tag whose tag name is "a" |
| If the list of active formatting elements contains an element |
| whose tag name is "a" between the end of the list and the last |
| marker on the list (or the start of the list if there is no |
| marker on the list), then this is a parse error; act as if an |
| end tag with the tag name "a" had been seen, then remove that |
| element from the list of active formatting elements and the |
| stack of open elements if the end tag didn't already remove it |
| (it might not have if the element is not in table scope). |
| |
| In the non-conforming stream |
| <a href="a">a<table><a href="b">b</table>x, the first a element |
| would be closed upon seeing the second one, and the "x" |
| character would be inside a link to "b", not to "a". This is |
| despite the fact that the outer a element is not in table scope |
| (meaning that a regular </a> end tag at the start of the table |
| wouldn't close the outer a element). |
| |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. Add that element to the |
| list of active formatting elements. |
| |
| A start tag whose tag name is one of: "b", "big", "em", "font", "i", |
| "s", "small", "strike", "strong", "tt", "u" |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. Add that element to the |
| list of active formatting elements. |
| |
| A start tag whose tag name is "nobr" |
| Reconstruct the active formatting elements, if any. |
| |
| If the stack of open elements has a nobr element in scope, then |
| this is a parse error; act as if an end tag with the tag name |
| "nobr" had been seen, then once again reconstruct the active |
| formatting elements, if any. |
| |
| Insert an HTML element for the token. Add that element to the |
| list of active formatting elements. |
| |
| An end tag whose tag name is one of: "a", "b", "big", "em", "font", |
| "i", "nobr", "s", "small", "strike", "strong", "tt", "u" |
| Follow these steps: |
| |
| 1. Let the formatting element be the last element in the list of |
| active formatting elements that: |
| o is between the end of the list and the last scope marker |
| in the list, if any, or the start of the list otherwise, |
| and |
| o has the same tag name as the token. |
| If there is no such node, or, if that node is also in the |
| stack of open elements but the element is not in scope, then |
| this is a parse error; ignore the token, and abort these |
| steps. |
| Otherwise, if there is such a node, but that node is not in |
| the stack of open elements, then this is a parse error; remove |
| the element from the list, and abort these steps. |
| Otherwise, there is a formatting element and that element is |
| in the stack and is in scope. If the element is not the |
| current node, this is a parse error. In any case, proceed with |
| the algorithm as written in the following steps. |
| 2. Let the furthest block be the topmost node in the stack of |
| open elements that is lower in the stack than the formatting |
| element, and is not an element in the phrasing or formatting |
| categories. There might not be one. |
| 3. If there is no furthest block, then the UA must skip the |
| subsequent steps and instead just pop all the nodes from the |
| bottom of the stack of open elements, from the current node up |
| to and including the formatting element, and remove the |
| formatting element from the list of active formatting |
| elements. |
| 4. Let the common ancestor be the element immediately above the |
| formatting element in the stack of open elements. |
| 5. If the furthest block has a parent node, then remove the |
| furthest block from its parent node. |
| 6. Let a bookmark note the position of the formatting element in |
| the list of active formatting elements relative to the |
| elements on either side of it in the list. |
| 7. Let node and last node be the furthest block. Follow these |
| steps: |
| 1. Let node be the element immediately above node in the |
| stack of open elements. |
| 2. If node is not in the list of active formatting elements, |
| then remove node from the stack of open elements and then |
| go back to step 1. |
| 3. Otherwise, if node is the formatting element, then go to |
| the next step in the overall algorithm. |
| 4. Otherwise, if last node is the furthest block, then move |
| the aforementioned bookmark to be immediately after the |
| node in the list of active formatting elements. |
| 5. If node has any children, perform a shallow clone of |
| node, replace the entry for node in the list of active |
| formatting elements with an entry for the clone, replace |
| the entry for node in the stack of open elements with an |
| entry for the clone, and let node be the clone. |
| 6. Insert last node into node, first removing it from its |
| previous parent node if any. |
| 7. Let last node be node. |
| 8. Return to step 1 of this inner set of steps. |
| 8. If the common ancestor node is a table, tbody, tfoot, thead, |
| or tr element, then, foster parent whatever last node ended up |
| being in the previous step. |
| Otherwise, append whatever last node ended up being in the |
| previous step to the common ancestor node, first removing it |
| from its previous parent node if any. |
| 9. Perform a shallow clone of the formatting element. |
| 10. Take all of the child nodes of the furthest block and append |
| them to the clone created in the last step. |
| 11. Append that clone to the furthest block. |
| 12. Remove the formatting element from the list of active |
| formatting elements, and insert the clone into the list of |
| active formatting elements at the position of the |
| aforementioned bookmark. |
| 13. Remove the formatting element from the stack of open elements, |
| and insert the clone into the stack of open elements |
| immediately below the position of the furthest block in that |
| stack. |
| 14. Jump back to step 1 in this series of steps. |
| |
| The way these steps are defined, only elements in the formatting |
| category ever get cloned by this algorithm. |
| |
| Because of the way this algorithm causes elements to change |
| parents, it has been dubbed the "adoption agency algorithm" (in |
| contrast with other possibly algorithms for dealing with |
| misnested content, which included the "incest algorithm", the |
| "secret affair algorithm", and the "Heisenberg algorithm"). |
| |
| A start tag whose tag name is "button" |
| If the stack of open elements has a button element in scope, |
| then this is a parse error; act as if an end tag with the tag |
| name "button" had been seen, then reprocess the token. |
| |
| Otherwise: |
| |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. |
| |
| Insert a marker at the end of the list of active formatting |
| elements. |
| |
| A start tag token whose tag name is one of: "applet", "marquee", |
| "object" |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. |
| |
| Insert a marker at the end of the list of active formatting |
| elements. |
| |
| An end tag token whose tag name is one of: "applet", "button", |
| "marquee", "object" |
| If the stack of open elements does not have an element in scope |
| with the same tag name as that of the token, then this is a |
| parse error; ignore the token. |
| |
| Otherwise, run these steps: |
| |
| 1. Generate implied end tags. |
| 2. If the current node is not an element with the same tag name |
| as that of the token, then this is a parse error. |
| 3. Pop elements from the stack of open elements until an element |
| with the same tag name as the token has been popped from the |
| stack. |
| 4. Clear the list of active formatting elements up to the last |
| marker. |
| |
| A start tag whose tag name is "xmp" |
| Reconstruct the active formatting elements, if any. |
| |
| Follow the generic CDATA element parsing algorithm. |
| |
| A start tag whose tag name is "table" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| Switch the insertion mode to "in table". |
| |
| A start tag whose tag name is one of: "area", "basefont", "bgsound", |
| "br", "embed", "img", "input", "spacer", "wbr" |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| A start tag whose tag name is one of: "param", "source" |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| A start tag whose tag name is "hr" |
| If the stack of open elements has a p element in scope, then act |
| as if an end tag with the tag name "p" had been seen. |
| |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| A start tag whose tag name is "image" |
| Parse error. Change the token's tag name to "img" and reprocess |
| it. (Don't ask.) |
| |
| A start tag whose tag name is "isindex" |
| Parse error. |
| |
| If the form element pointer is not null, then ignore the token. |
| |
| Otherwise: |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| Act as if a start tag token with the tag name "form" had been |
| seen. |
| |
| If the token has an attribute called "action", set the action |
| attribute on the resulting form element to the value of the |
| "action" attribute of the token. |
| |
| Act as if a start tag token with the tag name "hr" had been |
| seen. |
| |
| Act as if a start tag token with the tag name "p" had been seen. |
| |
| Act as if a start tag token with the tag name "label" had been |
| seen. |
| |
| Act as if a stream of character tokens had been seen (see below |
| for what they should say). |
| |
| Act as if a start tag token with the tag name "input" had been |
| seen, with all the attributes from the "isindex" token except |
| "name", "action", and "prompt". Set the name attribute of the |
| resulting input element to the value "isindex". |
| |
| Act as if a stream of character tokens had been seen (see below |
| for what they should say). |
| |
| Act as if an end tag token with the tag name "label" had been |
| seen. |
| |
| Act as if an end tag token with the tag name "p" had been seen. |
| |
| Act as if a start tag token with the tag name "hr" had been |
| seen. |
| |
| Act as if an end tag token with the tag name "form" had been |
| seen. |
| |
| If the token has an attribute with the name "prompt", then the |
| first stream of characters must be the same string as given in |
| that attribute, and the second stream of characters must be |
| empty. Otherwise, the two streams of character tokens together |
| should, together with the input element, express the equivalent |
| of "This is a searchable index. Insert your search keywords |
| here: (input field)" in the user's preferred language. |
| |
| A start tag whose tag name is "textarea" |
| |
| 1. Insert an HTML element for the token. |
| 2. If the next token is a U+000A LINE FEED (LF) character token, |
| then ignore that token and move on to the next one. (Newlines |
| at the start of textarea elements are ignored as an authoring |
| convenience.) |
| 3. Switch the tokeniser's content model flag to the RCDATA state. |
| 4. Let the original insertion mode be the current insertion mode. |
| 5. Switch the insertion mode to "in CDATA/RCDATA". |
| |
| A start tag whose tag name is one of: "iframe", "noembed" |
| A start tag whose tag name is "noscript", if the scripting flag is |
| enabled |
| Follow the generic CDATA element parsing algorithm. |
| |
| A start tag whose tag name is "select" |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. |
| |
| If the insertion mode is one of in table", "in caption", "in |
| column group", "in table body", "in row", or "in cell", then |
| switch the insertion mode to "in select in table". Otherwise, |
| switch the insertion mode to "in select". |
| |
| A start tag whose tag name is one of: "optgroup", "option" |
| If the stack of open elements has an option element in scope, |
| then act as if an end tag with the tag name "option" had been |
| seen. |
| |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. |
| |
| A start tag whose tag name is one of: "rp", "rt" |
| If the stack of open elements has a ruby element in scope, then |
| generate implied end tags. If the current node is not then a |
| ruby element, this is a parse error; pop all the nodes from the |
| current node up to the node immediately before the bottommost |
| ruby element on the stack of open elements. |
| |
| Insert an HTML element for the token. |
| |
| An end tag whose tag name is "br" |
| Parse error. Act as if a start tag token with the tag name "br" |
| had been seen. Ignore the end tag token. |
| |
| A start tag whose tag name is "math" |
| Reconstruct the active formatting elements, if any. |
| |
| Adjust MathML attributes for the token. (This fixes the case of |
| MathML attributes that are not all lowercase.) |
| |
| Adjust foreign attributes for the token. (This fixes the use of |
| namespaced attributes, in particular XLink.) |
| |
| Insert a foreign element for the token, in the MathML namespace. |
| |
| If the token has its self-closing flag set, pop the current node |
| off the stack of open elements and acknowledge the token's |
| self-closing flag. |
| |
| Otherwise, let the secondary insertion mode be the current |
| insertion mode, and then switch the insertion mode to "in |
| foreign content". |
| |
| A start tag whose tag name is one of: "caption", "col", "colgroup", |
| "frame", "frameset", "head", "tbody", "td", "tfoot", "th", |
| "thead", "tr" |
| Parse error. Ignore the token. |
| |
| Any other start tag |
| Reconstruct the active formatting elements, if any. |
| |
| Insert an HTML element for the token. |
| |
| This element will be a phrasing element. |
| |
| Any other end tag |
| Run the following steps: |
| |
| 1. Initialize node to be the current node (the bottommost node of |
| the stack). |
| 2. If node has the same tag name as the end tag token, then: |
| 1. Generate implied end tags. |
| 2. If the tag name of the end tag token does not match the |
| tag name of the current node, this is a parse error. |
| 3. Pop all the nodes from the current node up to node, |
| including node, then stop these steps. |
| 3. Otherwise, if node is in neither the formatting category nor |
| the phrasing category, then this is a parse error; ignore the |
| token, and abort these steps. |
| 4. Set node to the previous entry in the stack of open elements. |
| 5. Return to step 2. |
| |
| 8.2.5.11 The "in CDATA/RCDATA" insertion mode |
| |
| When the insertion mode is "in CDATA/RCDATA", tokens must be handled as |
| follows: |
| |
| A character token |
| Insert the token's character into the current node. |
| |
| An end-of-file token |
| Parse error. |
| |
| If the current node is a script element, mark the script element |
| as "already executed". |
| |
| Pop the current node off the stack of open elements. |
| |
| Switch the insertion mode to the original insertion mode and |
| reprocess the current token. |
| |
| An end tag whose tag name is "script" |
| Let script be the current node (which will be a script element). |
| |
| Pop the current node off the stack of open elements. |
| |
| Switch the insertion mode to the original insertion mode. |
| |
| Let the old insertion point have the same value as the current |
| insertion point. Let the insertion point be just before the next |
| input character. |
| |
| Increment the parser's script nesting level by one. |
| |
| Run the script. This might cause some script to execute, which |
| might cause new characters to be inserted into the tokeniser, |
| and might cause the tokeniser to output more tokens, resulting |
| in a reentrant invocation of the parser. |
| |
| Decrement the parser's script nesting level by one. If the |
| parser's script nesting level is zero, then set the parser pause |
| flag to false. |
| |
| Let the insertion point have the value of the old insertion |
| point. (In other words, restore the insertion point to the value |
| it had before the previous paragraph. This value might be the |
| "undefined" value.) |
| |
| At this stage, if there is a pending external script, then: |
| |
| If the tree construction stage is being called reentrantly, say |
| from a call to document.write(): |
| Set the parser pause flag to true, and abort the |
| processing of any nested invocations of the tokeniser, |
| yielding control back to the caller. (Tokenization will |
| resume when the caller returns to the "outer" tree |
| construction stage.) |
| |
| Otherwise: |
| Follow these steps: |
| |
| 1. Let the script be the pending external script. There is |
| no longer a pending external script. |
| 2. Pause until the script has completed loading. |
| 3. Let the insertion point be just before the next input |
| character. |
| 4. Execute the script. |
| 5. Let the insertion point be undefined again. |
| 6. If there is once again a pending external script, then |
| repeat these steps from step 1. |
| |
| Any other end tag |
| Pop the current node off the stack of open elements. |
| |
| Switch the insertion mode to the original insertion mode. |
| |
| 8.2.5.12 The "in table" insertion mode |
| |
| When the insertion mode is "in table", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| If the current table is tainted, then act as described in the |
| "anything else" entry below. |
| |
| Otherwise, insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "caption" |
| Clear the stack back to a table context. (See below.) |
| |
| Insert a marker at the end of the list of active formatting |
| elements. |
| |
| Insert an HTML element for the token, then switch the insertion |
| mode to "in caption". |
| |
| A start tag whose tag name is "colgroup" |
| Clear the stack back to a table context. (See below.) |
| |
| Insert an HTML element for the token, then switch the insertion |
| mode to "in column group". |
| |
| A start tag whose tag name is "col" |
| Act as if a start tag token with the tag name "colgroup" had |
| been seen, then reprocess the current token. |
| |
| A start tag whose tag name is one of: "tbody", "tfoot", "thead" |
| Clear the stack back to a table context. (See below.) |
| |
| Insert an HTML element for the token, then switch the insertion |
| mode to "in table body". |
| |
| A start tag whose tag name is one of: "td", "th", "tr" |
| Act as if a start tag token with the tag name "tbody" had been |
| seen, then reprocess the current token. |
| |
| A start tag whose tag name is "table" |
| Parse error. Act as if an end tag token with the tag name |
| "table" had been seen, then, if that token wasn't ignored, |
| reprocess the current token. |
| |
| The fake end tag token here can only be ignored in the fragment |
| case. |
| |
| An end tag whose tag name is "table" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. (fragment case) |
| |
| Otherwise: |
| |
| Pop elements from this stack until a table element has been |
| popped from the stack. |
| |
| Reset the insertion mode appropriately. |
| |
| An end tag whose tag name is one of: "body", "caption", "col", |
| "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is one of: "style", "script" |
| If the current table is tainted then act as described in the |
| "anything else" entry below. |
| |
| Otherwise, process the token using the rules for the "in head" |
| insertion mode. |
| |
| A start tag whose tag name is "input" |
| If the token does not have an attribute with the name "type", or |
| if it does, but that attribute's value is not an ASCII |
| case-insensitive match for the string "hidden", or, if the |
| current table is tainted, then: act as described in the |
| "anything else" entry below. |
| |
| Otherwise: |
| |
| Parse error. |
| |
| Insert an HTML element for the token. |
| |
| Pop that input element off the stack of open elements. |
| |
| An end-of-file token |
| If the current node is not the root html element, then this is a |
| parse error. |
| |
| It can only be the current node in the fragment case. |
| |
| Stop parsing. |
| |
| Anything else |
| Parse error. Process the token using the rules for the "in body" |
| insertion mode, except that if the current node is a table, |
| tbody, tfoot, thead, or tr element, then, whenever a node would |
| be inserted into the current node, it must instead be foster |
| parented. |
| |
| When the steps above require the UA to clear the stack back to a table |
| context, it means that the UA must, while the current node is not a |
| table element or an html element, pop elements from the stack of open |
| elements. |
| |
| The current node being an html element after this process is a fragment |
| case. |
| |
| 8.2.5.13 The "in caption" insertion mode |
| |
| When the insertion mode is "in caption", tokens must be handled as |
| follows: |
| |
| An end tag whose tag name is "caption" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. (fragment case) |
| |
| Otherwise: |
| |
| Generate implied end tags. |
| |
| Now, if the current node is not a caption element, then this is |
| a parse error. |
| |
| Pop elements from this stack until a caption element has been |
| popped from the stack. |
| |
| Clear the list of active formatting elements up to the last |
| marker. |
| |
| Switch the insertion mode to "in table". |
| |
| A start tag whose tag name is one of: "caption", "col", "colgroup", |
| "tbody", "td", "tfoot", "th", "thead", "tr" |
| |
| An end tag whose tag name is "table" |
| Parse error. Act as if an end tag with the tag name "caption" |
| had been seen, then, if that token wasn't ignored, reprocess the |
| current token. |
| |
| The fake end tag token here can only be ignored in the fragment |
| case. |
| |
| An end tag whose tag name is one of: "body", "col", "colgroup", "html", |
| "tbody", "td", "tfoot", "th", "thead", "tr" |
| Parse error. Ignore the token. |
| |
| Anything else |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| 8.2.5.14 The "in column group" insertion mode |
| |
| When the insertion mode is "in column group", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is "col" |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| An end tag whose tag name is "colgroup" |
| If the current node is the root html element, then this is a |
| parse error; ignore the token. (fragment case) |
| |
| Otherwise, pop the current node (which will be a colgroup |
| element) from the stack of open elements. Switch the insertion |
| mode to "in table". |
| |
| An end tag whose tag name is "col" |
| Parse error. Ignore the token. |
| |
| An end-of-file token |
| If the current node is the root html element, then stop parsing. |
| (fragment case) |
| |
| Otherwise, act as described in the "anything else" entry below. |
| |
| Anything else |
| Act as if an end tag with the tag name "colgroup" had been seen, |
| and then, if that token wasn't ignored, reprocess the current |
| token. |
| |
| The fake end tag token here can only be ignored in the fragment |
| case. |
| |
| 8.2.5.15 The "in table body" insertion mode |
| |
| When the insertion mode is "in table body", tokens must be handled as |
| follows: |
| |
| A start tag whose tag name is "tr" |
| Clear the stack back to a table body context. (See below.) |
| |
| Insert an HTML element for the token, then switch the insertion |
| mode to "in row". |
| |
| A start tag whose tag name is one of: "th", "td" |
| Parse error. Act as if a start tag with the tag name "tr" had |
| been seen, then reprocess the current token. |
| |
| An end tag whose tag name is one of: "tbody", "tfoot", "thead" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. |
| |
| Otherwise: |
| |
| Clear the stack back to a table body context. (See below.) |
| |
| Pop the current node from the stack of open elements. Switch the |
| insertion mode to "in table". |
| |
| A start tag whose tag name is one of: "caption", "col", "colgroup", |
| "tbody", "tfoot", "thead" |
| |
| An end tag whose tag name is "table" |
| If the stack of open elements does not have a tbody, thead, or |
| tfoot element in table scope, this is a parse error. Ignore the |
| token. (fragment case) |
| |
| Otherwise: |
| |
| Clear the stack back to a table body context. (See below.) |
| |
| Act as if an end tag with the same tag name as the current node |
| ("tbody", "tfoot", or "thead") had been seen, then reprocess the |
| current token. |
| |
| An end tag whose tag name is one of: "body", "caption", "col", |
| "colgroup", "html", "td", "th", "tr" |
| Parse error. Ignore the token. |
| |
| Anything else |
| Process the token using the rules for the "in table" insertion |
| mode. |
| |
| When the steps above require the UA to clear the stack back to a table |
| body context, it means that the UA must, while the current node is not |
| a tbody, tfoot, thead, or html element, pop elements from the stack of |
| open elements. |
| |
| The current node being an html element after this process is a fragment |
| case. |
| |
| 8.2.5.16 The "in row" insertion mode |
| |
| When the insertion mode is "in row", tokens must be handled as follows: |
| |
| A start tag whose tag name is one of: "th", "td" |
| Clear the stack back to a table row context. (See below.) |
| |
| Insert an HTML element for the token, then switch the insertion |
| mode to "in cell". |
| |
| Insert a marker at the end of the list of active formatting |
| elements. |
| |
| An end tag whose tag name is "tr" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. (fragment case) |
| |
| Otherwise: |
| |
| Clear the stack back to a table row context. (See below.) |
| |
| Pop the current node (which will be a tr element) from the stack |
| of open elements. Switch the insertion mode to "in table body". |
| |
| A start tag whose tag name is one of: "caption", "col", "colgroup", |
| "tbody", "tfoot", "thead", "tr" |
| |
| An end tag whose tag name is "table" |
| Act as if an end tag with the tag name "tr" had been seen, then, |
| if that token wasn't ignored, reprocess the current token. |
| |
| The fake end tag token here can only be ignored in the fragment |
| case. |
| |
| An end tag whose tag name is one of: "tbody", "tfoot", "thead" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. |
| |
| Otherwise, act as if an end tag with the tag name "tr" had been |
| seen, then reprocess the current token. |
| |
| An end tag whose tag name is one of: "body", "caption", "col", |
| "colgroup", "html", "td", "th" |
| Parse error. Ignore the token. |
| |
| Anything else |
| Process the token using the rules for the "in table" insertion |
| mode. |
| |
| When the steps above require the UA to clear the stack back to a table |
| row context, it means that the UA must, while the current node is not a |
| tr element or an html element, pop elements from the stack of open |
| elements. |
| |
| The current node being an html element after this process is a fragment |
| case. |
| |
| 8.2.5.17 The "in cell" insertion mode |
| |
| When the insertion mode is "in cell", tokens must be handled as |
| follows: |
| |
| An end tag whose tag name is one of: "td", "th" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as that of the token, then this is |
| a parse error and the token must be ignored. |
| |
| Otherwise: |
| |
| Generate implied end tags. |
| |
| Now, if the current node is not an element with the same tag |
| name as the token, then this is a parse error. |
| |
| Pop elements from this stack until an element with the same tag |
| name as the token has been popped from the stack. |
| |
| Clear the list of active formatting elements up to the last |
| marker. |
| |
| Switch the insertion mode to "in row". (The current node will be |
| a tr element at this point.) |
| |
| A start tag whose tag name is one of: "caption", "col", "colgroup", |
| "tbody", "td", "tfoot", "th", "thead", "tr" |
| If the stack of open elements does not have a td or th element |
| in table scope, then this is a parse error; ignore the token. |
| (fragment case) |
| |
| Otherwise, close the cell (see below) and reprocess the current |
| token. |
| |
| An end tag whose tag name is one of: "body", "caption", "col", |
| "colgroup", "html" |
| Parse error. Ignore the token. |
| |
| An end tag whose tag name is one of: "table", "tbody", "tfoot", |
| "thead", "tr" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as that of the token (which can |
| only happen for "tbody", "tfoot" and "thead", or, in the |
| fragment case), then this is a parse error and the token must be |
| ignored. |
| |
| Otherwise, close the cell (see below) and reprocess the current |
| token. |
| |
| Anything else |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| Where the steps above say to close the cell, they mean to run the |
| following algorithm: |
| 1. If the stack of open elements has a td element in table scope, then |
| act as if an end tag token with the tag name "td" had been seen. |
| 2. Otherwise, the stack of open elements will have a th element in |
| table scope; act as if an end tag token with the tag name "th" had |
| been seen. |
| |
| The stack of open elements cannot have both a td and a th element in |
| table scope at the same time, nor can it have neither when the |
| insertion mode is "in cell". |
| |
| 8.2.5.18 The "in select" insertion mode |
| |
| When the insertion mode is "in select", tokens must be handled as |
| follows: |
| |
| A character token |
| Insert the token's character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is "option" |
| If the current node is an option element, act as if an end tag |
| with the tag name "option" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| A start tag whose tag name is "optgroup" |
| If the current node is an option element, act as if an end tag |
| with the tag name "option" had been seen. |
| |
| If the current node is an optgroup element, act as if an end tag |
| with the tag name "optgroup" had been seen. |
| |
| Insert an HTML element for the token. |
| |
| An end tag whose tag name is "optgroup" |
| First, if the current node is an option element, and the node |
| immediately before it in the stack of open elements is an |
| optgroup element, then act as if an end tag with the tag name |
| "option" had been seen. |
| |
| If the current node is an optgroup element, then pop that node |
| from the stack of open elements. Otherwise, this is a parse |
| error; ignore the token. |
| |
| An end tag whose tag name is "option" |
| If the current node is an option element, then pop that node |
| from the stack of open elements. Otherwise, this is a parse |
| error; ignore the token. |
| |
| An end tag whose tag name is "select" |
| If the stack of open elements does not have an element in table |
| scope with the same tag name as the token, this is a parse |
| error. Ignore the token. (fragment case) |
| |
| Otherwise: |
| |
| Pop elements from the stack of open elements until a select |
| element has been popped from the stack. |
| |
| Reset the insertion mode appropriately. |
| |
| A start tag whose tag name is "select" |
| Parse error. Act as if the token had been an end tag with the |
| tag name "select" instead. |
| |
| A start tag whose tag name is one of: "input", "textarea" |
| Parse error. Act as if an end tag with the tag name "select" had |
| been seen, and reprocess the token. |
| |
| A start tag token whose tag name is "script" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| An end-of-file token |
| If the current node is not the root html element, then this is a |
| parse error. |
| |
| It can only be the current node in the fragment case. |
| |
| Stop parsing. |
| |
| Anything else |
| Parse error. Ignore the token. |
| |
| 8.2.5.19 The "in select in table" insertion mode |
| |
| When the insertion mode is "in select in table", tokens must be handled |
| as follows: |
| |
| A start tag whose tag name is one of: "caption", "table", "tbody", |
| "tfoot", "thead", "tr", "td", "th" |
| Parse error. Act as if an end tag with the tag name "select" had |
| been seen, and reprocess the token. |
| |
| An end tag whose tag name is one of: "caption", "table", "tbody", |
| "tfoot", "thead", "tr", "td", "th" |
| Parse error. |
| |
| If the stack of open elements has an element in table scope with |
| the same tag name as that of the token, then act as if an end |
| tag with the tag name "select" had been seen, and reprocess the |
| token. Otherwise, ignore the token. |
| |
| Anything else |
| Process the token using the rules for the "in select" insertion |
| mode. |
| |
| 8.2.5.20 The "in foreign content" insertion mode |
| |
| When the insertion mode is "in foreign content", tokens must be handled |
| as follows: |
| |
| A character token |
| Insert the token's character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is neither "mglyph" nor "malignmark", if the |
| current node is an mi element in the MathML namespace. |
| |
| A start tag whose tag name is neither "mglyph" nor "malignmark", if the |
| current node is an mo element in the MathML namespace. |
| |
| A start tag whose tag name is neither "mglyph" nor "malignmark", if the |
| current node is an mn element in the MathML namespace. |
| |
| A start tag whose tag name is neither "mglyph" nor "malignmark", if the |
| current node is an ms element in the MathML namespace. |
| |
| A start tag whose tag name is neither "mglyph" nor "malignmark", if the |
| current node is an mtext element in the MathML namespace. |
| |
| A start tag, if the current node is an element in the HTML namespace. |
| An end tag |
| Process the token using the rules for the secondary insertion |
| mode. |
| |
| If, after doing so, the insertion mode is still "in foreign |
| content", but there is no element in scope that has a namespace |
| other than the HTML namespace, switch the insertion mode to the |
| secondary insertion mode. |
| |
| A start tag whose tag name is one of: "b", "big", "blockquote", "body", |
| "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed", |
| "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img", |
| "li", "listing", "menu", "meta", "nobr", "ol", "p", "pre", |
| "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", |
| "table", "tt", "u", "ul", "var" |
| |
| A start tag whose tag name is "font", if the token has any attributes |
| named "color", "face", or "size" |
| |
| An end-of-file token |
| Parse error. |
| |
| Pop elements from the stack of open elements until the current |
| node is in the HTML namespace. |
| |
| Switch the insertion mode to the secondary insertion mode, and |
| reprocess the token. |
| |
| Any other start tag |
| If the current node is an element in the MathML namespace, |
| adjust MathML attributes for the token. (This fixes the case of |
| MathML attributes that are not all lowercase.) |
| |
| Adjust foreign attributes for the token. (This fixes the use of |
| namespaced attributes, in particular XLink in SVG.) |
| |
| Insert a foreign element for the token, in the same namespace as |
| the current node. |
| |
| If the token has its self-closing flag set, pop the current node |
| off the stack of open elements and acknowledge the token's |
| self-closing flag. |
| |
| 8.2.5.21 The "after body" insertion mode |
| |
| When the insertion mode is "after body", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A comment token |
| Append a Comment node to the first element in the stack of open |
| elements (the html element), with the data attribute set to the |
| data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| An end tag whose tag name is "html" |
| If the parser was originally created as part of the HTML |
| fragment parsing algorithm, this is a parse error; ignore the |
| token. (fragment case) |
| |
| Otherwise, switch the insertion mode to "after after body". |
| |
| An end-of-file token |
| Stop parsing. |
| |
| Anything else |
| Parse error. Switch the insertion mode to "in body" and |
| reprocess the token. |
| |
| 8.2.5.22 The "in frameset" insertion mode |
| |
| When the insertion mode is "in frameset", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| A start tag whose tag name is "frameset" |
| Insert an HTML element for the token. |
| |
| An end tag whose tag name is "frameset" |
| If the current node is the root html element, then this is a |
| parse error; ignore the token. (fragment case) |
| |
| Otherwise, pop the current node from the stack of open elements. |
| |
| If the parser was not originally created as part of the HTML |
| fragment parsing algorithm (fragment case), and the current node |
| is no longer a frameset element, then switch the insertion mode |
| to "after frameset". |
| |
| A start tag whose tag name is "frame" |
| Insert an HTML element for the token. Immediately pop the |
| current node off the stack of open elements. |
| |
| Acknowledge the token's self-closing flag, if it is set. |
| |
| A start tag whose tag name is "noframes" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| An end-of-file token |
| If the current node is not the root html element, then this is a |
| parse error. |
| |
| It can only be the current node in the fragment case. |
| |
| Stop parsing. |
| |
| Anything else |
| Parse error. Ignore the token. |
| |
| 8.2.5.23 The "after frameset" insertion mode |
| |
| When the insertion mode is "after frameset", tokens must be handled as |
| follows: |
| |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| Insert the character into the current node. |
| |
| A comment token |
| Append a Comment node to the current node with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| Parse error. Ignore the token. |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| An end tag whose tag name is "html" |
| Switch the insertion mode to "after after frameset". |
| |
| A start tag whose tag name is "noframes" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| An end-of-file token |
| Stop parsing. |
| |
| Anything else |
| Parse error. Ignore the token. |
| |
| This doesn't handle UAs that don't support frames, or that do support |
| frames but want to show the NOFRAMES content. Supporting the former is |
| easy; supporting the latter is harder. |
| |
| 8.2.5.24 The "after after body" insertion mode |
| |
| When the insertion mode is "after after body", tokens must be handled |
| as follows: |
| |
| A comment token |
| Append a Comment node to the Document object with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| An end-of-file token |
| Stop parsing. |
| |
| Anything else |
| Parse error. Switch the insertion mode to "in body" and |
| reprocess the token. |
| |
| 8.2.5.25 The "after after frameset" insertion mode |
| |
| When the insertion mode is "after after frameset", tokens must be |
| handled as follows: |
| |
| A comment token |
| Append a Comment node to the Document object with the data |
| attribute set to the data given in the comment token. |
| |
| A DOCTYPE token |
| A character token that is one of one of U+0009 CHARACTER TABULATION, |
| U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE |
| |
| A start tag whose tag name is "html" |
| Process the token using the rules for the "in body" insertion |
| mode. |
| |
| An end-of-file token |
| Stop parsing. |
| |
| A start tag whose tag name is "noframes" |
| Process the token using the rules for the "in head" insertion |
| mode. |
| |
| Anything else |
| Parse error. Ignore the token. |
| |
| 8.2.6 The end |
| |
| Once the user agent stops parsing the document, the user agent must |
| follow the steps in this section. |
| |
| First, the current document readiness must be set to "interactive". |
| |
| Then, the rules for when a script completes loading start applying |
| (script execution is no longer managed by the parser). |
| |
| If any of the scripts in the list of scripts that will execute as soon |
| as possible have completed loading, or if the list of scripts that will |
| execute asynchronously is not empty and the first script in that list |
| has completed loading, then the user agent must act as if those scripts |
| just completed loading, following the rules given for that in the |
| script element definition. |
| |
| Then, if the list of scripts that will execute when the document has |
| finished parsing is not empty, and the first item in this list has |
| already completed loading, then the user agent must act as if that |
| script just finished loading. |
| |
| By this point, there will be no scripts that have loaded but have not |
| yet been executed. |
| |
| The user agent must then fire a simple event called DOMContentLoaded at |
| the Document. |
| |
| Once everything that delays the load event has completed, the user |
| agent must set the current document readiness to "complete", and then |
| fire a load event at the body element. |
| |
| delaying the load event for things like image loads allows for intranet |
| port scans (even without javascript!). Should we really encode that |
| into the spec? |
| |
| 8.2.7 Coercing an HTML DOM into an infoset |
| |
| When an application uses an HTML parser in conjunction with an XML |
| pipeline, it is possible that the constructed DOM is not compatible |
| with the XML tool chain in certain subtle ways. For example, an XML |
| toolchain might not be able to represent attributes with the name |
| xmlns, since they conflict with the Namespaces in XML syntax. There is |
| also some data that the HTML parser generates that isn't included in |
| the DOM itself. This section specifies some rules for handling these |
| issues. |
| |
| If the XML API being used doesn't support DOCTYPEs, the tool may drop |
| DOCTYPEs altogether. |
| |
| If the XML API doesn't support attributes in no namespace that are |
| named "xmlns", attributes whose names start with "xmlns:", or |
| attributes in the XMLNS namespace, then the tool may drop such |
| attributes. |
| |
| The tool may annotate the output with any namespace declarations |
| required for proper operation. |
| |
| If the XML API being used restricts the allowable characters in the |
| local names of elements and attributes, then the tool may map all |
| element and attribute local names that the API wouldn't support to a |
| set of names that are allowed, by replacing any character that isn't |
| supported with the uppercase letter U and the five digits of the |
| character's Unicode codepoint when expressed in hexadecimal, using |
| digits 0-9 and capital letters A-F as the symbols, in increasing |
| numeric order. |
| |
| For example, the element name foo<bar, which can be output by the HTML |
| parser, though it is neither a legal HTML element name nor a |
| well-formed XML element name, would be converted into fooU0003Cbar, |
| which is a well-formed XML element name (though it's still not legal in |
| HTML by any means). |
| |
| As another example, consider the attribute xlink:href. Used on a MathML |
| element, it becomes, after being adjusted, an attribute with a prefix |
| "xlink" and a local name "href". However, used on an HTML element, it |
| becomes an attribute with no prefix and the local name "xlink:href", |
| which is not a valid NCName, and thus might not be accepted by an XML |
| API. It could thus get converted, becoming "xlinkU0003Ahref". |
| |
| The resulting names from this conversion conveniently can't clash with |
| any attribute generated by the HTML parser, since those are all either |
| lowercase or those listed in the adjust foreign attributes algorithm's |
| table. |
| |
| If the XML API restricts comments from having two consecutive U+002D |
| HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE |
| character between any such offending characters. |
| |
| If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS |
| character (-), the tool may insert a single U+0020 SPACE character at |
| the end of such comments. |
| |
| If the XML API restricts allowed characters in character data, the tool |
| may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE |
| character, and any other literal non-XML character with a U+FFFD |
| REPLACEMENT CHARACTER. |
| |
| If the tool has no way to convey out-of-band information, then the tool |
| may drop the following information: |
| * Whether the document is set to no quirks mode, limited quirks mode, |
| or quirks mode |
| * The association between form controls and forms that aren't their |
| nearest form element ancestor (use of the form element pointer in |
| the parser) |
| |
| The mutations allowed by this section apply after the HTML parser's |
| rules have been applied. For example, a <a::> start tag will be closed |
| by a </a::> end tag, and never by a </aU0003AU0003A> end tag, even if |
| the user agent is using the rules above to then generate an actual |
| element in the DOM with the name aU0003AU0003A for that start tag. |
| |
| 8.3 Namespaces |
| |
| The HTML namespace is: http://www.w3.org/1999/xhtml |
| |
| The MathML namespace is: http://www.w3.org/1998/Math/MathML |
| |
| The SVG namespace is: http://www.w3.org/2000/svg |
| |
| The XLink namespace is: http://www.w3.org/1999/xlink |
| |
| The XML namespace is: http://www.w3.org/XML/1998/namespace |
| |
| The XMLNS namespace is: http://www.w3.org/2000/xmlns/ |