| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html;charset=US-ASCII"> |
| <title>Text Formatting</title> |
| |
| <style type="text/css"> |
| |
| body { color: #000000; background-color: #FFFFFF; } |
| del { text-decoration: line-through; color: #8B0040; } |
| ins { text-decoration: underline; color: #005100; } |
| |
| p.example { margin-left: 2em; } |
| pre.example { margin-left: 2em; } |
| div.example { margin-left: 2em; } |
| |
| code.extract { background-color: #F5F6A2; } |
| pre.extract { margin-left: 2em; background-color: #F5F6A2; |
| border: 1px solid #E1E28E; } |
| |
| p.function { } |
| .attribute { margin-left: 2em; } |
| .attribute dt { float: left; font-style: italic; |
| padding-right: 1ex; } |
| .attribute dd { margin-left: 0em; } |
| |
| blockquote.std { color: #000000; background-color: #F1F1F1; |
| border: 1px solid #D1D1D1; |
| padding-left: 0.5em; padding-right: 0.5em; } |
| blockquote.stddel { text-decoration: line-through; |
| color: #000000; background-color: #FFEBFF; |
| border: 1px solid #ECD7EC; |
| padding-left: 0.5empadding-right: 0.5em; ; } |
| |
| blockquote.stdins { text-decoration: underline; |
| color: #000000; background-color: #C8FFC8; |
| border: 1px solid #B3EBB3; padding: 0.5em; } |
| |
| table { border: 1px solid black; border-spacing: 0px; |
| margin-left: auto; margin-right: auto; } |
| th { text-align: left; vertical-align: top; |
| padding-left: 0.8em; border: none; } |
| td { text-align: left; vertical-align: top; |
| padding-left: 0.8em; border: none; } |
| |
| </style> |
| |
| </head> |
| <body> |
| <h1>Text Formatting</h1> |
| |
| <p> |
| 2016-08-19 |
| </p> |
| |
| <address> |
| Victor Zverovich, victor.zverovich@gmail.com |
| </address> |
| |
| <p> |
| <a href="#Introduction">Introduction</a><br> |
| <a href="#Design">Design</a><br> |
| <a href="#Syntax">Format String Syntax</a><br> |
| <a href="#Extensibility">Extensibility</a><br> |
| <a href="#Locale">Locale Support</a><br> |
| <a href="#PosArguments">Positional Arguments</a><br> |
| <a href="#Wording">Wording</a><br> |
| <a href="#References">References</a><br> |
| </p> |
| |
| <h2><a name="Introduction">Introduction</a></h2> |
| |
| <p> |
| This paper proposes a new text formatting functionality that can be used as a |
| safe and extensible alternative to the <code>printf</code> family of functions. |
| It is intended to complement the existing C++ I/O streams library and reuse |
| some of its infrastructure such as overloaded insertion operators for |
| user-defined types. |
| </p> |
| |
| <p> |
| Example: |
| |
| <pre class="example"> |
| <code>std::string message = std::format("The answer is {}.", 42);</code> |
| </pre> |
| |
| <h2><a name="Design">Design</a></h2> |
| |
| <h3><a name="Syntax">Format String Syntax</a></h3> |
| |
| <p> |
| Variations of the printf format string syntax are arguably the most popular |
| among the programming languages and C++ itself inherits <code>printf</code> |
| from C <a href="#1">[1]</a>. The advantage of the printf syntax is that many |
| programmers are familiar with it. However, in its current form it has a number |
| of issues: |
| </p> |
| |
| <ul> |
| <li>Many format specifiers like <code>hh</code>, <code>h</code>, <code>l</code>, |
| <code>j</code>, etc. are used only to convey type information. |
| They are redundant in type-safe formatting and would unnecessarily |
| complicate specification and parsing.</li> |
| <li>There is no standard way to extend the syntax for user-defined types.</li> |
| <li>There are subtle differences between different implementations. For example, |
| POSIX positional arguments <a href="#2">[2]</a> are not supported on |
| some systems <a href="#6">[6]</a>.</li> |
| <li>Using <code>'%'</code> in a custom format specifier, e.g. for |
| <code>put_time</code>-like time formatting, poses difficulties.</li> |
| </ul> |
| |
| <p> |
| Although it is possible to address these issues, this will break compatibility |
| and can potentially be more confusing to users than introducing a different |
| syntax. |
| </p> |
| |
| </p> |
| Therefore we propose a new syntax based on the ones used in Python |
| <a href="#3">[3]</a>, the .NET family of languages <a href="#4">[4]</a>, |
| and Rust <a href="#5">[5]</a>. This syntax employs <code>'{'</code> and |
| <code>'}'</code> as replacement field delimiters instead of <code>'%'</code> |
| and it is described in details in TODO:link. Here are some of the advantages: |
| </p> |
| |
| <ul> |
| <li>Consistent and easy to parse mini-language focused on formatting rather |
| than conveying type information</li> |
| <li>Extensibility and support for custom format strings for user-defined |
| types</li> |
| <li>Positional arguments</li> |
| <li>Support for both locale-specific and locale-independent formatting (see |
| <a href="#Locale">Locale Support</a>)</li> |
| <li>Minor formatting improvements such as center alignment and binary format |
| </ul> |
| |
| <p> |
| The syntax is expressive enough to enable translation, possibly automated, |
| of most printf format strings. The correspondence between <code>printf</code> |
| and the new syntax is given in the following table. |
| </p> |
| |
| <table> |
| <thead> |
| <tr><th>printf</th><th>new</th><th>comment</th></tr> |
| </thead> |
| <tbody> |
| <tr><td>-</td><td><</td><td>left alignment</td></tr> |
| <tr><td>+</td><td>+</td><td></td></tr> |
| <tr><td><em>space</em></td><td><em>space</em></td><td></td></tr> |
| <tr><td>#</td><td>#</td><td></td></tr> |
| <tr><td>0</td><td>0</td><td></td></tr> |
| <tr><td>hh</td><td>unused</td><td></td></tr> |
| <tr><td>h</td><td>unused</td><td></td></tr> |
| <tr><td>l</td><td>unused</td><td></td></tr> |
| <tr><td>ll</td><td>unused</td><td></td></tr> |
| <tr><td>j</td><td>unused</td><td></td></tr> |
| <tr><td>z</td><td>unused</td><td></td></tr> |
| <tr><td>t</td><td>unused</td><td></td></tr> |
| <tr><td>L</td><td>unused</td><td></td></tr> |
| <tr><td>c</td><td>c (optional)</td><td></td></tr> |
| <tr><td>s</td><td>s (optional)</td><td></td></tr> |
| <tr><td>d</td><td>d (optional)</td><td></td></tr> |
| <tr><td>i</td><td>d (optional)</td><td></td></tr> |
| <tr><td>o</td><td>o</td><td></td></tr> |
| <tr><td>x</td><td>x</td><td></td></tr> |
| <tr><td>X</td><td>X</td><td></td></tr> |
| <tr><td>u</td><td>d (optional)</td><td></td></tr> |
| <tr><td>f</td><td>f</td><td></td></tr> |
| <tr><td>F</td><td>F</td><td></td></tr> |
| <tr><td>e</td><td>e</td><td></td></tr> |
| <tr><td>E</td><td>E</td><td></td></tr> |
| <tr><td>a</td><td>a</td><td></td></tr> |
| <tr><td>A</td><td>A</td><td></td></tr> |
| <tr><td>g</td><td>g (optional)</td><td></td></tr> |
| <tr><td>G</td><td>G</td><td></td></tr> |
| <tr><td>n</td><td>unused</td><td></td></tr> |
| <tr><td>p</td><td>p (optional)</td><td></td></tr> |
| </tbody> |
| </table> |
| |
| <p> |
| Width and precision are represented similarly in <code>printf</code> and the |
| proposed syntax with the only difference that runtime value is specified by |
| <code>*</code> in the former and <code>{}</code> in the latter, possibly with |
| the index of the argument inside the braces. |
| </p> |
| |
| <p> |
| As can be seen from the table above, most of the specifiers remain the same |
| which simplifies migration from <code>printf</code>. Notable difference is in |
| the alignment specification. The proposed syntax allows left, center, and right |
| alignment represented by <code>'<'</code>, <code>'^'</code>, and |
| <code>'>'</code> respectively which is more expressive than the corresponding |
| <code>printf</code> syntax. The latter only supports left and right (the default) |
| alignment. |
| </p> |
| |
| <p> |
| The following example uses center alignment and <code>'*'</code> as a fill |
| character: |
| </p> |
| |
| <pre class="example"> |
| <code>std::format("{:*^30}", "centered");</code> |
| </pre> |
| |
| <p> |
| resulting in <code>"***********centered***********"</code>. |
| The same formatting cannot be easily achieved with <code>printf</code>. |
| </p> |
| |
| <h3><a name="Extensibility">Extensibility</a></h3> |
| |
| <p> |
| Both the format string syntax and the API are designed with extensibility in mind. |
| The mini-language can be extended for user-defined types and users can provide |
| functions that do parsing and formatting for such types. |
| </p> |
| |
| <p>The general syntax of a replacement field in a format string is |
| |
| <dl> |
| <dt><em>replacement-field</em>:</dt> |
| <dd> |
| <code>{</code> <em>integer<sub>opt</sub></em> <code>}</code><br/> |
| <code>{</code> <em>integer<sub>opt</sub></em> |
| <code>:</code> <em>format-spec</em> <code>}</code> |
| </dd> |
| </dl> |
| |
| <p> |
| where <em>format-spec</em> is predefined for built-in types, but can be |
| customized for user-defined types. For example, the syntax can be extended |
| for <code>put_time</code>-like date and time formatting: |
| </p> |
| |
| <pre class="example"> |
| <code>std::time_t t = std::time(nullptr); |
| std::string date = std::format("The date is {0:%Y-%m-%d}.", *std::localtime(&t));</code> |
| </pre> |
| |
| <p>TODO: API</p> |
| |
| <h3><a name="Locale">Locale Support</a></h3> |
| |
| <p>TODO</p> |
| |
| <h3><a name="PosArguments">Positional Arguments</a></h3> |
| |
| <p>TODO</p> |
| |
| <h2><a name="Wording">Wording</a></h2> |
| |
| <p>TODO</p> |
| |
| <h2><a name="Implementation">Implementation</a></h2> |
| |
| <p> |
| The ideas proposed in this paper have been implemented in the open-source fmt |
| library. TODO: link |
| </p> |
| |
| <h2><a name="References">References</a></h2> |
| |
| <p> |
| <a name="1">[1]</a> |
| <cite>The <code>fprintf</code> function. ISO/IEC 9899:2011. 7.21.6.1.</cite><br/> |
| <a name="2">[2]</a> |
| <cite><a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/fprintf.html"> |
| fprintf, printf, snprintf, sprintf - print formatted output</a>. The Open |
| Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition.</cite><br/> |
| <a name="3">[3]</a> |
| <cite><a href="https://docs.python.org/3/library/string.html#format-string-syntax"> |
| 6.1.3. Format String Syntax</a>. Python 3.5.2 documentation.</cite><br/> |
| <a name="4">[4]</a> |
| <cite><a href="https://msdn.microsoft.com/en-us/library/system.string.format(v=vs.110).aspx"> |
| String.Format Method</a>. .NET Framework Class Library.</cite><br/> |
| <a name="5">[5]</a> |
| <cite><a href="https://doc.rust-lang.org/std/fmt/"> |
| Module <code>std::fmt</code></a>. The Rust Standard Library.</cite><br/> |
| <a name="6">[6]</a> |
| <cite><a href="https://msdn.microsoft.com/en-us/library/56e442dc(v=vs.120).aspx"> |
| Format Specification Syntax: printf and wprintf Functions</a>. C++ Language and |
| Standard Libraries.</cite><br/> |
| </p> |
| |
| </body> |