gcc-4.3.1/libstdc++-v3/doc/html/manual/bk01pt05ch13s04.html - toolchain/gcc - Git at Google

 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Tokenizing</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /><meta name="keywords" content="&#10;      ISO C++&#10;    , &#10;      library&#10;    " /><link rel="start" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="bk01pt05ch13.html" title="Chapter 13. String Classes" /><link rel="prev" href="bk01pt05ch13s03.html" title="Arbitrary Character Types" /><link rel="next" href="bk01pt05ch13s05.html" title="Shrink to Fit" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Tokenizing</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt05ch13s03.html">Prev</a> </td><th width="60%" align="center">Chapter 13. String Classes</th><td width="20%" align="right"> <a accesskey="n" href="bk01pt05ch13s05.html">Next</a></td></tr></table><hr /></div><div class="sect1" lang="en" xml:lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="strings.string.token"></a>Tokenizing</h2></div></div></div><p>
     </p><p>The Standard C (and C++) function <code class="code">strtok()</code> leaves a lot to
       be desired in terms of user-friendliness.  It's unintuitive, it
       destroys the character string on which it operates, and it requires
       you to handle all the memory problems.  But it does let the client
       code decide what to use to break the string into pieces; it allows
       you to choose the "whitespace," so to speak.
    </p><p>A C++ implementation lets us keep the good things and fix those
       annoyances.  The implementation here is more intuitive (you only
       call it once, not in a loop with varying argument), it does not
       affect the original string at all, and all the memory allocation
       is handled for you.
    </p><p>It's called stringtok, and it's a template function. Sources are
    as below, in a less-portable form than it could be, to keep this
    example simple (for example, see the comments on what kind of
    string it will accept).
    </p><pre class="programlisting">
 #include &lt;string&gt;
 template &lt;typename Container&gt;
 void
 stringtok(Container &amp;container, string const &amp;in,
           const char * const delimiters = " \t\n")
 {
     const string::size_type len = in.length();
           string::size_type i = 0;

     while (i &lt; len)
     {
         // Eat leading whitespace
         i = in.find_first_not_of(delimiters, i);
         if (i == string::npos)
 	  return;   // Nothing left but white space

         // Find the end of the token
         string::size_type j = in.find_first_of(delimiters, i);

         // Push token
         if (j == string::npos)
 	{
 	  container.push_back(in.substr(i));
 	  return;
         }
 	else
 	  container.push_back(in.substr(i, j-i));

         // Set up for next loop
         i = j + 1;
     }
 }
 </pre><p>
      The author uses a more general (but less readable) form of it for
      parsing command strings and the like.  If you compiled and ran this
      code using it:
    </p><pre class="programlisting">
    std::list&lt;string&gt;  ls;
    stringtok (ls, " this  \t is\t\n  a test  ");
    for (std::list&lt;string&gt;const_iterator i = ls.begin();
         i != ls.end(); ++i)
    {
        std::cerr &lt;&lt; ':' &lt;&lt; (*i) &lt;&lt; ":\n";
    } </pre><p>You would see this as output:
    </p><pre class="programlisting">
    :this:
    :is:
    :a:
    :test: </pre><p>with all the whitespace removed.  The original <code class="code">s</code> is still
       available for use, <code class="code">ls</code> will clean up after itself, and
       <code class="code">ls.size()</code> will return how many tokens there were.
    </p><p>As always, there is a price paid here, in that stringtok is not
       as fast as strtok.  The other benefits usually outweigh that, however.
       <a class="ulink" href="stringtok_std_h.txt" target="_top">Another version of stringtok is given
       here</a>, suggested by Chris King and tweaked by Petr Prikryl,
       and this one uses the
       transformation functions mentioned below.  If you are comfortable
       with reading the new function names, this version is recommended
       as an example.
    </p><p><span class="emphasis"><em>Added February 2001:</em></span>  Mark Wilden pointed out that the
       standard <code class="code">std::getline()</code> function can be used with standard
       <a class="ulink" href="../27_io/howto.html" target="_top">istringstreams</a> to perform
       tokenizing as well.  Build an istringstream from the input text,
       and then use std::getline with varying delimiters (the three-argument
       signature) to extract tokens into a string.
    </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="bk01pt05ch13s03.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="bk01pt05ch13.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="bk01pt05ch13s05.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Arbitrary Character Types </td><td width="20%" align="center"><a accesskey="h" href="../spine.html">Home</a></td><td width="40%" align="right" valign="top"> Shrink to Fit</td></tr></table></div></body></html>
	<?xml version="1.0" encoding="UTF-8" standalone="no"?>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Tokenizing</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /><meta name="keywords" content=" ISO C++ , library " /><link rel="start" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="bk01pt05ch13.html" title="Chapter 13. String Classes" /><link rel="prev" href="bk01pt05ch13s03.html" title="Arbitrary Character Types" /><link rel="next" href="bk01pt05ch13s05.html" title="Shrink to Fit" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Tokenizing</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt05ch13s03.html">Prev</a> </td><th width="60%" align="center">Chapter 13. String Classes</th><td width="20%" align="right"> <a accesskey="n" href="bk01pt05ch13s05.html">Next</a></td></tr></table><hr /></div><div class="sect1" lang="en" xml:lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="strings.string.token"></a>Tokenizing</h2></div></div></div><p>
	</p><p>The Standard C (and C++) function <code class="code">strtok()</code> leaves a lot to
	be desired in terms of user-friendliness. It's unintuitive, it
	destroys the character string on which it operates, and it requires
	you to handle all the memory problems. But it does let the client
	code decide what to use to break the string into pieces; it allows
	you to choose the "whitespace," so to speak.
	</p><p>A C++ implementation lets us keep the good things and fix those
	annoyances. The implementation here is more intuitive (you only
	call it once, not in a loop with varying argument), it does not
	affect the original string at all, and all the memory allocation
	is handled for you.
	</p><p>It's called stringtok, and it's a template function. Sources are
	as below, in a less-portable form than it could be, to keep this
	example simple (for example, see the comments on what kind of
	string it will accept).
	</p><pre class="programlisting">
	#include <string>
	template <typename Container>
	void
	stringtok(Container &container, string const &in,
	const char * const delimiters = " \t\n")
	{
	const string::size_type len = in.length();
	string::size_type i = 0;

	while (i < len)
	{
	// Eat leading whitespace
	i = in.find_first_not_of(delimiters, i);
	if (i == string::npos)
	return; // Nothing left but white space

	// Find the end of the token
	string::size_type j = in.find_first_of(delimiters, i);

	// Push token
	if (j == string::npos)
	{
	container.push_back(in.substr(i));
	return;
	}
	else
	container.push_back(in.substr(i, j-i));

	// Set up for next loop
	i = j + 1;
	}
	}
	</pre><p>
	The author uses a more general (but less readable) form of it for
	parsing command strings and the like. If you compiled and ran this
	code using it:
	</p><pre class="programlisting">
	std::list<string> ls;
	stringtok (ls, " this \t is\t\n a test ");
	for (std::list<string>const_iterator i = ls.begin();
	i != ls.end(); ++i)
	{
	std::cerr << ':' << (*i) << ":\n";
	} </pre><p>You would see this as output:
	</p><pre class="programlisting">
	:this:
	:is:
	:a:
	:test: </pre><p>with all the whitespace removed. The original <code class="code">s</code> is still
	available for use, <code class="code">ls</code> will clean up after itself, and
	<code class="code">ls.size()</code> will return how many tokens there were.
	</p><p>As always, there is a price paid here, in that stringtok is not
	as fast as strtok. The other benefits usually outweigh that, however.
	<a class="ulink" href="stringtok_std_h.txt" target="_top">Another version of stringtok is given
	here</a>, suggested by Chris King and tweaked by Petr Prikryl,
	and this one uses the
	transformation functions mentioned below. If you are comfortable
	with reading the new function names, this version is recommended
	as an example.
	</p><p><span class="emphasis"><em>Added February 2001:</em></span> Mark Wilden pointed out that the
	standard <code class="code">std::getline()</code> function can be used with standard
	<a class="ulink" href="../27_io/howto.html" target="_top">istringstreams</a> to perform
	tokenizing as well. Build an istringstream from the input text,
	and then use std::getline with varying delimiters (the three-argument
	signature) to extract tokens into a string.
	</p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="bk01pt05ch13s03.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="bk01pt05ch13.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="bk01pt05ch13s05.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Arbitrary Character Types </td><td width="20%" align="center"><a accesskey="h" href="../spine.html">Home</a></td><td width="40%" align="right" valign="top"> Shrink to Fit</td></tr></table></div></body></html>