| :mod:`shlex` --- Simple lexical analysis | 
 | ======================================== | 
 |  | 
 | .. module:: shlex | 
 |    :synopsis: Simple lexical analysis for Unix shell-like languages. | 
 |  | 
 | .. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com> | 
 | .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> | 
 | .. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> | 
 | .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> | 
 |  | 
 | **Source code:** :source:`Lib/shlex.py` | 
 |  | 
 | -------------- | 
 |  | 
 | The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for | 
 | simple syntaxes resembling that of the Unix shell.  This will often be useful | 
 | for writing minilanguages, (for example, in run control files for Python | 
 | applications) or for parsing quoted strings. | 
 |  | 
 | The :mod:`shlex` module defines the following functions: | 
 |  | 
 |  | 
 | .. function:: split(s, comments=False, posix=True) | 
 |  | 
 |    Split the string *s* using shell-like syntax. If *comments* is :const:`False` | 
 |    (the default), the parsing of comments in the given string will be disabled | 
 |    (setting the :attr:`~shlex.commenters` attribute of the | 
 |    :class:`~shlex.shlex` instance to the empty string).  This function operates | 
 |    in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is | 
 |    false. | 
 |  | 
 |    .. versionchanged:: 3.12 | 
 |       Passing ``None`` for *s* argument now raises an exception, rather than | 
 |       reading :data:`sys.stdin`. | 
 |  | 
 | .. function:: join(split_command) | 
 |  | 
 |    Concatenate the tokens of the list *split_command* and return a string. | 
 |    This function is the inverse of :func:`split`. | 
 |  | 
 |       >>> from shlex import join | 
 |       >>> print(join(['echo', '-n', 'Multiple words'])) | 
 |       echo -n 'Multiple words' | 
 |  | 
 |    The returned value is shell-escaped to protect against injection | 
 |    vulnerabilities (see :func:`quote`). | 
 |  | 
 |    .. versionadded:: 3.8 | 
 |  | 
 |  | 
 | .. function:: quote(s) | 
 |  | 
 |    Return a shell-escaped version of the string *s*.  The returned value is a | 
 |    string that can safely be used as one token in a shell command line, for | 
 |    cases where you cannot use a list. | 
 |  | 
 |    .. _shlex-quote-warning: | 
 |  | 
 |    .. warning:: | 
 |  | 
 |       The ``shlex`` module is **only designed for Unix shells**. | 
 |  | 
 |       The :func:`quote` function is not guaranteed to be correct on non-POSIX | 
 |       compliant shells or shells from other operating systems such as Windows. | 
 |       Executing commands quoted by this module on such shells can open up the | 
 |       possibility of a command injection vulnerability. | 
 |  | 
 |       Consider using functions that pass command arguments with lists such as | 
 |       :func:`subprocess.run` with ``shell=False``. | 
 |  | 
 |    This idiom would be unsafe: | 
 |  | 
 |       >>> filename = 'somefile; rm -rf ~' | 
 |       >>> command = 'ls -l {}'.format(filename) | 
 |       >>> print(command)  # executed by a shell: boom! | 
 |       ls -l somefile; rm -rf ~ | 
 |  | 
 |    :func:`quote` lets you plug the security hole: | 
 |  | 
 |       >>> from shlex import quote | 
 |       >>> command = 'ls -l {}'.format(quote(filename)) | 
 |       >>> print(command) | 
 |       ls -l 'somefile; rm -rf ~' | 
 |       >>> remote_command = 'ssh home {}'.format(quote(command)) | 
 |       >>> print(remote_command) | 
 |       ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"'' | 
 |  | 
 |    The quoting is compatible with UNIX shells and with :func:`split`: | 
 |  | 
 |       >>> from shlex import split | 
 |       >>> remote_command = split(remote_command) | 
 |       >>> remote_command | 
 |       ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"] | 
 |       >>> command = split(remote_command[-1]) | 
 |       >>> command | 
 |       ['ls', '-l', 'somefile; rm -rf ~'] | 
 |  | 
 |    .. versionadded:: 3.3 | 
 |  | 
 | The :mod:`shlex` module defines the following class: | 
 |  | 
 |  | 
 | .. class:: shlex(instream=None, infile=None, posix=False, punctuation_chars=False) | 
 |  | 
 |    A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer | 
 |    object.  The initialization argument, if present, specifies where to read | 
 |    characters from.  It must be a file-/stream-like object with | 
 |    :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or | 
 |    a string.  If no argument is given, input will be taken from ``sys.stdin``. | 
 |    The second optional argument is a filename string, which sets the initial | 
 |    value of the :attr:`~shlex.infile` attribute.  If the *instream* | 
 |    argument is omitted or equal to ``sys.stdin``, this second argument | 
 |    defaults to "stdin".  The *posix* argument defines the operational mode: | 
 |    when *posix* is not true (default), the :class:`~shlex.shlex` instance will | 
 |    operate in compatibility mode.  When operating in POSIX mode, | 
 |    :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell | 
 |    parsing rules.  The *punctuation_chars* argument provides a way to make the | 
 |    behaviour even closer to how real shells parse.  This can take a number of | 
 |    values: the default value, ``False``, preserves the behaviour seen under | 
 |    Python 3.5 and earlier.  If set to ``True``, then parsing of the characters | 
 |    ``();<>|&`` is changed: any run of these characters (considered punctuation | 
 |    characters) is returned as a single token.  If set to a non-empty string of | 
 |    characters, those characters will be used as the punctuation characters.  Any | 
 |    characters in the :attr:`wordchars` attribute that appear in | 
 |    *punctuation_chars* will be removed from :attr:`wordchars`.  See | 
 |    :ref:`improved-shell-compatibility` for more information. *punctuation_chars* | 
 |    can be set only upon :class:`~shlex.shlex` instance creation and can't be | 
 |    modified later. | 
 |  | 
 |    .. versionchanged:: 3.6 | 
 |       The *punctuation_chars* parameter was added. | 
 |  | 
 | .. seealso:: | 
 |  | 
 |    Module :mod:`configparser` | 
 |       Parser for configuration files similar to the Windows :file:`.ini` files. | 
 |  | 
 |  | 
 | .. _shlex-objects: | 
 |  | 
 | shlex Objects | 
 | ------------- | 
 |  | 
 | A :class:`~shlex.shlex` instance has the following methods: | 
 |  | 
 |  | 
 | .. method:: shlex.get_token() | 
 |  | 
 |    Return a token.  If tokens have been stacked using :meth:`push_token`, pop a | 
 |    token off the stack.  Otherwise, read one from the input stream.  If reading | 
 |    encounters an immediate end-of-file, :attr:`eof` is returned (the empty | 
 |    string (``''``) in non-POSIX mode, and ``None`` in POSIX mode). | 
 |  | 
 |  | 
 | .. method:: shlex.push_token(str) | 
 |  | 
 |    Push the argument onto the token stack. | 
 |  | 
 |  | 
 | .. method:: shlex.read_token() | 
 |  | 
 |    Read a raw token.  Ignore the pushback stack, and do not interpret source | 
 |    requests.  (This is not ordinarily a useful entry point, and is documented here | 
 |    only for the sake of completeness.) | 
 |  | 
 |  | 
 | .. method:: shlex.sourcehook(filename) | 
 |  | 
 |    When :class:`~shlex.shlex` detects a source request (see :attr:`source` | 
 |    below) this method is given the following token as argument, and expected | 
 |    to return a tuple consisting of a filename and an open file-like object. | 
 |  | 
 |    Normally, this method first strips any quotes off the argument.  If the result | 
 |    is an absolute pathname, or there was no previous source request in effect, or | 
 |    the previous source was a stream (such as ``sys.stdin``), the result is left | 
 |    alone.  Otherwise, if the result is a relative pathname, the directory part of | 
 |    the name of the file immediately before it on the source inclusion stack is | 
 |    prepended (this behavior is like the way the C preprocessor handles ``#include | 
 |    "file.h"``). | 
 |  | 
 |    The result of the manipulations is treated as a filename, and returned as the | 
 |    first component of the tuple, with :func:`open` called on it to yield the second | 
 |    component. (Note: this is the reverse of the order of arguments in instance | 
 |    initialization!) | 
 |  | 
 |    This hook is exposed so that you can use it to implement directory search paths, | 
 |    addition of file extensions, and other namespace hacks. There is no | 
 |    corresponding 'close' hook, but a shlex instance will call the | 
 |    :meth:`~io.IOBase.close` method of the sourced input stream when it returns | 
 |    EOF. | 
 |  | 
 |    For more explicit control of source stacking, use the :meth:`push_source` and | 
 |    :meth:`pop_source` methods. | 
 |  | 
 |  | 
 | .. method:: shlex.push_source(newstream, newfile=None) | 
 |  | 
 |    Push an input source stream onto the input stack.  If the filename argument is | 
 |    specified it will later be available for use in error messages.  This is the | 
 |    same method used internally by the :meth:`sourcehook` method. | 
 |  | 
 |  | 
 | .. method:: shlex.pop_source() | 
 |  | 
 |    Pop the last-pushed input source from the input stack. This is the same method | 
 |    used internally when the lexer reaches EOF on a stacked input stream. | 
 |  | 
 |  | 
 | .. method:: shlex.error_leader(infile=None, lineno=None) | 
 |  | 
 |    This method generates an error message leader in the format of a Unix C compiler | 
 |    error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced | 
 |    with the name of the current source file and the ``%d`` with the current input | 
 |    line number (the optional arguments can be used to override these). | 
 |  | 
 |    This convenience is provided to encourage :mod:`shlex` users to generate error | 
 |    messages in the standard, parseable format understood by Emacs and other Unix | 
 |    tools. | 
 |  | 
 | Instances of :class:`~shlex.shlex` subclasses have some public instance | 
 | variables which either control lexical analysis or can be used for debugging: | 
 |  | 
 |  | 
 | .. attribute:: shlex.commenters | 
 |  | 
 |    The string of characters that are recognized as comment beginners. All | 
 |    characters from the comment beginner to end of line are ignored. Includes just | 
 |    ``'#'`` by default. | 
 |  | 
 |  | 
 | .. attribute:: shlex.wordchars | 
 |  | 
 |    The string of characters that will accumulate into multi-character tokens.  By | 
 |    default, includes all ASCII alphanumerics and underscore.  In POSIX mode, the | 
 |    accented characters in the Latin-1 set are also included.  If | 
 |    :attr:`punctuation_chars` is not empty, the characters ``~-./*?=``, which can | 
 |    appear in filename specifications and command line parameters, will also be | 
 |    included in this attribute, and any characters which appear in | 
 |    ``punctuation_chars`` will be removed from ``wordchars`` if they are present | 
 |    there. If :attr:`whitespace_split` is set to ``True``, this will have no | 
 |    effect. | 
 |  | 
 |  | 
 | .. attribute:: shlex.whitespace | 
 |  | 
 |    Characters that will be considered whitespace and skipped.  Whitespace bounds | 
 |    tokens.  By default, includes space, tab, linefeed and carriage-return. | 
 |  | 
 |  | 
 | .. attribute:: shlex.escape | 
 |  | 
 |    Characters that will be considered as escape. This will be only used in POSIX | 
 |    mode, and includes just ``'\'`` by default. | 
 |  | 
 |  | 
 | .. attribute:: shlex.quotes | 
 |  | 
 |    Characters that will be considered string quotes.  The token accumulates until | 
 |    the same quote is encountered again (thus, different quote types protect each | 
 |    other as in the shell.)  By default, includes ASCII single and double quotes. | 
 |  | 
 |  | 
 | .. attribute:: shlex.escapedquotes | 
 |  | 
 |    Characters in :attr:`quotes` that will interpret escape characters defined in | 
 |    :attr:`escape`.  This is only used in POSIX mode, and includes just ``'"'`` by | 
 |    default. | 
 |  | 
 |  | 
 | .. attribute:: shlex.whitespace_split | 
 |  | 
 |    If ``True``, tokens will only be split in whitespaces.  This is useful, for | 
 |    example, for parsing command lines with :class:`~shlex.shlex`, getting | 
 |    tokens in a similar way to shell arguments.  When used in combination with | 
 |    :attr:`punctuation_chars`, tokens will be split on whitespace in addition to | 
 |    those characters. | 
 |  | 
 |    .. versionchanged:: 3.8 | 
 |       The :attr:`punctuation_chars` attribute was made compatible with the | 
 |       :attr:`whitespace_split` attribute. | 
 |  | 
 |  | 
 | .. attribute:: shlex.infile | 
 |  | 
 |    The name of the current input file, as initially set at class instantiation time | 
 |    or stacked by later source requests.  It may be useful to examine this when | 
 |    constructing error messages. | 
 |  | 
 |  | 
 | .. attribute:: shlex.instream | 
 |  | 
 |    The input stream from which this :class:`~shlex.shlex` instance is reading | 
 |    characters. | 
 |  | 
 |  | 
 | .. attribute:: shlex.source | 
 |  | 
 |    This attribute is ``None`` by default.  If you assign a string to it, that | 
 |    string will be recognized as a lexical-level inclusion request similar to the | 
 |    ``source`` keyword in various shells.  That is, the immediately following token | 
 |    will be opened as a filename and input will be taken from that stream until | 
 |    EOF, at which point the :meth:`~io.IOBase.close` method of that stream will be | 
 |    called and the input source will again become the original input stream.  Source | 
 |    requests may be stacked any number of levels deep. | 
 |  | 
 |  | 
 | .. attribute:: shlex.debug | 
 |  | 
 |    If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex` | 
 |    instance will print verbose progress output on its behavior.  If you need | 
 |    to use this, you can read the module source code to learn the details. | 
 |  | 
 |  | 
 | .. attribute:: shlex.lineno | 
 |  | 
 |    Source line number (count of newlines seen so far plus one). | 
 |  | 
 |  | 
 | .. attribute:: shlex.token | 
 |  | 
 |    The token buffer.  It may be useful to examine this when catching exceptions. | 
 |  | 
 |  | 
 | .. attribute:: shlex.eof | 
 |  | 
 |    Token used to determine end of file. This will be set to the empty string | 
 |    (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. | 
 |  | 
 |  | 
 | .. attribute:: shlex.punctuation_chars | 
 |  | 
 |    A read-only property. Characters that will be considered punctuation. Runs of | 
 |    punctuation characters will be returned as a single token. However, note that no | 
 |    semantic validity checking will be performed: for example, '>>>' could be | 
 |    returned as a token, even though it may not be recognised as such by shells. | 
 |  | 
 |    .. versionadded:: 3.6 | 
 |  | 
 |  | 
 | .. _shlex-parsing-rules: | 
 |  | 
 | Parsing Rules | 
 | ------------- | 
 |  | 
 | When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the | 
 | following rules. | 
 |  | 
 | * Quote characters are not recognized within words (``Do"Not"Separate`` is | 
 |   parsed as the single word ``Do"Not"Separate``); | 
 |  | 
 | * Escape characters are not recognized; | 
 |  | 
 | * Enclosing characters in quotes preserve the literal value of all characters | 
 |   within the quotes; | 
 |  | 
 | * Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and | 
 |   ``Separate``); | 
 |  | 
 | * If :attr:`~shlex.whitespace_split` is ``False``, any character not | 
 |   declared to be a word character, whitespace, or a quote will be returned as | 
 |   a single-character token. If it is ``True``, :class:`~shlex.shlex` will only | 
 |   split words in whitespaces; | 
 |  | 
 | * EOF is signaled with an empty string (``''``); | 
 |  | 
 | * It's not possible to parse empty strings, even if quoted. | 
 |  | 
 | When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the | 
 | following parsing rules. | 
 |  | 
 | * Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is | 
 |   parsed as the single word ``DoNotSeparate``); | 
 |  | 
 | * Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the | 
 |   next character that follows; | 
 |  | 
 | * Enclosing characters in quotes which are not part of | 
 |   :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value | 
 |   of all characters within the quotes; | 
 |  | 
 | * Enclosing characters in quotes which are part of | 
 |   :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value | 
 |   of all characters within the quotes, with the exception of the characters | 
 |   mentioned in :attr:`~shlex.escape`.  The escape characters retain its | 
 |   special meaning only when followed by the quote in use, or the escape | 
 |   character itself. Otherwise the escape character will be considered a | 
 |   normal character. | 
 |  | 
 | * EOF is signaled with a :const:`None` value; | 
 |  | 
 | * Quoted empty strings (``''``) are allowed. | 
 |  | 
 | .. _improved-shell-compatibility: | 
 |  | 
 | Improved Compatibility with Shells | 
 | ---------------------------------- | 
 |  | 
 | .. versionadded:: 3.6 | 
 |  | 
 | The :class:`shlex` class provides compatibility with the parsing performed by | 
 | common Unix shells like ``bash``, ``dash``, and ``sh``.  To take advantage of | 
 | this compatibility, specify the ``punctuation_chars`` argument in the | 
 | constructor.  This defaults to ``False``, which preserves pre-3.6 behaviour. | 
 | However, if it is set to ``True``, then parsing of the characters ``();<>|&`` | 
 | is changed: any run of these characters is returned as a single token.  While | 
 | this is short of a full parser for shells (which would be out of scope for the | 
 | standard library, given the multiplicity of shells out there), it does allow | 
 | you to perform processing of command lines more easily than you could | 
 | otherwise.  To illustrate, you can see the difference in the following snippet: | 
 |  | 
 | .. doctest:: | 
 |    :options: +NORMALIZE_WHITESPACE | 
 |  | 
 |     >>> import shlex | 
 |     >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")" | 
 |     >>> s = shlex.shlex(text, posix=True) | 
 |     >>> s.whitespace_split = True | 
 |     >>> list(s) | 
 |     ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)'] | 
 |     >>> s = shlex.shlex(text, posix=True, punctuation_chars=True) | 
 |     >>> s.whitespace_split = True | 
 |     >>> list(s) | 
 |     ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';', | 
 |     '(', 'def', 'ghi', ')'] | 
 |  | 
 | Of course, tokens will be returned which are not valid for shells, and you'll | 
 | need to implement your own error checks on the returned tokens. | 
 |  | 
 | Instead of passing ``True`` as the value for the punctuation_chars parameter, | 
 | you can pass a string with specific characters, which will be used to determine | 
 | which characters constitute punctuation. For example:: | 
 |  | 
 |     >>> import shlex | 
 |     >>> s = shlex.shlex("a && b || c", punctuation_chars="|") | 
 |     >>> list(s) | 
 |     ['a', '&', '&', 'b', '||', 'c'] | 
 |  | 
 | .. note:: When ``punctuation_chars`` is specified, the :attr:`~shlex.wordchars` | 
 |    attribute is augmented with the characters ``~-./*?=``.  That is because these | 
 |    characters can appear in file names (including wildcards) and command-line | 
 |    arguments (e.g. ``--color=auto``). Hence:: | 
 |  | 
 |       >>> import shlex | 
 |       >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?', | 
 |       ...                 punctuation_chars=True) | 
 |       >>> list(s) | 
 |       ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?'] | 
 |  | 
 |    However, to match the shell as closely as possible, it is recommended to | 
 |    always use ``posix`` and :attr:`~shlex.whitespace_split` when using | 
 |    :attr:`~shlex.punctuation_chars`, which will negate | 
 |    :attr:`~shlex.wordchars` entirely. | 
 |  | 
 | For best effect, ``punctuation_chars`` should be set in conjunction with | 
 | ``posix=True``. (Note that ``posix=False`` is the default for | 
 | :class:`~shlex.shlex`.) |