| <html> |
| <head> |
| <title>pcre2convert specification</title> |
| </head> |
| <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
| <h1>pcre2convert man page</h1> |
| <p> |
| Return to the <a href="index.html">PCRE2 index page</a>. |
| </p> |
| <p> |
| This page is part of the PCRE2 HTML documentation. It was generated |
| automatically from the original man page. If there is any nonsense in it, |
| please consult the man page, in case the conversion went wrong. |
| <br> |
| <ul> |
| <li><a name="TOC1" href="#SEC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a> |
| <li><a name="TOC2" href="#SEC2">THE CONVERT CONTEXT</a> |
| <li><a name="TOC3" href="#SEC3">THE CONVERSION FUNCTION</a> |
| <li><a name="TOC4" href="#SEC4">CONVERTING GLOBS</a> |
| <li><a name="TOC5" href="#SEC5">CONVERTING POSIX PATTERNS</a> |
| <li><a name="TOC6" href="#SEC6">AUTHOR</a> |
| <li><a name="TOC7" href="#SEC7">REVISION</a> |
| </ul> |
| <br><a name="SEC1" href="#TOC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br> |
| <P> |
| This document describes a set of functions that can be used to convert |
| "foreign" patterns into PCRE2 regular expressions. This facility is currently |
| experimental, and may be changed in future releases. Two kinds of pattern, |
| globs and POSIX patterns, are supported. |
| </P> |
| <br><a name="SEC2" href="#TOC1">THE CONVERT CONTEXT</a><br> |
| <P> |
| <b>pcre2_convert_context *pcre2_convert_context_create(</b> |
| <b> pcre2_general_context *<i>gcontext</i>);</b> |
| <br> |
| <br> |
| <b>pcre2_convert_context *pcre2_convert_context_copy(</b> |
| <b> pcre2_convert_context *<i>cvcontext</i>);</b> |
| <br> |
| <br> |
| <b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b> |
| <br> |
| <br> |
| <b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b> |
| <b> uint32_t <i>escape_char</i>);</b> |
| <br> |
| <br> |
| <b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b> |
| <b> uint32_t <i>separator_char</i>);</b> |
| <br> |
| <br> |
| A convert context is used to hold parameters that affect the way that pattern |
| conversion works. Like all PCRE2 contexts, you need to use a context only if |
| you want to override the defaults. There are the usual create, copy, and free |
| functions. If custom memory management functions are set in a general context |
| that is passed to <b>pcre2_convert_context_create()</b>, they are used for all |
| memory management within the conversion functions. |
| </P> |
| <P> |
| There are only two parameters in the convert context at present. Both apply |
| only to glob conversions. The escape character defaults to grave accent under |
| Windows, otherwise backslash. It can be set to zero, meaning no escape |
| character, or to any punctuation character with a code point less than 256. |
| The separator character defaults to backslash under Windows, otherwise forward |
| slash. It can be set to forward slash, backslash, or dot. |
| </P> |
| <P> |
| The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if |
| their second argument is invalid. |
| </P> |
| <br><a name="SEC3" href="#TOC1">THE CONVERSION FUNCTION</a><br> |
| <P> |
| <b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> |
| <b> uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b> |
| <b> PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b> |
| <br> |
| <br> |
| <b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b> |
| <br> |
| <br> |
| The first two arguments of <b>pcre2_pattern_convert()</b> define the foreign |
| pattern that is to be converted. The length may be given as |
| PCRE2_ZERO_TERMINATED. The <b>options</b> argument defines how the pattern is to |
| be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. |
| PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. |
| One or more of the glob options, or one of the following POSIX options must be |
| set to define the type of conversion that is required: |
| <pre> |
| PCRE2_CONVERT_GLOB |
| PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR |
| PCRE2_CONVERT_GLOB_NO_STARSTAR |
| PCRE2_CONVERT_POSIX_BASIC |
| PCRE2_CONVERT_POSIX_EXTENDED |
| </pre> |
| Details of the conversions are given below. The <b>buffer</b> and <b>blength</b> |
| arguments define how the output is handled: |
| </P> |
| <P> |
| If <b>buffer</b> is NULL, the function just returns the length of the converted |
| pattern via <b>blength</b>. This is one less than the length of buffer needed, |
| because a terminating zero is always added to the output. |
| </P> |
| <P> |
| If <b>buffer</b> points to a NULL pointer, an output buffer is obtained using |
| the allocator in the context or <b>malloc()</b> if no context is supplied. A |
| pointer to this buffer is placed in the variable to which <b>buffer</b> points. |
| When no longer needed the output buffer must be freed by calling |
| <b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL |
| argument, it returns immediately without doing anything. |
| </P> |
| <P> |
| If <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the |
| actual length of the buffer provided (in code units). |
| </P> |
| <P> |
| In all cases, after successful conversion, the variable pointed to by |
| <b>blength</b> is updated to the length actually used (in code units), excluding |
| the terminating zero that is always added. |
| </P> |
| <P> |
| If an error occurs, the length (via <b>blength</b>) is set to the offset |
| within the input pattern where the error was detected. Only gross syntax errors |
| are caught; there are plenty of errors that will get passed on for |
| <b>pcre2_compile()</b> to discover. |
| </P> |
| <P> |
| The return from <b>pcre2_pattern_convert()</b> is zero on success or a non-zero |
| PCRE2 error code. Note that PCRE2 error codes may be positive or negative: |
| <b>pcre2_compile()</b> uses mostly positive codes and <b>pcre2_match()</b> |
| negative ones; <b>pcre2_convert()</b> uses existing codes of both kinds. A |
| textual error message can be obtained by calling |
| <b>pcre2_get_error_message()</b>. |
| </P> |
| <br><a name="SEC4" href="#TOC1">CONVERTING GLOBS</a><br> |
| <P> |
| Globs are used to match file names, and consequently have the concept of a |
| "path separator", which defaults to backslash under Windows and forward slash |
| otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not |
| permitted to match separator characters, but the double-star (**) feature |
| (which does match separators) is supported. |
| </P> |
| <P> |
| PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to |
| match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the |
| double-star feature disabled. These options may be given together. |
| </P> |
| <br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br> |
| <P> |
| POSIX defines two kinds of regular expression pattern: basic and extended. |
| These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or |
| PCRE2_CONVERT_POSIX_EXTENDED, respectively. |
| </P> |
| <P> |
| In POSIX patterns, backslash is not special in a character class. Unmatched |
| closing parentheses are treated as literals. |
| </P> |
| <P> |
| In basic patterns, ? + | {} and () must be escaped to be recognized |
| as metacharacters outside a character class. If the first character in the |
| pattern is * it is treated as a literal. ^ is a metacharacter only at the start |
| of a branch. |
| </P> |
| <P> |
| In extended patterns, a backslash not in a character class always |
| makes the next character literal, whatever it is. There are no backreferences. |
| </P> |
| <P> |
| Note: POSIX mandates that the longest possible match at the first matching |
| position must be found. This is not what <b>pcre2_match()</b> does; it yields |
| the first match that is found. An application can use <b>pcre2_dfa_match()</b> |
| to find the longest match, but that does not support backreferences (but then |
| neither do POSIX extended patterns). |
| </P> |
| <br><a name="SEC6" href="#TOC1">AUTHOR</a><br> |
| <P> |
| Philip Hazel |
| <br> |
| University Computing Service |
| <br> |
| Cambridge, England. |
| <br> |
| </P> |
| <br><a name="SEC7" href="#TOC1">REVISION</a><br> |
| <P> |
| Last updated: 28 June 2018 |
| <br> |
| Copyright © 1997-2018 University of Cambridge. |
| <br> |
| <p> |
| Return to the <a href="index.html">PCRE2 index page</a>. |
| </p> |