Update documentation of \Q...\E
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html
index 6256de5..7ed2360 100644
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@@ -99,7 +99,12 @@
\Q\\E \ \\E
</pre>
The \Q...\E sequence is recognized both inside and outside character classes
-by both PCRE2 and Perl.
+by both PCRE2 and Perl. Another difference from Perl is that any appearance of
+\Q or \E inside what might otherwise be a quantifier causes PCRE2 not to
+recognize the sequence as a quantifier. Perl recognizes a quantifier if
+(redundantly) either of the numbers is inside \Q...\E, but not if the
+separating comma is. When not recognized as a quantifier a sequence such as
+{\Q1\E,2} is treated as the literal string "{1,2}".
</P>
<P>
9. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
@@ -267,7 +272,7 @@
REVISION
</b><br>
<P>
-Last updated: 22 July 2024
+Last updated: 12 August 2024
<br>
Copyright © 1997-2024 University of Cambridge.
<br>
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index f19abcd..4886cd0 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -346,7 +346,7 @@
</pre>
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
the pattern, other than in a character class, within a \Q...\E sequence, or
-between a # outside a character class and the next newline, inclusive, are
+between a # outside a character class and the next newline, inclusive, is
ignored. An escaping backslash can be used to include a white space or a #
character as part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the
same applies, but in addition unescaped space and horizontal tab characters are
@@ -404,6 +404,14 @@
the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
a character class, this causes an error, because the character class is then
not terminated by a closing square bracket.
+</P>
+<P>
+Another difference from Perl is that any appearance of \Q or \E inside what
+might otherwise be a quantifier causes PCRE2 not to recognize the sequence as a
+quantifier. Perl recognizes a quantifier if (redundantly) either of the numbers
+is inside \Q...\E, but not if the separating comma is. When not recognized as
+a quantifier a sequence such as {\Q1\E,2} is treated as the literal string
+"{1,2}".
<a name="digitsafterbackslash"></a></P>
<br><b>
Non-printing characters
@@ -2970,8 +2978,8 @@
There are two ways of including comments in patterns that are processed by
PCRE2. In both cases, the start of the comment must not be in a character
class, nor in the middle of any other sequence of related characters such as
-(?: or a group name or number. The characters that make up a comment play
-no part in the pattern matching.
+(?: or a group name or number or a Unicode property name. The characters that
+make up a comment play no part in the pattern matching.
</P>
<P>
The sequence (?# marks the start of a comment that continues up to the next
@@ -3864,7 +3872,7 @@
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 28 July 2024
+Last updated: 12 August 2024
<br>
Copyright © 1997-2024 University of Cambridge.
<br>
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
index 86f3f7c..fa3b275 100644
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@@ -60,7 +60,10 @@
\Q...\E treat enclosed characters as literal
</pre>
Note that white space inside \Q...\E is always treated as literal, even if
-PCRE2_EXTENDED is set, causing most other white space to be ignored.
+PCRE2_EXTENDED is set, causing most other white space to be ignored. Note also
+that PCRE2's handling of \Q...\E has some differences from Perl's. See the
+<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
+documentation for details.
</P>
<br><a name="SEC3" href="#TOC1">BRACED ITEMS</a><br>
<P>
@@ -629,7 +632,7 @@
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 29 July 2024
+Last updated: 12 August 2024
<br>
Copyright © 1997-2024 University of Cambridge.
<br>
diff --git a/doc/pcre2.txt b/doc/pcre2.txt
index 9f052f8..510ad4e 100644
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@@ -5176,131 +5176,137 @@
\Q\\E \ \\E
The \Q...\E sequence is recognized both inside and outside character
- classes by both PCRE2 and Perl.
+ classes by both PCRE2 and Perl. Another difference from Perl is that
+ any appearance of \Q or \E inside what might otherwise be a quantifier
+ causes PCRE2 not to recognize the sequence as a quantifier. Perl recog-
+ nizes a quantifier if (redundantly) either of the numbers is inside
+ \Q...\E, but not if the separating comma is. When not recognized as a
+ quantifier a sequence such as {\Q1\E,2} is treated as the literal
+ string "{1,2}".
- 9. Fairly obviously, PCRE2 does not support the (?{code}) and
+ 9. Fairly obviously, PCRE2 does not support the (?{code}) and
(??{code}) constructions. However, PCRE2 does have a "callout" feature,
which allows an external function to be called during pattern matching.
See the pcre2callout documentation for details.
- 10. Subroutine calls (whether recursive or not) were treated as atomic
- groups up to PCRE2 release 10.23, but from release 10.30 this changed,
+ 10. Subroutine calls (whether recursive or not) were treated as atomic
+ groups up to PCRE2 release 10.23, but from release 10.30 this changed,
and backtracking into subroutine calls is now supported, as in Perl.
- 11. In PCRE2, if any of the backtracking control verbs are used in a
- group that is called as a subroutine (whether or not recursively),
- their effect is confined to that group; it does not extend to the sur-
- rounding pattern. This is not always the case in Perl. In particular,
- if (*THEN) is present in a group that is called as a subroutine, its
+ 11. In PCRE2, if any of the backtracking control verbs are used in a
+ group that is called as a subroutine (whether or not recursively),
+ their effect is confined to that group; it does not extend to the sur-
+ rounding pattern. This is not always the case in Perl. In particular,
+ if (*THEN) is present in a group that is called as a subroutine, its
action is limited to that group, even if the group does not contain any
- | characters. Note that such groups are processed as anchored at the
+ | characters. Note that such groups are processed as anchored at the
point where they are tested.
- 12. If a pattern contains more than one backtracking control verb, the
- first one that is backtracked onto acts. For example, in the pattern
- A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure
+ 12. If a pattern contains more than one backtracking control verb, the
+ first one that is backtracked onto acts. For example, in the pattern
+ A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure
in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
it is the same as PCRE2, but there are cases where it differs.
- 13. There are some differences that are concerned with the settings of
- captured strings when part of a pattern is repeated. For example,
- matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 un-
+ 13. There are some differences that are concerned with the settings of
+ captured strings when part of a pattern is repeated. For example,
+ matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 un-
set, but in PCRE2 it is set to "b".
- 14. PCRE2's handling of duplicate capture group numbers and names is
- not as general as Perl's. This is a consequence of the fact the PCRE2
- works internally just with numbers, using an external table to trans-
- late between numbers and names. In particular, a pattern such as
- (?|(?<a>A)|(?<b>B)), where the two capture groups have the same number
- but different names, is not supported, and causes an error at compile
+ 14. PCRE2's handling of duplicate capture group numbers and names is
+ not as general as Perl's. This is a consequence of the fact the PCRE2
+ works internally just with numbers, using an external table to trans-
+ late between numbers and names. In particular, a pattern such as
+ (?|(?<a>A)|(?<b>B)), where the two capture groups have the same number
+ but different names, is not supported, and causes an error at compile
time. If it were allowed, it would not be possible to distinguish which
- group matched, because both names map to capture group number 1. To
+ group matched, because both names map to capture group number 1. To
avoid this confusing situation, an error is given at compile time.
15. Perl used to recognize comments in some places that PCRE2 does not,
- for example, between the ( and ? at the start of a group. If the /x
- modifier is set, Perl allowed white space between ( and ? though the
- latest Perls give an error (for a while it was just deprecated). There
+ for example, between the ( and ? at the start of a group. If the /x
+ modifier is set, Perl allowed white space between ( and ? though the
+ latest Perls give an error (for a while it was just deprecated). There
may still be some cases where Perl behaves differently.
- 16. Perl, when in warning mode, gives warnings for character classes
- such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
+ 16. Perl, when in warning mode, gives warnings for character classes
+ such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
als. PCRE2 has no warning features, so it gives an error in these cases
because they are almost certainly user mistakes.
17. In PCRE2, until release 10.45, the upper/lower case character prop-
- erties Lu and Ll were not affected when case-independent matching was
- specified. Perl has changed in this respect, and PCRE2 has now changed
- to match. When caseless matching is in force, Lu, Ll, and Lt (title
+ erties Lu and Ll were not affected when case-independent matching was
+ specified. Perl has changed in this respect, and PCRE2 has now changed
+ to match. When caseless matching is in force, Lu, Ll, and Lt (title
case) are all treated as Lc (cased letter).
18. From release 5.32.0, Perl locks out the use of \K in lookaround as-
- sertions. From release 10.38 PCRE2 does the same by default. However,
- there is an option for re-enabling the previous behaviour. When this
- option is set, \K is acted on when it occurs in positive assertions,
+ sertions. From release 10.38 PCRE2 does the same by default. However,
+ there is an option for re-enabling the previous behaviour. When this
+ option is set, \K is acted on when it occurs in positive assertions,
but is ignored in negative assertions.
- 19. PCRE2 provides some extensions to the Perl regular expression fa-
- cilities. Perl 5.10 included new features that were not in earlier
- versions of Perl, some of which (such as named parentheses) were in
+ 19. PCRE2 provides some extensions to the Perl regular expression fa-
+ cilities. Perl 5.10 included new features that were not in earlier
+ versions of Perl, some of which (such as named parentheses) were in
PCRE2 for some time before. This list is with respect to Perl 5.38:
- (a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
+ (a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
$ meta-character matches only at the very end of the string.
- (b) A backslash followed by a letter with no special meaning is
+ (b) A backslash followed by a letter with no special meaning is
faulted. (Perl can be made to issue a warning.)
- (c) If PCRE2_UNGREEDY is set, the greediness of the repetition quanti-
+ (c) If PCRE2_UNGREEDY is set, the greediness of the repetition quanti-
fiers is inverted, that is, by default they are not greedy, but if fol-
lowed by a question mark they are.
- (d) PCRE2_ANCHORED can be used at matching time to force a pattern to
+ (d) PCRE2_ANCHORED can be used at matching time to force a pattern to
be tried only at the first matching position in the subject string.
- (e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and
+ (e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and
PCRE2_NOTEMPTY_ATSTART options have no Perl equivalents.
- (f) The \R escape sequence can be restricted to match only CR, LF, or
+ (f) The \R escape sequence can be restricted to match only CR, LF, or
CRLF by the PCRE2_BSR_ANYCRLF option.
- (g) The callout facility is PCRE2-specific. Perl supports codeblocks
+ (g) The callout facility is PCRE2-specific. Perl supports codeblocks
and variable interpolation, but not general hooks on every match.
(h) The partial matching facility is PCRE2-specific.
- (i) The alternative matching function (pcre2_dfa_match() matches in a
+ (i) The alternative matching function (pcre2_dfa_match() matches in a
different way and is not Perl-compatible.
- (j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT)
- at the start of a pattern. These set overall options that cannot be
+ (j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT)
+ at the start of a pattern. These set overall options that cannot be
changed within the pattern.
- (k) PCRE2 supports non-atomic positive lookaround assertions. This is
+ (k) PCRE2 supports non-atomic positive lookaround assertions. This is
an extension to the lookaround facilities. The default, Perl-compatible
lookarounds are atomic.
- (l) There are three syntactical items in patterns that can refer to a
- capturing group by number: back references such as \g{2}, subroutine
- calls such as (?3), and condition references such as (?(4)...). PCRE2
- supports relative group numbers such as +2 and -4 in all three cases.
- Perl supports both plus and minus for subroutine calls, but only minus
+ (l) There are three syntactical items in patterns that can refer to a
+ capturing group by number: back references such as \g{2}, subroutine
+ calls such as (?3), and condition references such as (?(4)...). PCRE2
+ supports relative group numbers such as +2 and -4 in all three cases.
+ Perl supports both plus and minus for subroutine calls, but only minus
for back references, and no relative numbering at all for conditions.
20. Perl has different limits than PCRE2. See the pcre2limit documenta-
tion for details. Perl went with 5.10 from recursion to iteration keep-
ing the intermediate matches on the heap, which is ~10% slower but does
- not fall into any stack-overflow limit. PCRE2 made a similar change at
- release 10.30, and also has many build-time and run-time customizable
+ not fall into any stack-overflow limit. PCRE2 made a similar change at
+ release 10.30, and also has many build-time and run-time customizable
limits.
- 21. Unlike Perl, PCRE2 doesn't have character set modifiers and spe-
- cially no way to set characters by context just like Perl's "/d". A
+ 21. Unlike Perl, PCRE2 doesn't have character set modifiers and spe-
+ cially no way to set characters by context just like Perl's "/d". A
regular expression using PCRE2_UTF and PCRE2_UCP will use similar rules
- to Perl's "/u"; something closer to "/a" could be selected by adding
+ to Perl's "/u"; something closer to "/a" could be selected by adding
other PCRE2_EXTRA_ASCII* options on top.
- 22. Some recursive patterns that Perl diagnoses as infinite recursions
+ 22. Some recursive patterns that Perl diagnoses as infinite recursions
can be handled by PCRE2, either by the interpreter or the JIT. An exam-
ple is /(?:|(?0)abcd)(?(R)|\z)/, which matches a sequence of any number
of repeated "abcd" substrings at the end of the subject.
@@ -5315,11 +5321,11 @@
REVISION
- Last updated: 22 July 2024
+ Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
-PCRE2 10.45 22 July 2024 PCRE2COMPAT(3)
+PCRE2 10.45 12 August 2024 PCRE2COMPAT(3)
------------------------------------------------------------------------------
@@ -6750,7 +6756,7 @@
If a pattern is compiled with the PCRE2_EXTENDED option, most white
space in the pattern, other than in a character class, within a \Q...\E
sequence, or between a # outside a character class and the next new-
- line, inclusive, are ignored. An escaping backslash can be used to in-
+ line, inclusive, is ignored. An escaping backslash can be used to in-
clude a white space or a # character as part of the pattern. If the
PCRE2_EXTENDED_MORE option is set, the same applies, but in addition
unescaped space and horizontal tab characters are ignored inside a
@@ -6808,6 +6814,13 @@
error, because the character class is then not terminated by a closing
square bracket.
+ Another difference from Perl is that any appearance of \Q or \E inside
+ what might otherwise be a quantifier causes PCRE2 not to recognize the
+ sequence as a quantifier. Perl recognizes a quantifier if (redundantly)
+ either of the numbers is inside \Q...\E, but not if the separating
+ comma is. When not recognized as a quantifier a sequence such as
+ {\Q1\E,2} is treated as the literal string "{1,2}".
+
Non-printing characters
A second use of backslash provides a way of encoding non-printing char-
@@ -9206,8 +9219,9 @@
There are two ways of including comments in patterns that are processed
by PCRE2. In both cases, the start of the comment must not be in a
character class, nor in the middle of any other sequence of related
- characters such as (?: or a group name or number. The characters that
- make up a comment play no part in the pattern matching.
+ characters such as (?: or a group name or number or a Unicode property
+ name. The characters that make up a comment play no part in the pattern
+ matching.
The sequence (?# marks the start of a comment that continues up to the
next closing parenthesis. Nested parentheses are not permitted. If the
@@ -10047,11 +10061,11 @@
REVISION
- Last updated: 28 July 2024
+ Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
-PCRE2 10.45 28 July 2024 PCRE2PATTERN(3)
+PCRE2 10.45 12 August 2024 PCRE2PATTERN(3)
------------------------------------------------------------------------------
@@ -10973,22 +10987,24 @@
Note that white space inside \Q...\E is always treated as literal, even
if PCRE2_EXTENDED is set, causing most other white space to be ignored.
+ Note also that PCRE2's handling of \Q...\E has some differences from
+ Perl's. See the pcre2pattern documentation for details.
BRACED ITEMS
- With one exception, wherever brace characters { and } are required to
- enclose data for constructions such as \g{2} or \k{name}, space and/or
- horizontal tab characters that follow { or precede } are allowed and
+ With one exception, wherever brace characters { and } are required to
+ enclose data for constructions such as \g{2} or \k{name}, space and/or
+ horizontal tab characters that follow { or precede } are allowed and
are ignored. In the case of quantifiers, they may also appear before or
- after the comma. The exception is \u{...} which is not Perl-compatible
+ after the comma. The exception is \u{...} which is not Perl-compatible
and is recognized only when PCRE2_EXTRA_ALT_BSUX is set. This is an EC-
MAScript compatibility feature, and follows ECMAScript's behaviour.
ESCAPED CHARACTERS
- This table applies to ASCII and Unicode environments. An unrecognized
+ This table applies to ASCII and Unicode environments. An unrecognized
escape sequence causes an error.
\a alarm, that is, the BEL character (hex 07)
@@ -11012,19 +11028,19 @@
\uhhhh character with hex code hhhh
\u{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
- When \x is not followed by {, from zero to two hexadecimal digits are
- read, but in ALT_BSUX mode \x must be followed by two hexadecimal dig-
- its to be recognized as a hexadecimal escape; otherwise it matches a
- literal "x". Likewise, if \u (in ALT_BSUX mode) is not followed by
- four hexadecimal digits or (in EXTRA_ALT_BSUX mode) a sequence of hex
+ When \x is not followed by {, from zero to two hexadecimal digits are
+ read, but in ALT_BSUX mode \x must be followed by two hexadecimal dig-
+ its to be recognized as a hexadecimal escape; otherwise it matches a
+ literal "x". Likewise, if \u (in ALT_BSUX mode) is not followed by
+ four hexadecimal digits or (in EXTRA_ALT_BSUX mode) a sequence of hex
digits in curly brackets, it matches a literal "u".
Note that \0dd is always an octal code. The treatment of backslash fol-
- lowed by a non-zero digit is complicated; for details see the section
- "Non-printing characters" in the pcre2pattern documentation, where de-
- tails of escape processing in EBCDIC environments are also given.
+ lowed by a non-zero digit is complicated; for details see the section
+ "Non-printing characters" in the pcre2pattern documentation, where de-
+ tails of escape processing in EBCDIC environments are also given.
\N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not supported in
- EBCDIC environments. Note that \N not followed by an opening curly
+ EBCDIC environments. Note that \N not followed by an opening curly
bracket has a different meaning (see below).
@@ -11049,23 +11065,23 @@
\W a "non-word" character
\X a Unicode extended grapheme cluster
- \C is dangerous because it may leave the current matching point in the
+ \C is dangerous because it may leave the current matching point in the
middle of a UTF-8 or UTF-16 character. The application can lock out the
- use of \C by setting the PCRE2_NEVER_BACKSLASH_C option. It is also
+ use of \C by setting the PCRE2_NEVER_BACKSLASH_C option. It is also
possible to build PCRE2 with the use of \C permanently disabled.
- By default, \d, \s, and \w match only ASCII characters, even in UTF-8
+ By default, \d, \s, and \w match only ASCII characters, even in UTF-8
mode or in the 16-bit and 32-bit libraries. However, if locale-specific
- matching is happening, \s and \w may also match characters with code
+ matching is happening, \s and \w may also match characters with code
points in the range 128-255. If the PCRE2_UCP option is set, the behav-
iour of these escape sequences is changed to use Unicode properties and
- they match many more characters, but there are some option settings
- that can restrict individual sequences to matching only ASCII charac-
+ they match many more characters, but there are some option settings
+ that can restrict individual sequences to matching only ASCII charac-
ters.
Property descriptions in \p and \P are matched caselessly; hyphens, un-
- derscores, and ASCII white space characters are ignored, in accordance
- with Unicode's "loose matching" rules. For example, \p{Bidi_Class=al}
+ derscores, and ASCII white space characters are ignored, in accordance
+ with Unicode's "loose matching" rules. For example, \p{Bidi_Class=al}
is the same as \p{ bidi class = AL }.
@@ -11117,7 +11133,7 @@
Zp Paragraph separator
Zs Space separator
- From release 10.45, when caseless matching is set, Ll, Lu, and Lt are
+ From release 10.45, when caseless matching is set, Ll, Lu, and Lt are
all equivalent to Lc.
@@ -11136,9 +11152,9 @@
BINARY PROPERTIES FOR \p AND \P
- Unicode defines a number of binary properties, that is, properties
- whose only values are true or false. You can obtain a list of those
- that are recognized by \p and \P, along with their abbreviations, by
+ Unicode defines a number of binary properties, that is, properties
+ whose only values are true or false. You can obtain a list of those
+ that are recognized by \p and \P, along with their abbreviations, by
running this command:
pcre2test -LP
@@ -11146,8 +11162,8 @@
SCRIPT MATCHING WITH \p AND \P
- Many script names and their 4-letter abbreviations are recognized in
- \p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P
+ Many script names and their 4-letter abbreviations are recognized in
+ \p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P
of course). You can obtain a list of these scripts by running this com-
mand:
@@ -11209,8 +11225,8 @@
word same as \w
xdigit hexadecimal digit
- In PCRE2, POSIX character set names recognize only ASCII characters by
- default, but some of them use Unicode properties if PCRE2_UCP is set.
+ In PCRE2, POSIX character set names recognize only ASCII characters by
+ default, but some of them use Unicode properties if PCRE2_UCP is set.
You can use \Q...\E inside a character class.
@@ -11258,8 +11274,8 @@
\K set reported start of match
- From release 10.38 \K is not permitted by default in lookaround asser-
- tions, for compatibility with Perl. However, if the PCRE2_EXTRA_AL-
+ From release 10.38 \K is not permitted by default in lookaround asser-
+ tions, for compatibility with Perl. However, if the PCRE2_EXTRA_AL-
LOW_LOOKAROUND_BSK option is set, the previous behaviour is re-enabled.
When this option is set, \K is honoured in positive assertions, but ig-
nored in negative ones.
@@ -11280,8 +11296,8 @@
(?|...) non-capture group; reset group numbers for
capture groups in each alternative
- In non-UTF modes, names may contain underscores and ASCII letters and
- digits; in UTF modes, any Unicode letters and Unicode decimal digits
+ In non-UTF modes, names may contain underscores and ASCII letters and
+ digits; in UTF modes, any Unicode letters and Unicode decimal digits
are permitted. In both cases, a name must not start with a digit.
@@ -11297,7 +11313,7 @@
OPTION SETTING
- Changes of these options within a group are automatically cancelled at
+ Changes of these options within a group are automatically cancelled at
the end of the group.
(?a) all ASCII options
@@ -11319,10 +11335,10 @@
(?^) unset imnrsx options
(?aP) implies (?aT) as well, though this has no additional effect. How-
- ever, it means that (?-aP) is really (?-PT) which disables all ASCII
+ ever, it means that (?-aP) is really (?-PT) which disables all ASCII
restrictions for POSIX classes.
- Unsetting x or xx unsets both. Several options may be set at once, and
+ Unsetting x or xx unsets both. Several options may be set at once, and
a mixture of setting and unsetting such as (?i-x) is allowed, but there
may be only one hyphen. Setting (but no unsetting) is allowed after (?^
for example (?^in). An option setting may appear at the start of a non-
@@ -11344,11 +11360,11 @@
(*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
- Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the
- value of the limits set by the caller of pcre2_match() or
- pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete
+ Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the
+ value of the limits set by the caller of pcre2_match() or
+ pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete
synonym for LIMIT_DEPTH. The application can lock out the use of (*UTF)
- and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options,
+ and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options,
respectively, at compile time.
@@ -11392,11 +11408,11 @@
(*nlb:...) ) negative lookbehind
(*negative_lookbehind:...) )
- Each top-level branch of a lookbehind must have a limit for the number
- of characters it matches. If any branch can match a variable number of
- characters, the maximum for each branch is limited to a value set by
- the caller of pcre2_compile() or defaulted. The default is set when
- PCRE2 is built (ultimate default 255). If every branch matches a fixed
+ Each top-level branch of a lookbehind must have a limit for the number
+ of characters it matches. If any branch can match a variable number of
+ characters, the maximum for each branch is limited to a value set by
+ the caller of pcre2_compile() or defaulted. The default is set when
+ PCRE2 is built (ultimate default 255). If every branch matches a fixed
number of characters, the limit for each branch is 65535 characters.
@@ -11474,16 +11490,16 @@
(?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition
- Note the ambiguity of (?(R) and (?(Rn) which might be named reference
- conditions or recursion tests. Such a condition is interpreted as a
+ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
+ conditions or recursion tests. Such a condition is interpreted as a
reference condition if the relevant named group exists.
BACKTRACKING CONTROL
- All backtracking control verbs may be in the form (*VERB:NAME). For
- (*MARK) the name is mandatory, for the others it is optional. (*SKIP)
- changes its behaviour if :NAME is present. The others just set a name
+ All backtracking control verbs may be in the form (*VERB:NAME). For
+ (*MARK) the name is mandatory, for the others it is optional. (*SKIP)
+ changes its behaviour if :NAME is present. The others just set a name
for passing back to the caller, but this is not a name that (*SKIP) can
see. The following act immediately they are reached:
@@ -11491,7 +11507,7 @@
(*FAIL) force backtrack; synonym (*F)
(*MARK:NAME) set name to be passed back; synonym (*:NAME)
- The following act only when a subsequent match failure causes a back-
+ The following act only when a subsequent match failure causes a back-
track to reach them. They all force a match failure, but they differ in
what happens afterwards. Those that advance the start-of-match point do
so only if the pattern is not anchored.
@@ -11503,7 +11519,7 @@
(*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation
- The effect of one of these verbs in a group called as a subroutine is
+ The effect of one of these verbs in a group called as a subroutine is
confined to the subroutine call.
@@ -11514,14 +11530,14 @@
(?C"text") callout with string data
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
- the start and the end), and the starting delimiter { matched with the
- ending delimiter }. To encode the ending delimiter within the string,
+ the start and the end), and the starting delimiter { matched with the
+ ending delimiter }. To encode the ending delimiter within the string,
double it.
SEE ALSO
- pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
+ pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2(3).
@@ -11534,11 +11550,11 @@
REVISION
- Last updated: 29 July 2024
+ Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
-PCRE2 10.45 29 July 2024 PCRE2SYNTAX(3)
+PCRE2 10.45 12 August 2024 PCRE2SYNTAX(3)
------------------------------------------------------------------------------
diff --git a/doc/pcre2compat.3 b/doc/pcre2compat.3
index 6dfbae2..145cec5 100644
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "22 July 2024" "PCRE2 10.45"
+.TH PCRE2COMPAT 3 "12 August 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@@ -85,7 +85,12 @@
\eQ\e\eE \e \e\eE
.sp
The \eQ...\eE sequence is recognized both inside and outside character classes
-by both PCRE2 and Perl.
+by both PCRE2 and Perl. Another difference from Perl is that any appearance of
+\eQ or \eE inside what might otherwise be a quantifier causes PCRE2 not to
+recognize the sequence as a quantifier. Perl recognizes a quantifier if
+(redundantly) either of the numbers is inside \eQ...\eE, but not if the
+separating comma is. When not recognized as a quantifier a sequence such as
+{\eQ1\eE,2} is treated as the literal string "{1,2}".
.P
9. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
constructions. However, PCRE2 does have a "callout" feature, which allows an
@@ -231,6 +236,6 @@
.rs
.sp
.nf
-Last updated: 22 July 2024
+Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi
diff --git a/doc/pcre2demo.3 b/doc/pcre2demo.3
index d518f2c..4dcf77d 100644
--- a/doc/pcre2demo.3
+++ b/doc/pcre2demo.3
@@ -1,4 +1,4 @@
-.TH PCRE2DEMO 3 " 4 August 2024" "PCRE2 10.44"
+.TH PCRE2DEMO 3 "12 August 2024" "PCRE2 10.44"
.\"AUTOMATICALLY GENERATED BY PrepareRelease - do not EDIT!
.SH NAME
PCRE2DEMO - A demonstration C program for PCRE2
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3
index ed9b953..ed05a6e 100644
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "28 July 2024" "PCRE2 10.45"
+.TH PCRE2PATTERN 3 "12 August 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -320,7 +320,7 @@
.sp
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
the pattern, other than in a character class, within a \eQ...\eE sequence, or
-between a # outside a character class and the next newline, inclusive, are
+between a # outside a character class and the next newline, inclusive, is
ignored. An escaping backslash can be used to include a white space or a #
character as part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the
same applies, but in addition unescaped space and horizontal tab characters are
@@ -381,6 +381,13 @@
the pattern (that is, \eE is assumed at the end). If the isolated \eQ is inside
a character class, this causes an error, because the character class is then
not terminated by a closing square bracket.
+.P
+Another difference from Perl is that any appearance of \eQ or \eE inside what
+might otherwise be a quantifier causes PCRE2 not to recognize the sequence as a
+quantifier. Perl recognizes a quantifier if (redundantly) either of the numbers
+is inside \eQ...\eE, but not if the separating comma is. When not recognized as
+a quantifier a sequence such as {\eQ1\eE,2} is treated as the literal string
+"{1,2}".
.
.
.\" HTML <a name="digitsafterbackslash"></a>
@@ -3004,8 +3011,8 @@
There are two ways of including comments in patterns that are processed by
PCRE2. In both cases, the start of the comment must not be in a character
class, nor in the middle of any other sequence of related characters such as
-(?: or a group name or number. The characters that make up a comment play
-no part in the pattern matching.
+(?: or a group name or number or a Unicode property name. The characters that
+make up a comment play no part in the pattern matching.
.P
The sequence (?# marks the start of a comment that continues up to the next
closing parenthesis. Nested parentheses are not permitted. If the
@@ -3909,6 +3916,6 @@
.rs
.sp
.nf
-Last updated: 28 July 2024
+Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi
diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3
index cc4ff78..d69e050 100644
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "29 July 2024" "PCRE2 10.45"
+.TH PCRE2SYNTAX 3 "12 August 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -19,7 +19,12 @@
\eQ...\eE treat enclosed characters as literal
.sp
Note that white space inside \eQ...\eE is always treated as literal, even if
-PCRE2_EXTENDED is set, causing most other white space to be ignored.
+PCRE2_EXTENDED is set, causing most other white space to be ignored. Note also
+that PCRE2's handling of \eQ...\eE has some differences from Perl's. See the
+.\" HREF
+\fBpcre2pattern\fP
+.\"
+documentation for details.
.
.
.SH "BRACED ITEMS"
@@ -617,6 +622,6 @@
.rs
.sp
.nf
-Last updated: 29 July 2024
+Last updated: 12 August 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi