| Change Log for PCRE2 |
| -------------------- |
| |
| Version 10.20 xx-xx-2015 |
| ------------------------ |
| |
| 1. Callouts with string arguments have been added. |
| |
| 2. Assertion code generator in JIT has been optimized. |
| |
| 3. The invalid pattern (?(?C) has a missing assertion condition at the end. The |
| pcre2_compile() function read past the end of the input before diagnosing an |
| error. This bug was discovered by the LLVM fuzzer. |
| |
| 4. Implemented pcre2_callout_enumerate(). |
| |
| 5. Fix JIT compilation of conditional blocks whose assertion is converted to |
| (*FAIL). E.g: /(?(?!))/. |
| |
| 6. The pattern /(?(?!)^)/ caused references to random memory. This bug was |
| discovered by the LLVM fuzzer. |
| |
| 7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly |
| when this assertion was used as a condition, for example (?(?!)a|b). In |
| pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect |
| error about an unsupported item. |
| |
| 8. For some types of pattern, for example /Z*(|d*){216}/, the auto- |
| possessification code could take exponential time to complete. A recursion |
| depth limit of 1000 has been imposed to limit the resources used by this |
| optimization. This infelicity was discovered by the LLVM fuzzer. |
| |
| 9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class |
| such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored |
| because \S ensures they are all in the class. The code for doing this was |
| interacting badly with the code for computing the amount of space needed to |
| compile the pattern, leading to a buffer overflow. This bug was discovered by |
| the LLVM fuzzer. |
| |
| 10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside |
| other kinds of group caused stack overflow at compile time. This bug was |
| discovered by the LLVM fuzzer. |
| |
| 11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment |
| between a subroutine call and its quantifier was incorrectly compiled, leading |
| to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. |
| |
| 12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an |
| assertion after (?(. The code was failing to check the character after (?(?< |
| for the ! or = that would indicate a lookbehind assertion. This bug was |
| discovered by the LLVM fuzzer. |
| |
| 13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with |
| a fixed maximum following a group that contains a subroutine reference was |
| incorrectly compiled and could trigger buffer overflow. This bug was discovered |
| by the LLVM fuzzer. |
| |
| 14. Negative relative recursive references such as (?-7) to non-existent |
| subpatterns were not being diagnosed and could lead to unpredictable behaviour. |
| This bug was discovered by the LLVM fuzzer. |
| |
| 15. The bug fixed in 14 was due to an integer variable that was unsigned when |
| it should have been signed. Some other "int" variables, having been checked, |
| have either been changed to uint32_t or commented as "must be signed". |
| |
| 16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) |
| caused a stack overflow instead of the diagnosis of a non-fixed length |
| lookbehind assertion. This bug was discovered by the LLVM fuzzer. |
| |
| 17. The use of \K in a positive lookbehind assertion in a non-anchored pattern |
| (e.g. /(?<=\Ka)/) could make pcre2grep loop. |
| |
| 18. There was a similar problem to 17 in pcre2test for global matches, though |
| the code there did catch the loop. |
| |
| 19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), |
| and a subsequent item in the pattern caused a non-match, backtracking over the |
| repeated \X did not stop, but carried on past the start of the subject, causing |
| reference to random memory and/or a segfault. There were also some other cases |
| where backtracking after \C could crash. This set of bugs was discovered by the |
| LLVM fuzzer. |
| |
| 20. The function for finding the minimum length of a matching string could take |
| a very long time if mutual recursion was present many times in a pattern, for |
| example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has |
| been implemented. This infelicity was discovered by the LLVM fuzzer. |
| |
| 21. Implemented PCRE2_NEVER_BACKSLASH_C. |
| |
| 22. The feature for string replication in pcre2test could read from freed |
| memory if the replication required a buffer to be extended, and it was not |
| working properly in 16-bit and 32-bit modes. This issue was discovered by a |
| fuzzer: see http://lcamtuf.coredump.cx/afl/. |
| |
| 23. Added the PCRE2_ALT_CIRCUMFLEX option. |
| |
| 24. Adjust the treatment of \8 and \9 to be the same as the current Perl |
| behaviour. |
| |
| 25. Static linking against the PCRE2 library using the pkg-config module was |
| failing on missing pthread symbols. |
| |
| 26. If a group that contained a recursive back reference also contained a |
| forward reference subroutine call followed by a non-forward-reference |
| subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to |
| compile correct code, leading to undefined behaviour or an internally detected |
| error. This bug was discovered by the LLVM fuzzer. |
| |
| 27. Quantification of certain items (e.g. atomic back references) could cause |
| incorrect code to be compiled when recursive forward references were involved. |
| For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was |
| discovered by the LLVM fuzzer. |
| |
| 28. A repeated conditional group whose condition was a reference by name caused |
| a buffer overflow if there was more than one group with the given name. This |
| bug was discovered by the LLVM fuzzer. |
| |
| 29. A recursive back reference by name within a group that had the same name as |
| another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/. |
| This bug was discovered by the LLVM fuzzer. |
| |
| 30. A forward reference by name to a group whose number is the same as the |
| current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a |
| buffer overflow at compile time. This bug was discovered by the LLVM fuzzer. |
| |
| 31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1 |
| as an int; fixed by writing it as 1u). |
| |
| |
| Version 10.10 06-March-2015 |
| --------------------------- |
| |
| 1. When a pattern is compiled, it remembers the highest back reference so that |
| when matching, if the ovector is too small, extra memory can be obtained to |
| use instead. A conditional subpattern whose condition is a check on a capture |
| having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is |
| another kind of back reference, but it was not setting the highest |
| backreference number. This mattered only if pcre2_match() was called with an |
| ovector that was too small to hold the capture, and there was no other kind of |
| back reference (a situation which is probably quite rare). The effect of the |
| bug was that the condition was always treated as FALSE when the capture could |
| not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug |
| has been fixed. |
| |
| 2. Functions for serialization and deserialization of sets of compiled patterns |
| have been added. |
| |
| 3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove |
| excess code units at the end of the data block that may occasionally occur if |
| the code for calculating the size over-estimates. This change stops the |
| serialization code copying uninitialized data, to which valgrind objects. The |
| documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not |
| include the general overhead. This has been corrected. |
| |
| 4. All code units in every slot in the table of group names are now set, again |
| in order to avoid accessing uninitialized data when serializing. |
| |
| 5. The (*NO_JIT) feature is implemented. |
| |
| 6. If a bug that caused pcre2_compile() to use more memory than allocated was |
| triggered when using valgrind, the code in (3) above passed a stupidly large |
| value to valgrind. This caused a crash instead of an "internal error" return. |
| |
| 7. A reference to a duplicated named group (either a back reference or a test |
| for being set in a conditional) that occurred in a part of the pattern where |
| PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern |
| to be incorrectly calculated, leading to overwriting. |
| |
| 8. A mutually recursive set of back references such as (\2)(\1) caused a |
| segfault at compile time (while trying to find the minimum matching length). |
| The infinite loop is now broken (with the minimum length unset, that is, zero). |
| |
| 9. If an assertion that was used as a condition was quantified with a minimum |
| of zero, matching went wrong. In particular, if the whole group had unlimited |
| repetition and could match an empty string, a segfault was likely. The pattern |
| (?(?=0)?)+ is an example that caused this. Perl allows assertions to be |
| quantified, but not if they are being used as conditions, so the above pattern |
| is faulted by Perl. PCRE2 has now been changed so that it also rejects such |
| patterns. |
| |
| 10. The error message for an invalid quantifier has been changed from "nothing |
| to repeat" to "quantifier does not follow a repeatable item". |
| |
| 11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but |
| scanning the compiled pattern in subsequent auto-possessification can get out |
| of step and lead to an unknown opcode. Previously this could have caused an |
| infinite loop. Now it generates an "internal error" error. This is a tidyup, |
| not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an |
| undefined outcome. |
| |
| 12. A UTF pattern containing a "not" match of a non-ASCII character and a |
| subroutine reference could loop at compile time. Example: /[^\xff]((?1))/. |
| |
| 13. The locale test (RunTest 3) has been upgraded. It now checks that a locale |
| that is found in the output of "locale -a" can actually be set by pcre2test |
| before it is accepted. Previously, in an environment where a locale was listed |
| but would not set (an example does exist), the test would "pass" without |
| actually doing anything. Also the fr_CA locale has been added to the list of |
| locales that can be used. |
| |
| 14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a |
| capturing group number without parentheses, the last character was incorrectly |
| literally included at the end of the replacement string. |
| |
| 15. A possessive capturing group such as (a)*+ with a minimum repeat of zero |
| failed to allow the zero-repeat case if pcre2_match() was called with an |
| ovector too small to capture the group. |
| |
| 16. Improved error message in pcre2test when setting the stack size (-S) fails. |
| |
| 17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the |
| transfer from PCRE1, meaning that CMake configuration failed if "build tests" |
| was selected. (2) The file src/pcre2_serialize.c had not been added to the list |
| of PCRE2 sources, which caused a failure to build pcre2test. |
| |
| 18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems |
| only on Windows. |
| |
| 19. Use binary input when reading back saved serialized patterns in pcre2test. |
| |
| 20. Added RunTest.bat for running the tests under Windows. |
| |
| 21. "make distclean" was not removing config.h, a file that may be created for |
| use with CMake. |
| |
| 22. A pattern such as "((?2){0,1999}())?", which has a group containing a |
| forward reference repeated a large (but limited) number of times within a |
| repeated outer group that has a zero minimum quantifier, caused incorrect code |
| to be compiled, leading to the error "internal error: previously-checked |
| referenced subpattern not found" when an incorrect memory address was read. |
| This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's |
| FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.) |
| |
| 23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine |
| call within a group that also contained a recursive back reference caused |
| incorrect code to be compiled. This bug was reported as "heap overflow", |
| discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015: |
| CVE-2015-2326 was given to this.) |
| |
| 24. Computing the size of the JIT read-only data in advance has been a source |
| of various issues, and new ones are still appear unfortunately. To fix |
| existing and future issues, size computation is eliminated from the code, |
| and replaced by on-demand memory allocation. |
| |
| 25. A pattern such as /(?i)[A-`]/, where characters in the other case are |
| adjacent to the end of the range, and the range contained characters with more |
| than one other case, caused incorrect behaviour when compiled in UTF mode. In |
| that example, the range a-j was left out of the class. |
| |
| |
| Version 10.00 05-January-2015 |
| ----------------------------- |
| |
| Version 10.00 is the first release of PCRE2, a revised API for the PCRE |
| library. Changes prior to 10.00 are logged in the ChangeLog file for the old |
| API, up to item 20 for release 8.36. |
| |
| The code of the library was heavily revised as part of the new API |
| implementation. Details of each and every modification were not individually |
| logged. In addition to the API changes, the following changes were made. They |
| are either new functionality, or bug fixes and other noticeable changes of |
| behaviour that were implemented after the code had been forked. |
| |
| 1. Including Unicode support at build time is now enabled by default, but it |
| can optionally be disabled. It is not enabled by default at run time (no |
| change). |
| |
| 2. The test program, now called pcre2test, was re-specified and almost |
| completely re-written. Its input is not compatible with input for pcretest. |
| |
| 3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the |
| PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is |
| matched by that pattern. |
| |
| 4. For the benefit of those who use PCRE2 via some other application, that is, |
| not writing the function calls themselves, it is possible to check the PCRE2 |
| version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a |
| string such as "yesno". |
| |
| 5. There are case-equivalent Unicode characters whose encodings use different |
| numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is |
| theoretically possible for this to happen in UTF-16 too.) If a backreference to |
| a group containing one of these characters was greedily repeated, and during |
| the match a backtrack occurred, the subject might be backtracked by the wrong |
| number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly |
| (and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should |
| capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. |
| Incorrect backtracking meant that group 2 captured only the last two bytes. |
| This bug has been fixed; the new code is slower, but it is used only when the |
| strings matched by the repetition are not all the same length. |
| |
| 6. A pattern such as /()a/ was not setting the "first character must be 'a'" |
| information. This applied to any pattern with a group that matched no |
| characters, for example: /(?:(?=.)|(?<!x))a/. |
| |
| 7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for |
| those parentheses to be closed with whatever has been captured so far. However, |
| it was failing to mark any other groups between the highest capture so far and |
| the currrent group as "unset". Thus, the ovector for those groups contained |
| whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when |
| matched against "abcd". |
| |
| 8. The pcre2_substitute() function has been implemented. |
| |
| 9. If an assertion used as a condition was quantified with a minimum of zero |
| (an odd thing to do, but it happened), SIGSEGV or other misbehaviour could |
| occur. |
| |
| 10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented. |
| |
| **** |