Sortix volatile manual
This manual documents Sortix volatile, a development build that has not been officially released. You can instead view this document in the latest official manual.
PCREPARTIAL(3) | Library Functions Manual | PCREPARTIAL(3) |
NAME
PCRE - Perl-compatible regular expressionsPARTIAL MATCHING IN PCRE
In normal use of PCRE, if the subject string that is passed to a matching function matches as far as it goes, but is too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might be helpful to distinguish this case from other cases in which there is no match.^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()
A partial match occurs during a call to pcre_exec() or pcre[16|32]_exec() when the end of the subject string is reached successfully, but matching cannot continue because more characters are needed. However, at least one character in the subject must have been inspected. This character need not form part of the final matched string; lookbehind assertions and the \K escape sequence provide ways of inspecting characters before the start of a matched substring. The requirement for inspecting at least one character exists because an empty string can always be matched; without such a restriction there would always be a partial match of an empty string at the end of the subject./(?<=abc)123/
PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre[16|32]_exec()
If PCRE_PARTIAL_SOFT is set when pcre_exec() or pcre[16|32]_exec() identifies a partial match, the partial match is remembered, but matching continues as normal, and other alternatives in the pattern are tried. If no complete match can be found, PCRE_ERROR_PARTIAL is returned instead of PCRE_ERROR_NOMATCH./123\w+X|dogY/
PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre[16|32]_exec()
If PCRE_PARTIAL_HARD is set for pcre_exec() or pcre[16|32]_exec(), PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, without continuing to search for possible complete matches. This option is "hard" because it prefers an earlier partial match over a later complete match. For this reason, the assumption is made that the end of the supplied subject string may not be the true end of the available data, and so, if \z, \Z, \b, \B, or $ are encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL, provided that at least one character in the subject has been inspected.Comparing hard and soft partial matching
The difference between the two partial matching options can be illustrated by a pattern such as:/dog(sbody)?/
/dog(sbody)??/
/dog(sbody)?/ is the same as /dogsbody|dog/
/dog(sbody)??/ is the same as /dog|dogsbody/
PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec()
The DFA functions move along the subject string character by character, without backtracking, searching for all possible matches simultaneously. If the end of the subject is reached before the end of the pattern, there is the possibility of a partial match, again provided that at least one character has been inspected./dog(sbody)??/
PARTIAL MATCHING AND WORD BOUNDARIES
If a pattern ends with one of sequences \b or \B, which test for word boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive results. Consider this pattern:/\bcat\b/
FORMERLY RESTRICTED PATTERNS
For releases of PCRE prior to 8.00, because of the way certain internal optimizations were implemented in the pcre_exec() function, the PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with all patterns. From release 8.00 onwards, the restrictions no longer apply, and partial matching with can be requested for any pattern.EXAMPLE OF PARTIAL MATCHING USING PCRETEST
If the escape sequence \P is present in a pcretest data line, the PCRE_PARTIAL_SOFT option is used for the match. Here is a run of pcretest that uses the date example quoted above:re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data> 25jun04\P
0: 25jun04
1: jun
data> 25dec3\P
Partial match: 23dec3
data> 3ju\P
Partial match: 3ju
data> 3juj\P
No match
data> j\P
No match
MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec()
When a partial match has been found using a DFA matching function, it is possible to continue the match by providing additional subject data and calling the function again with the same compiled regular expression, this time setting the PCRE_DFA_RESTART option. You must pass the same working space as before, because this is where details of the previous partial match are stored. Here is an example using pcretest, using the \R escape sequence to set the PCRE_DFA_RESTART option (\D specifies the use of the DFA matching function):re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data> 23ja\P\D
Partial match: 23ja
data> n05\R\D
0: n05
MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec()
From release 8.00, the standard matching functions can also be used to do multi-segment matching. Unlike the DFA functions, it is not possible to restart the previous match with a new segment of data. Instead, new data must be added to the previous subject string, and the entire match re-run, starting from the point where the partial match occurred. Earlier data can be discarded.re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
data> The date is 23ja\P\P
Partial match: 23ja
ISSUES WITH MULTI-SEGMENT MATCHING
Certain types of pattern may give problems with multi-segment matching, whichever matching function is used.re> "(?<=123)abc"
data> xx123a\P\P
Partial match at offset 5: 123a
re> /c(?<=abc)x/
data> ab\P
No match
re> /dog(sbody)?/
data> dogsb\P
0: dog
data> do\P\D
Partial match: do
data> gsb\R\P\D
0: g
data> dogsbody\D
0: dogsbody
1: dog
re> /dog(sbody)?/
data> dogsb\P\P
Partial match: dogsb
data> do\P\D
Partial match: do
data> gsb\R\P\P\D
Partial match: gsb
1234|3789
1234|ABCD
re> /1234|3789/
data> ABC123\P\P
Partial match: 123
data> 1237890
0: 3789
02 July 2013 | PCRE 8.34 |