Parle pattern matching
Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], [:xdigit:].
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression [\xe2][\x82][\xac] can be used. The pattern for an UTF-8 encoded string could be [ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+.
Character representations
Character representations| Sequence | Description |
|---|
| \a | Alert (bell). |
| \b | Backspace. |
| \e | ESC character, \x1b. |
| \n | Newline. |
| \r | Carriage return. |
| \f | Form feed, \x0c. |
| \t | Horizontal tab, \x09. |
| \v | Vertical tab, \x0b. |
| \oct | Character specified by a three-digit octal code. |
| \xhex | Character specified by a hex code. |
| \cchar | Named control character. |
Character classes
Character classes| Sequence | Description |
|---|
| [...] | A single character listed or contained within a listed range. Ranges can be combined with the {+} and {-} operators. For example [a-z]{+}[0-9] is the same as [0-9a-z] and [a-z]{-}[aeiou] is the same as [b-df-hj-np-tv-z]. |
| [^...] | A single character not listed and not contained within a listed range. |
| . | Any character, default [^\n]. |
| \d | Digit character, [0-9]. |
| \D | Non-digit character, [^0-9]. |
| \s | White space character, [ \t\n\r\f\v]. |
| \S | Non-white space character, [^ \t\n\r\f\v]. |
| \w | Word character, [a-zA-Z0-9_]. |
| \W | Non-word character, [^a-zA-Z0-9_]. |
Unicode character classes
Unicode character classes| Sequence | Description |
|---|
| \p{C} | Other. |
| \p{Cc} | Other, control. |
| \p{Cf} | Other, format. |
| \p{Co} | Other, private use. |
| \p{Cs} | Other, surrogate. |
| \p{L} | Letter. |
| \p{LC} | Letter, cased. |
| \p{Ll} | Letter, lowercase. |
| \p{Lm} | Letter, modifier. |
| \p{Lo} | Letter, other. |
| \p{Lt} | Letter, titlecase. |
| \p{Lu} | Letter, uppercase. |
| \p{M} | Mark. |
| \p{Mc} | Mark, space combining. |
| \p{Me} | Mark, enclosing. |
| \p{Mn} | Mark, nonspacing. |
| \p{N} | Number. |
| \p{Nd} | Number, decimal digit. |
| \p{Nl} | Number, letter. |
| \p{No} | Number, other. |
| \p{P} | Punctuation. |
| \p{Pc} | Punctiation, connector. |
| \p{Pd} | Punctuation, dash. |
| \p{Pe} | Punctuation, close. |
| \p{Pf} | Punctuation, final quote. |
| \p{Pi} | Punctuation, initial quote. |
| \p{Po} | Punctuation, other. |
| \p{Ps} | Punctuation, open. |
| \p{S} | Symbol. |
| \p{Sc} | Symbol, currency. |
| \p{Sk} | Symbol, modifier. |
| \p{Sm} | Symbol, math. |
| \p{So} | Symbol, other. |
| \p{Z} | Separator. |
| \p{Zl} | Separator, line. |
| \p{Zp} | Separator, paragraph. |
| \p{Zs} | Separator, space. |
These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.
Alternation and repetition
Alternation and repetition| Sequence | Greedy | Description |
|---|
| ...|... | - | Try sub-patterns in alternation. |
| * | yes | Match 0 or more times. |
| + | yes | Match 1 or more times. |
| ? | yes | Match 0 or 1 times. |
| {n} | no | Match exactly n times. |
| {n,} | yes | Match at least n times. |
| {n,m} | yes | Match at least n times but no more than m times. |
| *? | no | Match 0 or more times. |
| +? | no | Match 1 or more times. |
| ?? | no | Match 0 or 1 times. |
| {n,}? | no | Match at least n times. |
| {n,m}? | no | Match at least n times but no more than m times. |
| {MACRO} | - | Include the regex MACRO in the current regex. |
Anchors
Anchors| Sequence | Description |
|---|
| ^ | Start of string or after a newline. |
| $ | End of string or before a newline. |
Grouping
Grouping| Sequence | Description |
|---|
| (...) | Group a regular expression to override default operator precedence. |
| (?r-s:pattern) | Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x. i means case-insensitive. -i means case-sensitive. s alters the meaning of . to match any character whatsoever. -s alters the meaning of . to match any character except \n. x ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range. These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer. |
| (?# comment ) | Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines. |