-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Description
Documentation
The claim at:
Lines 253 to 255 in d0c6ba9
* Special characters lose their special meaning inside sets. For example, | |
``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, | |
``'*'``, or ``')'``. |
seems wrong at least for
\
.
Consider the following example:
>>> bool(re.search(string=b"a\\b",pattern=b"[\\\n\r]"))
False
My expectation would be that after backslash-unescaping the b"…"
-string, pattern
is assigned the sequence of:
literal \
, the line-feed "character", the carriage-return "character"
If it would be true, that "Special characters lose their special meaning inside sets.", then the resolved \
in the unescaped pattern
should match the one in my test string b"a\\b"
, however it does not.
I guess what Python actually "sees" is:
backslash-escaped line-feed "character", the carriage-return "character"
which probably effectively yields:
the line-feed "character", the carriage-return "character"
Now you could argue that the \
is not considered a special-character for the terms of the regular expression syntax... but it is, at least already because of:
Lines 504 to 507 in d0c6ba9
The special sequences consist of ``'\'`` and a character from the list below. | |
If the ordinary character is not an ASCII digit or an ASCII letter, then the | |
resulting RE will match the second character. For example, ``\$`` matches the | |
character ``'$'``. |
and ff..
Also, even the section that explains […]
mentions the escaping functionality of it:
Lines 249 to 250 in d0c6ba9
``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g. | |
``[a\-z]``) or if it's placed as the first or last character |
I think:
Lines 253 to 255 in d0c6ba9
* Special characters lose their special meaning inside sets. For example, | |
``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, | |
``'*'``, or ``')'``. |
should be improved to document that:
\
is exempt from this- whether or this is only the case for characters that are actually special with respect to the RE bracket expression, i.e.
[0\-9]
is0
,-
and9
, because the-
was special in that position. But what about[\-9]
? Here, the-
would not have been special, so it the result\
,-
and9
or just-
and9
? - or whether this is simply the case for any character following the
\
... ones that are special outside and RE bracket expression, like\$
,\D
.\w
or\number
... and/or ones that are never special, like\ü
.
Thanks,
Chris.