-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
[Validator] Add NoBannedWords
constraint
#49868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Symfony/Component/Validator/Constraints/NoBannedWordsValidator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Validator/Constraints/NoBannedWordsValidator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Validator/Constraints/NoBannedWordsValidator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Validator/Constraints/NoBannedWordsValidator.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/Validator/Constraints/NoBannedWordsValidator.php
Outdated
Show resolved
Hide resolved
d7fdc10
to
41af25c
Compare
@nicolas-grekas, @stof: many thanks for the code review. All comments have been addressed 👍🏽. |
41af25c
to
b40df64
Compare
/** | ||
* @param array<string> $dictionary | ||
*/ | ||
public function __construct(array $dictionary = [], mixed $options = null, array $groups = null, mixed $payload = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should keep $options
as the first argument. Otherwise, this will break the XML and YAML loaders, which only pass a single argument with all options. Named arguments for all available options must come after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks.
'n' => '(n|\^/|\|V|\|\\\||/\\/|\[\\\]|\<\\\>|\{\\\}|\]\\\[|//|\^|\[\]|/V|₪)', | ||
'o' => '(o|0|\(\)|oh|\[\]|¤|°|\(\[\]\))', | ||
'p' => '(p|\|\*|\|o|\|º|\|\^\(o\)|\|\>|\|"|9|\[\]D|\|̊|\|7|\?|/\*|¶|\|D)', | ||
'q' => '(q|\(_,\)|\(\)_|0_|°\||\<\||0\.)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those groups should be non-capturing, to avoid reaching the limit on the number of capturing groups in PCRE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the brackets should just be removed actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the behaviour is not the same with [] instead of ().
As an example, "$yMph0NY" is not detected because of "ph". Same if I change "symfony" into "symfohny". It looks like multicharacters are not understood when grouping with []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked to use non-capturing groups, not character classes
throw new UnexpectedValueException($value, 'string'); | ||
} | ||
|
||
$toL33tRegex = fn (string $data): string => implode('', array_map(fn (string $char): string => strtr($char, self::LEET_MAP), str_split($data))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about all characters that are not keys in self::LEET_MAP
(which include uppercase letters btw, as you don't enforce lowercase in the dictionary) ? Having a .
in the dictionary would lead to pretty bad things (it gets even worse when you consider that anything else than a lowercase ASCII letter is injected as is in the regex, allowing to inject all regex special chars in it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For uppercase, I use the modifier i
. You are right, the dictionary shall only contain lowercase alpha otherwize it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strtr
does not care that you use the i
modifier in the final regex: it won't turn a A
in the dictionary word into l33t equivalents
You are right, the dictionary shall only contain lowercase alpha otherwize it does not work.
This must be enforced, because it opens the door to bad behavior otherwise (and it is actually a nasty requirement as it would also add that requirement for the non-l33t banned words, which don't have this restriction otherwise)
This constraint is not mature enough. There are many use cases where it doesn't work as expected. |
I think the whole part being immature is the attempt at forbidding l33t variants of the banned words. Without that, the constraint would be fine. |
Thanks @stof. I think you are right. I revised my mind and simplified the l33t part. Edit: I cannot reopen, I will recreate a new PR in the next days. |
This PR adds a new constraint
NoBannedWords
that can detect forbidden words in a text against a dictionary